Skip to main content

Advertisement

Table 1 Deep IA-search family

From: Deep bidirectional intelligence: AlphaZero, deep IA-search, deep IA-infer, and TPC causal learning

(A) Deep Scout A* (DSA) Deep CNneim-A (DCA) Deep Bi-Scout A* (DBA) V-AlphaGoZero
Deep learning \([v_{h} ,g](\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}\) \([v_{h} ,\varvec{p}](\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}\)
Selection step Get expanding node \(\varvec{S}_{\mathbf{E}}\) by A* In each child tree, get expanding node by A* Get expanding node by A* subtree scout for \({\mu }\), by A* Get expanding node by DBFS
Valuating Equation (1) by f = g + h Equation (1) by f = g + h Equation (1) with \({\varvec{\mu }}\) replacing f Q and p by Eq. (6)
Moving policy Frequency \({\pi }_{i}\) Mean \({\mu }_{i}\) Frequency \({\pi }_{i}\) Frequency \({\pi }_{i}\)
(B) DSA-E DCA-E DBA-E AlphaGoZero-E
Deep learning \([v_{h}, g, \varvec{p}]_{\varvec{\theta }}(\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}\)
Selection step Get expanding node \(\varvec{S}_{\mathbf{E}}\) by DBFS-n-A* selection
Bayesian valuation Make action either stochastically by value q or max\(_{a}\)q\(_{a}\)upon posteriori \(\varvec{q}=[q_{a}], q_{a}=p_{a}e_{a}/\varvec{p}^{T}{} \varvec{E}, \varvec{E}=[e_{a}]\)
\(TypeQ: e_{a} = \rho (Q(s,s_{a}))~\mathrm{or}~e_{a} = \rho (r + v_{h}(s_{a}), ~\mathrm{where}~s_{a} = a(s)\) \(TypeF: e_{a} = \rho ({\mu }_{a}), {\varvec{\mu }} = [{\mu }_{a}]\) \(TypeQ: e_{a} = Q(s,a)\)
\(TypeF: e_{a} = \rho (f(s_{a})),~\mathrm{where}~s_{a} = a(s)\)   \(TypeF: e_{a} = \rho (f(s_{a}))\)
If \(q_{a}\) is larger than a pre-specified threshold, put \(s_{a}\) into OPEN, otherwise into WAIT. When OPEN becomes empty, move some ones from WAIT to OPEN. Note:  p(r) is monotonically increasing for reward maximisation or decreasing for cost minimisation
OPEN revision Revise f values in OPEN by back-forward propagation after each expanding  
Others Same as DSA Same as DCA Same as DBA Same as AlphaGoZero