Skip to main content

Table 1 Deep IA-search family

From: Deep bidirectional intelligence: AlphaZero, deep IA-search, deep IA-infer, and TPC causal learning

(A)

Deep Scout A* (DSA)

Deep CNneim-A (DCA)

Deep Bi-Scout A* (DBA)

V-AlphaGoZero

Deep learning

\([v_{h} ,g](\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}\)

\([v_{h} ,\varvec{p}](\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}\)

Selection step

Get expanding node \(\varvec{S}_{\mathbf{E}}\) by A*

In each child tree, get expanding node by A*

Get expanding node by A* subtree scout for \({\mu }\), by A*

Get expanding node by DBFS

Valuating

Equation (1) by f = g + h

Equation (1) by f = g + h

Equation (1) with \({\varvec{\mu }}\) replacing f

Q and p by Eq. (6)

Moving policy

Frequency \({\pi }_{i}\)

Mean \({\mu }_{i}\)

Frequency \({\pi }_{i}\)

Frequency \({\pi }_{i}\)

(B)

DSA-E

DCA-E

DBA-E

AlphaGoZero-E

Deep learning

\([v_{h}, g, \varvec{p}]_{\varvec{\theta }}(\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}\)

Selection step

Get expanding node \(\varvec{S}_{\mathbf{E}}\) by DBFS-n-A* selection

Bayesian valuation

Make action either stochastically by value q or max\(_{a}\)q\(_{a}\)upon posteriori \(\varvec{q}=[q_{a}], q_{a}=p_{a}e_{a}/\varvec{p}^{T}{} \varvec{E}, \varvec{E}=[e_{a}]\)

\(TypeQ: e_{a} = \rho (Q(s,s_{a}))~\mathrm{or}~e_{a} = \rho (r + v_{h}(s_{a}), ~\mathrm{where}~s_{a} = a(s)\)

\(TypeF: e_{a} = \rho ({\mu }_{a}), {\varvec{\mu }} = [{\mu }_{a}]\)

\(TypeQ: e_{a} = Q(s,a)\)

\(TypeF: e_{a} = \rho (f(s_{a})),~\mathrm{where}~s_{a} = a(s)\)

 

\(TypeF: e_{a} = \rho (f(s_{a}))\)

If \(q_{a}\) is larger than a pre-specified threshold, put \(s_{a}\) into OPEN, otherwise into WAIT. When OPEN becomes empty, move some ones from WAIT to OPEN. Note:  p(r) is monotonically increasing for reward maximisation or decreasing for cost minimisation

OPEN revision

Revise f values in OPEN by back-forward propagation after each expanding

 

Others

Same as DSA

Same as DCA

Same as DBA

Same as AlphaGoZero