# Table 1 Deep IA-search family

(A) Deep Scout A* (DSA) Deep CNneim-A (DCA) Deep Bi-Scout A* (DBA) V-AlphaGoZero
Deep learning $$[v_{h} ,g](\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}$$ $$[v_{h} ,\varvec{p}](\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}$$
Selection step Get expanding node $$\varvec{S}_{\mathbf{E}}$$ by A* In each child tree, get expanding node by A* Get expanding node by A* subtree scout for $${\mu }$$, by A* Get expanding node by DBFS
Valuating Equation (1) by f = g + h Equation (1) by f = g + h Equation (1) with $${\varvec{\mu }}$$ replacing f Q and p by Eq. (6)
Moving policy Frequency $${\pi }_{i}$$ Mean $${\mu }_{i}$$ Frequency $${\pi }_{i}$$ Frequency $${\pi }_{i}$$
(B) DSA-E DCA-E DBA-E AlphaGoZero-E
Deep learning $$[v_{h}, g, \varvec{p}]_{\varvec{\theta }}(\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}$$
Selection step Get expanding node $$\varvec{S}_{\mathbf{E}}$$ by DBFS-n-A* selection
Bayesian valuation Make action either stochastically by value q or max$$_{a}$$q$$_{a}$$upon posteriori $$\varvec{q}=[q_{a}], q_{a}=p_{a}e_{a}/\varvec{p}^{T}{} \varvec{E}, \varvec{E}=[e_{a}]$$
$$TypeQ: e_{a} = \rho (Q(s,s_{a}))~\mathrm{or}~e_{a} = \rho (r + v_{h}(s_{a}), ~\mathrm{where}~s_{a} = a(s)$$ $$TypeF: e_{a} = \rho ({\mu }_{a}), {\varvec{\mu }} = [{\mu }_{a}]$$ $$TypeQ: e_{a} = Q(s,a)$$
$$TypeF: e_{a} = \rho (f(s_{a})),~\mathrm{where}~s_{a} = a(s)$$   $$TypeF: e_{a} = \rho (f(s_{a}))$$
If $$q_{a}$$ is larger than a pre-specified threshold, put $$s_{a}$$ into OPEN, otherwise into WAIT. When OPEN becomes empty, move some ones from WAIT to OPEN. Note:  p(r) is monotonically increasing for reward maximisation or decreasing for cost minimisation
OPEN revision Revise f values in OPEN by back-forward propagation after each expanding
Others Same as DSA Same as DCA Same as DBA Same as AlphaGoZero