From: Deep bidirectional intelligence: AlphaZero, deep IA-search, deep IA-infer, and TPC causal learning
(A) | Deep Scout A* (DSA) | Deep CNneim-A (DCA) | Deep Bi-Scout A* (DBA) | V-AlphaGoZero |
---|---|---|---|---|
Deep learning | \([v_{h} ,g](\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}\) | \([v_{h} ,\varvec{p}](\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}\) | ||
Selection step | Get expanding node \(\varvec{S}_{\mathbf{E}}\) by A* | In each child tree, get expanding node by A* | Get expanding node by A* subtree scout for \({\mu }\), by A* | Get expanding node by DBFS |
Valuating | Equation (1) by f = g + h | Equation (1) by f = g + h | Equation (1) with \({\varvec{\mu }}\) replacing f | Q and p by Eq. (6) |
Moving policy | Frequency \({\pi }_{i}\) | Mean \({\mu }_{i}\) | Frequency \({\pi }_{i}\) | Frequency \({\pi }_{i}\) |
(B) | DSA-E | DCA-E | DBA-E | AlphaGoZero-E |
---|---|---|---|---|
Deep learning | \([v_{h}, g, \varvec{p}]_{\varvec{\theta }}(\varvec{s})~=~\varvec{f}_{\varvec{\theta }}~\varvec{(s)}\) | |||
Selection step | Get expanding node \(\varvec{S}_{\mathbf{E}}\) by DBFS-n-A* selection | |||
Bayesian valuation | Make action either stochastically by value q or max\(_{a}\)q\(_{a}\)upon posteriori \(\varvec{q}=[q_{a}], q_{a}=p_{a}e_{a}/\varvec{p}^{T}{} \varvec{E}, \varvec{E}=[e_{a}]\) | |||
\(TypeQ: e_{a} = \rho (Q(s,s_{a}))~\mathrm{or}~e_{a} = \rho (r + v_{h}(s_{a}), ~\mathrm{where}~s_{a} = a(s)\) | \(TypeF: e_{a} = \rho ({\mu }_{a}), {\varvec{\mu }} = [{\mu }_{a}]\) | \(TypeQ: e_{a} = Q(s,a)\) | ||
\(TypeF: e_{a} = \rho (f(s_{a})),~\mathrm{where}~s_{a} = a(s)\) | \(TypeF: e_{a} = \rho (f(s_{a}))\) | |||
If \(q_{a}\) is larger than a pre-specified threshold, put \(s_{a}\) into OPEN, otherwise into WAIT. When OPEN becomes empty, move some ones from WAIT to OPEN. Note: p(r) is monotonically increasing for reward maximisation or decreasing for cost minimisation | ||||
OPEN revision | Revise f values in OPEN by back-forward propagation after each expanding | |||
Others | Same as DSA | Same as DCA | Same as DBA | Same as AlphaGoZero |