Abstract
Monte Carlo Tree Search (MCTS) algorithms such as AlphaGo and MuZero have achieved superhuman performance in many challenging tasks. However, the computational complexity of MCTS-based algorithms is influenced by the size of the search space. To address this issue, we propose a novel probability tree state abstraction algorithm to improve the search efficiency of MCTS. A general tree state abstraction with path transitivity is defined. In addition, the probability tree state abstraction is proposed for fewer mistakes during the aggregation step. Furthermore, the theoretical guarantees of the transitivity and aggregation error bound are justified. To evaluate the effectiveness of the PTSA algorithm, we integrate it with state-of-the-art MCTS-based algorithms, such as Sampled MuZero and Gumbel MuZero. Experimental results on different tasks demonstrate that our method can accelerate the training process of state-of-the-art algorithms with 10%-45% search space reduction.
Method
The proposed method introduces the Probability Tree State Abstraction (PTSA) algorithm to enhance Monte Carlo Tree Search (MCTS) efficiency by reducing the search space and identifying the smallest abstract space through transitive probability tree state abstraction. PTSA formalizes tree state abstraction using predicates to group paths and nodes into abstract clusters while preserving the Markov property. It defines path transitivity to extend state abstraction equivalence to tree structures, ensuring consistent abstraction. This approach maps the original path space to an abstracted space, addressing the NP-hard challenge of minimizing abstract states while maintaining computational feasibility and improving MCTS performance.
Fig. 1 The overview structure of \textit{PTSA} algorithm. The original tree search space in MCTS is mapped into a smaller abstract space efficiently by transitive probability tree state abstraction
Result
Experiments results on five Atari games with a normalized score plot. Sampled MuZero with PTSA (PTSAZero) is compared with three state-of-the-art MCTS-based methods.
The aggregation percentage on paths during the training process on Atari, Control, and Gomoku tasks varies as the network parameters are updated.
Speedup comparison of PTSAZero with different state abstraction functions, where Abs denotes the different tree state abstraction functions.