Monte Carlo Tree Search

MCTS

Approximate tree search with sampling which is faster than brute force. MCTS is for inference, not training.

Alpha Go

How to build a tree

We don’t train tree policy here, this is kind of inference step for game.

Selection: Traverse a tree using slow tree policy until it needs to expand a new node.

Expansion:

Simulation: Run a simulation using a fast default policy and get reward (win/lose), evaluate the new state using default policy.

Back propagation

After

N

iteration, most frequently visited action will be chosen.

We give a change by adding incentive to barely visited node.

Efficient MCTS

Pruned out less likely actions by only search plausible states (reduced breadth) and reduced depth by estimate expected results (win/lose) without simulation.

Specifically, reduced depth with value network and reduced breadth with policy network.

Q

encourages exploitation vs

N

encourages exploration

For LLM (LLaMa8b → GPT4) Monte Carlo Tree Self-refine

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte...

This paper introduces the MCT Self-Refine (MCTSr) algorithm, an innovative integration of Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS), designed to enhance performance in...

https://arxiv.org/abs/2406.07394

ieeexplore.ieee.org

https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6145622

AI search matters

AI Search: The Bitter-er Lesson