Monte Carlo Tree Search

Creator
Creator
Alan JoAlan Jo
Created
Created
2023 Feb 17 18:9
Editor
Editor
Alan JoAlan Jo
Edited
Edited
2024 Jul 3 15:50
Refs
Refs

MCTS

Approximate tree search with sampling which is faster than brute force. MCTS is for inference, not training.
Alpha Go
에서 사용
notion image

How to build a tree

We don’t train tree policy here, this is kind of inference step for game.
  1. Selection: Traverse a tree using slow tree policy until it needs to expand a new node.
  1. Expansion:
  1. Simulation: Run a simulation using a fast default policy and get reward (win/lose), evaluate the new state using default policy.
  1. Backpropagation
After iteration, most frequently visited action will be chosen.
We give a change by adding incentive to barely visited node.
 
notion image
 

Efficient MCTS

Pruned out less likely actions by only search plausible states (reduced breadth) and reduced depth by estimate expected results (win/lose) without simulation.
Specifically, reduced depth with value network and reduced breadth with policy network.
encourages exploitation vs encourages exploration
 
 
 

For LLM (LLaMa8b → GPT4) Monte Carlo Tree Self-refine

AI search matters

 
 
 

Recommendations