Monte Carlo Tree Search because simulating space of whole game is too huge and there is a time limit for choosing action.
Use default policy for quick simulation. Default policy takes 1 micro second and tree policy takes 1 ms
Alpha go pre-trained based on human data and applied self-play with Efficient MCTS. Based on the result of self-play, it trains based on the result.