Episode, Trajectory Rollout(s0,a0,s1,a1,…,sT)(s_0, a_0, s_1, a_1,\dots, s_T)(s0,a0,s1,a1,…,sT)Policy Rollout TechniquesReplay BufferEpsilon GreedyRL Frame skipRL Target NetworkEntropy BonusMaximum Entropy ObjectiveUCB exploration Trajectory can end in two wayscatastrophic failure, like crashingtruncation like exceeding the maximum episode length Rollout policyAI에 관련된 논문과 지식을 포스팅한 블로그입니다.https://ai-information.blogspot.com/2019/03/rollout-policy.html