Map situation to action by numeric reward signal for policy model
Approach for Sequential decision making problems (sequential decision making is everywhere)
Reinforcement Learning is process of resolving MDP
Unlike supervised and unsupervised iid, RL considers Compounding Error. RL addresses compounding error at each time step by considering distribution shift. However, SL considers distribution shift not for each inference, even with time-series data, but instead accounts for it indirectly within the model.
In actual training, the data structure is the same, but rewards are derived from the environment rather than from the ground truth data in supervised learning. The practical difference between supervised learning and reinforcement learning lies in whether there is interaction with the environment, whether it's offline or online, through model-based approaches.
Indirect supervision: an agent defined within a certain environment recognizes the current state and maximizes rewards among selectable actions
Reinforcement Learning Notion
Reinforcement Learning Usages
OpenAI
CS285