Regret Minimization

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Nov 21 10:22
Editor
Edited
Edited
2024 Nov 21 11:59
Refs
Refs

Minimize cumulated difference of decisions compared to the optimal choice

The reason we typically use regret minimization for
Online Learning
is similar to why we use
Replay Buffer
for
Sample efficiency
in
Off-policy
learning. More specifically, in online learning, data is presented sequentially, which hinders efficient training that considers all data points. Considering all data is more appropriate for online learning since it deals with environmental changes, unlike traditional training. Also, the reason why we consider optimal point is that the Online Learning requires gradual optimization for unknown data, so the concepts of Regret and Optimal are useful
Regret Minimization
and
Reward Maximization
are philosophically opposite since the former focuses on the past while the latter focuses on future rewards.
The cumulated performance difference of past decision compared to the optimal choice.

Implementation is just a regularized
Stochastic Gradient Descent

We simply add projection into space to minimize regret compared to loss minimization, and we do not know the optimal .
to implement this we use restriction trick
It enables to make it close to the optimal
with limiting explode during update
 
 
 
 
 

Recommendations