Regret Minimization

Minimize cumulated difference of decisions compared to the optimal choice

The reason we typically use regret minimization for

Online Learning is similar to why we use

Off-policy learning. More specifically, in online learning, data is presented sequentially, which hinders efficient training that considers all data points. Considering all data is more appropriate for online learning since it deals with environmental changes, unlike traditional training. Also, the reason why we consider optimal point is that the Online Learning requires gradual optimization for unknown data, so the concepts of Regret and Optimal are useful

Regret Minimization and

Reward Maximization are philosophically opposite since the former focuses on the past while the latter focuses on future rewards.

\text{Regret} = \frac{1}{N} \sum_{i=1}^{N} L(\theta_i, x_i) - \min_{\theta^* \in \Theta} \frac{1}{N} \sum_{i=1}^{N} L(\theta^*, x_i)

The cumulated performance difference of past decision compared to the optimal choice.

Implementation is just a regularized
Stochastic Gradient Descent

We simply add projection into

\theta

space to minimize regret compared to loss minimization, and we do not know the optimal

\theta

\theta_{i+1} = \text{proj}_{\Theta} \left( \theta_i - \eta_i \nabla L(\theta_i, x_i) \right)

\text{proj}_{\Theta}(\theta) = \arg\min_{\theta' \in \Theta} \|\theta' - \theta\|_2

to implement this we use restriction trick

\text{proj}_{\Theta}(\theta) = \theta \cdot \min\left(1, \frac{R}{\|\theta\|_2}\right)

\gamma = \min_{x \in S} \left| y_i f(x_i) \right|

It enables to make it close to the optimal

\theta_{i+1}^\top \theta^* \geq \theta_i^\top \theta^* + \gamma

with limiting explode during update

\theta_{i+1}^\top \theta_{i+1} \leq \theta_i^\top \theta_i + 1

Online Learning and Regret Minimization

https://donskerclass.github.io/Forecasting/OnlineLearning.html

Regret Minimization

Minimize cumulated difference of decisions compared to the optimal choice

Implementation is just a regularized
Stochastic Gradient Descent

Backlinks

Recommendations

Regret Minimization

Minimize cumulated difference of decisions compared to the optimal choice

Implementation is just a regularized Stochastic Gradient Descent

Backlinks

Recommendations

Implementation is just a regularized
Stochastic Gradient Descent