Since Correlating offline evaluation with online evaluation critical
IPS estimation
We can estimate how customer interactions will change by re-weighting how often each interaction will occur based on how much more (or less) each item is shown by our new recommendation mode.
This IPS estimate suggests how much reward (i.e., clicks, purchases) the new recommender would get relative to the production recommender if the new recommender was shown to users.
Clipped IPS (CIPS) estimator
IPS outperformed Direct Method (DM) as the amount of logged data increases, and that CIPS didn’t perform much better than IPS.
Self-normalized IPS (SNIPS)
Overall, SNIPS performed the best (i.e., had the least error) and without the need for any parameter tuning