Need good offline and check online correlation
Consideration
- Production recommender and test recommender’s logged feedback might different
- Special logged data need to be collected through randomized data or log propensity scores
- Counterfactual evaluation → off-policy evaluation