Upper Confidence BoundBalances 'exploration vs. exploitation' automatically by combining each action's mean reward with uncertainty (upper confidence bound)