reduce format reward bias AI Reward Hacking by SAE feature steeringaarxiv.orghttps://arxiv.org/pdf/2603.12795