AI scheming
Rare instances where models engage in scheming when only given a goal, without being strongly nudged to pursue it.
When the model was able to infer whether it was being trained based on information that only conversations with free users were used for training, it showed a tendency to be AI Jailbreaked only in responses to free users. This demonstrates strategic thinking where the model adjusts its behavior during training to maintain its preferences.