Not contrastive prompts like CAA, use Preference Optimization technique such as DPO
Personalized Steering of Large Language Models: Versatile Steering...
Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct...
https://openreview.net/forum?id=7qJFkuZdYo
arxiv.org
https://arxiv.org/pdf/2406.00045
gemini
Steering Gemini with BiDPO — LessWrong
We tried boosting Gemini benchmarks by optimizing steering vectors. It didn't work. We share our takeaways.
https://www.lesswrong.com/posts/WqjkqrEyFDXoHzz9K/steering-gemini-with-bidpo
Steering Gemini Using BIDPO Vectors
We tried boosting Gemini benchmarks by optimizing steering vectors. It didn't work. We share our takeaways.
https://turntrout.com/gemini-steering


Seonglae Cho