Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Steering Vector/
BiDPO
Search

BiDPO

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Feb 1 17:26
Editor
Editor
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Aug 10 21:49
Refs
Refs
Not contrastive prompts like
CAA
, use
Preference Optimization
technique such as DPO
 
 
 
 
 
Personalized Steering of Large Language Models: Versatile Steering...
Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct...
Personalized Steering of Large Language Models: Versatile Steering...
https://openreview.net/forum?id=7qJFkuZdYo
arxiv.org
https://arxiv.org/pdf/2406.00045
gemini
Steering Gemini with BiDPO — LessWrong
We tried boosting Gemini benchmarks by optimizing steering vectors. It didn't work. We share our takeaways.
Steering Gemini with BiDPO — LessWrong
https://www.lesswrong.com/posts/WqjkqrEyFDXoHzz9K/steering-gemini-with-bidpo
Steering Gemini with BiDPO — LessWrong
Steering Gemini Using BIDPO Vectors
We tried boosting Gemini benchmarks by optimizing steering vectors. It didn't work. We share our takeaways.
Steering Gemini Using BIDPO Vectors
https://turntrout.com/gemini-steering
Steering Gemini Using BIDPO Vectors
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Steering Vector/
BiDPO
Copyright Seonglae Cho