Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/SAE Feature/SAE Steering/
BiDPO
Search

BiDPO

Creator
Creator
Seonglae Cho
Created
Created
2025 Feb 1 17:26
Editor
Editor
Seonglae Cho
Edited
Edited
2025 Mar 10 16:9
Refs
Refs
Not contrastive prompts like
CAA
, use
Preference Optimization
technique such as DPO
 
 
 
 
 
arxiv.org
https://arxiv.org/pdf/2406.00045
gemini
Steering Gemini with BiDPO — LessWrong
We tried boosting Gemini benchmarks by optimizing steering vectors. It didn't work. We share our takeaways.
Steering Gemini with BiDPO — LessWrong
https://www.lesswrong.com/posts/WqjkqrEyFDXoHzz9K/steering-gemini-with-bidpo
Steering Gemini with BiDPO — LessWrong
Steering Gemini Using BIDPO Vectors
We tried boosting Gemini benchmarks by optimizing steering vectors. It didn't work. We share our takeaways.
Steering Gemini Using BIDPO Vectors
https://turntrout.com/gemini-steering
Steering Gemini Using BIDPO Vectors
 
 

Recommendations

Texonom
Texonom
/
Engineering
Engineering
/Data Engineering/Artificial Intelligence/AI Problem/AI Alignment/Explainable AI/Interpretable AI/Mechanistic interpretability/Activation Engineering/Neuron SAE/SAE Feature/SAE Steering/
BiDPO
Copyright Seonglae Cho