AI Value Expression

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2025 Sep 30 9:44
Editor
Edited
Edited
2025 Sep 30 22:52
Refs
 
 
 
 
Prompt-based vectors > inherent vectors (stronger steering effect. Response diversity: inherent vectors > prompt-based vectors)
Schwartz value model
shared components) The researchers verified whether this circular structure is reproduced, and when visualized with PCA, it actually shows a pattern similar to the Schwartz circle. This demonstrates that LLMs also encode the relative relationships between human values in an "abstract semantic space."
The research team compared inherent vectors and prompt vectors for each value, finding that the difference vectors across these 10 values have quite similar directions (average cosine similarity ≈ 0.48). The differences are almost all along the same axis. The prompted expression commonly reinforced instruction-following, and the instruction-following direction = the common vector averaged from the differences between inherent and prompt expressions. This increased ASR to 98.
 
 

Recommendations