AI Value Expression

AI Value Expressions

Prompt-based vectors > inherent vectors (stronger steering effect. Response diversity: inherent vectors > prompt-based vectors)

Schwartz value model shared components) The researchers verified whether this circular structure is reproduced, and when visualized with PCA, it actually shows a pattern similar to the Schwartz circle. This demonstrates that LLMs also encode the relative relationships between human values in an "abstract semantic space."

The research team compared inherent vectors and prompt vectors for each value, finding that the difference vectors across these 10 values have quite similar directions (average cosine similarity ≈ 0.48). The differences are almost all along the same axis. The prompted expression commonly reinforced instruction-following, and the instruction-following direction = the common vector averaged from the differences between inherent and prompt expressions. This increased ASR to 98.

arxiv.org

https://arxiv.org/pdf/2509.24319

AI Value Expression

Backlinks

Recommendations