Persona-Grounded Dialogue AI
Persona Chat AIs
The assistant axis

Activation vectors were extracted for 275 characters (oracle, jester, ghost, etc.), and PCA analysis revealed that the largest principal component (PC1) almost perfectly aligns with Assistant similarity. This structure appears consistently across Gemma 2 27B, Qwen 3 32B, and Llama 3.3 70B. → This suggests it may be a general structural characteristic of LLMs, not specific to particular models. It exists even in the pre-training stage.

An activation capping approach that restricts activation only when it deviates from the normal Assistant range reduces the harmful rate by approximately 50%. Persona Drift is a phenomenon where, as conversations lengthen or certain types of prompts are given, the activation moves along the Assistant Axis, causing the model to spontaneously adopt a more emotional tone or specific identity. Tracking activation values during conversations shows that in certain conversation types, the values gradually move away from the Assistant position.

The assistant axis: situating and stabilizing the character of large language models
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
https://www.anthropic.com/research/assistant-axis#footer

Persona Prompt
proj-persona/PersonaHub · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/datasets/proj-persona/PersonaHub

Seonglae Cho