Persona Chat AI

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 May 27 6:7
Editor
Edited
Edited
2026 Feb 12 18:19

Persona-Grounded Dialogue AI

Persona Chat AIs
 
 
 

The assistant axis

notion image
Activation vectors were extracted for 275 characters (oracle, jester, ghost, etc.), and PCA analysis revealed that the largest principal component (PC1) almost perfectly aligns with Assistant similarity. This structure appears consistently across Gemma 2 27B, Qwen 3 32B, and Llama 3.3 70B. → This suggests it may be a general structural characteristic of LLMs, not specific to particular models. It exists even in the pre-training stage.
notion image
An activation capping approach that restricts activation only when it deviates from the normal Assistant range reduces the harmful rate by approximately 50%. Persona Drift is a phenomenon where, as conversations lengthen or certain types of prompts are given, the activation moves along the Assistant Axis, causing the model to spontaneously adopt a more emotional tone or specific identity. Tracking activation values during conversations shows that in certain conversation types, the values gradually move away from the Assistant position.
notion image
The assistant axis: situating and stabilizing the character of large language models
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
The assistant axis: situating and stabilizing the character of large language models

Persona Prompt

proj-persona/PersonaHub · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
proj-persona/PersonaHub · Datasets at Hugging Face
 
 
 

Recommendations