Attempting to reverse engineer the detailed computationsMechanistic interpretability NotionActivation EngineeringAI Neural CircuitSuperposition HypothesisLinear representation hypothesisSparse Feature Circuit Overlookarxiv.orghttps://arxiv.org/pdf/2405.00208