Can we interpret latent reasoning using current mechanistic interpretability tools? — AI Alignment Forum
Authors: Bartosz Cywinski*, Bart Bussmann*, Arthur Conmy**, Joshua Engels**, Neel Nanda**, Senthooran Rajamanoharan** …
https://www.alignmentforum.org/posts/YGAimivLxycZcqRFR/can-we-interpret-latent-reasoning-using-current-mechanistic#How_many_latent_vectors_does_the_model_actually_use_