Circuit Superposition

Cross-layer Superposition

SAE error nodes contain the uninterpretable parts of the original model activations that the SAE failed to reconstruct. Removing the residual error nodes significantly decreases model performance, but restoring some important SAE features from intermediate layers can largely recover this performance drop. This phenomenon may be due to cross-layer superposition.

Compressed Computation is (probably) not Computation in Superposition — LessWrong

This research was completed during the Mentorship for Alignment Research Students (MARS 2.0) Supervised Program for Alignment Research (SPAR spring 2…

https://www.lesswrong.com/posts/ZxFchCFJFcgysYsT9/compressed-computation-is-probably-not-computation-in

Circuit Superposition

Storage and computation have different difficulty levels

Simply "storing" features in one layer can have very high capacity,

But when "computing" across multiple layers, noise accumulates/amplifies, making it much more difficult.

The number of simultaneously active circuits z must be small (sparsity assumption)

While it is possible to store many features in superposition, when computing across multiple layers, the number of circuits that can be stored is much smaller due to noise. This upper bound is given in the form .

D: large network width

d: small circuit width

z: number of active circuits at once

O~: ignoring log factors

Circuits in Superposition 2: Now with Less Wrong Math — LessWrong

Summary & Motivation This post is a continuation and clarification of Circuits in Superposition: Compressing many small neural networks into one. Tha…

https://www.lesswrong.com/posts/FWkZYQceEzL84tNej/circuits-in-superposition-2-now-with-less-wrong-math

Ping pong computation in superposition — LessWrong

Overview: This post builds on Circuits in Superposition 2, using the same terminology. …

https://www.lesswrong.com/posts/g9uMJkcWj8jQDjybb/ping-pong-computation-in-superposition

Circuit Superposition

Cross-layer Superposition

Circuit Superposition

Recommendations