ThoughtComm

Identifies latent thought structures across multiple LLMs and enables latent-level communication

Uses a single shared

Jacobian SAE) to simultaneously encode and decode all agent hidden states.

Each agent selectively reads and shares only its relevant latent subspace during inference.

In gradient-level MoE, the Jacobian's zero/non-zero pattern effectively acts as a 'routing mask'.

Jacobian mask

Only a subset of SAE latent dimensions are connected to specific agents. This connection relationship is represented by the Jacobian mask, which is automatically formed during training through Jacobian sparsity regularization.

Since the mask varies with input, it's like

MoE Routing, but it's essentially static as a quasi-static routing map. Unlike MoE, an advantage is that it can work across different architecture families.

SAE Model Transferability

However, since SAE is already challenging for a single model, it's questionable how well representation matching will work when sharing across multiple models. It would likely require a lot of data.

Use hidden states after layer normalization + pooling

Unify autoencoder input dimensions (same embedding dim)

Jacobian sparsity forces alignment as a side-effect

ThoughtComm's SAE is likely not capturing a fully disentangled representation, but rather only capturing something like "low-rank correlated directions"

arxiv.org

https://arxiv.org/pdf/2510.20733

ThoughtComm

Jacobian mask

SAE Model Transferability

Recommendations