Based on Striped Hyena architecture instead of traditional Transformers, combining convolutional filters with a gating mechanism
AI can now model and design the genetic code for all domains of life with Evo 2 | Arc Institute
Arc Institute develops the largest AI model for biology to date in collaboration with NVIDIA, bringing together Stanford University, UC Berkeley, and UC San Francisco researchers
https://arcinstitute.org/news/blog/evo2

Manuscript | Arc Institute
Arc Institute is a independent nonprofit research organization headquartered in Palo Alto, California.
https://arcinstitute.org/manuscripts/Evo2

Evo 2 Interpretability from GoodFire AI
www.goodfire.ai
We're thrilled to announce our collaboration with Arc Institute, a nonprofit research organization pioneering long-context biological foundation models (the "Evo" series). Through our partnership, we've developed methods to understand their model with unprecedented precision, enabling the extraction of meaningful units of model computation (i.e., features1Features are interpretable patterns we extract from neural network neuron activity, revealing how the model processes information. They represent meaningful concepts that emerge from complex neural interactions - like a model's understanding of 'α-helices'). Preliminary experiments have shown promising directions for steering these features to guide DNA sequence generation, though this work is still in its early stages.
https://www.goodfire.ai/blog/interpreting-evo-2

Evo 2: DNA Foundation Model | Arc Institute
Arc Institute is a independent nonprofit research organization headquartered in Palo Alto, California.
https://arcinstitute.org/tools/evo/evo-mech-interp

Evogeneao tree Manifold of Evo 2 in Mechanistic Interpretability
→ Confirmed distinct cluster formation by phylogenetic classification (class, order, etc.), calculated KNN graph + geodesic distance → Verified strong correlation with actual phylogenetic distances. The model's embedding space reflects phylogenetic tree relationships as a curved manifold, achieving 0.98 correlation with phylogenetic distances when learning a 10-dimensional "flat representation". Embeddings use Codon and also capture species-specific statistical "DNA styles" such as GC content
Finding the Tree of Life in Evo 2
In this research update, we uncover how Evo 2, a DNA foundation model, represents the “tree of life”—the phylogenetic relationships between species. We find that phylogeny is encoded geometrically in the distances along a curved manifold, one of the most complex manifold examples yet found (to our knowledge) in a foundation model. Our results support an emerging picture of feature manifolds—that they tend to have a dominant flat representation (with respect to the ambient space) plus higher curvature deviations—and point to both better ways of understanding scientific AI models and better interpretability techniques.
https://www.goodfire.ai/papers/phylogeny-manifold


Seonglae Cho
