Evo 2

Creator
Creator
Seonglae ChoSeonglae Cho
Created
Created
2024 Mar 8 6:7
Editor
Edited
Edited
2025 Oct 6 16:29
Refs
Refs
Based on
Striped Hyena
architecture instead of traditional Transformers, combining convolutional filters with a gating mechanism
 
 
 
www.nature.com
AI can now model and design the genetic code for all domains of life with Evo 2 | Arc Institute
Arc Institute develops the largest AI model for biology to date in collaboration with NVIDIA, bringing together Stanford University, UC Berkeley, and UC San Francisco researchers
AI can now model and design the genetic code for all domains of life with Evo 2 | Arc Institute
Manuscript | Arc Institute
Arc Institute is a independent nonprofit research organization headquartered in Palo Alto, California.
Manuscript | Arc Institute
Evo 2 Interpretability from
GoodFire AI
www.goodfire.ai
We're thrilled to announce our collaboration with Arc Institute, a nonprofit research organization pioneering long-context biological foundation models (the "Evo" series). Through our partnership, we've developed methods to understand their model with unprecedented precision, enabling the extraction of meaningful units of model computation (i.e., features1Features are interpretable patterns we extract from neural network neuron activity, revealing how the model processes information. They represent meaningful concepts that emerge from complex neural interactions - like a model's understanding of 'α-helices'). Preliminary experiments have shown promising directions for steering these features to guide DNA sequence generation, though this work is still in its early stages.
www.goodfire.ai
Evo 2: DNA Foundation Model | Arc Institute
Arc Institute is a independent nonprofit research organization headquartered in Palo Alto, California.
Evo 2: DNA Foundation Model | Arc Institute

Evogeneao tree
Manifold
of Evo 2 in Mechanistic Interpretability

→ Confirmed distinct cluster formation by phylogenetic classification (class, order, etc.), calculated KNN graph + geodesic distance → Verified strong correlation with actual phylogenetic distances. The model's embedding space reflects phylogenetic tree relationships as a curved manifold, achieving 0.98 correlation with phylogenetic distances when learning a 10-dimensional "flat representation". Embeddings use
Codon
and also capture species-specific statistical "DNA styles" such as GC content
Finding the Tree of Life in Evo 2
In this research update, we uncover how Evo 2, a DNA foundation model, represents the “tree of life”—the phylogenetic relationships between species. We find that phylogeny is encoded geometrically in the distances along a curved manifold, one of the most complex manifold examples yet found (to our knowledge) in a foundation model. Our results support an emerging picture of feature manifolds—that they tend to have a dominant flat representation (with respect to the ambient space) plus higher curvature deviations—and point to both better ways of understanding scientific AI models and better interpretability techniques.
Finding the Tree of Life in Evo 2
 
 

Backlinks

Arc Institute

Recommendations