Group Composition via Representations
This paper analyzes how the GCR (Group Composition via Representations) algorithm based on Representation Theory is implemented in small neural networks trained on Finite Group addition and multiplication through reverse engineering.
Method
The architecture was divided into four weight blocks from the beginning - "left embedding / right embedding / MLP / unembedding" - but this structure (embedding×2 + MLP + unembedding) was trained entirely end-to-end and then the learned weights were reverse-engineered to identify and interpret "these four blocks each store, compute, and retrieve representation matrices." These weights function as lookup tables that store and retrieve "Representation matrix" values. The trained weight sets store the representation matrices of specific finite groups directly as embedding tables, allowing the network to perform predefined algorithms.
Results
Dihedral Group data is represented by 2-dimensional standard representation (rotation+reflection), while Symmetric Group shows 1-dimensional sign representation and n-1 dimensional standard representation. Permutation Group exhibits 1-dimensional trivial representation, 1-dimensional sign representation, n-1 dimensional standard representation, and standard ⊗ sign Kronecker product dimensions, along with higher-dimensional representations corresponding to Young tableau when needed.

Insights
Learning progresses in three stages: memorization → circuit formation → cleanup, and shows that generalization performance improves dramatically only after universal circuits are formed. While all models use variants of the GCR algorithm, they differ randomly in which irreducible representations they learn, how many, and in what order, refuting "strong universality" but supporting "weak universality" (existence of general principles).