More intelligence for free by scaling
recommend that training tokens should be scaled linearly with model size.
AI Scaling Notion
AI Scaling Methods
Scaling Law (OpenAI 2020)
Primate neural architecture that’s really scalable in comparison to the brains of other kinds of species, analogous to how transformers have better scaling curves than LSTMs and RNNs.