Pretraining scaling and reasoning model Test-time Scaling represent the third axis of scaling. How long we can continue training and educating models will become a competitive advantage in the industry, and this intersects with Catastrophic forgetting to create an intelligence that doesn't die, appearing opposite to humans. However, the catastrophic forgetting problem also applies to humans, making it a more fundamental and unavoidable issue.
AI optimized for continual learning may emerge not as superintelligence but as super-learners, appearing as distinct individuals as we once imagined. There has been excessive faith in general AI due to the limitations of narrow AI before meta learners and few-shot learners. However, if we understand that AI's more fundamental paradigm lies not in ability itself but in learning prior ability, we can see that advanced narrow AI capable of learning anything in a general way is also valuable. Why this is good is it inducing intelligence not knowledge.
Continual Learning Notion

Ilya Sutskever 2025
AGI is intelligence that can learn to do anything. The deployment of AGI has gradualism as an inherent component of any plan. This is because the way the future approaches typically isn't accounted for in predictions, which don't consider gradualism. The difference lies in what to release first. w
The term AGI itself was born as a reaction to past criticisms of narrow AI. It was needed to describe the final state of AI. Pre-training is the keyword for new generalization and had a strong influence. The fact that RL is currently task-specific is part of the process of erasing this imprint of generality. First of all, humans don't memorize all information like pre-training does. Rather, they are intelligence that is well optimized for Continual Learning by adapting to anything and managing the Complexity-Robustness Tradoff.Abstraction and Reasoning Corpus
Ilya Sutskever – We're moving from the age of scaling to the age of research
Ilya & I discuss SSI’s strategy, the problems with pre-training, how to improve the generalization of AI models, and how to ensure AGI goes well.
𝐄𝐏𝐈𝐒𝐎𝐃𝐄 𝐋𝐈𝐍𝐊𝐒
* Transcript: https://www.dwarkesh.com/p/ilya-sutskever-2
* Apple Podcasts: https://podcasts.apple.com/us/podcast/dwarkesh-podcast/id1516093381?i=1000738363711
* Spotify: https://open.spotify.com/episode/7naOOba8SwiUNobGz8mQEL?si=39dd68f346ea4d49
𝐒𝐏𝐎𝐍𝐒𝐎𝐑𝐒
- Gemini 3 is the first model I’ve used that can find connections I haven’t anticipated. I recently wrote a blog post on RL’s information efficiency, and Gemini 3 helped me think it all through. It also generated the relevant charts and ran toy ML experiments for me with zero bugs. Try Gemini 3 today at https://gemini.google
- Labelbox helped me create a tool to transcribe our episodes! I’ve struggled with transcription in the past because I don’t just want verbatim transcripts, I want transcripts reworded to read like essays. Labelbox helped me generate the *exact* data I needed for this. If you want to learn how Labelbox can help you (or if you want to try out the transcriber tool yourself), go to https://labelbox.com/dwarkesh
- Sardine is an AI risk management platform that brings together thousands of device, behavior, and identity signals to help you assess a user’s risk of fraud & abuse. Sardine also offers a suite of agents to automate investigations so that as fraudsters use AI to scale their attacks, you can use AI to scale your defenses. Learn more at https://sardine.ai/dwarkesh
To sponsor a future episode, visit https://dwarkesh.com/advertise
𝐓𝐈𝐌𝐄𝐒𝐓𝐀𝐌𝐏𝐒
00:00:00 – Explaining model jaggedness
00:09:39 - Emotions and value functions
00:18:49 – What are we scaling?
00:25:13 – Why humans generalize better than models
00:35:45 – Straight-shotting superintelligence
00:46:47 – SSI’s model will learn from deployment
00:55:07 – Alignment
01:18:13 – “We are squarely an age of research company”
01:29:23 -- Self-play and multi-agent
01:32:42 – Research taste
https://www.youtube.com/watch?v=aR20FWCCjAs

Keep following up up-to-date information
[AI 논문리뷰] Continual Learning on Deep Learning
Catastrophic forgetting and Continual learning of deep neural network.
https://winnerus.medium.com/ai-논문리뷰-continual-learning-on-deep-learning-16969792acc7
![[AI 논문리뷰] Continual Learning on Deep Learning](https://www.notion.so/image/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afit%3A987%2F1*8smZsZHfrRm_xyjhW_hf-w.png?table=block&id=eb19e61e-9b21-4d77-9e91-d16b6932a3eb&cache=v2)
Continual Learning: 꾸준히 성장하는 모델을 만들기 위한 기술
주제별로 알아보는 continual learning
https://tech.scatterlab.co.kr/continual-learning/


Seonglae Cho![[AI 논문리뷰] Continual Learning on Deep Learning](https://www.notion.so/image/https%3A%2F%2Fmiro.medium.com%2Fv2%2Fresize%3Afill%3A152%3A152%2F1*sHhtYhaCe2Uc3IU0IgKwIQ.png?table=block&id=eb19e61e-9b21-4d77-9e91-d16b6932a3eb&cache=v2)