Model Generalization
주어진 정보 외의 것도 이용해 혹은 정보 없이 논리를 쌓아 답까지 진행
AI Reasoning Types
Procedural Knowledge in Pretraining
We observe that code data is highly influential for reasoning. StackExchange as a source has more than ten times more influential data in the top and bottom portions of the rankings than expected if the influential data was randomly sampled from the pretraining distribution. Other code sources and ArXiv & Markdown are twice or more as influential as expected when drawing randomly from the pretraining distribution