Pretraining

Creator
Creator
Seonglae Cho
Created
Created
2023 Mar 7 14:1
Editor
Edited
Edited
2025 Feb 1 16:27

The process of artificial neural networks extracting features from data and abstracting separation in each neuron.

Updates all parameters while gaining general comprehension ability

Dataset for AI are three types

  • Problems with solution -
    SFT
Pre Training Notion
 
 
 

How training process and loss value is related to neural network’s ability

notion image
Perhaps the most striking phenomenon the Anthropic have noticed is that the learning dynamics of toy models with large numbers of features appear to be dominated by "energy level jumps" where features jump between different feature dimensionalities.
notion image
 
 

Procedural Knowledge in Pretraining

We observe that code data is highly influential for reasoning. StackExchange as a source has more than ten times more influential data in the top and bottom portions of the rankings than expected if the influential data was randomly sampled from the pretraining distribution. Other code sources and ArXiv & Markdown are twice or more as influential as expected when drawing randomly from the pretraining distribution
 
 
 
 

Recommendations