Scaling Law

Creator

Creator

Seonglae Cho

Created

Created

2024 May 24 4:21

Editor

Editor

Seonglae Cho

Edited

Edited

2024 Dec 20 14:14

Refs

Refs

Chinchilla Scaling

Neural Scaling law with
Cross Entropy

The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. These relationships allow us to determine the optimal allocation of a fixed compute budget.

Larger models has significantly more

Sample efficiency, such that optimally compute-efficient training involves training very large models on a relatively modest amount of data and stopping significantly before convergence.

N - parameters, D - dataset size, C – compute

notion image

Larger models are more sample efficient

Transfer improves with test performance

notion image

notion image

Single layer transformer doesn’t have ability — Single layer transformer doesn’t have
In-context learning ability

It is feasible when increasing N and D simultaneously but it has capacity when one the them is fixed.

notion image

Larger model more easily achieves higher performance compared to smaller model which means has better

Sample efficiency. Consequently, it is good to scale model size than data size when the computing cost is constant.

Dataset

notion image

OpenAI

Scaling Laws for Neural Language Models

https://arxiv.org/pdf/2001.08361

Deepmind

UNIFIED SCALING LAWS FOR ROUTED LANGUAGE MODELS

https://arxiv.org/pdf/2202.01169

Training Compute-Optimal Large Language Models (

Chinchilla Scaling )

https://arxiv.org/pdf/2203.15556

Wikipedia

Neural scaling law

In machine learning, a neural scaling law is an empirical scaling law that describes how neural network performance changes as key factors are scaled up or down. These factors typically include the number of parameters, training dataset size,[1][2] and training cost.

Neural scaling law

https://en.wikipedia.org/wiki/Neural_scaling_law

Backlinks

AI Scaling Emergent ability

Recommendations

//////