Model Generalization

Central goal of machine learning (
Interpolation +
Extrapolation)

To predict un-seen data and model’s generalization ability is model’s capability to adapt properly to new data.

Bias-Variance Trade-off to minimize complexity and variance to improve model generalization.

Model Generalization Notion

Deep double descent

PAC

Lottery ticket hypothesis

Robustness

AI Extrapolation

Neural Network Loss

Complexity-Robustness Tradoff

AI Generalization Methods

Hold-out Method

Nested cross validation

Random Sampling

Train/Validation/Test splitting

LLM Generality is a Timeline Crux — LessWrong

Short Summary LLMs may be fundamentally incapable of fully general reasoning, and if so, short timelines are less plausible. …

https://www.lesswrong.com/posts/k38sJNLk7YbJA72ST/llm-generality-is-a-timeline-crux

OOD generalization is crucial given the wide range of real-world scenarios in which these models are being used, while output diversity refers to the model’s ability to generate varied outputs and is important for a variety of use cases

RLHF generalizes better than SFT to new inputs, particularly as the distribution shift between train and test becomes larger. However, RLHF significantly reduces output diversity compared to SFT across a variety of measures, implying a tradeoff in current LLM fine-tuning methods between generalization and diversity.

arxiv.org

https://arxiv.org/pdf/2310.06452

Dreams are not meaningless byproducts, but rather evolved to prevent

Overfitting in the brain and aid generalization. Just as deep learning uses noise injection and

Dropout to prevent overfitting, dreams provide the brain with distorted, sparse, and hallucinatory inputs: the sparsity, hallucination, and narrative that differ from reality are precisely the "intentional corruption" that favors generalization.

In other words, dreams expose the brain to high-entropy data that differs from existing data, preventing overfitting

Model Collapse. This means the brain continuously generates its own

Synthetic Dataset for self-training, thereby generalizing its performance

This explains the phenomenology of dreams (their strangeness) better than

Memory Consolidation or emotional regulation theories.

Sleep (especially dream) deprivation → memorization remains intact but ability to respond to new situations declines. Dreams contribute to performance recovery after repeated overtraining. Fiction like novels and films can also help generalization like "artificial dreams"

arxiv.org

https://arxiv.org/pdf/2007.09560

Model Generalization

Central goal of machine learning (
Interpolation +
Extrapolation)

Backlinks

Recommendations

Model Generalization

Central goal of machine learning (Interpolation + Extrapolation)

Backlinks

Recommendations

Central goal of machine learning (
Interpolation +
Extrapolation)