Emergent ability

Creator

Creator

Created

Created

2023 Dec 16 10:54

Editor

Editor

Edited

Edited

2025 Jun 2 11:41

Refs

Refs

AI Extrapolation

Model Generalization

Model Phase Change

Emergence, compositional generalization

Deep Learning presents a challenge to classical statistical learning theory. Neural networks often achieve zero training error, yet they generalize well to unseen data. This contradicts traditional expectations and makes many classical generalization bounds ineffective.

Sparse activation and the

Superposition Hypothesis have been proposed as possible explanations for the

Grokking phenomenon, where models learn to activate sparsely and generalize well after initially overfitting when trained on very large datasets.

Observation

computing cost is decreasing exponentially

low level (transformer) for high incentive structure (intelligence)

unlike human, machine has different time budget

Intuition

some abilities emerge with scale

Emergent ability this idea doesn't work yet

for such scalability and generalization, time and computing required

Approach

make learn them how we think

matmul + length + dimension

For superintelligence, we don't necessarily need to follow human methods (loss 0)

learning objective and reasoning from induced incentive

larning something general for milons of real world task

https://aninternetreference.substack.com/p/jeff-bezos-on-generative-ai-theyre

https://www.youtube.com/watch?v=Ount2Y4qxQo

Next Token Prediction

Instruction Tuning or
Chat AI

AI Alignment

Emergent Ability does not follows

Scaling Law and does not occur on small model, which is aligned with the view of

Induction head generation.

Emergent Abilities of Large Language Models

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon...

https://arxiv.org/abs/2206.07682

OpenAI perspective

Large Language Models (in 2023)

I gave a talk at Seoul National University. I titled the talk “Large Language Models (in 2023)”. This was an ambitious attempt to summarize our exploding field. Trying to summarize the field forced me to think about what really matters in the field. While scaling undeniably stands out, its far-reaching implications are more nuanced. I share my thoughts on scaling from three angles: 1:02 1) Change in perspective is necessary because some abilities only emerge at a certain scale. Even if some abilities don’t work with the current generation LLMs, we should not claim that it doesn’t work. Rather, we should think it doesn’t work yet. Once larger models are available many conclusions change. This also means that some conclusions from the past are invalidated and we need to constantly unlearn intuitions built on top of such ideas. 7:12 2) From first-principles, scaling up the Transformer amounts to efficiently doing matrix multiplications with many, many machines. I see many researchers in the field of LLM who are not familiar with how scaling is actually done. This section is targeted for technical audiences who want to understand what it means to train large models. 27:52 3) I talk about what we should think about for further scaling (think 10000x GPT-4 scale). To me scaling isn’t just doing the same thing with more machines. It entails finding the inductive bias that is the bottleneck in further scaling. I believe that the maximum likelihood objective function is the bottleneck in achieving the scale of 10000x GPT-4 level. Learning the objective function with an expressive neural net is the next paradigm that is a lot more scalable. With the compute cost going down exponentially, scalable methods eventually win. Don’t compete with that. In all of these sections, I strive to describe everything from first-principles. In an extremely fast moving field like LLM, no one can keep up. I believe that understanding the core ideas by deriving from first-principles is the only scalable approach. Disclaimer: I give my personal opinions and the talk material doesn't reflect my employer's opinion in any way.

Large Language Models (in 2023)

https://www.youtube.com/watch?v=dbo3kNKPaUA

Large Language Models (in 2023)

[1hr Talk] Intro to Large Language Models

This is a 1 hour general-audience introduction to Large Language Models: the core technical component behind systems like ChatGPT, Claude, and Bard. What they are, where they are headed, comparisons and analogies to present-day operating systems, and some of the security-related challenges of this new computing paradigm. As of November 2023 (this field moves fast!). Context: This video is based on the slides of a talk I gave recently at the AI Security Summit. The talk was not recorded but a lot of people came to me after and told me they liked it. Seeing as I had already put in one long weekend of work to make the slides, I decided to just tune them a bit, record this round 2 of the talk and upload it here on YouTube. Pardon the random background, that's my hotel room during the thanksgiving break. - Slides as PDF: https://drive.google.com/file/d/1pxx_ZI7O-Nwl7ZLNk5hI3WzAsTLwvNU7/view?usp=share_link (42MB) - Slides. as Keynote: https://drive.google.com/file/d/1FPUpFMiCkMRKPFjhi9MAhby68MHVqe8u/view?usp=share_link (140MB) Few things I wish I said (I'll add items here as they come up): - The dreams and hallucinations do not get fixed with finetuning. Finetuning just "directs" the dreams into "helpful assistant dreams". Always be careful with what LLMs tell you, especially if they are telling you something from memory alone. That said, similar to a human, if the LLM used browsing or retrieval and the answer made its way into the "working memory" of its context window, you can trust the LLM a bit more to process that information into the final answer. But TLDR right now, do not trust what LLMs say or do. For example, in the tools section, I'd always recommend double-checking the math/code the LLM did. - How does the LLM use a tool like the browser? It emits special words, e.g. |BROWSER|. When the code "above" that is inferencing the LLM detects these words it captures the output that follows, sends it off to a tool, comes back with the result and continues the generation. How does the LLM know to emit these special words? Finetuning datasets teach it how and when to browse, by example. And/or the instructions for tool use can also be automatically placed in the context window (in the “system message”). - You might also enjoy my 2015 blog post "Unreasonable Effectiveness of Recurrent Neural Networks". The way we obtain base models today is pretty much identical on a high level, except the RNN is swapped for a Transformer. http://karpathy.github.io/2015/05/21/rnn-effectiveness/ - What is in the run.c file? A bit more full-featured 1000-line version hre: https://github.com/karpathy/llama2.c/blob/master/run.c Chapters: Part 1: LLMs 00:00:00 Intro: Large Language Model (LLM) talk 00:00:20 LLM Inference 00:04:17 LLM Training 00:08:58 LLM dreams 00:11:22 How do they work? 00:14:14 Finetuning into an Assistant 00:17:52 Summary so far 00:21:05 Appendix: Comparisons, Labeling docs, RLHF, Synthetic data, Leaderboard Part 2: Future of LLMs 00:25:43 LLM Scaling Laws 00:27:43 Tool Use (Browser, Calculator, Interpreter, DALL-E) 00:33:32 Multimodality (Vision, Audio) 00:35:00 Thinking, System 1/2 00:38:02 Self-improvement, LLM AlphaGo 00:40:45 LLM Customization, GPTs store 00:42:15 LLM OS Part 3: LLM Security 00:45:43 LLM Security Intro 00:46:14 Jailbreaks 00:51:30 Prompt Injection 00:56:23 Data poisoning 00:58:37 LLM Security conclusions End 00:59:23 Outro

[1hr Talk] Intro to Large Language Models

https://www.youtube.com/watch?v=zjkBMFhNj_g

[1hr Talk] Intro to Large Language Models

Adversarial opinion

Nonlinear metrics can exhibit exponential-like increases, leading to the misinterpretation of emergent abilities

Are Emergent Abilities of Large Language Models a Mirage?

Recent work claims that large language models display \textit{emergent abilities}, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent...

https://openreview.net/forum?id=ITw9edRDlD

Are Emergent Abilities of Large Language Models a Mirage?

Emergent Abilities of Large Language Models

Emergence can be defined as the sudden appearance of novel behavior. Large Language Models apparently display emergence by suddenly gaining new abilities as they grow. Why does this happen, and what does this mean?

Emergent Abilities of Large Language Models

https://www.assemblyai.com/blog/emergent-abilities-of-large-language-models/

Emergent Abilities of Large Language Models

Transformer can
Extrapolation and outperform without RL

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure...

One way to address safety risks from large language models (LLMs) is to censor dangerous knowledge from their training data. While this removes the explicit information, implicit information can...

https://arxiv.org/abs/2406.14546

Transcendence: Generative Models Can Outperform The Experts That Train Them

Generative models are trained with the simple objective of imitating the conditional probability distribution induced by the data they are trained on. Therefore, when trained on data generated by humans, we may not expect the artificial model to outperform the humans on their original objectives. In this work, we study the phenomenon of transcendence: when a generative model achieves capabilities that surpass the abilities of the experts generating its data. We demonstrate transcendence by training an autoregressive transformer to play chess from game transcripts, and show that the trained model can sometimes achieve better performance than all players in the dataset.111To play with our models, code, and data, please see our website at https://transcendence.eddie.win. We theoretically prove that transcendence is enabled by low-temperature sampling, and rigorously assess this experimentally. Finally, we discuss other sources of transcendence, laying the groundwork for future investigation of this phenomenon in a broader setting.

https://arxiv.org/html/2406.11741v1

for non experts

2024's Biggest Breakthroughs in Computer Science

The year's biggest breakthroughs in computer science included a new understanding of what’s going on in large language models (LLMs) and a breakthrough in computing Hamiltonians — models that represent complex quantum systems. Read more at Quanta Magazine: https://www.quantamagazine.org/the-year-in-computer-science-20241219/?swcfpc=1 0:04 - Can Large Language Models Understand? Are chatbots "stochastic parrots"? A new evaluation called Skill Mix suggests that the biggest large language models seem to learn enough skills to understand the words they’re processing. Read more: https://www.quantamagazine.org/new-theory-suggests-chatbots-can-understand-text-20240122/ 6:14 - Hamiltonian Learning Algorithm After years of false starts, a team of computer scientists has found a way to efficiently deduce the Hamiltonian of a quantum system at any constant temperature. Read more: https://www.quantamagazine.org/scientists-find-a-fast-way-to-describe-quantum-systems-20240501/ - VISIT our website: https://www.quantamagazine.org - LIKE us on Facebook: https://www.facebook.com/QuantaNews - FOLLOW us on Twitter: https://twitter.com/QuantaMagazine Quanta Magazine is an editorially independent publication supported by the Simons Foundation: https://www.simonsfoundation.org/

2024's Biggest Breakthroughs in Computer Science

https://youtu.be/fTMMsreAqX0?si=Oyv23Cq1BJeILtBP

2024's Biggest Breakthroughs in Computer Science

Utility Engineering

Value systems about AI preference with high degrees of structural coherence which emerges in scale

https://arxiv.org/pdf/2502.08640

Backlinks

In-context learning Reversing Transformer TTS Bidirectional LM LLM Model Generalization

Recommendations

//////