Teach language models to follow instructions to solve a task
By using instruction datasets for Fine Tuning, we can improve Zero shot learning performance without needing to show data directly, unlike Few Shot Prompting.
The key industrial significance is that anyone can easily create datasets. Large datasets aren't necessary for instruction learning - around 50k examples is sufficient, and data quality is more important than quantity.
Instruction Tuning Notion
Instruction Tuning Usages
Evaluation for Instruction tuning
Fine tuned LLM are more good at Positional Information processing such as Entity Tracking than base model. Position Transmitter or Value Fetcher mechanism using Activation Patching reproduced similar ability with Fine tuned model in base model.
Instruction tuned model’s Attention head are more attending on verbs to understand instruction.