Bootstrap AGGregation
Bagging is a special case of the model averaging approach
Bagging (Bootstrap AGGregation) is a powerful ensemble learning method that creates multiple models trained on randomly sampled subsets of the original dataset. This technique effectively reduces variance and helps prevent overfitting.
Key Characteristics
- Creates multiple training datasets through random sampling with replacement (bootstrapping)
- Commonly used with decision trees, though applicable to various model types
- Functions as a specialized form of model averaging
Implementation Methods
- Random Patches: Combines both feature and dataset sampling (bootstrapping)
- Random Subspace: Utilizes only feature sampling
Statistical Impact
Feature bagging typically increases bias while reducing variance, whereas dataset bagging maintains similar bias levels while decreasing variance. When compared to single-model training on the complete dataset, bagging generally maintains similar bias but achieves lower variance through the use of mode (most frequent value) for final predictions.
In statistical terms, bootstrapping refers to resampling with replacement, allowing the same data point to be selected multiple times during the sampling process.