Random Forest

Creator
Creator
Seonglae Cho
Created
Created
2021 Oct 6 14:39
Editor
Edited
Edited
2025 Mar 24 22:9
Refs
Refs

Bagging
Decision tree

Random Forest is an ensemble learning method that improves upon bagging by reducing correlation between decision trees. It achieves this by creating multiple decorrelated trees and combining their predictions through averaging.
Bagging trees leads to reduction in variance but not bias. The idea in random forest is to improve the variance reduction of bagging by reducing the correlation between trees. This is achieved through random selection of the input variables/features. 1) For each tree: Pick a bootstrap sample of data 2) For each split: Pick random sample of the features. Builds a large collection of de-correlated trees and averages them.

Key Characteristics

  • Similar performance to boosting but simpler to train and tune.
  • Creates a "forest" of multiple decision trees
  • Generates predictions by averaging results from decorrelated trees
  • Offers comparable performance to boosting while being more straightforward to implement and optimize
When creating trees, each node splits on a random subset of features to ensure diversity in the forest structure.

Impurity and Information Gain

The splitting criterion at each node aims to minimize impurity (or maximize homogeneity) in the resulting subsets. In information theory, this reduction in uncertainty is known as Information Gain.

Extra-Trees

Extra-Trees (Extremely Randomized Trees) takes randomization a step further. Instead of searching for optimal thresholds for node splitting, it randomly generates splitting points and selects the best among these random splits. This approach significantly reduces computational time compared to standard Random Forest, which searches for optimal thresholds at each node.
A key advantage of Random Forest is its ability to measure feature importance. In Scikit-Learn, this is calculated by measuring how much each feature reduces impurity when used in node splitting decisions.

Applications

One practical application is visualizing feature importance, such as displaying pixel-wise importance in image analysis tasks.
notion image
 
 
 
 
 
 

Recommendations