Random forest algorithm is a supervised learning algorithm. It is an ensemble learning algorithm which means it combines multiple algorithm of the same type and takes the average to improve the predictive accuracy. The random forest can be used for both regression and classification tasks.
In ensemble learning, we join different types of algorithm or same types of algorithms multiple times to form a strong prediction model. Random forest is formed of combining many Decision tree Algorithm. The name random forest came because it groups the decision trees which produces forest of trees to produce output.
The diagrammatic representation of Random Forest
How the algorithm works
- Select N number of random data samples from the dataset.
- Build Decision tree for each sample (subsets) and predict the result for each decision tree.
- Voting is done for each predicted result of decision tree.
- The prediction result with majority votes win.
- For new data points, find the predictions of each decision tree, and assign the new data points to the category that wins the majority votes.
Why Random Forest?
For larger datasets (big data) we need to form a deep decision tree, this may suffer from over fitting. But random forest prevent over fitting by creating trees on random subsets of training sample. The greater number of trees in the forest leads to higher accuracy.
Advantages of Random Forest
- It is one of the most accurate machine learning algorithms available. For most of the data sets, it produces a highly accurate result.
- It runs efficiently and performs well on large datasets.
- Random forest can automatically handle missing values.
- It can also maintain accuracy when a large proportion of data is missing.
Disadvantages of Random Forest
- Random Forest creates a lot of trees and combines their outputs, so his algorithm requires much more computational power and resources.
- Random forest require much more time to train compared to other algorithms.