K-Nearest Neighbors(KNN) is one of the simplest supervised machine learning algorithms used for classification. KNN algorithm performs by matching the test data with K neighbor training examples and decides its group. K is the number of neighbors in KNN.
The KNN algorithm is used in the following scenarios:
- Data is noise-free
- Data is labeled
- Dataset is small
Selection of K-Value:
Selecting the right K value is a process called parameter tuning, which is important to achieve higher accuracy. There is not a definitive way to determine the best value of K, It depends on the type of problem you are solving.
It’s very important to have the right k-value when analyzing the dataset to avoid overfitting and underfitting of the dataset.
Nearest neighbors calculation:
To find the nearest neighbors, we will calculate the Euclidean distance.
The Euclidean distance between two points in the plane with coordinates (x,y) is given by:
Factors such as k value, distance calculation, and choice of appropriate predictors all have a significant impact on the model performance.
In the above image, the nearest neighbor value k=3 and the new data point is classified by the majority of votes from three neighbors. The new point would be classified as red since two out of three neighbors are red.
Non-parametric learning algorithm
KNN is also a non-parametric learning algorithm because it doesn’t assume anything about the underlying data.
Lazy Learning algorithm
KNN is a lazy learning algorithm because it does not have a specialized training phase and uses all the data for training while classification.
Implementation of KNN algorithm using python
# Importing necessary modules
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
irisData = load_iris()
x = irisData.data
y = irisData.target
# Splitting data into training and test set
x_train, x_test, y_train, y_test = train_test_split( x, y, test_size = 0.2, random_state=42)
knn = KNeighborsClassifier(n_neighbors=5)
# Predicting results
Advantages of KNN:
- The algorithm is simple and easy to implement and works well with small datasets.
- There’s no need to build a model, tune several parameters, or make additional assumptions.
Disadvantages of KNN:
- The algorithm gets slower as the number of predictors increase.
- Sensitive to the scale of the data and irrelevant features.
Applications of KNN Algorithm:
KNN Algorithm can be used in credit scoring, prediction of cancer cells, image recognition, etc,