Introduction
Another categorical method of prediction. KNN takes a comparison approach, where new data point is compared to other points that are close by.
It works as shown in the graph below. The Euchlidian distance is calculated between the 5 closest data points. And simply just assign the new data point to the category with largest number of nearest neighbors, in this case it would be category 1.
Steps for KNN
- Choose the number k of neighbors (default is usually 5)
- Take the K nearest neighbors of the new data point, accoding to the Euclidean distance (or any other distance formula)
- Among these K neighbors, count the number of data points in each category
- Assign the new data point to the category where you counted the most neighbors
Building the K-NN in Python
Use classsification Template and preprocess the data.
import kNeighborsClassifier
from the sklearn.neighbors
library and fit the object to the training set.
‘KNeighborsClassifier’ has the following parameters :
n_neighbors
: # of neighbors to use
metric
: Distance metric for use for tree. Use ‘minkowski’
p
: Power parameter. Euclidean distance = 2.
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifiers(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)