Lazy learning in k-nearest neighbor

It is said; The k-nearest-neighbor is an example of a “lazy learner” algorithm. What does it mean by ‘lazy learner’?

1 Like

KNN is a lazy learning algorithm. It doesn’t mean the algorithm itself is very lazy in nature :grinning:. But its’ prediction time is very expensive because it has no training phase. When we feed our data into KNN model it takes the whole the data into account for prediction. Each and every time it does the same.
That’s why it is known as a lazy learning algorithm.

1 Like

Why do we use the fit() method, if we are not training the model?

1 Like

Hey @nileshsuryavanshi395,
Thanks for your enthusiasm to dig deep into the topic.
How does fit() method works?
the algorithm which has a training phase known as Eager learning algorithm. For eager learning algorithm fit() method takes the input, train the model and generalize it for further prediction with parameter and hyperparameter. knn has no training phase. In the fit() method, it stores the input training data, saves the hyperparameter, sorts out the data for the calculation in the test phase.


Other than training model scikit-learn uses fit() method to choose the most appropriate algorithm based on the values passed to the fit() method

Choosing the best optimal algorithm based on given data can make a significant impact on time complexity & space complexity.

According to the scikit-learn documentation on KNeighborsClassifier
scikit-learn can use

In KNN we have to search for nearest neighbours in the training set every time which can be an expensive operation if the training set is large.

But there are techniques to speed up this search, which typically work by creating various data structures based on the training set.

The general idea is that some of the computational work needed to classify new points is common across points. So, this work can be done ahead of time and then re-used, rather than repeated for each new instance.

So scikit-learn uses kd-trees or ball trees KNN implementations to do this work during the training phase by calling the fit() method.

P.S. Time-complexity if we use brute-force search for KNN is O(n)
But it can be reduced to O(log(k)*n)
where n = Number of data point & k = nearest neighbors

1 Like

Bhai samajh nhi aaya :thinking:

But, in the DQ exercise, they tell us to train model with the train dataset. I read many blogs, some of them says there is no training phase and some of them says to train the model. What is right about it? It is making me confuse.

According to the DQ step 7 Fitting a model and making predictions in Multivariate K-Nearest Neighbors mission

When the fit() method is called, scikit-learn stores the training data we specified within the KNearestNeighbors instance ( knn ).

According to DQ exercise for step 7 Fitting a model and making predictions

Use the fit method to specify the data we want the k-nearest neighbor model to use.

For now, to avoid confusion I will recommend that you stick with DQ’s explanation
For KNN fit() method is used to store the training data

If you still want to read further & understand more about fit() method for KNN
I recommend this article

1 Like