Machine Learning with Swift
上QQ阅读APP看书,第一时间看更新

KNN cons

  • The algorithm is fast for training but slow for inference.
  • You need to choose the best k somehow (see Choosing a good k section).
  • With the small values of k, the model can be badly affected by outliers; in other words, it's prone to overfitting.
  • You need to choose a distance metric. For usual real value features, one can choose among many available options (see Calculating the distance section) resulting in different closest neighbors. The metric used by default in many machine learning packages is the Euclidean distance; however, this choice is nothing more than a tradition and for many applications is not the optimal.
  • Model size grows with the new data incorporated.
  • What should we do if there are several identical samples with different labels? In this case, the result can be different depending on the order in which samples are stored.
  • The model suffers from the curse of dimensionality.