Populate the Canvas
Click to add data points
Fork me on GitHub
Machine Learning Playground
Feedback

Parameters:

K:

K Nearest Neighbors

TL;DR - Birds of a feather flock together

Picks the k closest points from training data, then decides prediction via popular vote.

Parameters

  • k (≥ 1): number of closest neighbors to select

Use Cases:

  • Binary Classification
  • Multi-class Classification
  • Regression
A simple and straightforward algorithm. The underlying assumption is that datapoints close to each other share the same label.
Analogy: if I hang out with CS majors, then I'm probably also a CS major (or that one Philosophy major who's minoring in everything.)
Note that distance can be defined different ways, such as Manhattan (sum of all features, or inputs), Euclidean (geometric distance), p-norm distance...typically Euclidean is used (like in this demo), but Manhattan can be faster and thus preferable.

The Good

  • Simple to implement

The Bad

  • Non-Parametric - size of model grows as training data grows. It could take a long time to compute distances for billions of datapoints.
  • Curse of Dimensionality - as number of features increase (ie. more dimensions), the average distance between randomly distributed points converge to a fixed value. This means that most points end up equidistant to each other - so distance becomes less meaningful as a metric