Classify data using the K-nearest neighbors algorithm in JavaScript

Definition
The K-nearest neighbors algorithm is a simple, instance-based learning algorithm used for classification and regression tasks. It classifies a data point based on the majority class of its k
nearest neighbors in a feature space.
Implementation
This implementation classifies a data point relative to a labelled data set, using the K-nearest neighbors algorithm. The data
is expected to be an array of objects, where each object represents a data point with its features. The labels
are an array of the corresponding classes for each data point.
- Use
Array.prototype.map()
to map thedata
to objects. Each object contains the Euclidean distance of the element frompoint
, calculated usingMath.hypot()
,Object.keys()
and itslabel
. - Use
Array.prototype.sort()
andArray.prototype.slice()
to get thek
nearest neighbors ofpoint
. - Use
Array.prototype.reduce()
in combination withObject.keys()
andArray.prototype.indexOf()
to find the most frequentlabel
among them. - Return the most frequent
label
as the classification ofpoint
.
const kNearestNeighbors = (data, labels, point, k = 3) => {
const kNearest = data
.map((el, i) => ({
dist: Math.hypot(...Object.keys(el).map(key => point[key] - el[key])),
label: labels[i]
}))
.sort((a, b) => a.dist - b.dist)
.slice(0, k);
return kNearest.reduce(
(acc, { label }, i) => {
acc.classCounts[label] =
Object.keys(acc.classCounts).indexOf(label) !== -1
? acc.classCounts[label] + 1
: 1;
if (acc.classCounts[label] > acc.topClassCount) {
acc.topClassCount = acc.classCounts[label];
acc.topClass = label;
}
return acc;
},
{
classCounts: {},
topClass: kNearest[0].label,
topClassCount: 0
}
).topClass;
};
const data = [[0, 0], [0, 1], [1, 3], [2, 0]];
const labels = [0, 1, 1, 0];
kNearestNeighbors(data, labels, [1, 2], 2); // 1
kNearestNeighbors(data, labels, [1, 0], 2); // 0
Complexity
The time complexity of this algorithm is O(n)
for each classification, where n
is the number of data points. The space complexity is O(n)
as well, since we are storing the distances and labels for each data point. The algorithm is non-parametric, meaning it does not make any assumptions about the underlying data distribution. It is also lazy, as it does not require a training phase, making it suitable for online learning scenarios.