The fastknn provides a function to do feature extraction using KNN. It generates k * c new features, where c is the number of class labels. The new features are computed from the distances between the observations and their k nearest neighbors inside each class, as follows: The first test feature contains the distances between each test instance and its nearest neighbor inside the first class. The second test feature contains the sums of distances between each test instance and its 2 nearest neighbors inside the first class. The third test feature contains the sums of distances between each test instance and its 3 nearest neighbors inside the first class. And so on. This procedure repeats for each class label, generating k * c new features. Then, the new training features are generated using a n-fold CV approach, in order to avoid overfitting. Parallelization is available. You can specify the number of threads via nthread parameter. The feature extraction technique proposed here is based on the ideas presented in the winner solution of the Otto Group Product Classification Challenge on Kaggle. The following example shows that the KNN features carry information about the original data that can not be extracted by a linear learner, like a GLM model: library("mlbench") library("caTools") library("fastknn") library("glmnet") #### Load data data("Ionosphere", package = "mlbench") x <- data.matrix(subset(Ionosphere, select = -Class)) y <- Ionosphere$Class #### Remove near zero variance columns x <- x[, -c(1,2)] #### Split data set.seed(123) tr.idx <- which(sample.split(Y = y, SplitRatio = 0.7)) x.tr <- x[tr.idx,] x.te <- x[-tr.idx,] y.tr <- y[tr.idx] y.te <- y[-tr.idx] #### GLM with original features glm <- glmnet(x = x.tr, y = y.tr, family = "binomial", lambda = 0) yhat <- drop(predict(glm, x.te, type = "class")) yhat1 <- factor(yhat, levels = levels(y.tr)) #### Generate KNN features set.seed(123) new.data <- knnExtract(xtr = x.tr, ytr = y.tr, xte = x.te, k = 3) #### GLM...
First seen: 2025-11-16 00:56
Last seen: 2025-11-16 02:56