A VISUAL-EMPIRICAL STUDY OF SCALING EFFECTS AND HYPER- PARAMETER ROBUSTNESS IN K-NEAREST NEIGHBOR CLASSIFICATION
Keywords:
Key words: k-nearest neighbors algorithm, z-score standardization, hyper- parameter tuning, benchmark datasets (Iris, Wine, Breast-Cancer), PCA visualization, classification accuracy.Abstract
Annotation. The paper revisits the k-Nearest Neighbors (k-NN) algorithm by
combining mathematical exposition with empirical testing on three benchmark
datasets—Iris, Wine and Breast-Cancer. All features were z-score standardized;
classification accuracy was recorded for k ranging from 1 to 15. Two visual tools—an
accuracy-versus-k curve and a 2-D PCA scatter plot—highlight how hyper-parameter
choice affects performance and reveal the inherent class structure. Findings confirm
that, with proper scaling and a moderate neighborhood size (k ≈ 5–11), k-NN attains
stable accuracies of roughly 94–96 %.
References
1. Cover T.M., Hart P.E. Nearest Neighbor Pattern Classification. IEEE Transactions
on Information Theory 13 (1): 21–27, 1967.
2. Wilson D.R., Martinez T.R. Improved Heterogeneous Distance Functions. Journal
of Artificial Intelligence Research 6: 1–34, 1997. (arxiv.org)
3. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning: Data
Mining, Inference, and Prediction. 2nd ed., Springer, 2009.
4. Bishop C.M. Pattern Recognition and Machine Learning. Springer, 2006.
5. Jolliffe I.T., Cadima J. Principal Component Analysis: A Review and Recent
Developments. Philosophical Transactions of the Royal Society A 374 (2065):
20150202, 2016.
6. Pedregosa F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine
Learning Research 12: 2825–2830, 2011.
7. Johnson J., Douze M., Jégou H. Billion-Scale Similarity Search with GPUs. IEEE
Transactions on Big Data 7 (3): 535–547, 2021. (scirp.org)
8. Malkov Y.A., Yashunin D.A. Efficient and Robust Approximate Nearest Neighbor
Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on
Pattern Analysis and Machine Intelligence 42 (4): 824–836, 2020.
(en.wikipedia.org)
9. Cunningham P., Delany S.J. k-Nearest Neighbour Classifiers – A Tutorial. ACM
Computing Surveys 54 (6): 128:1–128:54, 2022.
10. Giannopoulos P.G., Dasaklis T.K., Rachaniotis N. Development and Evaluation of
a Novel Framework to Enhance k-NN Algorithm’s Accuracy in Data Sparsity
Contexts. Scientific Reports 14: 25036, 2024. (nature.com)
11. Halder R.K. et al. Enhancing k-Nearest Neighbor Algorithm: A Comprehensive
Review and Performance Analysis of Modifications. Journal of Big Data 11: 113,
2024. (journalofbigdata.springeropen.com)
12. Dua D., Graff C. UCI Machine Learning Repository. University of California,
Irvine, 2019. (archive.ics.uci.edu)
13. Park H.S., Pastor D. A Comprehensive Survey on Feature Scaling Techniques for
k-Nearest Neighbor. Pattern Recognition Letters 167: 60–66, 2023.
14. Aggarwal C.C., Reddy C.K. Data Clustering: Algorithms and Applications. 2nd
ed., CRC Press, 2023.
15. Fix E., Hodges J.L. Discriminatory Analysis: Nonparametric Discrimination,
Consistency Properties. USAF School of Aviation Medicine, Technical Report 4,
1951.