A VISUAL-EMPIRICAL STUDY OF SCALING EFFECTS AND HYPER- PARAMETER ROBUSTNESS IN K-NEAREST NEIGHBOR  CLASSIFICATION

Tuxtabayev Qudratillo Axmadjanovich; Xo’jayev Shukurjon Ahmedovich

Authors

Tuxtabayev Qudratillo Axmadjanovich Author
Xo’jayev Shukurjon Ahmedovich Author

Keywords:

Key words: k-nearest neighbors algorithm, z-score standardization, hyper- parameter tuning, benchmark datasets (Iris, Wine, Breast-Cancer), PCA visualization, classification accuracy.

Abstract

Annotation. The paper revisits the k-Nearest Neighbors (k-NN) algorithm by
combining mathematical exposition with empirical testing on three benchmark
datasets—Iris, Wine and Breast-Cancer. All features were z-score standardized;
classification accuracy was recorded for k ranging from 1 to 15. Two visual tools—an
accuracy-versus-k curve and a 2-D PCA scatter plot—highlight how hyper-parameter
choice affects performance and reveal the inherent class structure. Findings confirm
that, with proper scaling and a moderate neighborhood size (k ≈ 5–11), k-NN attains
stable accuracies of roughly 94–96 %.

References

1. Cover T.M., Hart P.E. Nearest Neighbor Pattern Classification. IEEE Transactions

on Information Theory 13 (1): 21–27, 1967.

2. Wilson D.R., Martinez T.R. Improved Heterogeneous Distance Functions. Journal

of Artificial Intelligence Research 6: 1–34, 1997. (arxiv.org)

3. Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning: Data

Mining, Inference, and Prediction. 2nd ed., Springer, 2009.

4. Bishop C.M. Pattern Recognition and Machine Learning. Springer, 2006.

5. Jolliffe I.T., Cadima J. Principal Component Analysis: A Review and Recent

Developments. Philosophical Transactions of the Royal Society A 374 (2065):

20150202, 2016.

6. Pedregosa F. et al. Scikit-learn: Machine Learning in Python. Journal of Machine

Learning Research 12: 2825–2830, 2011.

7. Johnson J., Douze M., Jégou H. Billion-Scale Similarity Search with GPUs. IEEE

Transactions on Big Data 7 (3): 535–547, 2021. (scirp.org)

8. Malkov Y.A., Yashunin D.A. Efficient and Robust Approximate Nearest Neighbor

Search Using Hierarchical Navigable Small World Graphs. IEEE Transactions on

Pattern Analysis and Machine Intelligence 42 (4): 824–836, 2020.

(en.wikipedia.org)

9. Cunningham P., Delany S.J. k-Nearest Neighbour Classifiers – A Tutorial. ACM

Computing Surveys 54 (6): 128:1–128:54, 2022.

10. Giannopoulos P.G., Dasaklis T.K., Rachaniotis N. Development and Evaluation of

a Novel Framework to Enhance k-NN Algorithm’s Accuracy in Data Sparsity

Contexts. Scientific Reports 14: 25036, 2024. (nature.com)

11. Halder R.K. et al. Enhancing k-Nearest Neighbor Algorithm: A Comprehensive

Review and Performance Analysis of Modifications. Journal of Big Data 11: 113,

2024. (journalofbigdata.springeropen.com)

12. Dua D., Graff C. UCI Machine Learning Repository. University of California,

Irvine, 2019. (archive.ics.uci.edu)

13. Park H.S., Pastor D. A Comprehensive Survey on Feature Scaling Techniques for

k-Nearest Neighbor. Pattern Recognition Letters 167: 60–66, 2023.

14. Aggarwal C.C., Reddy C.K. Data Clustering: Algorithms and Applications. 2nd

ed., CRC Press, 2023.

15. Fix E., Hodges J.L. Discriminatory Analysis: Nonparametric Discrimination,

Consistency Properties. USAF School of Aviation Medicine, Technical Report 4,

1951.

A VISUAL-EMPIRICAL STUDY OF SCALING EFFECTS AND HYPER- PARAMETER ROBUSTNESS IN K-NEAREST NEIGHBOR CLASSIFICATION

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

How to Cite