Stanislav Protasov
Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets
Protasov, Stanislav; Khan, Adil Mehmood
Abstract
K-nearest neighbours (kNN) is a very popular instance-based classifier due to its simplicity and good empirical performance. However, large-scale datasets are a big problem for building fast and compact neighbourhood-based classifiers. This work presents the design and implementation of a classification algorithm with index data structures, which would allow us to build fast and scalable solutions for large multidimensional datasets. We propose a novel approach that uses navigable small-world (NSW) proximity graph representation of large-scale datasets. Our approach shows 2-4 times classification speedup for both average and 99th percentile time with asymptotically close classification accuracy compared to the 1-NN method. We observe two orders of magnitude better classification time in cases when method uses swap memory. We show that NSW graph used in our method outperforms other proximity graphs in classification accuracy. Our results suggest that the algorithm can be used in large-scale applications for fast and robust classification, especially when the search index is already constructed for the data.
Citation
Protasov, S., & Khan, A. M. (2021). Using Proximity Graph Cut for Fast and Robust Instance-Based Classification in Large Datasets. Complexity, 2021, Article 2011738. https://doi.org/10.1155/2021/2011738
Journal Article Type | Article |
---|---|
Acceptance Date | Oct 29, 2021 |
Online Publication Date | Nov 29, 2021 |
Publication Date | Nov 29, 2021 |
Deposit Date | Aug 28, 2024 |
Publicly Available Date | Sep 2, 2024 |
Journal | Complexity |
Print ISSN | 1076-2787 |
Publisher | Hindawi |
Peer Reviewed | Peer Reviewed |
Volume | 2021 |
Article Number | 2011738 |
DOI | https://doi.org/10.1155/2021/2011738 |
Public URL | https://hull-repository.worktribe.com/output/4792231 |
Files
Published article
(1.5 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0
Copyright Statement
Copyright © 2021 Stanislav Protasov and Adil Mehmood Khan.
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
You might also like
A hybrid contextual framework to predict severity of infectious disease: COVID-19 case study
(2024)
Journal Article
Global Knowledge, Local Impact: Domain Adaptation and Classification for Obesity in the UAE
(2024)
Presentation / Conference Contribution
Downloadable Citations
About Repository@Hull
Administrator e-mail: repository@hull.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search