Nongnuch Poolsawad
Issues in the mining of heart failure datasets
Poolsawad, Nongnuch; Moore, Lisa; Kambhampati, Chandrasekhar; Cleland, John G.F.
Authors
Lisa Moore
Dr Chandrasekhar Kambhampati C.Kambhampati@hull.ac.uk
Reader in Computer Science
John G.F. Cleland
Abstract
This paper investigates the characteristics of a clinical dataset using a combination of feature selection and classification methods to handle missing values and understand the underlying statistical characteristics of a typical clinical dataset. Typically, when a large clinical dataset is presented, it consists of challenges such as missing values, high dimensionality, and unbalanced classes. These pose an inherent problem when implementing feature selection and classification algorithms. With most clinical datasets, an initial exploration of the dataset is carried out, and those attributes with more than a certain percentage of missing values are eliminated from the dataset. Later, with the help of missing value imputation, feature selection and classification algorithms, prognostic and diagnostic models are developed. This paper has two main conclusions: 1) Despite the nature of clinical datasets, and their large size, methods for missing value imputation do not affect the final performance. What is crucial is that the dataset is an accurate representation of the clinical problem and those methods of imputing missing values are not critical for developing classifiers and prognostic/diagnostic models. 2) Supervised learning ha s proven to be more suitable for mining clinical data than unsupervised methods. It is also shown that non-parametric classifiers such as decision trees give better results when compared to parametric classifiers such as radial basis function networks (RBFNs). © 2014 Institute of Automation, Chinese Academy of Sciences and Springer-Verlag Berlin Heidelberg.
Citation
Poolsawad, N., Moore, L., Kambhampati, C., & Cleland, J. G. (2014). Issues in the mining of heart failure datasets. International Journal of Automation and Computing, 11(2), 162-179. https://doi.org/10.1007/s11633-014-0778-5
Journal Article Type | Article |
---|---|
Acceptance Date | Jul 18, 2013 |
Online Publication Date | Mar 12, 2015 |
Publication Date | 2014-04 |
Deposit Date | Jan 2, 2019 |
Publicly Available Date | Mar 29, 2024 |
Journal | International Journal of Automation and Computing |
Print ISSN | 1476-8186 |
Electronic ISSN | 1751-8520 |
Publisher | Springer Verlag |
Peer Reviewed | Peer Reviewed |
Volume | 11 |
Issue | 2 |
Pages | 162-179 |
DOI | https://doi.org/10.1007/s11633-014-0778-5 |
Keywords | Heart failure; Clinical dataset; Classification; Clustering; Missing values; Feature selection |
Public URL | https://hull-repository.worktribe.com/output/565359 |
Publisher URL | https://link.springer.com/article/10.1007%2Fs11633-014-0778-5 |
You might also like
A LDA-Based Social Media Data Mining Framework for Plastic Circular Economy
(2024)
Journal Article
Locally fitting hyperplanes to high-dimensional data
(2022)
Journal Article
Genetic Algorithms as a Feature Selection Tool in Heart Failure Disease
(2020)
Journal Article
Downloadable Citations
About Repository@Hull
Administrator e-mail: repository@hull.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search