Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition

Al Khaldy, Mohammad

Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition

Al Khaldy, Mohammad

Authors

Mohammad Al Khaldy

Contributors

Dr Chandrasekhar Kambhampati C.Kambhampati@hull.ac.uk
Supervisor

Abstract

Over the last decade, research has focused on machine learning and data mining to develop frameworks that can improve data analysis and output performance; to build accurate decision support systems that benefit from real-life datasets. This leads to the field of clinical data analysis, which has attracted a significant amount of interest in the computing, information systems, and medical fields. To create and develop models by machine learning algorithms, there is a need for a particular type of data for the existing algorithms to build an efficient model. Clinical datasets pose several issues that can affect the classification of the dataset: missing values, high dimensionality, and class imbalance. In order to build a framework for mining the data, it is necessary first to preprocess data, by eliminating patients’ records that have too many missing values, imputing missing values, addressing high dimensionality, and classifying the data for decision support.
This thesis investigates a real clinical dataset to solve their challenges. Autoencoder is employed as a tool that can compress data mining methodology, by extracting features and classifying data in one model. The first step in data mining methodology is to impute missing values, so several imputation methods are analysed and employed. Then high dimensionality is demonstrated and used to discard irrelevant and redundant features, in order to improve prediction accuracy and reduce computational complexity. Class imbalance is manipulated to investigate the effect on feature selection algorithms and classification algorithms.
The first stage of analysis is to investigate the role of the missing values. Results found that techniques based on class separation will outperform other techniques in predictive ability. The next stage is to investigate the high dimensionality and a class imbalance. However it was found a small set of features that can improve the classification performance, the balancing class does not affect the performance as much as imbalance class.

Citation

Al Khaldy, M. Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition. (Thesis). University of Hull. https://hull-repository.worktribe.com/output/4224219

Thesis Type	Thesis
Deposit Date	Jul 26, 2022
Publicly Available Date	Feb 24, 2023
Keywords	Computer science
Public URL	https://hull-repository.worktribe.com/output/4224219
Additional Information	Department of Computer Science, The University of Hull
Award Date	Jul 1, 2017

Files

Thesis (3.2 Mb)
PDF

Copyright Statement
© 2017 Al Khaldy, Mohammad. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright holder.