Skip to main content

Research Repository

Advanced Search

Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition

Al Khaldy, Mohammad


Mohammad Al Khaldy



Over the last decade, research has focused on machine learning and data mining to develop frameworks that can improve data analysis and output performance; to build accurate decision support systems that benefit from real-life datasets. This leads to the field of clinical data analysis, which has attracted a significant amount of interest in the computing, information systems, and medical fields. To create and develop models by machine learning algorithms, there is a need for a particular type of data for the existing algorithms to build an efficient model. Clinical datasets pose several issues that can affect the classification of the dataset: missing values, high dimensionality, and class imbalance. In order to build a framework for mining the data, it is necessary first to preprocess data, by eliminating patients’ records that have too many missing values, imputing missing values, addressing high dimensionality, and classifying the data for decision support.
This thesis investigates a real clinical dataset to solve their challenges. Autoencoder is employed as a tool that can compress data mining methodology, by extracting features and classifying data in one model. The first step in data mining methodology is to impute missing values, so several imputation methods are analysed and employed. Then high dimensionality is demonstrated and used to discard irrelevant and redundant features, in order to improve prediction accuracy and reduce computational complexity. Class imbalance is manipulated to investigate the effect on feature selection algorithms and classification algorithms.
The first stage of analysis is to investigate the role of the missing values. Results found that techniques based on class separation will outperform other techniques in predictive ability. The next stage is to investigate the high dimensionality and a class imbalance. However it was found a small set of features that can improve the classification performance, the balancing class does not affect the performance as much as imbalance class.


Al Khaldy, M. (2017). Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition. (Thesis). University of Hull. Retrieved from

Thesis Type Thesis
Deposit Date Jul 26, 2022
Publicly Available Date Feb 24, 2023
Keywords Computer science
Public URL
Additional Information Department of Computer Science, The University of Hull
Award Date Jul 1, 2017


Thesis (3.2 Mb)

Copyright Statement
© 2017 Al Khaldy, Mohammad. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright holder.

You might also like

Downloadable Citations