Mohammad Al Khaldy
Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition
Al Khaldy, Mohammad
Abstract
Over the last decade, research has focused on machine learning and data mining to develop frameworks that can improve data analysis and output performance; to build accurate decision support systems that benefit from real-life datasets. This leads to the field of clinical data analysis, which has attracted a significant amount of interest in the computing, information systems, and medical fields. To create and develop models by machine learning algorithms, there is a need for a particular type of data for the existing algorithms to build an efficient model. Clinical datasets pose several issues that can affect the classification of the dataset: missing values, high dimensionality, and class imbalance. In order to build a framework for mining the data, it is necessary first to preprocess data, by eliminating patients’ records that have too many missing values, imputing missing values, addressing high dimensionality, and classifying the data for decision support.
This thesis investigates a real clinical dataset to solve their challenges. Autoencoder is employed as a tool that can compress data mining methodology, by extracting features and classifying data in one model. The first step in data mining methodology is to impute missing values, so several imputation methods are analysed and employed. Then high dimensionality is demonstrated and used to discard irrelevant and redundant features, in order to improve prediction accuracy and reduce computational complexity. Class imbalance is manipulated to investigate the effect on feature selection algorithms and classification algorithms.
The first stage of analysis is to investigate the role of the missing values. Results found that techniques based on class separation will outperform other techniques in predictive ability. The next stage is to investigate the high dimensionality and a class imbalance. However it was found a small set of features that can improve the classification performance, the balancing class does not affect the performance as much as imbalance class.
Citation
Al Khaldy, M. Autoencoder for clinical data analysis and classification : data imputation, dimensional reduction, and pattern recognition. (Thesis). University of Hull. https://hull-repository.worktribe.com/output/4224219
Thesis Type | Thesis |
---|---|
Deposit Date | Jul 26, 2022 |
Publicly Available Date | Feb 24, 2023 |
Keywords | Computer science |
Public URL | https://hull-repository.worktribe.com/output/4224219 |
Additional Information | Department of Computer Science, The University of Hull |
Award Date | Jul 1, 2017 |
Files
Thesis
(3.2 Mb)
PDF
Copyright Statement
© 2017 Al Khaldy, Mohammad. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright holder.
You might also like
Disease progression in chronic heart failure is linear: Insights from multistate modelling
(2024)
Journal Article
A LDA-Based Social Media Data Mining Framework for Plastic Circular Economy
(2024)
Journal Article
Locally fitting hyperplanes to high-dimensional data
(2022)
Journal Article
Ionic Imbalances and Coupling in Synchronization of Responses in Neurons
(2019)
Journal Article
Downloadable Citations
About Repository@Hull
Administrator e-mail: repository@hull.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search