Lisa Kirke
Data mining for heart failure : an investigation into the challenges in real life clinical datasets
Kirke, Lisa
Abstract
Clinical data presents a number of challenges including missing data, class imbalance, high dimensionality and non-normal distribution. A motivation for this research is to investigate and analyse the manner in which the challenges affect the performance of algorithms. The challenges were explored with the help of a real life heart failure clinical dataset known as Hull LifeLab, obtained from a live cardiology clinic at the Hull Royal Infirmary Hospital. A Clinical Data Mining Workflow (CDMW) was designed with three intuitive stages, namely, descriptive, predictive and prescriptive. The naming of these stages reflects the nature of the analysis that is possible within each stage; therefore a number of different algorithms are employed. Most algorithms require the data to be distributed in a normal manner. However, the distribution is not explicitly used within the algorithms. Approaches based on Bayes use the properties of the distributions very explicitly, and thus provides valuable insight into the nature of the data.
The first stage of the analysis is to investigate if the assumptions made for Bayes hold, e.g. the strong independence assumption and the assumption of a Gaussian distribution. The next stage is to investigate the role of missing values. Results found that imputation does not affect the performance as much as those records which are initially complete. These records are often not outliers, but contain problem variables. A method was developed to identify these. The effect of skews in the data was also investigated within the CDMW. However, it was found that methods based on Bayes were able to handle these, albeit with a small variability in performance. The thesis provides an insight into the reasons why clinical data often causes problems. Even the issue of imbalanced classes is not an issue, for Bayes is independent of this.
Citation
Kirke, L. (2015). Data mining for heart failure : an investigation into the challenges in real life clinical datasets. (Thesis). University of Hull. Retrieved from https://hull-repository.worktribe.com/output/4218159
Thesis Type | Thesis |
---|---|
Deposit Date | Jun 9, 2016 |
Publicly Available Date | Feb 23, 2023 |
Keywords | Computer science |
Public URL | https://hull-repository.worktribe.com/output/4218159 |
Additional Information | Department of Computer Science, The University of Hull |
Award Date | Jun 1, 2015 |
Files
Thesis
(2.9 Mb)
PDF
Copyright Statement
© 2015 Kirke, Lisa. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright holder.
You might also like
Disease progression in chronic heart failure is linear: Insights from multistate modelling
(2024)
Journal Article
A LDA-Based Social Media Data Mining Framework for Plastic Circular Economy
(2024)
Journal Article
Locally fitting hyperplanes to high-dimensional data
(2022)
Journal Article
Downloadable Citations
About Repository@Hull
Administrator e-mail: repository@hull.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search