Skip to main content

Research Repository

Advanced Search

Missing Value Imputation Using Stratified Supervised Learning for Cardiovascular Data

Davis, Darryl; Rahman, MM


Darryl Davis

MM Rahman


Legacy (and current) medical datasets are rich source of information and knowledge. However, the use of most legacy medical datasets is beset with problems. One of the most often faced is the problem of missing data, often due to oversights in data capture or data entry procedures. Algorithms commonly used in the analysis of data often depend on a complete data set. Missing value imputation offers a solution to this problem. This may result in the generation of synthetic data, with artificially induced missing values, but simply removing the incomplete data records often produces the best classifier results. With legacy data, simply removing the records from the original datasets can significantly reduce the data volume and often affect the class balance of the dataset. A suitable method for missing value imputation is very much needed to produce good quality datasets for better analysing data resulting from clinical trials. This paper proposes a framework for missing value imputation using stratified machine learning methods. We explore machine learning technique to predict missing value for incomplete clinical (cardiovascular) data, with experiments comparing this with other standard methods. Two machine learning (classifier) algorithms, fuzzy unordered rule induction algorithm and decision tree, plus other machine learning algorithms (for comparison purposes) are used to train on complete data and subsequently predict missing values for incomplete data. The complete datasets are classified using decision tree, neural network, K-NN and K-Mean clustering. The classification performances are evaluated using sensitivity, specificity, accuracy, positive predictive value and negative predictive value. The results show that final classifier performance can be significantly improved for all class labels when stratification was used with fuzzy unordered rule induction algorithm to predict missing attribute values.


Davis, D., & Rahman, M. (2016). Missing Value Imputation Using Stratified Supervised Learning for Cardiovascular Data. Journal of Informatics and Data Mining, 1(2), Article 13.

Journal Article Type Article
Acceptance Date Jun 20, 2016
Publication Date Jun 27, 2016
Deposit Date Feb 16, 2017
Publicly Available Date Apr 27, 2018
Journal Journal of Informatics and Data Mining
Peer Reviewed Peer Reviewed
Volume 1
Issue 2
Article Number 13
Keywords Data Mining, Missing Values, Machine Learning
Public URL
Publisher URL


You might also like

Downloadable Citations