Experimental analysis of methods for imputation of missing values in databases
A PHP Error was encountered
Message: Undefined index: id
Line Number: 218
A very important issue faced by researchers and practitioners who use industrial and research databases is incompleteness of data, usually in terms of missing or erroneous values. While some of data analysis algorithms can work with incomplete data, a large portion of them require complete data. Therefore, different strategies, such as deletion of incomplete examples, and imputation (filling) of missing values through variety of statistical and machine learning (ML) procedures, are developed to preprocess the incomplete data. This study concentrates on performing experimental analysis of several algorithms for imputation of missing values, which range from simple statistical algorithms like mean and hot deck imputation to imputation algorithms that work based on application of inductive ML algorithms. Three major families of ML algorithms, such as probabilistic algorithms (e.g. Naïve Bayes), decision tree algorithms (e.g. C4.5), and decision rule algorithms (e.g. CLIP4), are used to implement the ML based imputation algorithms. The analysis is carried out using a comprehensive range of databases, for which missing values were introduced randomly. The goal of this paper is to provide general guidelines on selection of suitable data imputation algorithms based on characteristics of the data. The guidelines are developed by performing a comprehensive experimental comparison of performance of different data imputation algorithms.