VIM introduces tools for visualization of missing and imputed values. Forthermore, methods to impute missing values are featured. This vignette will give a brief look at a common imputation scenario and showcase how VIM can be used to both impute the data and also interpret the results visually.
library(VIM) data(sleep) <- aggr(sleep, plot = FALSE) a plot(a, numbers = TRUE, prop = FALSE)
The left plot shows the amount of missings for each column in the
sleep and the right plot shows how often each
combination of missings occur. For example, there are 9 rows wich
contain a missing in both
For simplicity, we will only look at the variables
Sleep for the remainer of this vignette. Bivariate
datasets can be passed to special functions that visualize the structure
of missings such as
<- sleep[, c("Dream", "Sleep")] x marginplot(x)
The red boxplot on the left
shows the distrubution of all values of
Dream contains a missing value. The
blue boxplot on the left
shows the distribution of the values of
Dream is observed.
In order to impute missing values,
VIM offers a spectrum
of imputation methods like
kNN() (k nearest neighbour),
hotdeck() and so forth. Those functions can be applied to a
data.frame and return another
missings are replaced by imputed values.
To learn more about all implemented imputation methods, three vignettes are available
vignette("donorImp")explains the donor-based imputation methods
vignette("modelImp")gives insight into the model-based imputation methods
The same functions that visualize missing values can also visualize the imputed dataset.
marginplot(x_imputed, delimiter = "_imp")
In this plot three differnt colors are used in the top-right. These colors represent the structure of missings.
Dreamwas missing initially
Sleepwas missing initially
Sleepwere missing initially
kNN() method seemingly preserves the correlation