LDATree
is an R modeling package for fitting
classification trees. If you are unfamiliar with classification trees,
here is a tutorial
about the traditional CART and its R implementation
rpart
.
Compared to other similar trees, LDATree
sets itself
apart in the following ways:
It applies the idea of LDA (Linear Discriminant Analysis) when selecting variables, finding splits, and fitting models in terminal nodes.
It addresses certain limitations of the R implementation of LDA
(MASS::lda
), such as handling missing values, dealing with
more features than samples, and constant values within groups.
Re-implement LDA using the Generalized Singular Value Decomposition (GSVD), LDATree offers quick response, particularly with large datasets.
The package also includes several visualization tools to provide deeper insights into the data.
Currently, LDATree
offers two methods to construct a
tree:
The first method utilizes a direct-stopping rule, halting the growth process once specific conditions are satisfied.
The second approach involves pruning: it permits the building of a larger tree, which is then pruned using cross-validation.
LDATree
offers two plotting methods:
You can use plot
directly to view the full tree
diagram.
To check the individual plot for the node that you are interested in, you have to input the (training) data and specify the node index.
# Three types of individual plots
# 1. Scatter plot on first two LD scores
plot(fit, data = iris, node = 1)
# Prediction only
predictions <- predict(fit, iris)
head(predictions)
#> [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
For missing values, you do not need to specify anything (unless you
want to); LDATree
will handle it. By default, it fills in
missing numerical variables with their mean and adds a missing flag. For
missing factor variables, it assigns a new level. For more options,
please refer to help(Treee)
.
As we re-implement the LDA/GSVD and apply it in the model fitting, a
by-product is the ldaGSVD
function. Feel free to play with
it and see how it compares to MASS::lda
.