The plotmo package: Plotting model surfaces

After building a regression or classification model, it’s often useful to plot the model response as the predictors vary. These model surface plots are helpful for visualizing “black box” models.

The plotmo package makes it easy to generate model surfaces for a wide variety of R models, including rpart, gbm, earth, and many others.

An example model surface

Let’s generate a randomForest model from the well-known ozone dataset. (We use a random forest for this example, but any model could be used.)

``````    library(earth) # for the ozone1 data
data(ozone1)
oz <- ozone1[, c("O3", "humidity", "temp")] # simple dataset for illustration
library(randomForest)
mod <- randomForest(O3 ~ ., data=oz)``````

We now have a model, but what does it tell us about the relationship between ozone pollution (O3) and humidity and temperature? We can visualize this relationship with `plotmo`:

``````    library(plotmo)
plotmo(mod)``````

From the plots, we see that ozone increases with humidity and temperature, although humidity doesn’t have much effect at low temperatures.

Some details

The top two plots in the above figure are generated by plotting the predicted response as a variable changes. Variables that don’t appear in a plot are held fixed at their median values. Plotmo automatically creates a separate plot for each variable in the model.

The lower interaction plot shows the predicted response as two variables are changed (once again with other variables if any held at their median values). Plotmo draws just one interaction plot for this model, since there are only two variables.

Partial dependence plots

We can generate `partial dependence` plots by specifying `pmethod="partdep"` when invoking `plotmo`. In partial dependence plots, the effect of the background variables is averaged (instead of simply holding the background variables at their medians). Partial dependence plots can be very slow, but they do incorporate more information about the distribution of the response.

Plotting model residuals

The `plotres` function is also included in the `plotmo` package. This function shows residuals and other useful information about the model, if available. Using the above model as an example:

``    plotres(mod)``

which gives

Note the “<” shape in the residuals plot in the lower left. This suggests that we should transform the response before building the model, maybe by taking the square or cube-root. Cases 53, 237, and 258 have the largest residuals and perhaps should be investigated. This kind of information is not obvious without plotting the residuals

Miscellaneous

More details and examples may be found in the package vignettes:

The package also provides a few utility functions such as `plot_glmnet` and `plot_gbm`. These functions enhance similar functions in the glmnet and gbm packages. Some examples:

Which models work with plotmo?

Any model that conforms to standard S3 model guidelines will work with `plotmo`. Plotmo knows how to deal with logistic, classification, and multiple response models. It knows how to handle different `type` arguments to `predict` functions.

Package authors may want to look at Guidelines for S3 Regression Models. If `plotmo` or `plotres` doesn’t work with your model, contact the `plotmo` package maintainer. Often a minor tweak to the model code is all that is needed.

Stephen Milborrow