# Exploring interactions with continuous predictors in regression models

#### 2021-07-02

Understanding an interaction effect in a linear regression model is usually difficult when using just the basic output tables and looking at the coefficients. The `interactions` package provides several functions that can help analysts probe more deeply.

Categorical by categorical interactions: All the tools described here require at least one variable to be continuous. A separate vignette describes `cat_plot`, which handles the plotting of interactions in which all the focal predictors are categorical variables.

First, we use example data from `state.x77` that is built into R. Let’s look at the interaction model output with `summ` as a starting point.

``````library(jtools) # for summ()
states <- as.data.frame(state.x77)
fiti <- lm(Income ~ Illiteracy * Murder + `HS Grad`, data = states)
summ(fiti)``````
 Observations 50 Dependent variable Income Type OLS linear regression
 F(4,45) 10.65 R² 0.49 Adj. R² 0.44
Est. S.E. t val. p
(Intercept) 1414.46 737.84 1.92 0.06
Illiteracy 753.07 385.90 1.95 0.06
Murder 130.60 44.67 2.92 0.01
`HS Grad` 40.76 10.92 3.73 0.00
Illiteracy:Murder -97.04 35.86 -2.71 0.01
Standard errors: OLS

Our interaction term is significant, suggesting some more probing is warranted to see what’s going on. It’s worth recalling that you shouldn’t focus too much on the main effects of terms included in the interaction since they are conditional on the other variable(s) in the interaction being held constant at 0.

# Plotting interactions

A versatile and sometimes the most interpretable method for understanding interaction effects is via plotting. `interactions` provides `interact_plot` as a relatively pain-free method to get good-looking plots of interactions using `ggplot2` on the backend.

``interact_plot(fiti, pred = Illiteracy, modx = Murder)``

Keep in mind that the default behavior of `interact_plot` is to mean-center all continuous variables not involved in the interaction so that the predicted values are more easily interpreted. You can disable that by adding `centered = "none"`. You can choose specific variables by providing their names in a vector to the `centered` argument.

By default, with a continuous moderator you get three lines — 1 standard deviation above and below the mean and the mean itself. If you specify `modx.values = "plus-minus"`, the mean of the moderator is not plotted, just the two +/- SD lines. You may also choose `"terciles"` to split the data into three equal-sized groups — representing the upper, middle, and lower thirds of the distribution of the moderator — and get the line that represents the median of the moderator within each of those groups.

If your moderator is a factor, each level will be plotted and you should leave `modx.values = NULL`, the default.

``````fitiris <- lm(Petal.Length ~ Petal.Width * Species, data = iris)
interact_plot(fitiris, pred = Petal.Width, modx = Species)``````

If you want, you can specify a subset of a factor’s levels via the `modx.values` argument:

``````interact_plot(fitiris, pred = Petal.Width, modx = Species,
modx.values = c("versicolor", "virginica"))``````

## Plotting observed data

If you want to see the individual data points plotted to better understand how the fitted lines related to the observed data, you can use the `plot.points = TRUE` argument.

``interact_plot(fiti, pred = Illiteracy, modx = Murder, plot.points = TRUE)``

For continuous moderators, as you can see, the observed data points are shaded depending on the level of the moderator variable.

It can be very enlightening, too, for categorical moderators.

``````interact_plot(fitiris, pred = Petal.Width, modx = Species,
plot.points = TRUE)``````

Where many points are slightly overlapping as they do here, it can be useful to apply a random “jitter” to move them slightly to stop the overlap. Use the `jitter` argument to do this. If you provide a single number it will apply a jitter of proportional magnitude both sideways and up and down. If you provide a vector length 2, then the first is assumed to refer to the left/right jitter and the second to the up/down jitter.

If you would like to better differentiate the points with factor moderators, you can use `point.shape = TRUE` to give a different shape to each point. This can be especially helpful for black and white publications.

``````interact_plot(fitiris, pred = Petal.Width, modx = Species,
plot.points = TRUE, jitter = 0.1, point.shape = TRUE)``````

If your original data are weighted, then the points will be sized based on the weight. For the purposes of our example, we’ll weight the same model we’ve been using with the population of each state.

``````fiti <- lm(Income ~ Illiteracy * Murder, data = states,
weights = Population)
interact_plot(fiti, pred = Illiteracy, modx = Murder, plot.points = TRUE)``````

For those working with weighted data, it can be hard to use a scatterplot to explore the data unless there is some way to account for the weights. Using size is a nice middle ground.

## Plotting partial residuals

In more complex regressions, plotting the observed data can sometimes be relatively uninformative because the points seem to be all over the place.

For an example, let’s take a look at this model. I am using the `mpg` dataset from `ggplot2` and predicting the city miles per gallon (`cty`) based on several variables, including model year, type of car, fuel type, drive type, and an interaction between engine displacement (`displ`) and number of cylinders in the engine (`cyl`). Let’s take a look at the regression output:

``````library(ggplot2)
data(cars)
fitc <- lm(cty ~ year + cyl * displ + class + fl + drv, data = mpg)
summ(fitc)``````
 Observations 234 Dependent variable cty Type OLS linear regression
 F(16,217) 99.73 R² 0.88 Adj. R² 0.87
Est. S.E. t val. p
(Intercept) -200.98 47.01 -4.28 0.00
year 0.12 0.02 5.03 0.00
cyl -1.86 0.28 -6.69 0.00
displ -3.56 0.66 -5.41 0.00
classcompact -2.60 0.93 -2.80 0.01
classmidsize -2.63 0.93 -2.82 0.01
classminivan -4.41 1.04 -4.24 0.00
classpickup -4.37 0.93 -4.68 0.00
classsubcompact -2.38 0.93 -2.56 0.01
classsuv -4.27 0.87 -4.92 0.00
fld 6.34 1.69 3.74 0.00
fle -4.57 1.66 -2.75 0.01
flp -1.92 1.59 -1.21 0.23
flr -0.79 1.57 -0.50 0.61
drvf 1.40 0.40 3.52 0.00
drvr 0.49 0.46 1.06 0.29
cyl:displ 0.36 0.08 4.56 0.00
Standard errors: OLS

Okay, we are explaining a lot of variance here but there is quite a bit going on. Let’s plot the interaction with the observed data.

``````interact_plot(fitc, pred = displ, modx = cyl, plot.points = TRUE,
modx.values = c(4, 5, 6, 8))``````

Hmm, doesn’t that look…bad? You can kind of see the pattern of the interaction, but the predicted lines don’t seem to match the data very well. But I included a lot of variables besides `cyl` and `displ` in this model and they may be accounting for some of this variation. This is what partial residual plots are designed to help with. You can learn more about the technique and theory in Fox and Weisberg (2018) and another place to generate partial residual plots is in Fox’s `effects` package.

Using the argument `partial.residuals = TRUE`, what is plotted instead is the observed data with the effects of all the control variables accounted for. In other words, the value `cty` for the observed data is based only on the values of `displ`, `cyl`, and the model error. Let’s take a look.

``````interact_plot(fitc, pred = displ, modx = cyl, partial.residuals = TRUE,
modx.values = c(4, 5, 6, 8))``````

Now we can understand how the observed data and the model relate to each other much better. One insight is how the model really underestimates the values at the low end of displacement and cylinders. You can also see how much the cylinders and displacement seem to be correlated each other, which makes it difficult to say how much we can learn from this kind of model.

## Confidence bands

Another way to get a sense of the precision of the estimates is by plotting confidence bands. To get started, just set `interval = TRUE`. To decide how wide the confidence interval should be, express the percentile as a number, e.g., `int.width = 0.8` corresponds to an 80% interval.

``````interact_plot(fiti, pred = Illiteracy, modx = Murder, interval = TRUE,
int.width = 0.8)``````

You can also use the `robust` argument to plot confidence intervals based on robust standard error calculations.

## Check linearity assumption

A basic assumption of linear regression is that the relationship between the predictors and response variable is linear. When you have an interaction effect, you add the assumption that relationship between your predictor and response is linear regardless of the level of the moderator.

To show a striking example of how this can work, we’ll generate two simple datasets to replicate Hainmueller et al. (2017).

The first has a focal predictor \(X\) that is normally distributed with mean 3 and standard deviation 1. It then has a dichotomous moderator \(W\) that is Bernoulli distributed with mean probability 0.5. We also generate a normally distributed error term with standard deviation 4.

``````set.seed(99)
x <- rnorm(n = 200, mean = 3, sd = 1)
err <- rnorm(n = 200, mean = 0, sd = 4)
w <- rbinom(n = 200, size = 1, prob = 0.5)

y_1 <- 5 - 4*x - 9*w + 3*w*x + err``````

We fit a linear regression model with an interaction between x and w.

``````model_1 <- lm(y_1 ~ x * w)
summ(model_1)``````
 Observations 200 Dependent variable y_1 Type OLS linear regression
 F(3,196) 25.93 R² 0.28 Adj. R² 0.27
Est. S.E. t val. p
(Intercept) 3.19 1.29 2.47 0.01
x -3.59 0.42 -8.61 0.00
w -7.77 1.90 -4.10 0.00
x:w 2.86 0.62 4.62 0.00
Standard errors: OLS

In the following plot, we use `linearity.check = TRUE` argument to split the data by the level of the moderator \(W\) and plot predicted lines (black) and a loess line (red) within each group. The predicted lines come from the full data set. The loess line looks only at the subset of data in each pane and will be curved if the relationship is nonlinear. What we’re looking for is whether the red loess line is straight like the predicted line.

``````interact_plot(model_1, pred = x, modx = w, linearity.check = TRUE,
plot.points = TRUE)``````

In this case, it is. That means the assumption is satisfied.

Now we generate similar data, but this time the linearity assumption will be violated. \(X_2\) will now be uniformly distributed across the range of -3 to 3. The true relationship between `y_2` and \(X_2\) (`x_2`) is non-linear.

``````x_2 <- runif(n = 200, min = -3, max = 3)
y_2 <- 2.5 - x_2^2 - 5*w + 2*w*(x_2^2) + err
data_2 <- as.data.frame(cbind(x_2, y_2, w))

model_2 <- lm(y_2 ~ x_2 * w, data = data_2)
summ(model_2)``````
 Observations 200 Dependent variable y_2 Type OLS linear regression
 F(3,196) 3.28 R² 0.05 Adj. R² 0.03
Est. S.E. t val. p
(Intercept) -0.75 0.49 -1.51 0.13
x_2 0.53 0.30 1.77 0.08
w 1.63 0.72 2.28 0.02
x_2:w -0.25 0.42 -0.61 0.54
Standard errors: OLS

The regression output would lead us to believe there is no interaction.

Let’s do the linearity check:

``````interact_plot(model_2, pred = x_2, modx = w, linearity.check = TRUE,
plot.points = TRUE)``````

This is a striking example of the assumption being violated. At both levels of \(W\), the relationship between \(X_2\) and the response is non-linear. There really is an interaction, but it’s a non-linear one.

You could try to fit this true model with the polynomial term like this:

``````model_3 <- lm(y_2 ~ poly(x_2, 2) * w, data = data_2)
summ(model_3)``````
 Observations 200 Dependent variable y_2 Type OLS linear regression
 F(5,194) 18.32 R² 0.32 Adj. R² 0.3
Est. S.E. t val. p
(Intercept) -1.01 0.42 -2.41 0.02
poly(x_2, 2)1 9.90 6.14 1.61 0.11
poly(x_2, 2)2 -34.14 6.15 -5.55 0.00
w 1.61 0.61 2.65 0.01
poly(x_2, 2)1:w -6.33 8.60 -0.74 0.46
poly(x_2, 2)2:w 75.58 8.62 8.77 0.00
Standard errors: OLS

`interact_plot` can plot these kinds of effects, too. Just provide the untransformed predictor’s name (in this case, `x_2`) and also include the data in the `data` argument. If you don’t include the data, the function will try to find the data you used but it will warn you about it and it could cause problems under some circumstances.

``interact_plot(model_3, pred = x_2, modx = w, data = data_2)``

Look familiar? Let’s do the linearity.check, which in this case is more like a non-linearity check:

``````interact_plot(model_3, pred = x_2, modx = w, data = data_2,
linearity.check = TRUE, plot.points = TRUE)``````

The red loess line almost perfectly overlaps the black predicted line, indicating the assumption is satisfied. As a note of warning, in practice the lines will rarely overlap so neatly and you will have to make more difficult judgment calls.

## Other options

There are a number of other options not mentioned, many relating to the appearance.

For instance, you can manually specify the axis labels, add a main title, choose a color scheme, and so on.

``````interact_plot(fiti, pred = Illiteracy, modx = Murder,
x.label = "Custom X Label", y.label = "Custom Y Label",
main.title = "Sample Plot",  legend.main = "Custom Legend Title",
colors = "seagreen")``````

Because the function uses `ggplot2`, it can be modified and extended like any other `ggplot2` object. For example, using the `theme_apa` function from `jtools`:

``interact_plot(fitiris, pred = Petal.Width, modx = Species) + theme_apa()``