# Introduction to the goric package

## Data example

An experiment was conducted to find out whether vinylidene fluoride gives rise to liver damage. The dataset is available on page 10 of Silvapulle and Sen (2005) and in a report prepared by Litton Bionetics Inc in 1984. Since increased levels of serum enzyme are inherent in liver damage, the focus is on whether enzyme levels are affected by vinylidene fluoride. The variable of interest is the serum enzyme level. Three types of enzymes are inspected, namely SDH, SGOT, and SGPT. To study whether vinylidene fluoride has an influence on the three serum enzymes, four dosages of this substance are examined. In each of these four treatment groups, ten male Fischer-344 rats received the substance.

The data is available in the goric package.

library(goric)
data(vinylidene)

We will first only consider the single response SDH.

## Estimating marginal dose means

We assume the linear model $\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon} \quad \mbox{assuming} \quad \epsilon \sim N(\boldsymbol{0}, \boldsymbol{I}\sigma^2)$

• $$\boldsymbol{y}$$ being a $$N$$-dimensional response vector
• $$\boldsymbol{X}$$ is a $$(N \times p)$$ design matrix
• $$\boldsymbol{\beta}$$ is a $$p$$-dimensional parameter vector
• $$\boldsymbol{\epsilon}$$ is a $$N$$-dimensional residual vector

Choosing dose level d1 as a control group we can estimate the change in the average response at consecutive dose levels compared to the control by

X <- model.matrix(~ dose, data=vinylidene)[,-1]
m <- lm(SDH ~ X, data=vinylidene)
coefficients(m)
## (Intercept)     Xdosed2     Xdosed3     Xdosed4
##        22.7         0.1         1.0         4.6

## AIC selection of model complexity

Instead of using three slope parameters to represent the dose-response rate, we can search for a simpler representation, reducing model complexity but increasing prediction accuracy.

A set of models can be stated, removing different subsets of design matrix columns.

m0 <- lm(SDH ~ 1, data=vinylidene)
m1 <- lm(SDH ~ X[,-1], data=vinylidene)
m2 <- lm(SDH ~ X[,-2], data=vinylidene)
m3 <- lm(SDH ~ X[,-3], data=vinylidene)
m4 <- lm(SDH ~ X[,-c(1,2)], data=vinylidene)
m5 <- lm(SDH ~ X[,-c(2,3)], data=vinylidene)
m6 <- lm(SDH ~ X[,-c(1,3)], data=vinylidene)

The Akaike Information Criterion can be used to select a ‘best’ model out o the set, which is a compromise of complexity and goodness of fit.

AIC(m, m0,m1,m2,m3,m4,m5,m6)
##    df      AIC
## m   5 218.6877
## m0  2 223.9517
## m1  4 216.6923
## m2  4 217.1481
## m3  4 225.4519
## m4  3 215.2456
## m5  3 224.2814
## m6  3 225.7830

The model with an intercept and only a slope parameter considering the last dose level is selected.

coefficients(m4)
##   (Intercept) X[, -c(1, 2)]
##     23.066667      4.233333

## Fitting linear models under order restriction

The model coefficients are estimated under the linear constraints $\boldsymbol{A} \boldsymbol{\beta} \geq \boldsymbol{\delta}$ where

• $$\boldsymbol{A}$$ denotes a $$(k \times p)$$ dimensional constraint matrix
• $$\boldsymbol{\delta}$$ is a vector with $$k$$ constraint margins

The operator $$\geq$$ can be exchanged element-specific with a strict equality constraint, using $$=$$.