# Models of nucleotide substitution

## Introduction

This document outlines the models of substitution used in the package. The matrices below are substitution-rate matrices for each model. The rates within these matrices are ordered as follows:

$\begin{bmatrix} \cdot & T\rightarrow C & T\rightarrow A & T\rightarrow G \\ C\rightarrow T & \cdot & C\rightarrow A & C\rightarrow G \\ A\rightarrow T & A\rightarrow C & \cdot & A\rightarrow G \\ G\rightarrow T & G\rightarrow C & G\rightarrow A & \cdot \end{bmatrix}$

(For example, $$C \rightarrow T$$ indicates that the cell in that location refers to the rate from $$C$$ to $$T$$.) Diagonals are determined based on all rows having to sum to zero (Yang 2006).

Under each rate matrix are listed the parameters in the function required for that model.

Below is a key of the parameters required in the functions for the models below, in order of their appearance:

• lambda: $$\lambda$$
• alpha $$\alpha$$
• beta $$\beta$$
• pi_tcag vector of $$\pi_T$$, $$\pi_C$$, $$\pi_A$$, then $$\pi_G$$
• alpha_1 $$\alpha_1$$
• alpha_2 $$\alpha_2$$
• kappa transition / transversion rate ratio
• abcdef vector of $$a$$, $$b$$, $$c$$, $$d$$, $$e$$, then $$f$$
• Q: matrix of all parameters, where diagonals are ignored

Functions in jackalope that employ each model take the form sub_X for model X (e.g., sub_JC69 for JC69 model).

Note: In all models, the matrices are scaled such that the overall mutation rate is 1, but this behavior can be change using the mu parameter for each function.

## JC69

The JC69 model (Jukes and Cantor 1969) uses a single rate, $$\lambda$$.

$\mathbf{Q} = \begin{bmatrix} \cdot & \lambda & \lambda & \lambda \\ \lambda & \cdot & \lambda & \lambda \\ \lambda & \lambda & \cdot & \lambda \\ \lambda & \lambda & \lambda & \cdot \end{bmatrix}$

Parameters:

• lambda

## K80

The K80 model (Kimura 1980) uses separate rates for transitions ($$\alpha$$) and transversions ($$\beta$$).

$\mathbf{Q} = \begin{bmatrix} \cdot & \alpha & \beta & \beta \\ \alpha & \cdot & \beta & \beta \\ \beta & \beta & \cdot & \alpha \\ \beta & \beta & \alpha & \cdot \end{bmatrix}$

Parameters:

• alpha
• beta

## F81

The F81 model (Felsenstein 1981) incorporates different equilibrium frequency distributions for each nucleotide ($$\pi_X$$ for nucleotide $$X$$).

$\mathbf{Q} = \begin{bmatrix} \cdot & \pi_C & \pi_A & \pi_G \\ \pi_T & \cdot & \pi_A & \pi_G \\ \pi_T & \pi_C & \cdot & \pi_G \\ \pi_T & \pi_C & \pi_A & \cdot \end{bmatrix}$

Parameters:

• pi_tcag

## HKY85

The HKY85 model (Hasegawa et al. 1984, 1985) combines different equilibrium frequency distributions with unequal transition and transversion rates.

$\mathbf{Q} = \begin{bmatrix} \cdot & \alpha \pi_C & \beta \pi_A & \beta \pi_G \\ \alpha \pi_T & \cdot & \beta \pi_A & \beta \pi_G \\ \beta \pi_T & \beta \pi_C & \cdot & \alpha \pi_G \\ \beta \pi_T & \beta \pi_C & \alpha \pi_A & \cdot \end{bmatrix}$

Parameters:

• alpha
• beta
• pi_tcag

## TN93

The TN93 model (Tamura and Nei 1993) adds to the HKY85 model by distinguishing between the two types of transitions: between pyrimidines ($$\alpha_1$$) and between purines ($$\alpha_2$$).

$\mathbf{Q} = \begin{bmatrix} \cdot & \alpha_1 \pi_C & \beta \pi_A & \beta \pi_G \\ \alpha_1 \pi_T & \cdot & \beta \pi_A & \beta \pi_G \\ \beta \pi_T & \beta \pi_C & \cdot & \alpha_2 \pi_G \\ \beta \pi_T & \beta \pi_C & \alpha_2 \pi_A & \cdot \end{bmatrix}$

Parameters:

• alpha_1
• alpha_2
• beta
• pi_tcag

## F84

The F84 model (Kishino and Hasegawa 1989) is a special case of TN93, where $$\alpha_1 = (1 + \kappa/\pi_Y) \beta$$ and $$\alpha_2 = (1 + \kappa/\pi_R) \beta$$ ($$\pi_Y = \pi_T + \pi_C$$ and $$\pi_R = \pi_A + \pi_G$$).

$\mathbf{Q} = \begin{bmatrix} \cdot & (1 + \kappa/\pi_Y) \beta \pi_C & \beta \pi_A & \beta \pi_G \\ (1 + \kappa/\pi_Y) \beta \pi_T & \cdot & \beta \pi_A & \beta \pi_G \\ \beta \pi_T & \beta \pi_C & \cdot & (1 + \kappa/\pi_R) \beta \pi_G \\ \beta \pi_T & \beta \pi_C & (1 + \kappa/\pi_R) \beta \pi_A & \cdot \end{bmatrix}$

Parameters:

• beta
• kappa
• pi_tcag

## GTR

The GTR model (Tavaré 1986) is the least restrictive model that is still time-reversible (i.e., the rates $$r_{x \rightarrow y} = r_{y \rightarrow x}$$).

$\mathbf{Q} = \begin{bmatrix} \cdot & a \pi_C & b \pi_A & c \pi_G \\ a \pi_T & \cdot & d \pi_A & e \pi_G \\ b \pi_T & d \pi_C & \cdot & f \pi_G \\ c \pi_T & e \pi_C & f \pi_A & \cdot \end{bmatrix}$

Parameters:

• pi_tcag
• abcdef

## UNREST

The UNREST model (Yang 1994) is entirely unrestrained.

$\mathbf{Q} = \begin{bmatrix} \cdot & q_{TC} & q_{TA} & q_{TG} \\ q_{CT} & \cdot & q_{CA} & q_{CG} \\ q_{AT} & q_{AC} & \cdot & q_{AG} \\ q_{GT} & q_{GC} & q_{GA} & \cdot \end{bmatrix}$

Parameters:

• Q