# Matrix classes

## Overview

tabula provides a set of S4 classes that extend the matrix data type from R base. These new classes represent different special types of matrix.

• Abundance matrix:
• CountMatrix represents count data,
• FrequencyMatrix represents frequency data.
• Logical matrix:
• IncidenceMatrix represents presence/absence data.
• OccurrenceMatrix represents a co-occurence matrix.

It assumes that you keep your data tidy: each variable (type) must be saved in its own column and each observation (case) must be saved in its own row.

Missing values are not allowed.

## Definitions

### Abundance matix

#### Count matrix

We denote the $$m \times p$$ count matrix by $$A = \left[ a_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]$$ with row and column sums:

\begin{align} a_{i \cdot} = \sum_{j = 1}^{p} a_{ij} && a_{\cdot j} = \sum_{i = 1}^{m} a_{ij} && a_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} a_{ij} && \forall a_{ij} \in \mathbb{N} \end{align}

#### Frequency matrix

A frequency matrix represents relative abundances.

We denote the $$m \times p$$ frequency matrix by $$B = \left[ b_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]$$ with row and column sums:

\begin{align} b_{i \cdot} = \sum_{j = 1}^{p} b_{ij} = 1 && b_{\cdot j} = \sum_{i = 1}^{m} b_{ij} && b_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} b_{ij} && \forall b_{ij} \in \left[ 0,1 \right] \end{align}

### Logical matrix

#### Incidence matrix

We denote the $$m \times p$$ incidence matrix by $$C = \left[ c_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]$$ with row and column sums:

\begin{align} c_{i \cdot} = \sum_{j = 1}^{p} c_{ij} && c_{\cdot j} = \sum_{i = 1}^{m} c_{ij} && c_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} c_{ij} && \forall c_{ij} \in \lbrace 0,1 \rbrace \end{align}

#### Co-occurrence matrix

A co-occurrence matrix is a symetric matrix with zeros on its main diagonal, which works out which pairs of taxa occur together in at least one sample

## Usage

### Create

These new classes are of simple use, on the same way as the base matrix:

# Create a count data matrix
CountMatrix(data = sample(0:10, 100, TRUE),
nrow = 10, ncol = 10)
#> 10 x 10 count data matrix:
#>    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#> 1   5  9  7  9  6 10  8  0  3   4
#> 2   2  0  8  3  1  3  6  6  6   2
#> 3   4  0  9  7  9  3  9  8  5   4
#> 4   7  7  6  7 10  2  0  9  9   8
#> 5   5 10  4  7  0  2  7  3  6   9
#> 6   1  7  7  0  3  9  7  0  4   0
#> 7  10  5  2  5  3  8  8  6  5   5
#> 8   6  5  0  2  6  1  2  1  1  10
#> 9   9  8 10  5  3 10 10  9  0   5
#> 10  7 10  1  8  9  5  9  6  3   0

# Create an incidence (presence/absence) matrix
# Numeric values are coerced to logical as by as.logical
IncidenceMatrix(data = sample(0:1, 100, TRUE),
nrow = 10, ncol = 10)
#> 10 x 10 presence/absence data matrix:
#>       V1    V2    V3    V4    V5    V6    V7    V8    V9   V10
#> 1  FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
#> 2  FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
#> 3  FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE
#> 4   TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
#> 5  FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
#> 6  FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE
#> 7  FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
#> 8   TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE
#> 9  FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE
#> 10 FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

Note that a FrequencyMatrix can only be created by coercion from a CountMatrix to ensure data integrity (see below).

### Coerce

tabula uses coercing mechanisms (with validation methods) for data type conversions:

# Create a count matrix
#  Numeric values are coerced to integer and hence truncated towards zero
A1 <- CountMatrix(data = sample(0:10, 100, TRUE),
nrow = 10, ncol = 10)

# Coerce counts to frequencies
B <- as(A1, "FrequencyMatrix")

# Row sums are internally stored before coercing to a frequency matrix
totals(B)
#>  1  2  3  4  5  6  7  8  9 10
#> 56 59 66 41 70 50 56 48 68 45
# This allows to restore the source data
A2 <- as(B, "CountMatrix")
all(A1 == A2)
#> [1] TRUE

# Coerce to presence/absence
C <- as(A1, "IncidenceMatrix")

# Coerce to a co-occurrence matrix
D <- as(A1, "OccurrenceMatrix")