Matrix classes

N. Frerebeau

2018-11-22

Overview

tabula provides a set of S4 classes that extend the matrix data type from R base. These new classes represent different special types of matrix.

It assumes that you keep your data tidy: each variable (type) must be saved in its own column and each observation (case) must be saved in its own row.

Missing values are not allowed.

Definitions

Abundance matix

Count matrix

We denote the \(m \times p\) count matrix by \(A = \left[ a_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:

\[\begin{align} a_{i \cdot} = \sum_{j = 1}^{p} a_{ij} && a_{\cdot j} = \sum_{i = 1}^{m} a_{ij} && a_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} a_{ij} && \forall a_{ij} \in \mathbb{N} \end{align}\]

Frequency matrix

A frequency matrix represents relative abundances.

We denote the \(m \times p\) frequency matrix by \(B = \left[ b_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:

\[\begin{align} b_{i \cdot} = \sum_{j = 1}^{p} b_{ij} = 1 && b_{\cdot j} = \sum_{i = 1}^{m} b_{ij} && b_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} b_{ij} && \forall b_{ij} \in \left[ 0,1 \right] \end{align}\]

Logical matrix

Incidence matrix

We denote the \(m \times p\) incidence matrix by \(C = \left[ c_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:

\[\begin{align} c_{i \cdot} = \sum_{j = 1}^{p} c_{ij} && c_{\cdot j} = \sum_{i = 1}^{m} c_{ij} && c_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} c_{ij} && \forall c_{ij} \in \lbrace 0,1 \rbrace \end{align}\]

Co-occurrence matrix

A co-occurrence matrix is a symetric matrix with zeros on its main diagonal, which works out which pairs of taxa occur together in at least one sample

Usage

Create

These new classes are of simple use, on the same way as the base matrix:

# Create a count data matrix
CountMatrix(data = sample(0:10, 100, TRUE),
            nrow = 10, ncol = 10)
#> 10 x 10 count data matrix: 
#>    V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#> 1   5  9  7  9  6 10  8  0  3   4
#> 2   2  0  8  3  1  3  6  6  6   2
#> 3   4  0  9  7  9  3  9  8  5   4
#> 4   7  7  6  7 10  2  0  9  9   8
#> 5   5 10  4  7  0  2  7  3  6   9
#> 6   1  7  7  0  3  9  7  0  4   0
#> 7  10  5  2  5  3  8  8  6  5   5
#> 8   6  5  0  2  6  1  2  1  1  10
#> 9   9  8 10  5  3 10 10  9  0   5
#> 10  7 10  1  8  9  5  9  6  3   0

# Create an incidence (presence/absence) matrix
# Numeric values are coerced to logical as by as.logical
IncidenceMatrix(data = sample(0:1, 100, TRUE),
                nrow = 10, ncol = 10)
#> 10 x 10 presence/absence data matrix: 
#>       V1    V2    V3    V4    V5    V6    V7    V8    V9   V10
#> 1  FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
#> 2  FALSE FALSE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
#> 3  FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE FALSE  TRUE
#> 4   TRUE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE
#> 5  FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE  TRUE  TRUE
#> 6  FALSE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE FALSE  TRUE
#> 7  FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE  TRUE
#> 8   TRUE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE
#> 9  FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE
#> 10 FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

Note that a FrequencyMatrix can only be created by coercion from a CountMatrix to ensure data integrity (see below).

Coerce

tabula uses coercing mechanisms (with validation methods) for data type conversions:

# Create a count matrix
#  Numeric values are coerced to integer and hence truncated towards zero
A1 <- CountMatrix(data = sample(0:10, 100, TRUE),
                  nrow = 10, ncol = 10)

# Coerce counts to frequencies
B <- as(A1, "FrequencyMatrix")

# Row sums are internally stored before coercing to a frequency matrix
totals(B)
#>  1  2  3  4  5  6  7  8  9 10 
#> 56 59 66 41 70 50 56 48 68 45
# This allows to restore the source data
A2 <- as(B, "CountMatrix")
all(A1 == A2)
#> [1] TRUE

# Coerce to presence/absence
C <- as(A1, "IncidenceMatrix")

# Coerce to a co-occurrence matrix
D <- as(A1, "OccurrenceMatrix")