The R package `fdcov`

(**F**unctional **D**ata **Cov**ariance) contains a collection of tools for performing statistical inference on functional data specifically through an analysis of the covariance structure of the data. It includes two methods for performing a k-sample test for equality of covariance in `ksample.perm`

and `ksample.com`

and two methods for performing a 2-sample test `ksample.gauss`

and `ksample.vstab`

. For supervised and unsupervised learning, it contains a method to classify functional data with respect to each category’s covariance operator in `classif.com`

, and it contains a method to cluster functional data, `cluster.com`

, again based on the covariance structure of the data.

The current version of this package assumes that all functional data is sampled on the same grid at the same intervals. Future updates are planned to allow for the below methods to interface with the fda package and its functional basis representations of the data.

There are two methods for performing a k-sample test for equality of covariance operators.

`ksample.com`

uses concentration inequalities to test for equality of covariance operator in the non-asymptotic setting from Kashlak et al. (2017). We use a subset of the data contained in the phoneme dataset from the R package `fds`

to illustrate how to use this function.

```
# Load fdcov package
library(fdcov)
# Load in phoneme data
library(fds)
# Setup data arrays
dat1 = rbind( t(aa$y)[1:20,], t(sh$y)[1:20,] );
dat2 = rbind( t(aa$y)[1:20,], t(ao$y)[1:20,] );
dat3 = rbind( dat1, t(ao$y)[1:20,] );
# Setup group labels
grp1 = gl(2,20);
grp2 = gl(2,20);
grp3 = gl(3,20);
```

This function takes as arguments a data matrix with one entry per row. The second argument contains the labels identifying which entries in the data matrix belong to which group.

```
# Compare two disimilar phonemes (should return TRUE)
ksample.com(dat1,grp1);
```

`## [1] TRUE`

```
# Compare two similar phonemes (should return FALSE)
ksample.com(dat2,grp2);
```

`## [1] FALSE`

```
# Compare three phonemes (should return TRUE)
ksample.com(dat3,grp3);
```

`## [1] TRUE`

A boolean variable is returned indicating whether or not the test believes the covariance operators differ significantly. This test is fast to compute, but inherently conservative. As a result, the two scale arguments can be used to tweak the test as detailed in Kashlak et al. (2017). The default test size is alpha = 0.05. This can also be modified as an argument to the above functions.

`ksample.perm`

can be used to perform the metric-based permutation test for the equality of covariance operators of Cabassi et al. (2017). Again, we use a subset of the data contained in the phoneme dataset from the R package `fds`

to illustrate how to use this function.

```
## Create data set
# Load data
data(aa); data(ao); data(dcl); data(iy); data(sh)
# Select 20 observations from each dataset
dat = cbind(aa$y[,1:20],ao$y[,1:20],dcl$y[,1:20],iy$y[,1:20],sh$y[,1:20])
# Input matrix must be of size N (number of observations) X P (number of time points)
dat = t(dat)
# Define cluster labels
grp = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20))
```

The function takes as input the dataset `dat`

and the vector of group labels `grp`

.

```
## Test the equality of the covariance operators
p = ksample.perm(dat, grp, part = TRUE)
```

The user can also choose the number of iterations of the permutation test `iter`

(the default is `1000`

), the distance between covariance operators `dist`

(the default is `sq`

, the square root distance) and the combining function to find the global p-value `comb`

(the default is `tipp`

, the Tippett combining function). If the data have already been centred around the mean, we suggest to set `cent = TRUE`

. Moreover, if `part=FALSE`

, this function returns only the global p-value. If `part=TRUE`

, it returns a vector that contains both the global p-value (in `p$global`

) and a list of partial p-values, one for each pairwise comparison (in `p$partial`

).

`p$global # global p-value`

`## [1] 0`

`p$partial # partial p-values`

```
## dist p_value signif
## 1-2 30.00486 0.000 ***
## 1-3 28.78225 0.000 ***
## 1-4 30.23026 0.075 .
## 1-5 30.05272 0.000 ***
## 2-3 26.96973 0.000 ***
## 2-4 29.17826 0.000 ***
## 2-5 27.55809 0.000 ***
## 3-4 28.35040 0.000 ***
## 3-5 26.81700 0.000 ***
## 4-5 29.97638 0.000 ***
```

The values of the observed distances between each pair of covariance operators are also reported in the table. The user can also choose to adjust the p-values with the step-down method of Westfall and Young (1993) by setting `adj = TRUE`

in `ksample.perm`

. For more information about these options please see and Cabassi et al. (2017).

The partial p-values can be plotted using the `perm.plot`

function, that takes as input the list `p`

, output by `perm.plot`

, the number of groups and a vector of strings that indicate the class labels (optional):

`perm.plot(p, 5, lab=c('aa','ao','dcl','iy','sh')) # visualise partial p-values`

It is also possible to automatically save the figure by setting `save = TRUE`

. The figure will be saved in `.eps`

format. The name of the figure can be customised using the parameter `name`

.

`ksample.gauss`

and `ksample.vstab`

are two methods detailed in Panaretos et al. (2010) that compare two samples of functional data for testing for equality of covariance operators. In this case, the data is assumed to be Gaussian. Considering the same phoneme data,

```
# Load in phoneme data
library(fds)
# Set up test data
dat1 = t(aa$y)[1:20,];
dat2 = t(sh$y)[1:20,];
dat3 = t(aa$y)[21:40,];
```

we can compare two sets of 20 observations corresponding to different phonemes and two sets of 20 observations from the same population of phonemes.

```
# Compare two disimilar phonemes
# Resulting in a small p-value
ksample.gauss(dat1,dat2,K=5);
```

`## [1] 0.0002660378`

`ksample.vstab(dat1,dat2,K=5);`

`## [1] 1.1825e-08`

```
# Compare two sets of the same phonemes
# Resulting in a large p-value
ksample.gauss(dat1,dat3,K=5);
```

`## [1] 0.5825128`

`ksample.vstab(dat1,dat3,K=5);`

`## [1] 0.4979127`

In these examples, K=5 eigen-functions were used to reduce the data to a finite dimensional space. Then, a test statistic is constructed with asymptotic chi-squared distribution with K(K+1)/2 degrees of freedom. From there, the p-values are computed.

`classif.com`

trains a covariance operator based functional data classifier that makes use of concentration inequalities in the same way as the above `ksample.com`

function. `predict.classif.com`

expands upon the generic `predict`

function for the classif.com class. It uses the previously trained classifier to classify new observations.

```
library(fds);
# Setup training data
dat1 = rbind(
t(aa$y[,1:100]), t(ao$y[,1:100]), t(dcl$y[,1:100]),
t(iy$y[,1:100]), t(sh$y[,1:100])
);
# Setup testing data
dat2 = rbind(
t(aa$y[,101:400]), t(ao$y[,101:400]), t(dcl$y[,101:400]),
t(iy$y[,101:400]), t(sh$y[,101:400])
);
datgrp = gl(5,100);
```

Once the data has been read in, the classifier can be trained.

`clCom = classif.com( datgrp, dat1 );`

Subsequently, the class associated with new observations can be predicted based on the trained classifier.

```
grp = predict( clCom, dat2, LOADING=FALSE );
acc = c(
sum( grp[1:300]==1 ), sum( grp[301:600]==2 ), sum( grp[601:900]==3 ),
sum( grp[901:1200]==4 ), sum( grp[1201:1500]==5 )
)/300;
print(rbind(gl(5,1),signif(acc,3)));
```

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.000 2.000 3.00 4.000 5.00
## [2,] 0.783 0.797 0.96 0.993 0.99
```

`cluster.com`

clusters sets of functional data via their covariance operators making use of an EM style algorithm with concentration inequalities. This method similarly is based on the same concentration paradigm as `ksample.com`

and `classif.com`

.

```
# Setup data to be clustered
dat = rbind( t(aa$y[,1:20]),t(iy$y[,1:20]),t(sh$y[,1:20]) );
```

Given the unlabelled phoneme data, the following method will assign categories to each entry given the preselected number of categories in the argument grpCnt.

```
# Cluster data into three groups
clst = cluster.com(dat,grpCnt=3,PRINTLK = FALSE);
matrix(clst,3,20,byrow=TRUE);
```

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,] 1 1 1 1 1 1 1 1 1 1 1 1 1
## [2,] 2 2 2 2 2 2 2 2 2 3 2 2 2
## [3,] 3 3 3 3 3 3 3 3 3 3 3 3 3
## [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## [1,] 1 1 1 1 1 1 1
## [2,] 2 2 2 3 2 2 3
## [3,] 3 3 3 3 3 3 3
```

In this next example, groups of curves are clustered as units. That is, we begin with a sample of 120 observations in the variable dat. We next assign labels grouping each sequential set of four obervations as a single unit. In this way, the algorithm clusters based on 30 rank 4 empirical covariance operators instead of on 120 individual curves.

```
# cluster groups of curves
dat = rbind( t(aa$y[,1:40]),t(iy$y[,1:40]),t(sh$y[,1:40]) );
lab = gl(30,4);
# Cluster data into three groups
clst = cluster.com(dat,labl=lab,grpCnt=3,PRINTLK = FALSE);
matrix(clst,3,10,byrow=TRUE);
```

```
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 2 2 2 2 2 2 2 2 2 2
## [2,] 1 1 1 1 1 1 1 1 1 1
## [3,] 3 3 3 3 3 3 3 3 3 3
```

Cabassi, A., Pigoli, D., Secchi, P., Carter, P. A. (2017). Permutation tests for the equality of covariance operators of functional data with applications to evolutionary biology. Electron. J. Statist. 11(2), pp.3815–3840.

Kashlak, A.B., Aston, J.A. and Nickl, R., (2016). Inference on covariance operators via concentration inequalities: k-sample tests, classification, and clustering via Rademacher complexities. arXiv preprint arXiv:1604.06310.

Panaretos, Victor M., David Kraus, and John H. Maddocks. “Second-order comparison of Gaussian random functions and the geometry of DNA minicircles.” Journal of the American Statistical Association 105.490 (2010): 670-682.

Pigoli, D., Aston, J.A., Dryden, I.L. and Secchi, P., (2014). Distances and inference for covariance operators. Biometrika, 101(2), pp.409-422.

Westfall, P. H. and Young, S. S. (1993). Resampling-based multiple testing: Examples and methods for p-value adjustment, volume 279. John Wiley & Sons.