The univariate Kolmogorov-Smirnov (KS) test is a nonâ€“parametric
statistical test designed to assess whether two samples come from the
same underlying distribution. The versatility of the KS test has made it
a cornerstone of statistical analysis across the scientific disciplines.
However, the test proposed by Kolmogorov and Smirnov does not naturally
extend to multidimensional distributions. Here, we present the
`fasano.franceschini.test`

package, an **R** implementation of the 2-D
KS twoâ€“sample test as defined by Fasano and Franceschini(1).
The `fasano.franceschini.test`

package provides three improvements over
the current 2-D KS test on the Comprehensive **R** Archive Network
(CRAN): (i) the Fasano and Franceschini test has been shown to run in
\(O(n^2)\) versus the Peacock implementation which runs in \(O(n^3)\); (ii)
the package implements a procedure for handling ties in the data; and
(iii) the package implements a parallelized permutation procedure for
improved significance testing. Ultimately, the
`fasano.franceschini.test`

package presents a robust statistical test
for analyzing random samples defined in 2-dimensions.

The Kolmogorovâ€“Smirnov (KS) is a nonâ€“parametric, univariate statistical test designed to assess whether a set of data is consistent with a given probability distribution (or, in the two-sample case, whether the two samples come from the same underlying distribution). First derived by Kolmogorov and Smirnov in a series of papers (2â€“8), the one-sample KS test defines the distribution of the quantity \(D_{KS}\), the maximal absolute difference between the empirical cumulative distribution function (CDF) of a set of values and a reference probability distribution. Kolmogorov and Smirnovâ€™s key insight was proving the distribution of \(D_{KS}\) was independent of the CDFs being tested. Thus, the test can effectively be used to compare any univariate empirical data distribution to any continuous univariate reference distribution. The two-sample KS test could further be used to compare any two univariate empirical data distributions against each other to determine if they are drawn from the same underlying univariate distribution.

The nonparametric versatility of the univariate KS test has made it a cornerstone of statistical analysis and is commonly used across the scientific disciplines (9â€“14). However, the KS test as proposed by Kolmogorov and Smirnov does not naturally extend to distributions in more than one dimension. Fortunately, a solution to the dimensionality issue was articulated by Peacock (15) and later extended by Fasano and Franceschini (1).

Currently, only the Peacock implementation of the 2-D two-sample KS test
is available in **R** (16) with the `Peacock.test`

package via the
`peacock2()`

function, but this has been shown to be markedly slower
than the Fasano and Franceschini algorithm (17). A **C**
implementation of the Fasanoâ€“Franceschini test is available in
(18); however, arguments have been made to the validity
of the implementation of the test not being distribution-free
(19). Furthermore, in the **C** implementation, statistical
testing is based on a fit to Monte Carlo simulation that is only valid
for significance levels \(\alpha \lessapprox 0.20\).

Here we present the `fasano.franceschini.test`

package as an **R**
implementation of the 2-D two-sample KS test described by Fasano and
Franceschini (1). The `fasano.franceschini.test`

package
provides two improvements over the current 2-D KS test available on the
Comprehensive Archive Network (CRAN): (i) the Fasano and Franceschini
test has been shown to run in \(O(n^2)\) versus the Peacock implementation
which runs in \(O(n^3)\); and (ii) the package implements a permutation
procedure for improved significance testing and mitigates the
limitations of the test brought noted by (19).

The Kolmogorovâ€“Smirnov (KS) test is a nonâ€“parametric method for
determining whether a sample is consistent with a given probability
distribution (20). In one dimension, the Kolmogorov-Smirnov
statistic (\(D_{KS}\)) is the defined by the maximum absolute difference
between the cumulative density functions of the data and model
(oneâ€“sample), or between the two data sets (twoâ€“sample), as
illustrated in **Figure 1**.