1 Installation

At present, fasthplus is only available for installation via github using the devtools package. A CRAN distribution is in preparation and this section will be updated to reflect its eventual availability.

devtools::install_github(repo="ntdyjack/fasthplus", 
                         ref = "main", 
                         build_vignettes = TRUE, 
                         subdir = NULL)

Once the package is installed, the library can be loaded:

library(fasthplus)

This vignette serves as an introductory example on how to utilize the fasthplus package.

2 Background

A standard unsupervised analysis is to cluster (i.e. label or partition) observations into discrete groups using a dissimilarity measure, such as Euclidean distance. If there does not exist a ground-truth label for each observation, internal validity metrics, such as the tightness or consistency are often used, such as within-cluster sums of squares (WCSS) or Silhouette scores (Rousseeuw 1987) to evaluate the performance of a set of predicted cluster labels.

Alternatively, one may also seek to assess the performance between multiple dissimilarity measures (i.e., geometric versus probabilistic dissimilarity) (Baker et al. 2021). However, when comparing different dissimilarity measures, the interpretation of these performance metrics can be problematic as different dissimilarity measures have different magnitudes and ranges of values that they span leading to different ranges in the tightness of the clusters, thereby making the interpretation of these internal validity metrics difficult.

One solution is to use discordance as an internal validity metric. For example, the discordance metric \(G_{+}\) (Williams and Clifford 1971;