Please see the arXiv paper for details. We denote the R package as dsos, to avoid confusion with D-SOS, the method.

DIY: Bring Your Own Scores

We show how easy it is to implement D-SOS for a particular notion of outlyingness. Suppose we want to test for no adverse shift based on isolation scores in the context of multivariate two-sample comparison. To do so, we need two main ingredients: a score function and a method to compute the \(p-\)value.

First, the scores are obtained using predictions from isolation forest with the isotree package (Cortes 2020). Isolation forest detects isolated points, instances that are typically out-of-distribution relative to the high-density regions of the data distribution. Naturally, any performant method for density-based out-of-distribution detection can effectively be used to achieve the same goal. Isolation forest just happens to be a convenient way to do this. The internal function outliers_no_split shows the implementation of one such score function in the dsos package.

dsos:::outliers_no_split
## function (x_train, x_test, num_trees = 500) 
## {
##     iso_fit <- isotree::isolation.forest(data = x_train, ntrees = num_trees)
##     os_train <- predict(iso_fit, newdata = x_train)
##     os_test <- predict(iso_fit, newdata = x_test)
##     return(list(test = os_test, train = os_train))
## }
## <bytecode: 0x000000001d5ae0f0>
## <environment: namespace:dsos>

Second, we estimate the empirical null distribution for the \(p-\