This R package provides a fast C++ reimplementation of several density-based algorithms of the DBSCAN family for spatial data. The package includes:

**DBSCAN:**Density-based spatial clustering of applications with noise.**OPTICS/OPTICSXi:**Ordering points to identify the clustering structure clustering algorithms.**HDBSCAN:**Hierarchical DBSCAN with simplified hierarchy extraction.**LOF:**Local outlier factor algorithm.**GLOSH:**Global-Local Outlier Score from Hierarchies algorithm.

The implementations uses the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to **fast kNN and fixed-radius NN search** is provided along with **Jarvis-Patrick clustering** and **Shared Nearest Neighbor Clustering.** Additionally, a fast implementation of the **Framework for Optimal Selection of Clusters (FOSC)** is available that supports unsupervised and semisupervised clustering of hierarchical cluster tree ('hclust' object). Supports any arbitrary linkage criterion.

The implementations are typically faster than the native R implementations (e.g., dbscan in package `fpc`

), or the implementations in WEKA, ELKI and Python's scikit-learn.

**Stable CRAN version:** install from within R with

`install.packages("dbscan")`

**Current development version:** Download package from AppVeyor or install from GitHub (needs devtools).

`install_git("mhahsler/dbscan")`

Load the package and use the numeric variables in the iris dataset

```
library("dbscan")
data("iris")
x <- as.matrix(iris[, 1:4])
```

Run DBSCAN

```
db <- dbscan(x, eps = .4, minPts = 4)
db
```

```
DBSCAN clustering for 150 objects.
Parameters: eps = 0.4, minPts = 4
The clustering contains 4 cluster(s) and 25 noise points.
0 1 2 3 4
25 47 38 36 4
Available fields: cluster, eps, minPts
```

Visualize results (noise is shown in black)

`pairs(x, col = db$cluster + 1L)`

Calculate LOF (local outlier factor) and visualize (larger bubbles in the visualization have a larger LOF)

```
lof <- lof(x, k = 4)
pairs(x, cex = lof)
```

Run OPTICS

```
opt <- optics(x, eps = 1, minPts = 4)
opt
```

```
OPTICS clustering for 150 objects.
Parameters: minPts = 4, eps = 1, eps_cl = NA, xi = NA
Available fields: order, reachdist, coredist, predecessor, minPts, eps, eps_cl, xi
```

Extract DBSCAN-like clustering from OPTICS and create a reachability plot (extracted DBSCAN clusters at eps_cl=.4 are colored)

```
opt <- extractDBSCAN(opt, eps_cl = .4)
plot(opt)
```

Extract a hierarchical clustering using the Xi method (captures clusters of varying density)

```
opt <- extractXi(opt, xi = .05)
opt
plot(opt)
```

Run HDBSCAN (captures stable clusters)

```
hdb <- hdbscan(x, minPts = 4)
hdb
```

```
HDBSCAN clustering for 150 objects.
Parameters: minPts = 4
The clustering contains 2 cluster(s) and 0 noise points.
1 2
100 50
Available fields: cluster, minPts, cluster_scores, membership_prob, outlier_scores, hc
```

Visualize the results as a simplified tree

`plot(hdb, show_flat = T)`

See how well each point corresponds to the clusters found by the model used

```
colors <- mapply(function(col, i) adjustcolor(col, alpha.f = hdb$membership_prob[i]),
palette()[hdb$cluster+1], seq_along(hdb$cluster))
plot(x, col=colors, pch=20)
```

The dbscan package is licensed under the GNU General Public License (GPL) Version 3. The **OPTICSXi** R implementation was directly ported from the ELKI framework's Java implementation (GNU AGPLv3), with explicit permission granted by the original author, Erich Schubert.

- Development version of dbscan on github.
- List of changes from NEWS.md
- dbscan reference manual

*Maintainer:* Michael Hahsler