Clustering with the Leiden Algorithm in R

This package allows calling the Leiden algorithm for clustering on an igraph object from R. See the Python and Java implementations for more details:

https://github.com/CWTSLeiden/networkanalysis

https://github.com/vtraag/leidenalg

Install

This package requires the 'leidenalg' and 'igraph' modules for python (2) to be installed on your system. For example:

pip install leidenalg igraph

If you do not have root access, you can use pip install --user or pip install --prefix to install these in your user directory (which you have write permissions for) and ensure that this directory is in your PATH so that Python can find it.

The 'devtools' package will be used to install 'leiden' and the dependancies (igraph and reticulate). To install the development version:

if (!requireNamespace("devtools"))
    install.packages("devtools")
devtools::install_github("TomKellyGenetics/leiden")

The current release on CRAN can be installed with:

install.packages("leiden")
library("leiden")

Usage

Running the Leiden algorithm in R

First set up a compatible adjacency matrix:

adjacency_matrix <- rbind(cbind(matrix(round(rbinom(4000, 1, 0.8)), 20, 20),
                                matrix(round(rbinom(4000, 1, 0.3)), 20, 20), 
                                matrix(round(rbinom(400, 1, 0.1)), 20, 20)),
                          cbind(matrix(round(rbinom(400, 1, 0.3)), 20, 20), 
                                matrix(round(rbinom(400, 1, 0.8)), 20, 20), 
                                matrix(round(rbinom(4000, 1, 0.2)), 20, 20)),
                          cbind(matrix(round(rbinom(400, 1, 0.3)), 20, 20), 
                                matrix(round(rbinom(4000, 1, 0.1)), 20, 20), 
                                matrix(round(rbinom(4000, 1, 0.9)), 20, 20)))
str(adjacency_matrix)
#>  num [1:60, 1:60] 1 1 1 1 1 1 1 1 1 1 ...
dim(adjacency_matrix )
#> [1] 60 60

An adjacency matrix is any binary matrix representing links between nodes (column and row names). It is a directed graph if the adjacency matrix is not symmetric.

library("igraph")
rownames(adjacency_matrix) <- 1:60
colnames(adjacency_matrix) <- 1:60
graph_object <- graph_from_adjacency_matrix(adjacency_matrix, mode = "directed")
graph_object
#> IGRAPH 078946c DN-- 60 1539 -- 
#> + attr: name (v/c)
#> + edges from 078946c (vertex names):
#>  [1] 1->1  1->2  1->3  1->4  1->5  1->6  1->7  1->8  1->9  1->10 1->11
#> [12] 1->12 1->13 1->14 1->15 1->16 1->18 1->20 1->25 1->31 1->33 1->36
#> [23] 1->37 1->38 1->39 1->40 1->55 2->1  2->2  2->3  2->5  2->6  2->7 
#> [34] 2->8  2->9  2->10 2->11 2->12 2->13 2->16 2->18 2->19 2->20 2->25
#> [45] 2->26 2->37 2->40 2->41 2->54 2->56 2->57 3->1  3->2  3->3  3->5 
#> [56] 3->6  3->7  3->8  3->10 3->11 3->12 3->13 3->14 3->15 3->16 3->17
#> [67] 3->18 3->19 3->20 3->22 3->26 3->27 3->34 3->36 3->39 3->55 4->1 
#> [78] 4->2  4->3  4->4  4->5  4->6  4->7  4->10 4->11 4->12 4->15 4->16
#> + ... omitted several edges

This represents the following graph structure.

plot(graph_object, vertex.color = "grey75")

plot of chunk unnamed-chunk-6

This can be a shared nearest neighbours matrix derived from a graph object.

library("igraph")
adjacency_matrix <- igraph::as_adjacency_matrix(graph_object)

Then the Leiden algorithm can be run on the adjacency matrix.

partition <- leiden(adjacency_matrix)
table(partition)
#> partition
#>  1  2  3 
#> 20 20 20

Here we can see partitions in the plotted results. The nodes that are more interconnected have been partitioned into separate clusters.

library("RColorBrewer")
node.cols <- brewer.pal(max(c(3, partition)),"Pastel1")[partition]
plot(graph_object, vertex.color = node.cols)

plot of chunk unnamed-chunk-11

Running Leiden with arguments passed to leidenalg

Arguments can be passed to the leidenalg implementation in Python:

#run with defaults
  partition <- leiden(adjacency_matrix)


#run with ModularityVertexPartition"
  partition <- leiden(adjacency_matrix, partition_type = "ModularityVertexPartition")


#run with resolution parameter
  partition <- leiden(adjacency_matrix, resolution_parameter = 0.95)

In particular, the resolution parameter can tune the number of clusters to be detected.

partition <- leiden(adjacency_matrix, resolution_parameter = 0.5)
node.cols <- brewer.pal(max(c(3, partition)),"Pastel1")[partition]
plot(graph_object, vertex.color = node.cols)

plot of chunk unnamed-chunk-13

partition <- leiden(adjacency_matrix, resolution_parameter = 1.8)
node.cols <- brewer.pal(max(c(3, partition)),"Pastel1")[partition]
plot(graph_object, vertex.color = node.cols)

plot of chunk unnamed-chunk-14

Weights for edges an also be passed to the leiden algorithm either as a separate vector or weights or a weighted adjacency matrix.

#generate example weights
weights <- sample(1:10, sum(adjacency_matrix!=0), replace=TRUE)
partition <- leiden(adjacency_matrix, weights = weights)
table(partition)
#> partition
#>  1  2  3 
#> 20 20 20
#generate example weighted matrix
adjacency_matrix[adjacency_matrix == 1] <- weights
partition <- leiden(adjacency_matrix)
table(partition)
#> partition
#>  1  2  3 
#> 20 20 20

See the documentation on the leidenalg Python module for more information: https://leidenalg.readthedocs.io/en/latest/reference.html

Running on a Seurat Object

Seurat version 2

To use Leiden with the Seurat pipeline for a Seurat Object object that has an SNN computed (for example with Seurat::FindClusters with save.SNN = TRUE). This will compute the Leiden clusters and add them to the Seurat Object Class.

adjacency_matrix <- as.matrix(object@snn)
membership <- leiden(adjacency_matrix)
object@ident <- as.factor(membership)
names(object@ident) <- rownames(object@meta.data)
object@meta.data$ident <- as.factor(membership)

Note that this code is designed for Seurat version 2 releases.

Seurat version 3 or later

Note that the object for Seurat version 3 has changed. For example an SNN can be generated:

library("Seurat")
FindClusters(pbmc_small)
membership <- leiden(pbmc_small@graphs$RNA_snn)
table(membership)

For Seurat version 3 objects, the Leiden algorithm has been implemented in the Seurat version 3 package with Seurat::FindClusters and algorithm = "leiden"). See the documentation for these functions.

FindClusters(pbmc_small, algorithm = "leiden")
table(pbmc_small@active.ident)