In network analysis, many community detection algorithms have been developed. However,their applications leave unaddressed one important question: **the statistical validation of the results**.

*robin* (ROBustness in Network) is an R package for the validation of community detection. It has a double aim: **to study the robustness** of a community detection algorithm and **to compare** the robustness of **different community detection algorithms** on the same network.

It provides: 1) a procedure to examine the robustness of a community detection algorithm against random perturbations of the original graph; 2) two tests to determine the statistical difference between stability measure curves; 3) a routine to choose among different community detection algorithms the one that better fits the network of interest; 4) a graphical interactive representation.

The input of the *robin* package is a network that can be read from different format: edgelist, pajek, graphml, gml, ncol, lgl, dimacs, graphdb and igraph graphs.

**prepGraph** function creates an *igraph* object from the input file. This step is necessary for *robin* execution

```
my_network <- system.file("example/football.gml", package="robin")
# downloaded from: http://www-personal.umich.edu/~mejn/netdata/
graph <- prepGraph(file=my_network, file.format="gml")
graph
```

```
## IGRAPH b2be2a0 U--- 115 613 --
## + attr: id (v/n), label (v/c), value (v/n)
## + edges from b2be2a0:
## [1] 1-- 2 1-- 5 1-- 10 1-- 17 1-- 24 1-- 34 1-- 36 1-- 42 1-- 66 1-- 91
## [11] 1-- 94 1--105 2-- 26 2-- 28 2-- 34 2-- 38 2-- 46 2-- 58 2-- 90 2--102
## [21] 2--104 2--106 2--110 3-- 4 3-- 7 3-- 14 3-- 15 3-- 16 3-- 48 3-- 61
## [31] 3-- 65 3-- 73 3-- 75 3--101 3--107 4-- 6 4-- 12 4-- 27 4-- 41 4-- 53
## [41] 4-- 59 4-- 73 4-- 75 4-- 82 4-- 85 4--103 5-- 6 5-- 10 5-- 17 5-- 24
## [51] 5-- 29 5-- 42 5-- 70 5-- 94 5--105 5--109 6-- 11 6-- 12 6-- 53 6-- 75
## [61] 6-- 82 6-- 85 6-- 91 6-- 98 6-- 99 6--108 7-- 8 7-- 33 7-- 40 7-- 48
## [71] 7-- 56 7-- 59 7-- 61 7-- 65 7-- 86 7--101 7--107 8-- 9 8-- 22 8-- 23
## + ... omitted several edges
```

*robin* offers two choiches for the null model:

external building according to users’ preferences, then the null graph is passed as a variable;

generation by using the function

**random**.

**random** function creates the null model for the robustness procedure **robinRobust**. The graph argument must be the same returned by **prepGraph** function.

The function **random** creates a random graph with the same degree distribution of the original graph, but with completely random edges, using the **rewire** function of the package igraph with the option “keeping_degseq” that preserves the degree distribution of the original network. The function **rewire** randomly assigns a number of edges between vertices with the given degree distribution.

```
## IGRAPH b2ca72c U--- 115 613 --
## + attr: id (v/n), label (v/c), value (v/n)
## + edges from b2ca72c:
## [1] 1-- 52 1-- 24 1--110 1-- 47 11--110 1-- 53 1--107 40-- 74 1-- 66
## [10] 54-- 63 35-- 80 83--105 2-- 26 63--108 2-- 34 83-- 87 2-- 60 2-- 58
## [19] 2-- 55 2--114 2-- 36 2-- 15 70--103 4-- 29 7--110 3-- 14 15--108
## [28] 64-- 83 3-- 48 3--100 91-- 99 55-- 94 75--104 66--101 3-- 85 64-- 85
## [37] 4-- 28 4--101 4-- 57 4-- 53 56-- 59 4-- 73 69--110 66-- 82 4-- 75
## [46] 4-- 54 87--106 5-- 41 5-- 17 81--100 5-- 12 5-- 42 5-- 27 5-- 94
## [55] 5--105 5-- 97 6-- 27 12--107 6-- 88 6-- 77 6-- 33 6-- 78 24-- 93
## [64] 17-- 98 6-- 99 6--102 7-- 8 33--102 6-- 7 7-- 48 7-- 56 7-- 59
## + ... omitted several edges
```

**plotGraph** function offers a graphical representation of the network with the aid of *networkD3* package.

**methodCommunity** function detects communities using all the algorithms implemented in *igraph* package: “walktrap”, “edgeBetweenness”, “fastGreedy”, “spinglass”, “leadingEigen”, “labelProp”, “infomap”, “optimal”, “other”.

```
## IGRAPH clustering fast greedy, groups: 6, mod: 0.55
## + groups:
## $`1`
## [1] 7 14 16 33 40 48 61 65 101 107
##
## $`2`
## [1] 8 9 10 17 22 23 24 42 47 50 52 54 68 69 74 78 79 89
## [19] 105 109 111 112 115
##
## $`3`
## [1] 1 2 20 26 30 31 34 36 38 46 56 80 81 83 90 94 95 102
## [19] 104 106 110
## + ... omitted several groups/vertices
```

**membershipCommunities** function detects the community membership.

```
## [1] 3 3 5 5 5 5 1 2 2 2 5 5 4 1 4 1 2 6 4 3 6 2 2 2 5 3 4 6 5 3 3 4 1 3 4 3 6
## [38] 3 4 1 5 2 6 4 6 3 2 1 6 2 5 2 5 2 4 3 6 6 6 6 1 4 6 6 1 6 6 2 2 5 6 4 5 2
## [75] 5 6 6 2 2 3 3 5 3 5 5 4 6 6 2 3 5 6 6 3 3 6 6 6 5 4 1 3 5 3 2 3 1 5 2 3 2
## [112] 2 6 6 2
```

**plotComm** function produces an interactive 3D plot of the communites detected by the chosen algorithm.

**robinRobust** function implements the validation of the community robustness. In this example we use “vi” distance as stability measure, “indipendent” type procedure and “louvain” as community detection algorithm.

Users can choose different measures: “nmi”,“split.join”, “adjusted.rand”.

```
proc <- robinRobust(graph=graph, graphRandom=graphRandom, measure="vi",
method="louvain", type="independent")
```

**plotRobin** function plots the stability measure curves. The (x,y)-axes represent the percentuage of perturbation and the average of the stability measure, respectively.

```
plotRobin(graph=graph, model1=proc$Mean, model2=proc$MeanRandom,
legend=c("real data", "null model"), measure="vi")
```

The differeces between the stability measure curves are tested using: 1) Functional Data Analysis (FDA); 2) Gaussian Process (GP). Moreover the Area Under the Curve (AUC) is evaluated too.

**robinFDATest** function implements the FDA testing.

```
## [1] "First step: basis expansion"
## Swapping 'y' and 'argvals', because 'y' is simpler,
## and 'argvals' should be; now dim(argvals) = 13 ; dim(y) = 13 x 20
## [1] "Second step: joint univariate tests"
## [1] "Third step: interval-wise combination and correction"
## [1] "creating the p-value matrix: end of row 2 out of 9"
## [1] "creating the p-value matrix: end of row 3 out of 9"
## [1] "creating the p-value matrix: end of row 4 out of 9"
## [1] "creating the p-value matrix: end of row 5 out of 9"
## [1] "creating the p-value matrix: end of row 6 out of 9"
## [1] "creating the p-value matrix: end of row 7 out of 9"
## [1] "creating the p-value matrix: end of row 8 out of 9"
## [1] "creating the p-value matrix: end of row 9 out of 9"
## [1] "Interval Testing Procedure completed"
```

```
## $ask
## [1] TRUE
```

`## [1] 0.2321 0.0104 0.0021 0.0021 0.0021 0.0021 0.0021 0.0042 0.0055`

The first figure represents the stability measure plot using “louvain” algorithm for detecting communities. The second one represents the corresponding adjusted p-values of the Interval Testing procedure. Horizontal red line corresponds to the critical value 0.05.

**robinGPTest** function implements the GP testing.

```
## Profile 1
## Profile 2
```

`## [1] 109.1337`

The output is the Bayes Factor.

**robinAUC** function implements the AUC.

```
## $area1
## [1] 0.1131984
##
## $area2
## [1] 0.2541
```

The outputs are the area under the two curves

**robinCompare** function compares two detection algorithms on the same network and permitts the user to choose the one that better fits the network of interest.

In this example we consider the “Fast Greedy” and “Louvain” algorithms.

We firstly plot the communities dectected by both algorithms.

```
membersFast <- membershipCommunities(graph=graph, method="fastGreedy")
membersLouv <- membershipCommunities(graph=graph, method="louvain")
plotComm(graph=graph, members=membersFast)
```

Secondly, we compare them with **robinCompare** function.

```
comp <- robinCompare(graph=graph, method1="fastGreedy",
method2="louvain", measure="vi", type="independent")
```

Thirdly, we plot the curves of the compared methods.

```
plotRobin(graph=graph, model1=comp$Mean1, model2=comp$Mean2, measure="vi",
legend=c("fastGreedy", "louvain"), title="FastGreedy vs Louvain")
```

In this example, the Louvain algorithm fits better the network of interest, as the curve of the stability measure assumes lower values than the one obtained by the Fast greedy method.

The following procedures test the statistical differeces between the two curves using two different methods

```
## [1] "First step: basis expansion"
## Swapping 'y' and 'argvals', because 'y' is simpler,
## and 'argvals' should be; now dim(argvals) = 13 ; dim(y) = 13 x 20
## [1] "Second step: joint univariate tests"
## [1] "Third step: interval-wise combination and correction"
## [1] "creating the p-value matrix: end of row 2 out of 9"
## [1] "creating the p-value matrix: end of row 3 out of 9"
## [1] "creating the p-value matrix: end of row 4 out of 9"
## [1] "creating the p-value matrix: end of row 5 out of 9"
## [1] "creating the p-value matrix: end of row 6 out of 9"
## [1] "creating the p-value matrix: end of row 7 out of 9"
## [1] "creating the p-value matrix: end of row 8 out of 9"
## [1] "creating the p-value matrix: end of row 9 out of 9"
## [1] "Interval Testing Procedure completed"
```

```
## $ask
## [1] TRUE
```

`## [1] 0.0478 0.0471 0.0034 0.0023 0.0056 0.0432 0.7487 0.3727 0.2772`

```
## Profile 1
## Profile 2
```

`## [1] 21.2705`

while

```
## $area1
## [1] 0.1682532
##
## $area2
## [1] 0.1187933
```

calculates the area under the curves.