BDgraph with Simple Examples

Reza Mohammadi (https://orcid.org/0000-0001-9538-0648)

2022-09-28

The R package BDgraph provides statistical tools for Bayesian structure learning for undirected graphical models with continuous, count, binary, and mixed data. The package is implemented the recent improvements in the Bayesian graphical models’ literature, including Mohammadi and Wit (2015), Mohammadi et al. (2021), Mohammadi et al. (2017), and Dobra and Mohammadi (2018). Besides, the package contains several functions for simulation and visualization, as well as several multivariate datasets taken from the literature.

Install BDgraph using

install.packages( "BDgraph" )

First, we load BDgraph package as well as pROC and ggplot2

library( BDgraph )

library( pROC )    
library( ggplot2 )  

Here are two simple examples to show how to use the functionality of the package.

Example 1: Gaussian Graphical Models

Here is a simple example to see the performance of the package for the Gaussian graphical models. First, by using the function bdgraph.sim(), we simulate 200 observations (n = 200) from a multivariate Gaussian distribution with 15 variables (p = 15) and “scale-free” graph structure, as follows

set.seed( 20 )

data.sim = bdgraph.sim( n = 200, p = 15, graph = "scale-free", vis = TRUE )

Since the generated data are Gaussian, we run the bdgraph() function by choosing method = "ggm", as follows

bdgraph.obj = bdgraph( data = data.sim, method = "ggm", iter = 5000, verbose = FALSE )

To report confusion matrix with cutoff point 0.5:

conf.mat( actual = data.sim, pred = bdgraph.obj, cutoff = 0.5 )
            Actual
  Prediction  0  1
           0 89  4
           1  2 10

conf.mat.plot( actual = data.sim, pred = bdgraph.obj, cutoff = 0.5 )

To compare the result with the true graph

compare( data.sim, bdgraph.obj, main = c( "Target", "BDgraph" ), vis = TRUE )

                 Target BDgraph
  true positive      14  10.000
  true negative      91  89.000
  false positive      0   2.000
  false negative      0   4.000
  F1-score            1   0.769
  specificity         1   0.978
  sensitivity         1   0.714
  MCC                 1   0.740

Now, as an alternative, we run the bdgraph.mpl() function which is based on the GGMs and marginal pseudo-likelihood, as follows

bdgraph.mpl.obj = bdgraph.mpl( data = data.sim, method = "ggm", iter = 5000, verbose = FALSE )

conf.mat( actual = data.sim, pred = bdgraph.mpl.obj )
            Actual
  Prediction  0  1
           0 89  4
           1  2 10
conf.mat.plot( actual = data.sim, pred = bdgraph.mpl.obj )

We could compare the results of both algorithms with the true graph as follows

compare( data.sim, list( bdgraph.obj, bdgraph.mpl.obj ), 
         main = c( "Target", "BDgraph", "BDgraph.mpl" ), vis = TRUE )

                 Target BDgraph BDgraph.mpl
  true positive      14  10.000      10.000
  true negative      91  89.000      89.000
  false positive      0   2.000       2.000
  false negative      0   4.000       4.000
  F1-score            1   0.769       0.769
  specificity         1   0.978       0.978
  sensitivity         1   0.714       0.714
  MCC                 1   0.740       0.740

To see the performance of the BDMCMC algorithm we could plot the ROC curve as follows

roc.bdgraph     = BDgraph::roc( pred = bdgraph.obj,     actual = data.sim )
roc.bdgraph.mpl = BDgraph::roc( pred = bdgraph.mpl.obj, actual = data.sim )

pROC::ggroc( list( BDgraph = roc.bdgraph, BDgraph.mpl = roc.bdgraph.mpl ), size = 0.8 ) + 
    theme_minimal() + ggtitle( "ROC plots with AUC" ) +
  scale_color_manual( values = c( "red", "blue" ), 
    labels = c( paste( "AUC=", round( auc( roc.bdgraph ), 3 ), "; BDgraph; " ),
                paste( "AUC=", round( auc( roc.bdgraph.mpl ), 3 ), "; BDgraph.mpl" ) ) ) +
  theme( legend.title = element_blank() ) +
  theme( legend.position = c( .7, .3 ), text = element_text( size = 17 ) ) + 
    geom_segment( aes( x = 1, xend = 0, y = 0, yend = 1 ), color = "grey", linetype = "dashed" )

Example 2: Gaussian Copula Graphical Models

Here is a simple example to see the performance of the package for the mixed data using Gaussian copula graphical models. First, by using the function bdgraph.sim(), we simulate 300 observations (n = 300) from mixed data (type = "mixed") with 10 variables (p = 10) and “random” graph structure, as follows

set.seed( 2 )

data.sim = bdgraph.sim( n = 300, p = 10, type = "mixed", graph = "random", vis = TRUE )

Since the generated data are mixed data, we are using run the bdgraph() function by choosing method = "gcgm", as follows:

bdgraph.obj = bdgraph( data = data.sim, method = "gcgm", iter = 5000, verbose = FALSE )

To compare the result with the true graph, we could run

compare( data.sim, bdgraph.obj, main = c( "Target", "BDgraph" ), vis = TRUE )

                 Target BDgraph
  true positive      12   9.000
  true negative      33  29.000
  false positive      0   4.000
  false negative      0   3.000
  F1-score            1   0.720
  specificity         1   0.879
  sensitivity         1   0.750
  MCC                 1   0.613

For more examples see Mohammadi and Wit (2019).