GERGM -- Master: Travis-CI Build Status CRAN_Status_Badge Development: Travis-CI Build Status

An R package to estimate Generalized Exponential Random Graph Models. To get started, check out this vignette!

PLEASE REPORT ANY BUGS OR ERRORS TO .

News

[03/12/17] A number of minor updates to the GERGM package (mostly in the documentation) and bump to version 0.11.0 on CRAN:

[09/17/16] A major update to the package has been pushed to the public GERGM repo. Here are some highlights:

[08/06/16] A major update to all aspects of the package has been pushed to the public GERGM repo and will shortly be up on CRAN. Note that this version does break compatibility with estimation results generated by the previous version of the package. Here are some highlights:

Model Overview

An R package which implements the Generalized Exponential Random Graph Model (GERGM) with an extension to estimation via Metropolis Hastings. The relevant papers detailing the model can be found at the links below:

To maximize translation across related methods, we have followed many naming conventions used in the statnet suite of packages, and in the ergm package particularly.

Installation

Requirements for using C++ code with R

See the Requirements for using C++ code with R section in the following tutorial: Using C++ and R code Together with Rcpp. You will likely need to install either Xcode or Rtools depending on whether you are using a Mac or Windows machine before you can use the package.

Installing The Package

The easiest way to do this is to install the package from CRAN via the standard install.packages command:

install.packages("GERGM")

This will take care of some weird compilation issues that can arise, and is the best option for most people. If you want the most current development version of the package (available here), you will need to start by making sure you have Hadley Wickham's devtools package installed.

install.packages("devtools")

Now we can install from Github using the following line:

devtools::install_github("matthewjdenny/GERGM")

I have had success installing this way on most major operating systems with R 3.2.0+ installed, but if you do not have the latest version of R installed, or run into some install errors (please email if you do), it should work as long as you install the dependencies first with the following block of code:

install.packages( pkgs = c("BH","RcppArmadillo","ggplot2","methods",
"stringr","igraph", "plyr", "parallel", "coda", "vegan", "scales",
"RcppParallel","slackr"), dependencies = TRUE)

Once the GERGM package is installed, you may access its functionality as you would any other package by calling:

library(GERGM)

If all went well, check out the ?GERGM help file to see a full working example with info on how the data should look.

Basic Usage

To use this package, first load in the network you wish to use as a (square) matrix, following the example provided below. You may then use the gergm() function to estimate a model using any combination of the following statistics: out2stars(alpha = 1), in2stars(alpha = 1), ctriads(alpha = 1), mutual(alpha = 1), ttriads(alpha = 1), absdiff(covariate = "MyCov"), sender(covariate = "MyCov"), reciever(covariate = "MyCov"), nodematch(covariate), nodemix(covariate, base = "MyBase"), netcov(network), and edges(alpha = 1, method = c("regression","endogenous")). Note that the edges term must be specified if the user wishes to include an intercept (strongly recommended). The user may select the "regression" method (default) to include an intercept in the lambda transformation of the network, or "endogenous" to include the intercept as in a traditional ERGM model. To use exponential down-weighting for any of the network level terms, simply specify a value for alpha less than 1. The (alpha = 1) term may be omitted from the structural terms if no exponential down-weighting is required. In this case, the terms may be provided as: out2stars, in2stars, ctriads, mutual, ttriads, edges.

If the network is undirected the user may only specify the following terms: twostars(alpha = 1), ttriads(alpha = 1), absdiff(covariate = "MyCov"), sender(covariate = "MyCov"), nodematch(covariate), nodemix(covariate, base = "MyBase"), netcov(network), and edges(alpha = 1, method = c("regression","endogenous")). The gergm() function will provide all of the estimation and diagnostic functionality and the parameters of this function can be queried by typing ?gergm into the R console. You may also generate diagnostic plots using a GERGM Object returned by the gergm() function by using any of the following functions: Estimate_Plot(), GOF(), Trace_Plot(). The user may also investigate the sensitivity of parameter estimates using the hysteresis() function. This function simulates large numbers of networks at parameter values around the estimated parameter values an plots the mean network density at each of these values to examine whether the model becomes degenerate due to small deviations in the parameter estimates. See the following reference for details:

The user may also predict edge values (one at a time, conditioning on the rest of the edge values, and estimated parameters), using the conditional_edge_prediction() and conditional_edge_prediction_plot() functions.

Examples

Here are two simple working examples using the gergm() function:

library(GERGM)
########################### 1. No Covariates #############################
# Preparing an unbounded network without covariates for gergm estimation #
set.seed(12345)
net <- matrix(rnorm(100,0,20),10,10)
colnames(net) <- rownames(net) <- letters[1:10]
formula <- net ~ edges(method = "endogenous") + mutual + ttriads 
  
test <- gergm(formula,
              normalization_type = "division",
              network_is_directed = TRUE,
              number_of_networks_to_simulate = 40000,
              thin = 1/10,
              proposal_variance = 0.2,
              MCMC_burnin = 10000,
              seed = 456,
              convergence_tolerance = 0.01,
              force_x_theta_update = 4)
  
########################### 2. Covariates #############################
# Preparing an unbounded network with covariates for gergm estimation #
set.seed(12345)
net <- matrix(runif(100,0,1),10,10)
colnames(net) <- rownames(net) <- letters[1:10]
node_level_covariates <- data.frame(Age = c(25,30,34,27,36,39,27,28,35,40),
                                    Height = c(70,70,67,58,65,67,64,74,76,80),
                                    Type = c("A","B","B","A","A","A","B","B","C","C"))
rownames(node_level_covariates) <- letters[1:10]
network_covariate <- net + matrix(rnorm(100,0,.5),10,10)
formula <- net ~  edges(method = "regression") + mutual + ttriads + sender("Age") + 
netcov("network_covariate") + nodemix("Type",base = "A")  
   
test <- gergm(formula,
              covariate_data = node_level_covariates,
              number_of_networks_to_simulate = 100000,
              thin = 1/10,
              proposal_variance = 0.2,
              MCMC_burnin = 50000,
              seed = 456,
              convergence_tolerance = 0.01,
              force_x_theta_update = 2)
  
# Generate Estimate Plot
Estimate_Plot(test)
# Generate GOF Plot
GOF(test)
# Generate Trace Plot
Trace_Plot(test)
# Generate Hysteresis plots for all structural parameter estimates
hysteresis_results <- hysteresis(test,
                                 networks_to_simulate = 1000,
                                 burnin = 500,
                                 range = 2,
                                 steps = 20,
                                 simulation_method = "Metropolis",
                                 proposal_variance = 0.2)
                                 

Edge Prediction

Following on from the example above, we can also predict individual edge values, conditioning on the rest of the observed edges and estimated parameters. These predictions can then be plotted as a grid of box plots for each individual edge, as in the following example code:

test2 <- conditional_edge_prediction(
  GERGM_Object = test,
  number_of_networks_to_simulate = 1000,
  thin = 1,
  proposal_variance = 0.5,
  MCMC_burnin = 1000,
  seed = 123)

# set working directory where we want to save the file. Only outputs a PDF
# because plotting is too slow in the graphics device.
setwd("~/Desktop")
conditional_edge_prediction_plot(edge_prediction_results = test2,
                                 filename = "test.pdf",
                                 plot_size = 20,
                                 node_name_cex = 6)

# make a scatter plot of predicted vs observed edge values
conditional_edge_prediction_correlation_plot(edge_prediction_results = test2)

# set x and y limits manually
conditional_edge_prediction_correlation_plot(edge_prediction_results = test2,
                                             xlim = c(-10,10),
                                             ylim = c(-10,10))

Parallel GERGM Specification Fitting

There is also now functionality to run multiple gergm() model specifications in parallel using the parallel_gergm() function. This can come in very handy if the user wishes to specify the same model but for a large number of networks, or multiple models for the same network. Another useful (experimental) feature that can now be turned out is hyperparameter_optimization = TRUE, which will seek to automatically optimize the number of networks simulated during MCMC, the burnin, the Metropolis Hastings proposal variance and will seek to address any issues with model degeneracy that arise during estimation by reducing exponential weights if using Metropolis Hastings. This feature is generally meant to make it easier and less time intensive to find a model that fits the data well.

set.seed(12345)
net <- matrix(runif(100,0,1),10,10)
colnames(net) <- rownames(net) <- letters[1:10]
node_level_covariates <- data.frame(Age = c(25,30,34,27,36,39,27,28,35,40),
                           Height = c(70,70,67,58,65,67,64,74,76,80),
                           Type = c("A","B","B","A","A","A","B","B","C","C"))
rownames(node_level_covariates) <- letters[1:10]
network_covariate <- net + matrix(rnorm(100,0,.5),10,10)

network_data_list <- list(network_covariate = network_covariate)

formula <- net ~ edges + netcov("network_covariate") + nodematch("Type",base = "A")
formula2 <- net ~ edges + sender("Age") +
  netcov("network_covariate") + nodemix("Type",base = "A")

form_list <- list(f1 = formula,
                  f2 = formula2)

testp <- parallel_gergm(formula_list = form_list,
                        observed_network_list = net,
                        covariate_data_list = node_level_covariates,
                        network_data_list = network_data_list,
                        cores = 2,
                        number_of_networks_to_simulate = 10000,
                        thin = 1/100,
                        proposal_variance = 0.1,
                        MCMC_burnin = 5000)

Finally, if you specified an output_directory and output_name, you will want to check the output_directory which will contain a number of .pdf's which can aide in assessing model fit and in determining the statistical significance of theta parameter estimates.

Output

If output_name is specified in the gergm() function, then five files will be automatically generated and saved to the output_directory. The example file names provided below are for output_name = "Test":

Advanced Features

The GERGM package incorporates a number of advanced features that are designed to aid in model fitting and evaluation, and expand the applicability of the model to a wider range of networks and network statistics.

gergm() optional arguments

Testing

So far, this package has been tested successfully on OSX, CentOS 7, Ubuntu and Windows 7. Please email me at mdenny@psu.edu if you have success on another OS or run into any problems.

Intellectual Property

Please note that the use of the correlation network estimation functionality in this package is protected from unlicensed commercial use in neuroimaging applications under a provisional patent. Please contact mdenny@psu.edu for more information if your company is interested in commercial neuroimaging applications using the GERGM.