library(redist)
library(igraph)
#> 
#> Attaching package: 'igraph'
#> The following objects are masked from 'package:stats':
#> 
#>     decompose, spectrum
#> The following object is masked from 'package:base':
#> 
#>     union
library(spdep)
#> Warning: package 'spdep' was built under R version 3.6.2
#> Loading required package: sp
#> Warning: package 'sp' was built under R version 3.6.2
#> Loading required package: spData
#> Warning: package 'spData' was built under R version 3.6.2
#> To access larger datasets in this package, install the spDataLarge
#> package with: `install.packages('spDataLarge',
#> repos='https://nowosad.github.io/drat/', type='source')`
#> Loading required package: sf
#> Warning: package 'sf' was built under R version 3.6.2
#> Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
library(coda)
set.seed(1)

The redist package is designed to allow for replicable redistricting simulations. The package comes loaded with data for simple testing and with functions to simulate redistricting and to run diagnostics on the redistricting plans created. These data form the basis for small-scale validations of sampling methods in Automated Redistricting Simulation Using Markov Chain Monte Carlo. For larger scale validation, see The Essential Role of Empirical Validation in Legislative Redistricting Simulation, which has additional methods which will be added to this package later in 2020.

Table of Contents {#top}

Loading redist

redist can be installed either from CRAN, for the stable version with:

install.packages('redist')

or from GitHub, which is updated more often:

devtools::install_github(repo = 'kosukeimai/redist', ref = 'master')

Helpful Packages to be used with redist.

This package is often used with a set of other packages: sf and sp are useful for working with shapefiles and can be loaded with:

library(sf)
library(sp)

For additional functions for working with the shapefiles, using spdep is recommended. This is also used with creating lists of which precincts are adjacent to other precincts. It can be loaded with:

library(spdep)

For plotting maps and adjacency graphs, ggplot2 and igraph are useful. The former allows for making ggplot maps, when used with sf. These may be loaded with:

library(igraph)
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 3.6.2

Additionally useful data manipulation tools for working with objects from the package are dplyr and magrittr.

library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.6.2
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:igraph':
#> 
#>     as_data_frame, groups, union
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(magrittr)

Included Data {#data}

The package contains four datasets. The first three are algdat.p10, algdat.p20, and algdat.pfull which contain data for 25 continuous precincts within Florida when partitioned into three districts. Respectively, these are those plans which fall within 10% population parity, 20% population parity, and all possible partitions. These are loaded in with the package and can be loaded as follows:

data("algdat.p10")
data("algdat.p20")
data("algdat.pfull")

Each algdat object is a list with five objects:

names(algdat.p10)
#> [1] "adjlist"           "cdmat"             "precinct.data"    
#> [4] "segregation.index" "distancemat"
names(algdat.p20)
#> [1] "adjlist"           "cdmat"             "precinct.data"    
#> [4] "segregation.index" "distancemat"
names(algdat.pfull)
#> [1] "adjlist"           "cdmat"             "precinct.data"    
#> [4] "segregation.index" "distancemat"

The fourth data set is the sf dataframe, containing the shapes themselves for creating these objects. It can be called with:

data("fl25")

From the head of the data, we can see this is a typical dataframe with a geometry column appended. See the website for sf for more information about this type of object.

head(fl25)
#> Simple feature collection with 6 features and 11 fields
#> geometry type:  POLYGON
#> dimension:      XY
#> bbox:           xmin: -81.85582 ymin: 26.4483 xmax: -81.56292 ymax: 26.7171
#> CRS:            NA
#>       geoid10   pop   vap obama mccain TotPop BlackPop HispPop   VAP BlackVAP
#> 2942 2519.0_0 15993 12379   647   1522  15993      863    1980 12379      582
#> 2561 2402.0_0  7799  6084   266    226   7799     2417    1600  6084     1601
#> 2581 2520.0_0  6347  5698   758   1846   6347       97     304  5698       78
#> 2594 2532.0_0  3808  3134   424    828   3808       41     180  3134       27
#> 2687 2438.0_0  4982  4173   668    842   4982      316     837  4173      199
#> 2732 2521.0_0  8907  7587   644    935   8907      195    1147  7587      153
#>      HispVAP                       geometry
#> 2942    1292 POLYGON ((-81.78102 26.6274...
#> 2561    1133 POLYGON ((-81.80102 26.6341...
#> 2581     240 POLYGON ((-81.79772 26.5093...
#> 2594     134 POLYGON ((-81.79842 26.5474...
#> 2687     545 POLYGON ((-81.75042 26.7038...
#> 2732     860 POLYGON ((-81.66622 26.4657...

Each column contains useful information for typical redistricting questions:

Adjacency-Based Redistricting {#adj}

We begin with a simple example of thinking about adjacency-based redistricting, using a small piece of Florida, named fl25. It can be loaded with the data command as follows. For more information on the data, see the data section

data("fl25")

This Florida subset contains the shapefiles for the districts, which can be plotted with sf.

fl25 %>% ggplot(aes(fill = pop)) + 
  geom_sf()

plot of chunk unnamed-chunk-11

In this map, the color represents the population, where darker colors are smaller populations. The 25 shapes outlined above are the precincts that we work with. A difficulty in redistricting is that the area of the precinct does not necessarily tell us anything about the population of the precinct. As such, we work with adjacency lists for the precincts.

If we number the districts from 1 to 25, we say that two districts are connected if they share a side, which is referred to as rook contiguity.

If we arbitrarily number the districts from the plot above, we can pick a small subset of them.

fl25$id <- 1:25
fl25[c(18,19,23:25),] %>% ggplot() + 
  geom_sf() +
  geom_sf_label(aes(label = id))

plot of chunk unnamed-chunk-12

Then, the 25th precinct by this numbering is connected to 19, 23, and 24. So, the adjacency list would be 19,23,24. 18 would not be on this list, as it does not share a portion of a side with precinct 25.

To make an adjacency list, we may use the packages spdep with function poly2nb. We use this function with queen = FALSE, as we want the rooks adjacency, not the queens adjacency, which would allow for if only corners touch, like on a chess board.

adjlist <- poly2nb(pl = fl25, queen = FALSE)

We can further verify the above comments by looking at the 25th element of the adjacency list, which corresponds to the 25th district.

adjlist[[25]]
#> [1]  5 22 24

Using igraph, we can easily plot the whole set of adjacencies:

plot(graph_from_adj_list(adjlist, mode = 'total'))

plot of chunk unnamed-chunk-15

While these are numbered in 1:25, the back end in C++ requires 0 indexing for efficiency, so we sink the adjacency list to be 0:24.

for(i in 1:25){
  adjlist[[i]] <- adjlist[[i]]-1
}

Thus, everything is the same up to naming.

adjlist[[25]]
#> [1]  4 21 23

Each algorithm within the package maintains geographically contiguous districts. The MCMC and enumerate algorithms consider the underlying graphical structure of the districts, where contiguity is represented by the edges in the above graph.

Back to top

Redistricting with MCMC {#mcmc}

To begin running the MCMC algorithm, we have to provide some basic data. We provide the adjacency list to adjobj, a vector of populations to popvec, the number of districts as 3 to ndists, and choose to run 10000 simulations with nsims. We can also save the output with savename.

alg_mcmc <- redist.mcmc(adjobj = algdat.pfull$adjlist,
                        popvec = algdat.pfull$precinct.data$pop,
                        ndists = 3,
                        nsims = 10000,
                        savename = "redist.mcmc")
#> 
#> ==================== 
#> redist.mcmc(): Automated Redistricting Simulation Using
#>          Markov Chain Monte Carlo
#> 
#> Preprocessing data.
#> 
#> 
#> ==================== 
#> Using redist.rsg() to generate starting values.
#> 
#> 10 percent done.
#> Metropolis acceptance ratio: 0.962963
#> 
#> 20 percent done.
#> Metropolis acceptance ratio: 0.964982
#> 
#> 30 percent done.
#> Metropolis acceptance ratio: 0.962654
#> 
#> 40 percent done.
#> Metropolis acceptance ratio: 0.95949
#> 
#> 50 percent done.
#> Metropolis acceptance ratio: 0.962993
#> 
#> 60 percent done.
#> Metropolis acceptance ratio: 0.962827
#> 
#> 70 percent done.
#> Metropolis acceptance ratio: 0.963138
#> 
#> 80 percent done.
#> Metropolis acceptance ratio: 0.96237
#> 
#> 90 percent done.
#> Metropolis acceptance ratio: 0.962551
#> 
#> 100 percent done.
#> Metropolis acceptance ratio: 0.962796

Note that we do not need to specify ndists when we supply a different argument, initcds, which is the set of districts to initialize to. If we do not specific initcds, the RSG function is run to initialize districts. When tuning parameters, the initcds argument may be useful for ensuring diverse starting positions for different chains, though this is not typically necessary.

We can specify initcds, for example, as the first column of the full enumeration in algdat.pfull.

initcds <- algdat.pfull$cdmat[,1]
alg_mcmc <- redist.mcmc(adjobj = algdat.pfull$adjlist,
                        popvec = algdat.pfull$precinct.data$pop,
                        initcds = initcds,
                        nsims = 10000,
                        savename = "redist.mcmc")
#> 
#> ==================== 
#> redist.mcmc(): Automated Redistricting Simulation Using
#>          Markov Chain Monte Carlo
#> 
#> Preprocessing data.
#> 
#> 10 percent done.
#> Metropolis acceptance ratio: 0.958959
#> 
#> 20 percent done.
#> Metropolis acceptance ratio: 0.962481
#> 
#> 30 percent done.
#> Metropolis acceptance ratio: 0.960654
#> 
#> 40 percent done.
#> Metropolis acceptance ratio: 0.963241
#> 
#> 50 percent done.
#> Metropolis acceptance ratio: 0.962993
#> 
#> 60 percent done.
#> Metropolis acceptance ratio: 0.961327
#> 
#> 70 percent done.
#> Metropolis acceptance ratio: 0.961566
#> 
#> 80 percent done.
#> Metropolis acceptance ratio: 0.960995
#> 
#> 90 percent done.
#> Metropolis acceptance ratio: 0.961662
#> 
#> 100 percent done.
#> Metropolis acceptance ratio: 0.961096

Once we've run the algorithm, the output is of class redist. As savename was provided, there is an Rdata file created in the working directory with a copy of the output.

class(alg_mcmc)
#> [1] "redist"
names(alg_mcmc)
#>  [1] "partitions"             "distance_parity"        "distance_original"     
#>  [4] "mhdecisions"            "mhprob"                 "pparam"                
#>  [7] "beta_sequence"          "energy_psi"             "constraint_pop"        
#> [10] "constraint_compact"     "constraint_segregation" "constraint_similar"    
#> [13] "constraint_countysplit" "boundary_partitions"    "boundaryratio"

For simple runs of the algorithm, the most important pieces of the output are partitions and distance_parity. The partitions object is a matrix with ndist rows and nsims columns. Each column is one redistricting plan, where the numbers go from 0 to n-1, with each number represents the district assignment.

alg_mcmc$partitions[,1]
#>  [1] 0 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2

The distance_parity output provides the population parity of the districts as an array with nsims entries.

alg_mcmc$distance_parity[1]
#> [1] 1.538696

The additional outputs are useful for implementing the algorithm with constraints.

Back to top

Redistricting with MCMC using MPI {#mpi}

The running of this function is the same as in redist.mcmc, but it requires Rmpi installed.

library(Rmpi)
redist.mcmc.mpi(adjobj = algdat.pfull$adjlist,
                popvec = algdat.pfull$precinct.data$pop,
                nsims = 10000, 
                ndists = 3,
                savename = "redist.mcmc.mpi")

Outputs and usage are the same as in the MCMC Section, but MPI will allow for much faster computation.

Back to top

Using Multiple Chains {#chains}

When running larger redistricting analyses, one important step is to run multiple chains of the MCMC algorithm. This will also allow us to diagnose convergence better, using the Gelman-Rubin plot, as seen in the section on Diagnostic Plots.

On Windows and in smaller capacities, it is useful to run the algorithm within an lapply loop. First, we set up the seed for replicability and decide on the number of chains and simulations.

RNGkind(kind = "L'Ecuyer-CMRG")
set.seed(1)
nchains <- 4
nsims <- 10000
mcmc_chains <- lapply(1:nchains, function(x){
          redist.mcmc(adjobj = algdat.pfull$adjlist, 
                      popvec = algdat.pfull$precinct.data$pop, 
                      nsims = nsims,
                      ndists = 3)
})
#> 
#> ==================== 
#> redist.mcmc(): Automated Redistricting Simulation Using
#>          Markov Chain Monte Carlo
#> 
#> Preprocessing data.
#> 
#> 
#> ==================== 
#> Using redist.rsg() to generate starting values.
#> 
#> 10 percent done.
#> Metropolis acceptance ratio: 0.966967
#> 
#> 20 percent done.
#> Metropolis acceptance ratio: 0.966483
#> 
#> 30 percent done.
#> Metropolis acceptance ratio: 0.964321
#> 
#> 40 percent done.
#> Metropolis acceptance ratio: 0.964491
#> 
#> 50 percent done.
#> Metropolis acceptance ratio: 0.964993
#> 
#> 60 percent done.
#> Metropolis acceptance ratio: 0.964827
#> 
#> 70 percent done.
#> Metropolis acceptance ratio: 0.963709
#> 
#> 80 percent done.
#> Metropolis acceptance ratio: 0.96362
#> 
#> 90 percent done.
#> Metropolis acceptance ratio: 0.964996
#> 
#> 100 percent done.
#> Metropolis acceptance ratio: 0.964796
#> 
#> 
#> ==================== 
#> redist.mcmc(): Automated Redistricting Simulation Using
#>          Markov Chain Monte Carlo
#> 
#> Preprocessing data.
#> 
#> 
#> ==================== 
#> Using redist.rsg() to generate starting values.
#> 
#> 10 percent done.
#> Metropolis acceptance ratio: 0.96997
#> 
#> 20 percent done.
#> Metropolis acceptance ratio: 0.964982
#> 
#> 30 percent done.
#> Metropolis acceptance ratio: 0.969323
#> 
#> 40 percent done.
#> Metropolis acceptance ratio: 0.965491
#> 
#> 50 percent done.
#> Metropolis acceptance ratio: 0.965993
#> 
#> 60 percent done.
#> Metropolis acceptance ratio: 0.963994
#> 
#> 70 percent done.
#> Metropolis acceptance ratio: 0.963852
#> 
#> 80 percent done.
#> Metropolis acceptance ratio: 0.96362
#> 
#> 90 percent done.
#> Metropolis acceptance ratio: 0.964774
#> 
#> 100 percent done.
#> Metropolis acceptance ratio: 0.964896
#> 
#> 
#> ==================== 
#> redist.mcmc(): Automated Redistricting Simulation Using
#>          Markov Chain Monte Carlo
#> 
#> Preprocessing data.
#> 
#> 
#> ==================== 
#> Using redist.rsg() to generate starting values.
#> 
#> 10 percent done.
#> Metropolis acceptance ratio: 0.974975
#> 
#> 20 percent done.
#> Metropolis acceptance ratio: 0.964982
#> 
#> 30 percent done.
#> Metropolis acceptance ratio: 0.964988
#> 
#> 40 percent done.
#> Metropolis acceptance ratio: 0.964241
#> 
#> 50 percent done.
#> Metropolis acceptance ratio: 0.964393
#> 
#> 60 percent done.
#> Metropolis acceptance ratio: 0.965494
#> 
#> 70 percent done.
#> Metropolis acceptance ratio: 0.965424
#> 
#> 80 percent done.
#> Metropolis acceptance ratio: 0.964996
#> 
#> 90 percent done.
#> Metropolis acceptance ratio: 0.964774
#> 
#> 100 percent done.
#> Metropolis acceptance ratio: 0.965097
#> 
#> 
#> ==================== 
#> redist.mcmc(): Automated Redistricting Simulation Using
#>          Markov Chain Monte Carlo
#> 
#> Preprocessing data.
#> 
#> 
#> ==================== 
#> Using redist.rsg() to generate starting values.
#> 
#> 10 percent done.
#> Metropolis acceptance ratio: 0.951952
#> 
#> 20 percent done.
#> Metropolis acceptance ratio: 0.956478
#> 
#> 30 percent done.
#> Metropolis acceptance ratio: 0.95932
#> 
#> 40 percent done.
#> Metropolis acceptance ratio: 0.95999
#> 
#> 50 percent done.
#> Metropolis acceptance ratio: 0.960992
#> 
#> 60 percent done.
#> Metropolis acceptance ratio: 0.960327
#> 
#> 70 percent done.
#> Metropolis acceptance ratio: 0.961423
#> 
#> 80 percent done.
#> Metropolis acceptance ratio: 0.96162
#> 
#> 90 percent done.
#> Metropolis acceptance ratio: 0.961218
#> 
#> 100 percent done.
#> Metropolis acceptance ratio: 0.961896

In unix-based systems, this can be run considerably faster by running this in parallel.

mcmc_chains <- parallel::mclapply(1:nchains, function(x){
          redist.mcmc(adjobj = algdat.pfull$adjlist, 
                      popvec = algdat.pfull$precinct.data$pop, 
                      nsims = nsims,
                      ndists = 3)
}, mc.set.seed = 1, mc.cores = parallel::detectCores())

Back to top

Random Seed Growth {#rsg}

This package also contains an implementation of Chen and Rodden's non-compact Random Seed and Grow redistricting algorithm from their paper Unintentional Gerrymandering: Political Geography and Electoral Biases in Legislatures.

The adjacency list, via adj.list, is created as in Adjacency-Based Redistricting, population is the population of the districts, ndists is the number of districts, and thresh is the allowed population parity. A thresh of 0.05 allows for 5\% population parity. In easier redistricting questions, a small maxiter is typically sufficient, but in large maps, values may be closer to 50,000 to ensure that a solution is found.

rsg <- redist.rsg(adj.list = algdat.pfull$adjlist,
                  population = algdat.pfull$precinct.data$pop,
                  ndists = 3,
                  thresh = 0.05, 
                  maxiter = 5000)
#> 
#> ==================== 
#> redist.rsg(): Automated Redistricting Starts
#> 
#> 
#>  3 districts built using 25 precincts in 0 seconds...

The output of redist.rsg is a list with three objects. district_membership provides a numeric array with indices for which district each precinct belongs to.

rsg$district_membership
#>  [1] 2 1 2 2 1 2 1 2 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 1 1

district_list will provide an alternate formulation of this with ndists arrays, indicating which precincts are in each district.

rsg$district_list
#> [[1]]
#> [1]  9 11 14 13 10  8 12
#> 
#> [[2]]
#>  [1] 15 19 18 17 23 24 16 21  1  4  6 20
#> 
#> [[3]]
#> [1] 22  2  0  7  5  3

Corresponding to this, district_pop will give the district population for each of the ndists districts.

rsg$district_pop
#> [1] 56227 60047 58769

Back to top

Segregation Index {#seg}

To evaluate redistricting plans, the redist package comes with the redist.segcalc function. This computes the segregation index introduced in Massey and Denton 1987.

It takes three arguments, algout, which is a redist object from the various mcmc function or rsg function, grouppop which is the population of the group being used for comparison, and fullpop is the total population of the precinct.

seg <- redist.segcalc(algout = alg_mcmc, 
                      grouppop = algdat.pfull$precinct.data$blackpop,
                      fullpop = algdat.pfull$precinct.data$pop)

This returns a numeric vector with the an entry for each district provided.

Back to top

Diagnostic Plots {#diag}

When using the MCMC algorithms, there are various useful diagnostic plots. The redist.diagplot function creates familiar plots by converting numeric entries into mcmc objects to use with coda.

The first four plots take a single index, such as one created by redist.segcalc.

redist.diagplot(seg, plot = "trace")

plot of chunk unnamed-chunk-33

redist.diagplot(seg, plot = "autocorr")

plot of chunk unnamed-chunk-34

redist.diagplot(seg, plot = "densplot")

plot of chunk unnamed-chunk-35

redist.diagplot(seg, plot = "mean")

plot of chunk unnamed-chunk-36

The final plot requires at least two chains.

seg_chains <- lapply(1:nchains, 
                     function(i){redist.segcalc(algout = mcmc_chains[[i]], 
                                  grouppop = algdat.pfull$precinct.data$blackpop,
                                  fullpop = algdat.pfull$precinct.data$pop)})

redist.diagplot(seg_chains, plot = 'gelmanrubin')

plot of chunk unnamed-chunk-37 Beware that this diagnostic will often fail to converge if there are insufficient iterations. If an error is thrown and no plot is created, try increasing nsims.

Back to top