Introduction

Cliff and Ord (1969), published forty years ago, marked a turning point in the treatment of spatial autocorrelation in quantitative geography. It provided the framework needed by any applied researcher to attempt an implementation for a different system, possibly using a different programming language. In this spirit, here we examine how spatial weights have been represented in implementations and may be reproduced, how the tabulated results in the paper may be reproduced, and how they may be extended to cover simulation.

One of the major assertions of Cliff and Ord (1969) is that their statistic advances the measurement of spatial autocorrelation with respect to Moran (1950) and Geary (1954) because a more general specification of spatial weights could be used. This more general form has implications both for the preparation of the weights themselves, and for the calculation of the measures. We will look at spatial weights first, before moving on to consider the measures presented in the paper and some of their subsequent developments. Before doing this, we will put together a data set matching that used in Cliff and Ord (1969). They provide tabulated data for the counties of the Irish Republic, but omit Dublin from analyses. A shapefile included in this package, kindly made available by Michael Tiefelsdorf, is used as a starting point:

library(spdep)
eire <- as(sf::st_read(system.file("shapes/eire.gpkg", package="spData")[1]), "Spatial")
row.names(eire) <- as.character(eire$names)
#proj4string(eire) <- CRS("+proj=utm +zone=30 +ellps=airy +units=km")
class(eire)
## [1] "SpatialPolygonsDataFrame"
## attr(,"package")
## [1] "sp"
names(eire)
##  [1] "A"       "towns"   "pale"    "size"    "ROADACC" "OWNCONS" "POPCHG" 
##  [8] "RETSALE" "INCOME"  "names"

and read into a SpatialPolygonsDataFrame — classes used for handling spatial data in are fully described in Roger S. Bivand, Pebesma, and Gómez-Rubio (2008). To this we need to add the data tabulated in the paper in Table 2,1 p. 40, here in the form of a text file with added rainfall values from Table 9, p. 49:

fn <- system.file("etc/misc/geary_eire.txt", package="spdep")[1]
ge <- read.table(fn, header=TRUE)
names(ge)
##  [1] "serlet"        "county"        "pagval2_10"    "pagval10_50"  
##  [5] "pagval50p"     "cowspacre"     "ocattlepacre"  "pigspacre"    
##  [9] "sheeppacre"    "townvillp"     "carspcap"      "radiopcap"    
## [13] "retailpcap"    "psinglem30_34" "rainfall"

Since we assigned the county names as feature identifiers when reading the shapefiles, we do the same with the extra data, and combine the objects:

row.names(ge) <- as.character(ge$county)
all.equal(row.names(ge), row.names(eire))
## [1] TRUE
eire_ge <- cbind(eire, ge)

Finally, we need to drop the Dublin county omitted in the analyses conducted in Cliff and Ord (1969):

eire_ge1 <- eire_ge[!(row.names(eire_ge) %in% "Dublin"),]
length(row.names(eire_ge1))
## [1] 25

To double-check our data, let us calculate the sample Beta coefficients, using the formulae given in the paper for sample moments:

skewness <- function(z) {z <- scale(z, scale=FALSE); ((sum(z^3)/length(z))^2)/((sum(z^2)/length(z))^3)}
kurtosis <- function(z) {z <- scale(z, scale=FALSE); (sum(z^4)/length(z))/((sum(z^2)/length(z))^2)}

These differ somewhat from the ways in which skewness and kurtosis are computed in modern statistical software, see for example Joanes and Gill (1998). However, for our purposes, they let us reproduce Table 3, p. 42:

print(sapply(as(eire_ge1, "data.frame")[13:24], skewness), digits=3)
##    pagval2_10   pagval10_50     pagval50p     cowspacre  ocattlepacre 
##      1.675429      1.294978      0.000382      1.682094      0.086267 
##     pigspacre    sheeppacre     townvillp      carspcap     radiopcap 
##      1.138387      1.842362      0.472748      0.011111      0.342805 
##    retailpcap psinglem30_34 
##      0.002765      0.068169
print(sapply(as(eire_ge1, "data.frame")[13:24], kurtosis), digits=4)
##    pagval2_10   pagval10_50     pagval50p     cowspacre  ocattlepacre 
##         3.790         4.331         1.508         4.294         2.985 
##     pigspacre    sheeppacre     townvillp      carspcap     radiopcap 
##         3.754         4.527         2.619         1.865         2.730 
##    retailpcap psinglem30_34 
##         2.188         2.034
print(sapply(as(eire_ge1, "data.frame")[c(13,16,18,19)], function(x) skewness(log(x))), digits=3)
## pagval2_10  cowspacre  pigspacre sheeppacre 
##    0.68801    0.17875    0.00767    0.04184
print(sapply(as(eire_ge1, "data.frame")[c(13,16,18,19)], function(x) kurtosis(log(x))), digits=4)
## pagval2_10  cowspacre  pigspacre sheeppacre 
##      2.883      2.799      2.212      2.421

Using the tabulated value of \(23.6\) for the percentage of agricultural holdings above 50 in 1950 in Waterford, the skewness and kurtosis cannot be reproduced, but by comparison with the irishdata dataset in , it turns out that the value should rather be \(26.6\), which yields the tabulated skewness and kurtosis values.

Before going on, the variables considered are presented in Table \[vars\].

Description of variables in the Geary data set.
variable description
pagval2_10 Percentage number agricultural holdings in valuation group £2–£10 (1950)
pagval10_50 Percentage number agricultural holdings in valuation group £10–£50 (1950)
pagval50p Percentage number agricultural holdings in valuation group above £50 (1950)
cowspacre Milch cows per 1000 acres crops and pasture (1952)
ocattlepacre Other cattle per 1000 acres crops and pasture (1952)
pigspacre Pigs per 1000 acres crops and pasture (1952)
sheeppacre Sheep per 1000 acres crops and pasture (1952)
townvillp Town and village population as percentage of total (1951)
carspcap Private cars registered per 1000 population (1952)
radiopcap Radio licences per 1000 population (1952)
retailpcap Retail sales £ per person (1951)
psinglem30_34 Single males as percentage of all males aged 30–34 (1951)
rainfall Average of rainfall for stations in Ireland, 1916–1950, mm

Spatial weights

As a basis for comparison, we will first read the unstandardised weighting matrix given in Table A1, p. 54, of the paper, reading a file corrected for the misprint giving O rather than D as a neighbour of V:

fn <- system.file("etc/misc/unstand_sn.txt", package="spdep")[1]
unstand <- read.table(fn, header=TRUE)
summary(unstand)
##      from                to                weight        
##  Length:110         Length:110         Min.   :0.000600  
##  Class :character   Class :character   1st Qu.:0.003225  
##  Mode  :character   Mode  :character   Median :0.007550  
##                                        Mean   :0.007705  
##                                        3rd Qu.:0.010225  
##                                        Max.   :0.032400

In the file, the counties are represented by their serial letters, so ordering and conversion to integer index representation is required to reach a representation similar to that of the SpatialStats module for spatial neighbours:

class(unstand) <- c("spatial.neighbour", class(unstand))
of <- ordered(unstand$from)
attr(unstand, "region.id") <- levels(of)
unstand$from <- as.integer(of)
unstand$to <- as.integer(ordered(unstand$to))
attr(unstand, "n") <- length(unique(unstand$from))

Having done this, we can change its representation to a listw object, assigning an appropriate style (generalised binary) for unstandardised values:

lw_unstand <- sn2listw(unstand)
lw_unstand$style <- "B"
lw_unstand
## Characteristics of weights list object:
## Neighbour list object:
## Number of regions: 25 
## Number of nonzero links: 110 
## Percentage nonzero weights: 17.6 
## Average number of links: 4.4 
## 
## Weights style: B 
## Weights constants summary:
##    n  nn     S0         S1        S2
## B 25 625 0.8476 0.01871808 0.1229232

Note that the values of S0, S1, and S2 correspond closely with those given on page 42 of the paper, \(0.84688672\), \(0.01869986\) and \(0.12267319\). The discrepancies appear to be due to rounding in the printed table of weights.

The contiguous neighbours represented in this object ought to match those found using poly2nb. However, we see that the reproduced contiguities have a smaller link count:

nb <- poly2nb(eire_ge1)
nb
## Neighbour list object:
## Number of regions: 25 
## Number of nonzero links: 108 
## Percentage nonzero weights: 17.28 
## Average number of links: 4.32

The missing link is between Clare and Kerry, perhaps by the Tarbert–Killimer ferry, but the counties are not contiguous, as Figure \[plot\_nb\] shows:

xx <- diffnb(nb, lw_unstand$neighbours, verbose=TRUE)
## Neighbour difference for region id: Clare in relation to id: Kerry 
## Neighbour difference for region id: Kerry in relation to id: Clare
plot(eire_ge1, border="grey60")
plot(nb, coordinates(eire_ge1), add=TRUE, pch=".", lwd=2)
plot(xx, coordinates(eire_ge1), add=TRUE, pch=".", lwd=2, col=3)