Moran’s I rescaling

Ivan Fuentes, Thomas Dewiit, Thomas Ioerger, Michael Bishop

2019-04-15

Many Geographical Analysis utilizes spatial autocorrelation, that allows us to study the geographical evolution from different points of view. One measurement for spatial autocorrelation is Moran’s I, that is based on Pearson’s correlation coefficient in general statistics (Chen 2009)

Performing the Analysis

This package offers a straight fordward to perform the whole analisys by using the function rescaleI which requires an input file with a specific format you can see it at Loading data section

library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
data<-loadFile(fileInput)
scaledI<-rescaleI(data,samples=1000, scalingUpTo="MaxMin")

fn = file.path(tempdir(),"output.csv",fsep = .Platform$file.sep)
saveFile(fn,scaledI)
if (file.exists(fn)) 
  #Delete file if it exists
  file.remove(fn)
#> [1] TRUE

Analysis Step by Step

The analysis can be done following the steps

Loading data

The input file1 should have the following format.

fileInput<-system.file("testdata", "chen.csv", package="Irescale")
head(read.csv(fileInput))
#>           City Latitude Longitude Population
#> 1      Beijing 39.90420  116.4074    9496688
#> 2      Tianjin 39.34336  117.3616    5313702
#> 3 Shijiazhuang 39.34336  117.3616    1930579
#> 4      Taiyuan 37.87059  112.5489    2538336
#> 5      Hohehot 40.82192  111.6581     990954
#> 6     Shenyang 41.80570  123.4315    4344933

To load data to performe the analysis is quite simple. The function loadFile provides the interface to make it. loadFile returns a list with two variables, data and varOfInterest, the first one represents a vector with latitude and longitude; varOfInterest is a matrix with all the measurements from the field.

library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
head(input$data)
#>              Latitude Longitude
#> Beijing      39.90420  116.4074
#> Tianjin      39.34336  117.3616
#> Shijiazhuang 39.34336  117.3616
#> Taiyuan      37.87059  112.5489
#> Hohehot      40.82192  111.6581
#> Shenyang     41.80570  123.4315
head(input$varOfInterest)
#>              Population
#> Beijing         9496688
#> Tianjin         5313702
#> Shijiazhuang    1930579
#> Taiyuan         2538336
#> Hohehot          990954
#> Shenyang        4344933

If the data has a chessboard shape,the file is organized in rows and columns, where the rows represent latitute and columns longitude, the measurements are in the cell. The function loadChessBoard can be used to load into the analysis.

library(Irescale)
fileInput<-"../inst/testdata/chessboard.csv"
input<-loadChessBoard(fileInput)
#> [1] 21
head(input$data)
#>      [,1] [,2]
#> [1,]    1    1
#> [2,]    1    2
#> [3,]    1    3
#> [4,]    1    4
#> [5,]    1    5
#> [6,]    1    6
head(input$varOfInterest)
#> [1] 1 1 1 1 1 1

Calculate Distance

Once the data is loaded, The distance matrix, the distance between all the points might be calcualted. The distance can be calculated using `calculateEuclideanDistance’ if the points are taken in a geospatial location.

library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
distM[1:5,1:5]
#>          [,1]     [,2]     [,3]     [,4]     [,5]
#> [1,] 0.000000 1.106865 1.106865 4.361616 4.837197
#> [2,] 1.106865 0.000000 0.000000 5.033068 5.892130
#> [3,] 1.106865 0.000000 0.000000 5.033068 5.892130
#> [4,] 4.361616 5.033068 5.033068 0.000000 3.082845
#> [5,] 4.837197 5.892130 5.892130 3.082845 0.000000

If the data is taken from a chessboard a like field, the Manhattan distance can be used.

library(Irescale)
fileInput<-"../inst/testdata/chessboard.csv"
input<-loadChessBoard(fileInput)
#> [1] 21
distM<-calculateManhattanDistance(input$data)
distM[1:5,1:5]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    1    2    3    4
#> [2,]    1    0    1    2    3
#> [3,]    2    1    0    1    2
#> [4,]    3    2    1    0    1
#> [5,]    4    3    2    1    0

Calculate Weighted Distance Matrix

The weighted distance matrix can be calculated it using the function calculateWeightedDistMatrix, however it is not required to do it, because ‘calculateMoranI’ does it.

library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
distW<-calculateWeightedDistMatrix(distM)
distW[1:5,1:5]
#>             [,1]        [,2]        [,3]        [,4]        [,5]
#> [1,] 0.000000000 0.009745774 0.009745774 0.002473224 0.002230063
#> [2,] 0.009745774 0.000000000 0.000000000 0.002143276 0.001830791
#> [3,] 0.009745774 0.000000000 0.000000000 0.002143276 0.001830791
#> [4,] 0.002473224 0.002143276 0.002143276 0.000000000 0.003499124
#> [5,] 0.002230063 0.001830791 0.001830791 0.003499124 0.000000000

Moran’s I

It is time to calculate the spatial autocorrelation statistic Morans’ I. The function calcualteMoranI, which requires the distance matrix, and the variable you want are interested on.

library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
I<-calculateMoranI(distM = distM,varOfInterest = input$varOfInterest)
I
#> [1] -0.04800907

Resampling Method for I

The scaling process is made using Monte Carlo resampling method. The idea is to shuffle the values and recalculate I for at least 1000 times. In the code below, after resampling the value of I, a set of statistics are calculated for that generated vector.

library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
I<-calculateMoranI(distM = distM,varOfInterest = input$varOfInterest)
vI<-resamplingI(1000,distM, input$varOfInterest) # This is the permutation
statsVI<-summaryVector(vI)
statsVI
#> $mean
#> [1] -0.03437608
#> 
#> $sd
#> [1] 0.03668554
#> 
#> $max
#> [1] 0.107634
#> 
#> $min
#> [1] -0.1414878
#> 
#> $Q1
#>       0.1% 
#> -0.1318318 
#> 
#> $Q99
#>    99.99% 
#> 0.1069136 
#> 
#> $median
#> [1] -0.03815818
#> 
#> $skew
#> [1] 0.556846
#> 
#> $kurt
#> [1] 0.9046789

Plotting Distribution (Optional)

To see how the value of I is distribuited, the method plotHistogramOverlayNormal provides the functionality to get a histogram of the vector generated by resampling with a theorical normal distribution overlay.

library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
I<-calculateMoranI(distM = distM,varOfInterest = input$varOfInterest)
vI<-resamplingI(1000,distM, input$varOfInterest) # This is the permutation
statsVI<-summaryVector(vI)
plotHistogramOverlayNormal(vI,statsVI, main=colnames(input$varOfInterest))

Rescaling I

Once we have calculated the null distribution via resampling, you need to scale by centering and streching. The method iCorrection, return an object with the resampling vector rescaled, and all the summary for this vector, the new value of I is returned in a variable named newI

library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
I<-calculateMoranI(distM = distM,varOfInterest = input$varOfInterest)
vI<-resamplingI(1000,distM, input$varOfInterest) # This is the permutation
statsVI<-summaryVector(vI)
corrections<-iCorrection(I,vI)
corrections$newI
#>       0.1% 
#> -0.0849232

Calculate P-value

In order to provide a significance to this new value, you can calculate the pvalue using the method calculatePvalue. This method requires the scaled vector, you get this vector,scaledData, the scaled I, newI and the mean of the scaledData.

library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
distM<-calculateEuclideanDistance(input$data)
I<-calculateMoranI(distM = distM,varOfInterest = input$varOfInterest)
vI<-resamplingI(1000,distM, input$varOfInterest) # This is the permutation
statsVI<-summaryVector(vI)
corrections<-iCorrection(I,vI)
pvalueIscaled<-calculatePvalue(corrections$scaledData,corrections$newI,corrections$summaryScaledD$mean)
pvalueIscaled
#> [1] 0.3466533

Stability Analysis

In order to determine how many iterations it is necessary to run the resampling method, it is possible to run a stability analysis. This function draw a chart in log scale (10^x) of the number of interations needed to achieve the stability in the Monte Carlo simulation.

library(Irescale)
fileInput<-system.file("testdata", "chen.csv", package="Irescale")
input<-loadFile(fileInput)
resultsChen<-buildStabilityTable(data=input, times=100, samples=1000, plots=TRUE)
#>                 [,1]         [,2]        [,3]
#> 0.1%   -0.1100477170 -0.033855533 -0.09094158
#> 0.1%   -0.7968325914 -0.173691009 -0.10946002
#> 0.1%   -0.0476478128 -0.164981137 -0.11317752
#> 0.1%   -0.1852989922 -0.115728189 -0.09787173
#> 0.1%   -0.0273691571 -0.110433173 -0.10659597
#> 0.1%   -0.3590361431 -0.162301028 -0.09331094
#> 0.1%   -0.4117878616 -0.160768187 -0.10942879
#> 0.1%   -0.1023578850 -0.067556652 -0.09535132
#> 0.1%   -0.4602703949 -0.136926079 -0.09108123
#> 0.1%   -0.4824180440 -0.103241378 -0.08069736
#> 0.1%   -0.3313552186 -0.131398209 -0.11716722
#> 99.99%  0.0305069059 -0.100187787 -0.07723382
#> 99.99%  0.0002159302 -0.097410923 -0.09709291
#> 0.1%   -0.1116537410 -0.137310020 -0.08796895
#> 0.1%   -0.0857060639 -0.075205566 -0.10854973
#> 0.1%   -0.4576531108 -0.086573152 -0.09292019
#> 99.99%  0.0255257161 -0.135001911 -0.09366838
#> 0.1%   -0.2386937242 -0.083884400 -0.09068054
#> 0.1%   -0.2544993627 -0.090496332 -0.09356458
#> 0.1%   -0.0365540361 -0.125575830 -0.11837364
#> 0.1%   -0.3111403148 -0.042682921 -0.11078670
#> 0.1%   -0.1910962980 -0.123896594 -0.10356858
#> 0.1%   -0.0584203895 -0.129234100 -0.11364286
#> 99.99%  0.0116356107 -0.105845842 -0.09621248
#> 0.1%   -0.3584862168 -0.034779862 -0.08402458
#> 99.99%  0.0435661907 -0.107405508 -0.09040584
#> 99.99%  0.0079472155 -0.140866913 -0.12642794
#> 0.1%   -0.3321632063 -0.151439904 -0.06427120
#> 0.1%   -0.6241906384 -0.126248582 -0.09291878
#> 0.1%   -0.3867733215 -0.089026630 -0.09388036
#> 0.1%   -0.8242046886 -0.076292721 -0.14307526
#> 0.1%   -0.1680040082 -0.079380450 -0.12269651
#> 0.1%   -0.3428702932 -0.154759154 -0.10424134
#> 0.1%   -0.1169670558 -0.109509699 -0.12272626
#> 0.1%   -0.5132613179 -0.071259365 -0.09675470
#> 0.1%   -0.3370013371 -0.182135249 -0.09530322
#> 0.1%   -0.3133688529 -0.116111497 -0.08627645
#> 0.1%   -0.8392213274 -0.099071076 -0.09865377
#> 0.1%   -0.2518294981 -0.194323558 -0.08478823
#> 0.1%   -0.2904034001 -0.166274979 -0.10593781
#> 0.1%   -1.0050445689 -0.178921538 -0.10300357
#> 0.1%   -0.3289415251 -0.107032737 -0.11605746
#> 0.1%   -0.0072018406 -0.176453734 -0.10769863
#> 0.1%   -0.3700020156 -0.086269135 -0.10496448
#> 0.1%   -0.0361989848 -0.003229186 -0.07486909
#> 0.1%   -0.4917804890 -0.157707506 -0.10106833
#> 0.1%   -0.1015856819 -0.118226074 -0.09677565
#> 0.1%   -0.5405311301 -0.212889833 -0.09811767
#> 0.1%   -0.0862715749 -0.083248065 -0.09558480
#> 99.99%  0.0306692226 -0.216263210 -0.12057079
#> 99.99%  0.0382505904 -0.108057286 -0.08047858
#> 99.99%  0.3776285202 -0.155906293 -0.11405303
#> 99.99%  0.3709634499 -0.047894557 -0.11041672
#> 0.1%   -0.1582968242 -0.111203480 -0.12205529
#> 0.1%   -0.2882743070 -0.161388571 -0.09705002
#> 0.1%   -0.1822940521 -0.037365334 -0.07895892
#> 99.99%  0.0887863187 -0.090731160 -0.12072753
#> 99.99%  0.0337398380 -0.141378450 -0.11008424
#> 0.1%   -0.4250872285 -0.050993763 -0.07916848
#> 0.1%   -0.4349650411 -0.181815168 -0.10806329
#> 0.1%   -0.1544981507 -0.114460853 -0.10839030
#> 0.1%   -0.4027638995 -0.130938178 -0.12687433
#> 0.1%   -0.3661589236 -0.140572670 -0.10384686
#> 0.1%   -0.4870259274 -0.126881964 -0.09371733
#> 0.1%   -0.2645200760 -0.218031665 -0.12087870
#> 0.1%   -1.0031343207 -0.222720363 -0.10381323
#> 99.99%  0.0115494639 -0.134156070 -0.12606699
#> 0.1%   -0.0007892366 -0.152154039 -0.11290709
#> 99.99%  0.0239598851 -0.117469163 -0.12501232
#> 0.1%   -0.1298766707 -0.063093079 -0.10241842
#> 0.1%   -0.1717180015 -0.140316795 -0.11672135
#> 0.1%   -0.7345547397 -0.133069132 -0.08075124
#> 0.1%   -0.5297070794 -0.141109504 -0.11804873
#> 0.1%   -0.2029398618 -0.145820626 -0.12040386
#> 0.1%   -0.1203454149 -0.156517300 -0.10190200
#> 0.1%   -0.1252047063 -0.186799641 -0.11581310
#> 0.1%   -0.1813474447 -0.125610211 -0.08180902
#> 0.1%   -0.4027390493 -0.066856235 -0.08605978
#> 0.1%   -0.8754498510 -0.084794603 -0.09380929
#> 99.99%  0.0551517626 -0.151426355 -0.12584170
#> 99.99%  0.0293693718 -0.112848664 -0.09559771
#> 99.99%  0.0006223942 -0.137205916 -0.12087375
#> 0.1%   -1.0005928534 -0.153319049 -0.11318207
#> 0.1%   -0.6763751255 -0.277558150 -0.08475437
#> 0.1%   -0.0651677707 -0.153356010 -0.07823124
#> 99.99%  0.0327260766 -0.163401154 -0.10474311
#> 0.1%   -0.2147541984 -0.012302225 -0.08805698
#> 0.1%   -0.0028146067 -0.139321953 -0.12967728
#> 0.1%   -0.0902258889 -0.025472230 -0.07555643
#> 0.1%   -0.2663921409 -0.118079245 -0.10334561
#> 0.1%   -0.2675010000 -0.194982214 -0.10763511
#> 0.1%   -0.0879999005 -0.128910699 -0.09047845
#> 0.1%   -0.6682638502 -0.209776977 -0.09400441
#> 0.1%   -0.8107494801 -0.240484640 -0.10014187
#> 0.1%   -0.6439780109 -0.082516100 -0.12792346
#> 0.1%   -0.3064404483 -0.185804878 -0.12379360
#> 0.1%   -0.5104805736 -0.161198984 -0.12144691
#> 0.1%   -0.1048352202 -0.132601613 -0.13202342
#> 0.1%   -0.0576266942 -0.065238739 -0.08449641
#> 0.1%   -0.2829310938 -0.152265959 -0.09306511

References

Chen, Yan-guang. 2009. “Reconstructing the Mathematical Process of Spatial Autocorrelation Based on Moran’s Statistics.” Geographic Research 28 (6): 1449–63.


  1. The data used in this example is taken from (Chen 2009).