The package hgwrr is used to calibrate Hierarchical and Geographically Weighted Regression (HGWR) model on spatial data. It requires the spatial hierarchical structure in the data; i.e., samples are grouped by their locations. All the variables are either in the group level or sample level. For the group-level variables, they can have fixed effects (globally constant) or spatially weighted effects (varying with the location). For the sample-level variables, they can have fixed effects or random effects (varying among groups). We note the fixed effects as \(\beta\), the group-level spatially weighted (GLSW) effects as \(\gamma\), and sample-level random (SLR) effects as \(\mu\). The HGWR model consists of these three kinds of effects and estimates the three kinds of effects considering the spatial heterogeneity.
library(hgwrr)
#> Loading required package: sf
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
To calibrate a HGWR model, use the function hgwr()
.
hgwr(
formula, data, ..., bw = "CV",
kernel = c("gaussian", "bisquared"),
alpha = 0.01, eps_iter = 1e-6, eps_gradient = 1e-6,
max_iters = 1e6, max_retries = 1e6,
ml_type = c("D_Only", "D_Beta"), verbose = 0
)
The following is explanation of some important parameters.
formula
This parameter specifies the model form. Recall that the three kinds of effects are GLSW, fixed, and SLR effects. They are specified in different parts of the formula.
In the formula, L()
is used to mark some effects as GLSW
effects, and ( | group)
is used to set the SLR effects and
grouping indicator. Only group-level variables can have GLSW
effects.
data
sf
objects
From version 0.3-1, this parameter supports sf
objects.
In this case, no further arguments in ...
are required.
Here is an example.
data(wuhan.hp)
m_sf <- hgwr(
formula = Price ~ L(d.Water + d.Commercial) + BuildingArea + (Floor.High | group),
data = wuhan.hp,
bw = 299
)
data.frame
objects
If the data is a normal data.frame
object, an extra
argument coords
is required to specify the coordinates of
each group. Note that the row order of coords
needs to
match that of the group
variable. Here is an example.
bw
and kernel
Argument bw
is the bandwidth used to estimate GLSW
effects. It can be either of the following options:
"CV"
letting the algorithm select one.Argument kernel
is the kernel function used to estimate
GLSW effects. Currently, there are only two choices:
"gaussian"
and "bisquared"
.
The output of returned object of hgwr()
shows the
estimates of the effects.
m_df
#> Hierarchical and geographically weighted regression model
#> =========================================================
#> Formula: y ~ L(g1 + g2) + x1 + (z1 | group)
#> Method: Back-fitting and Maximum likelihood
#> Data: multisampling$data
#>
#> Global Fixed Effects
#> -------------------
#> Intercept x1
#> 4.057598 0.966088
#>
#> Group-level Spatially Weighted Effects
#> -------------------
#> Bandwidth: 9.000016 (nearest neighbours)
#> Coefficient Min 1st Quartile Median 3rd Quartile Max
#> Intercept -2.477395 -2.355703 -2.121756 -1.877999 -1.692284
#> g1 5.557560 6.303393 7.248665 7.500396 8.628861
#> g2 -0.868120 0.069715 0.958127 1.318092 1.551972
#>
#> Sample-level Random Effects
#> --------------
#> Groups Name Std.Dev. Corr
#> group Intercept 1.920528
#> z1 1.920528 0.000000
#> Residual 1.920528
#>
#> Other Information
#> -----------------
#> Number of Obs: 484
#> Groups: group , 16
And the summary()
method shows some diagnostic
information.
summary(m_df)
#> Hierarchical and geographically weighted regression model
#> =========================================================
#> Formula: y ~ L(g1 + g2) + x1 + (z1 | group)
#> Method: Back-fitting and Maximum likelihood
#> Data: multisampling$data
#>
#> Parameter Estimates
#> -------------------
#> Fixed effects:
#> Estimated Sd. Err t.val Pr(>|t|)
#> Intercept 4.0575982 0.25449128 15.94396 0 ***
#> x1 0.9660875 0.04676336 20.65907 0 ***
#>
#> GLSW effects:
#> Mean Est. Mean Sd. *** ** * .
#> Intercept -2.103489 0.565900 75.0% 25.0% 0.0% 0.0%
#> g1 6.948511 3.266527 0.0% 6.2% 62.5% 31.2%
#> g2 0.719551 3.296013 0.0% 0.0% 0.0% 0.0%
#>
#> Bandwidth: 9.000016 (nearest neighbours)
#>
#> SLR effects:
#> Groups Name Mean Std.Dev. Corr
#> group Intercept 0.000000 1.920528
#> z1 0.005523 1.920528 0.000000
#> Residual -0.000000 1.920528
#>
#>
#> Diagnostics
#> -----------
#> rsquared 0.647039
#> logLik NaN
#> AIC NaN
#>
#> Scaled Residuals
#> ----------------
#> Min 1Q Median 3Q Max
#> -5.454886 -1.216360 -0.073301 1.247373 5.327542
#>
#> Other Information
#> -----------------
#> Number of Obs: 484
#> Groups: group , 16
The significance level of spatial heterogeneity in GLSW effects can be tested with the following codes.
summary(m_df, test_hetero = T)
#> Hierarchical and geographically weighted regression model
#> =========================================================
#> Formula: y ~ L(g1 + g2) + x1 + (z1 | group)
#> Method: Back-fitting and Maximum likelihood
#> Data: multisampling$data
#>
#> Parameter Estimates
#> -------------------
#> Fixed effects:
#> Estimated Sd. Err t.val Pr(>|t|)
#> Intercept 4.0575982 0.25449128 15.94396 0 ***
#> x1 0.9660875 0.04676336 20.65907 0 ***
#>
#> GLSW effects:
#> Mean Est. Mean Sd. *** ** * .
#> Intercept -2.103489 0.565900 75.0% 25.0% 0.0% 0.0%
#> g1 6.948511 3.266527 0.0% 6.2% 62.5% 31.2%
#> g2 0.719551 3.296013 0.0% 0.0% 0.0% 0.0%
#>
#> GLSW effect spatial heterogeneity:
#> Sd. Est. t0 Min. t Max. t Pr(t>t0)
#> Intercept 0.268012 0.253645 0.044126 0.255320 0.000400 ***
#> g1 0.854396 0.809485 0.152553 0.814342 0.000600 ***
#> g2 0.722266 0.691518 0.109949 0.685669 0.000000 ***
#>
#> Bandwidth: 9.000016 (nearest neighbours)
#>
#> SLR effects:
#> Groups Name Mean Std.Dev. Corr
#> group Intercept 0.000000 1.920528
#> z1 0.005523 1.920528 0.000000
#> Residual -0.000000 1.920528
#>
#>
#> Diagnostics
#> -----------
#> rsquared 0.647039
#> logLik NaN
#> AIC NaN
#>
#> Scaled Residuals
#> ----------------
#> Min 1Q Median 3Q Max
#> -5.454886 -1.216360 -0.073301 1.247373 5.327542
#>
#> Other Information
#> -----------------
#> Number of Obs: 484
#> Groups: group , 16
Some other methods are provided.
head(coef(m_df))
#> Intercept g1 g2 x1 z1
#> 1 0.8738330 7.721070 0.06175177 0.9660875 -0.1931416
#> 2 3.3857847 6.335641 1.52880953 0.9660875 0.2870040
#> 3 0.8175705 7.476406 0.84953498 0.9660875 1.2706663
#> 4 2.4934172 7.398369 0.86755061 0.9660875 -0.5998743
#> 5 5.9128656 8.628861 -0.86811997 0.9660875 -1.4658340
#> 6 0.6129129 6.271144 1.35855541 0.9660875 0.6958578
head(fitted(m_df))
#> [1] 2.6579678 2.5879538 3.9085387 0.8865106 3.9384948 2.6547040
head(residuals(m_df))
#> [1] -1.4267714 0.1274905 -1.9104633 2.8806623 0.5553585 -1.9290357
The following papers shows more details about the mathematical basis about the HGWR model.