Estimation of error components models with the plm function

Yves Croissant

2019-09-06

plm is a very versatile function which enable the estimation of a wide range of error component models. Those models can be written as follow :

\[ y_{nt}=\alpha + \beta^\top x_{nt} + \epsilon_{nt} = \alpha + \beta^\top x_{nt} + \eta_n + \mu_t + \nu_{nt} \]

where \(n\) and \(t\) are the individual and time indexes, \(y\) the response, \(x\) a vector of covariates, \(\alpha\) the overall intercept and \(\beta\) the vector of parameters of interest that we are willing to estimate. The error term is composed of three elements :

Basic use of plm

The first two arguments of plm are, like for most of the estimation functions of R a formula which describes the model to be estimated and a data.frame. subset, weights and na.action are also available and have the same behavior as in the lm function. Three more main arguments can be set :

The estimation of all but the last model is straightforward, as it requires only the estimation by OLS of obvious transformations of the data. The GLS model requires more explanation. In most of the cases, the estimation is obtained by quasi-differencing the data from the individual and/or the time means. The coefficients used to perform this quasi-difference depends on estimators of the variance of the components of the error, namely \(\sigma^2_\nu\), \(\sigma^2_\eta\) in case of individual effects and \(\sigma^2_\mu\) in case of time effects.

The most common technique used to estimate these variance is to use the following result :

\[ \frac{\mbox{E}(\epsilon^\top W \epsilon)}{N(T-1)} = \sigma_\nu^2 \]

and

\[ \frac{\mbox{E}(\epsilon^\top B \epsilon)}{N} = T \sigma_\eta^2 + \sigma_\nu^2 \]

where \(B\) and \(W\) are respectively the matrices that performs the individual (or time) means and the deviations from these means. Consistent estimators can be obtained by replacing the unknown errors by the residuals of a consistent preliminary estimation and by dropping the expecting value operator. Some degree of freedom correction can also be introduced. plm calls the general function ercomp to estimate the variances. Important arguments to ercomp are:

Note that when only one model is provided in models, this means that the same residuals are used to compute the two quadratic forms.

To illustrate the use of plm, we use examples reproduced in Baltagi (2013). Table 2.1 on page 21 presents EViews’ results of the estimation on the Grunfeld data set :

library("plm")
data("Grunfeld", package = "plm")
ols <- plm(inv ~ value + capital, Grunfeld, model = "pooling")
between <- update(ols, model = "between")
within <- update(ols, model = "within")
walhus <- update(ols, model = "random", random.method = "walhus", random.dfcor = 3)
amemiya <- update(walhus, random.method = "amemiya")
swar <- update(amemiya, random.method = "swar")

Note that the random.dfcor argument is set to 3, which means that the unbiased version of the estimation of the error components is used. We use the texreg package to present the results :

library("texreg")
screenreg(list(ols = ols, between = between, within = within, 
            walhus = walhus, amemiya = amemiya, swar = swar),
        digits = 5, omit.coef = "(Intercept)")
## 
## =================================================================================================
##            ols            between      within         walhus         amemiya        swar         
## -------------------------------------------------------------------------------------------------
## value        0.11556 ***   0.13465 **    0.11012 ***    0.10979 ***    0.10978 ***    0.10978 ***
##             (0.00584)     (0.02875)     (0.01186)      (0.01052)      (0.01048)      (0.01049)   
## capital      0.23068 ***   0.03203       0.31007 ***    0.30818 ***    0.30808 ***    0.30811 ***
##             (0.02548)     (0.19094)     (0.01735)      (0.01717)      (0.01718)      (0.01718)   
## -------------------------------------------------------------------------------------------------
## R^2          0.81241       0.85777       0.76676        0.76941        0.76954        0.76950    
## Adj. R^2     0.81050       0.81713       0.75311        0.76707        0.76720        0.76716    
## Num. obs.  200            10           200            200            200            200          
## s_idios                                                53.74518       52.76797       52.76797    
## s_id                                                   87.35803       83.52354       84.20095    
## =================================================================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

The estimated variance can be extracted using the ercomp function. For example, for the amemiya model :

ercomp(amemiya)
##                   var std.dev share
## idiosyncratic 2784.46   52.77 0.285
## individual    6976.18   83.52 0.715
## theta: 0.8601

Baltagi (2013), p. 27, presents the Stata estimation of the Swamy-Arora estimator ; the Swamy-Arora estimator is the same if random.dfcor is set to 3 or 2 (the quadratic forms are divided by \(\sum_n T_n - K - N\) and by \(N - K - 1\)), so I don’t know what is the behaviour of Stata for the other estimators for which the unbiased estimators differs from the simple one.

data("Produc", package = "plm")
PrSwar <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, Produc, 
           model = "random", random.method = "swar", random.dfcor = 3)
summary(PrSwar)
## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, 
##     data = Produc, model = "random", random.method = "swar", 
##     random.dfcor = 3)
## 
## Balanced Panel: n = 48, T = 17, N = 816
## 
## Effects:
##                    var  std.dev share
## idiosyncratic 0.001454 0.038137 0.175
## individual    0.006838 0.082691 0.825
## theta: 0.8888
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -0.1067230 -0.0245520 -0.0023694  0.0217333  0.1996307 
## 
## Coefficients:
##                Estimate  Std. Error z-value  Pr(>|z|)    
## (Intercept)  2.13541100  0.13346149 16.0002 < 2.2e-16 ***
## log(pcap)    0.00443859  0.02341732  0.1895    0.8497    
## log(pc)      0.31054843  0.01980475 15.6805 < 2.2e-16 ***
## log(emp)     0.72967053  0.02492022 29.2803 < 2.2e-16 ***
## unemp       -0.00617247  0.00090728 -6.8033 1.023e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    29.209
## Residual Sum of Squares: 1.1879
## R-Squared:      0.95933
## Adj. R-Squared: 0.95913
## Chisq: 19131.1 on 4 DF, p-value: < 2.22e-16

The twoways effect model

The two-ways effect model is obtained by setting the effect argument to "twoways". Baltagi (2013) p. 44-46, presents EViews’ output for the Grunfeld data set.

Grw <- plm(inv ~ value + capital, Grunfeld, model = "random", effect = "twoways", 
           random.method = "walhus", random.dfcor = 3)
Grs <- update(Grw, random.method = "swar")
Gra <- update(Grw, random.method = "amemiya")
screenreg(list("Wallace-Hussain" = Grw, "Swamy-Arora" = Grs, "Amemiya" = Gra), digits = 5)
## 
## ==========================================================
##              Wallace-Hussain  Swamy-Arora    Amemiya      
## ----------------------------------------------------------
## (Intercept)  -57.81705 *      -57.86538 *    -63.89217 *  
##              (28.63258)       (29.39336)     (30.53284)   
## value          0.10978 ***      0.10979 ***    0.11145 ***
##               (0.01047)        (0.01053)      (0.01096)   
## capital        0.30807 ***      0.30819 ***    0.32353 ***
##               (0.01719)        (0.01717)      (0.01877)   
## ----------------------------------------------------------
## s_idios       55.33298         51.72452       51.72452    
## s_id          87.31428         84.23332       89.26257    
## s_time         0.00000          0.00000       15.77783    
## R^2            0.76956          0.76940        0.74898    
## Adj. R^2       0.76722          0.76706        0.74643    
## Num. obs.    200              200            200          
## ==========================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

The estimated variance of the time component is negative for the Wallace-Hussain as well as the Swamy-Arora models and plm sets it to 0.

Baltagi (2009) p. 60-62, presents EViews’ output for the Produc data.

data("Produc", package = "plm")
Prw <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, Produc, 
           model = "random", random.method = "walhus", 
           effect = "twoways", random.dfcor = 3)
Prs <- update(Prw, random.method = "swar")
Pra <- update(Prw, random.method = "amemiya")
screenreg(list("Wallace-Hussain" = Prw, "Swamy-Arora" = Prs, "Amemiya" = Pra), digits = 5)
## 
## ==========================================================
##              Wallace-Hussain  Swamy-Arora    Amemiya      
## ----------------------------------------------------------
## (Intercept)    2.39200 ***      2.36350 ***    2.85210 ***
##               (0.13833)        (0.13891)      (0.18502)   
## log(pcap)      0.02562          0.01785        0.00221    
##               (0.02336)        (0.02332)      (0.02469)   
## log(pc)        0.25781 ***      0.26559 ***    0.21666 ***
##               (0.02128)        (0.02098)      (0.02438)   
## log(emp)       0.74180 ***      0.74490 ***    0.77005 ***
##               (0.02371)        (0.02411)      (0.02584)   
## unemp         -0.00455 ***     -0.00458 ***   -0.00398 ***
##               (0.00106)        (0.00102)      (0.00108)   
## ----------------------------------------------------------
## s_idios        0.03571          0.03429        0.03429    
## s_id           0.08244          0.08279        0.15390    
## s_time         0.01595          0.00984        0.02608    
## R^2            0.92915          0.93212        0.85826    
## Adj. R^2       0.92880          0.93178        0.85756    
## Num. obs.    816              816            816          
## ==========================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

Unbalanced panels

Two difficulties arise with unbalanced panels :

Baltagi (2013) and Baltagi (2009) present results of the estimation of the Swamy and Arora (1972) model with the Hedonic data set. Baltagi (2013), p. 174, presents the Stata output as Baltagi (2009), p. 211 presents EViews’ output. EViews’ Wallace-Hussain estimator is reported in Baltagi (2009), p. 210.

data("Hedonic", package = "plm")
form <- mv ~ crim + zn + indus + chas + nox + rm + 
    age + dis + rad + tax + ptratio + blacks + lstat
HedStata <- plm(form, Hedonic, model = "random", index = "townid", 
                random.models = c("within", "between"))
HedEviews <- plm(form, Hedonic, model = "random", index = "townid", 
                 random.models = c("within", "Between"))
HedEviewsWH <- update(HedEviews, random.models = "pooling")
screenreg(list(EViews = HedEviews, Stata = HedStata, "Wallace-Hussain" = HedEviewsWH), 
          digits = 5, single.row = TRUE)
## 
## ======================================================================================
##              EViews                   Stata                    Wallace-Hussain        
## --------------------------------------------------------------------------------------
## (Intercept)    9.68587 (0.19751) ***    9.67780 (0.20714) ***    9.68443 (0.19922) ***
## crim          -0.00741 (0.00105) ***   -0.00723 (0.00103) ***   -0.00738 (0.00105) ***
## zn             0.00008 (0.00065)        0.00004 (0.00069)        0.00007 (0.00066)    
## indus          0.00156 (0.00403)        0.00208 (0.00434)        0.00165 (0.00409)    
## chasyes       -0.00442 (0.02921)       -0.01059 (0.02896)       -0.00565 (0.02916)    
## nox           -0.00584 (0.00125) ***   -0.00586 (0.00125) ***   -0.00585 (0.00125) ***
## rm             0.00906 (0.00119) ***    0.00918 (0.00118) ***    0.00908 (0.00119) ***
## age           -0.00086 (0.00047)       -0.00093 (0.00046) *     -0.00087 (0.00047)    
## dis           -0.14442 (0.04409) **    -0.13288 (0.04568) **    -0.14236 (0.04439) ** 
## rad            0.09598 (0.02661) ***    0.09686 (0.02835) ***    0.09614 (0.02692) ***
## tax           -0.00038 (0.00018) *     -0.00037 (0.00019) *     -0.00038 (0.00018) *  
## ptratio       -0.02948 (0.00907) **    -0.02972 (0.00975) **    -0.02951 (0.00919) ** 
## blacks         0.56278 (0.10197) ***    0.57506 (0.10103) ***    0.56520 (0.10179) ***
## lstat         -0.29107 (0.02393) ***   -0.28514 (0.02385) ***   -0.28991 (0.02391) ***
## --------------------------------------------------------------------------------------
## s_idios        0.13025                  0.13025                  0.14050              
## s_id           0.11505                  0.12974                  0.12698              
## R^2            0.99091                  0.99029                  0.99081              
## Adj. R^2       0.99067                  0.99004                  0.99057              
## Num. obs.    506                      506                      506                    
## ======================================================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

The difference is due to the fact that Stata uses a between regression on \(N\) observations while EViews uses a between regression on \(\sum_n T_n\) observations, which are not the same on unbalanced panels. Note the use of between with or without the B capitalized in the random.models argument. plm’s default is to use the between regression with \(\sum_n T_n\) observations when setting model = "random", random.method = "swar". The default employed is what the original paper for the unbalanced one-way Swamy-Arora estimator defined (in Baltagi and Chang (1994), p. 73). A more detailed analysis of Stata’s Swamy-Arora estimation procedure is given by Cottrell (2017).

Instrumental variable estimators

All of the models presented above may be estimated using instrumental variables (IV). The instruments are specified using two- or three-part formulas, each part being separated by a | sign :

The instrumental variables estimator used is indicated with the inst.method argument:

The various possible values of the inst.method argument are not relevant for fixed effect IV models as there is only one method for this type of IV models but many for random effect IV models.

The instrumental variable estimators are illustrated in the following example from Baltagi (2005), p. 120/ Baltagi (2013), p. 137.

data("Crime", package = "plm")
crbalt <- plm(lcrmrte ~ lprbarr + lpolpc + lprbconv + lprbpris + lavgsen +
              ldensity + lwcon + lwtuc + lwtrd + lwfir + lwser + lwmfg + lwfed +
              lwsta + lwloc + lpctymle + lpctmin + region + smsa + factor(year)
              | . - lprbarr - lpolpc + ltaxpc + lmix,
              data = Crime, model = "random", inst.method = "baltagi")
crbvk <- update(crbalt, inst.method = "bvk")
crwth <- update(crbalt, model = "within")
screenreg(list(FE2SLS = crwth, EC2SLS = crbalt, G2SLS = crbvk), 
          single.row = TRUE, digits = 5, omit.coef = "(region)|(year)",
          reorder.coef = c(1:16, 19, 18, 17))
## 
## ==================================================================================
##              FE2SLS               EC2SLS                   G2SLS                  
## ----------------------------------------------------------------------------------
## lprbarr       -0.57551 (0.80218)   -0.41293 (0.09740) ***   -0.41414 (0.22105)    
## lpolpc         0.65753 (0.84687)    0.43475 (0.08970) ***    0.50495 (0.22778) *  
## lprbconv      -0.42314 (0.50194)   -0.32289 (0.05355) ***   -0.34325 (0.13246) ** 
## lprbpris      -0.25026 (0.27946)   -0.18632 (0.04194) ***   -0.19005 (0.07334) ** 
## lavgsen        0.00910 (0.04899)   -0.01018 (0.02702)       -0.00644 (0.02894)    
## ldensity       0.13941 (1.02124)    0.42903 (0.05485) ***    0.43434 (0.07115) ***
## lwcon         -0.02873 (0.05351)   -0.00748 (0.03958)       -0.00430 (0.04142)    
## lwtuc          0.03913 (0.03086)    0.04545 (0.01979) *      0.04446 (0.02154) *  
## lwtrd         -0.01775 (0.04531)   -0.00814 (0.04138)       -0.00856 (0.04198)    
## lwfir         -0.00934 (0.03655)   -0.00364 (0.02892)       -0.00403 (0.02946)    
## lwser          0.01859 (0.03882)    0.00561 (0.02013)        0.01056 (0.02158)    
## lwmfg         -0.24317 (0.41955)   -0.20414 (0.08044) *     -0.20180 (0.08394) *  
## lwfed         -0.45134 (0.52712)   -0.16351 (0.15945)       -0.21346 (0.21510)    
## lwsta         -0.01875 (0.28082)   -0.05405 (0.10568)       -0.06012 (0.12031)    
## lwloc          0.26326 (0.31239)    0.16305 (0.11964)        0.18354 (0.13968)    
## lpctymle       0.35112 (1.01103)   -0.10811 (0.13969)       -0.14587 (0.22681)    
## smsayes                            -0.22515 (0.11563)       -0.25955 (0.14997)    
## lpctmin                             0.18904 (0.04150) ***    0.19488 (0.04594) ***
## (Intercept)                        -0.95380 (1.28397)       -0.45385 (1.70298)    
## ----------------------------------------------------------------------------------
## R^2            0.44364              0.59847                  0.59230              
## Adj. R^2       0.32442              0.58115                  0.57472              
## Num. obs.    630                  630                      630                    
## s_idios                             0.14924                  0.14924              
## s_id                                0.21456                  0.21456              
## ==================================================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

The Hausman-Taylor model (Hausman and Taylor (1981)) may be estimated with the plm function by setting argument random.method = "ht". The following example is from Baltagi (2005), p. 130 and Baltagi (2013), p. 146.

data("Wages", package = "plm")
ht <- plm(lwage ~ wks + south + smsa + married + exp + I(exp^2) + 
            bluecol + ind + union + sex + black + ed | 
            bluecol + south + smsa + ind + sex + black | 
            wks + married + exp + I(exp^2) + union, 
          data = Wages, index = 595, 
          inst.method = "baltagi", model = "random", 
          random.method = "ht")

am <- update(ht, inst.method = "am")
  
screenreg(list("Hausman-Taylor" = ht, "Amemiya-MaCurdy" = am), 
          digits = 5, single.row = TRUE)
## 
## ===============================================================
##              Hausman-Taylor            Amemiya-MaCurdy         
## ---------------------------------------------------------------
## (Intercept)     2.91273 (0.28365) ***     2.92734 (0.27513) ***
## wks             0.00084 (0.00060)         0.00084 (0.00060)    
## southyes        0.00744 (0.03196)         0.00728 (0.03194)    
## smsayes        -0.04183 (0.01896) *      -0.04195 (0.01895) *  
## marriedyes     -0.02985 (0.01898)        -0.03009 (0.01897)    
## exp             0.11313 (0.00247) ***     0.11297 (0.00247) ***
## I(exp^2)       -0.00042 (0.00005) ***    -0.00042 (0.00005) ***
## bluecolyes     -0.02070 (0.01378)        -0.02085 (0.01377)    
## ind             0.01360 (0.01524)         0.01363 (0.01523)    
## unionyes        0.03277 (0.01491) *       0.03248 (0.01489) *  
## sexfemale      -0.13092 (0.12666)        -0.13201 (0.12660)    
## blackyes       -0.28575 (0.15570)        -0.28590 (0.15549)    
## ed              0.13794 (0.02125) ***     0.13720 (0.02057) ***
## ---------------------------------------------------------------
## s_idios         0.15180                   0.15180              
## s_id            0.94180                   0.94180              
## R^2             0.60945                   0.60948              
## Adj. R^2        0.60833                   0.60835              
## Num. obs.    4165                      4165                    
## ===============================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

Nested error component model

Baltagi, Song, and Jung (2001)

data("Produc", package = "plm")
swar <- plm(form <- log(gsp) ~ log(pc) + log(emp) + log(hwy) + log(water) + log(util) + unemp, 
            Produc, index = c("state", "year", "region"), effect = "nested", random.method = "swar")
walhus <- update(swar, random.method = "walhus")
amem <- update(swar, random.method = "amemiya")
screenreg(list("Swamy-Arora" = swar, "Wallace-Hussain" = walhus, "Amemiya" = amem), digits = 5)
## 
## ==========================================================
##              Swamy-Arora    Wallace-Hussain  Amemiya      
## ----------------------------------------------------------
## (Intercept)    2.08921 ***    2.08165 ***      2.13133 ***
##               (0.14570)      (0.15035)        (0.16014)   
## log(pc)        0.27412 ***    0.27256 ***      0.26448 ***
##               (0.02054)      (0.02093)        (0.02176)   
## log(emp)       0.73984 ***    0.74164 ***      0.75811 ***
##               (0.02575)      (0.02607)        (0.02661)   
## log(hwy)       0.07274 ***    0.07493 ***      0.07211 ** 
##               (0.02203)      (0.02235)        (0.02363)   
## log(water)     0.07645 ***    0.07639 ***      0.07616 ***
##               (0.01386)      (0.01387)        (0.01402)   
## log(util)     -0.09437 ***   -0.09523 ***     -0.10151 ***
##               (0.01677)      (0.01677)        (0.01705)   
## unemp         -0.00616 ***   -0.00615 ***     -0.00584 ***
##               (0.00090)      (0.00091)        (0.00091)   
## ----------------------------------------------------------
## s_idios        0.03676        0.03762          0.03676    
## s_id           0.06541        0.06713          0.08306    
## s_gp           0.03815        0.05239          0.04659    
## R^2            0.97387        0.97231          0.96799    
## Adj. R^2       0.97368        0.97210          0.96776    
## Num. obs.    816            816              816          
## ==========================================================
## *** p < 0.001, ** p < 0.01, * p < 0.05

Bibliography

Amemiya, T. 1971. “The Estimation of the Variances in a Variance–Components Model.” International Economic Review 12: 1–13.

Amemiya, Takeshi, and Thomas E MaCurdy. 1986. “Instrumental-Variable Estimation of an Error-Components Model.” Econometrica 54 (4): 869–80.

Balestra, P., and J. Varadharajan–Krishnakumar. 1987. “Full Information Estimations of a System of Simultaneous Equations with Error Components.” Econometric Theory 3: 223–46.

Baltagi, B.H. 2009. A Companion to Econometric Analysis of Panel Data. John Wiley; Sons ltd.

———. 2013. Econometric Analysis of Panel Data. 5th ed. John Wiley; Sons ltd.

Baltagi, B.H. 1981. “Simultaneous Equations with Error Components.” Journal of Econometrics 17: 21–49.

———. 2005. Econometric Analysis of Panel Data. 3rd ed. John Wiley; Sons ltd.

Baltagi, B.H., and Y.J. Chang. 1994. “Incomplete Panels: A Comparative Study of Alternative Estimators for the Unbalanced One-Way Error Component Regression Model.” Journal of Econometrics 62: 67–89.

Baltagi, B.H., S.H. Song, and B.C. Jung. 2001. “The Unbalanced Nested Error Component Regression Model.” Journal of Econometrics 101: 357–81.

Breusch, Trevor S, Grayham E Mizon, and Peter Schmidt. 1989. “Efficient Estimation Using Panel Data.” Econometrica 57 (3): 695–700.

Cottrell, A. 2017. “Random Effects Estimators for Unbalanced Panel Data: A Monte Carlo Analysis.” Gretl Working Papers, no. 4. https://EconPapers.repec.org/RePEc:anc:wgretl:4.

Hausman, J.A., and W.E. Taylor. 1981. “Panel Data and Unobservable Individual Effects.” Econometrica 49: 1377–98.

Nerlove, M. 1971. “Further Evidence on the Estimation of Dynamic Economic Relations from a Time–Series of Cross–Sections.” Econometrica 39: 359–82.

Swamy, P.A.V.B., and S.S Arora. 1972. “The Exact Finite Sample Properties of the Estimators of Coefficients in the Error Components Regression Models.” Econometrica 40: 261–75.

Wallace, T.D., and A. Hussain. 1969. “The Use of Error Components Models in Combining Cross Section with Time Series Data.” Econometrica 37 (1): 55–72.