CRAN Task View: Econometrics
Base R ships with a lot of functionality useful for computational
econometrics, in particular in the stats package. This
functionality is complemented by many packages on CRAN, a brief overview
is given below. There is also a considerable overlap between the tools
for econometrics in this view and those in the task views on
is a suitable mailing list for obtaining help
and discussing questions about both computational finance and econometrics.
The packages in this view can be roughly structured into the following topics.
If you think that some package is missing from the list, please contact the maintainer.
Basic linear regression
Estimation and standard inference
: Ordinary least squares (OLS) estimation for linear models is provided
(from stats) and standard tests for model comparisons are available in various
methods such as
Further inference and nested model comparisons
: Functions analogous to
that also support asymptotic tests (
Chi-squared instead of
tests) and plug-in of other covariance
Tests of more general linear hypotheses are implemented in
and for nonlinear hypotheses in
Robust standard errors
: HC and HAC covariance matrices are available in
and can be plugged into the inference functions mentioned above.
Nonnested model comparisons
: Various tests for comparing non-nested linear
models are available in
(encompassing test, J test, Cox test).
The Vuong test for comparing other non-nested models is provided by
(and specifically for count data regression in
: The packages
provide a large collection
of regression diagonstics and diagnostic tests.
Generalized linear models (GLMs)
: Many standard microeconometric models belong to the
family of generalized linear models and can be fitted by
from package stats. This includes in particular logit and probit models
for modeling choice data and Poisson models for count data. Effects for typical
values of regressors in these models can be obtained and visualized using
Marginal effects tables for certain GLMs can be obtained using the
package. Interactive visualizations of both effects and marginal
effects are possible in
: The standard logit and probit models (among many others) for binary
responses are GLMs that can be estimated by
family = binomial.
Bias-reduced GLMs that are robust to complete and quasi-complete separation are provided by
brglm. Discrete choice models estimated by simulated maximum likelihood are
Rchoice. Heteroskedastic probit models (and other heteroskedastic
GLMs) are implemented in
along with parametric link functions and goodness-of-link
tests for GLMs.
: The basic Poisson regression is a GLM that can be estimated by
family = poisson
as explained above.
Negative binomial GLMs are available via
Another implementation of negative binomial models
is provided by
aod, which also contains other models for overdispersed
data. Zero-inflated and hurdle count models are provided in package
A reimplementation by the same authors is currently under development in
on R-Forge which also encompasses separate functions for zero-truncated regression,
finite mixture models etc.
: Multinomial models
with individual-specific covariates only are available in
nnet. Implementations with both individual- and
choice-specific variables are
multinomial logit models (e.g., with random effects etc.) are in
Generalized additive models
(GAMs) for multinomial responses can be fitted with the
A Bayesian approach to multinomial probit models is provided by
Various Bayesian multinomial models (including logit and probit) are available
bayesm. Furthermore, the package
hierarchical Bayesian specifications based on direct specification of the likelihood
: Proportional-odds regression for ordered responses is implemented
MASS. The package
provides cumulative link models for ordered data which encompasses proportional
odds models but also includes more general specifications. Bayesian ordered probit
models are provided by
: Basic censored regression models (e.g., tobit models)
can be fitted by
survival, a convenience
is in package
AER. Further censored
regression models, including models for panel data, are provided in
Interval regression models are in
intReg. Censored regression models with
conditional heteroskedasticity are in
Furthermore, hurdle models for left-censored data at zero can be estimated with
mhurdle. Models for sample selection are available in
and semiparametric extensions of these are provided by
corrects for selection bias when the sample is the
result of a stable matching process (e.g., a group formation or college admissions problem).
for truncated Gaussian responses.
Fraction and proportion responses
: Fractional response models are in
Beta regression for responses in (0, 1) is in
: Further more refined tools for microeconometrics are provided in
family of packages: Analysis with
Cobb-Douglas, translog, and quadratic functions is in
the constant elasticity of scale (CES) function is in
the symmetric normalized quadratic profit (SNQP) function is in
The almost ideal demand system (AIDS) is in
Stochastic frontier analysis (SFA) is in
and certain special cases also in
Semiparametric SFA in is available in
and spatial SFA in
implements a Bayesian approach to microeconometrics and marketing.
Estimation and marginal effect computations for multivariate probit models can be carried out with
Inference for relative distributions is contained in package
Basic instrumental variables (IV) regression
: Two-stage least squares (2SLS)
is provided by
AER. Other implementations are in
focus on multiple group fixed effects).
: An IV probit model via GLS estimation
is available in
local average response functions for binary treatments and binary instruments.
: Certain basic IV models for panel data can also be estimated
with standard 2SLS functions (see above). Dedicated IV panel data models are provided
(fixed effects) and
(between and random effects).
estimates Bayesian IV models with conditional Bayes factors.
implements the Lewbel approach based on GMM estimation of triangular systems using heteroscedasticity-based IVs.
Panel data models
Panel-corrected standard errors
: A simple approach for panel data is
to fit the pooling (or independence) model (e.g., via
and only correct the standard errors. Different types of panel-corrected standard
errors are available in
geepack, respectively. The latter two require estimation of the
pooling/independence models via
the respective packages (which also provide other types of models, see below).
Linear panel models
plm, providing a wide range of within,
between, and random-effect methods (among others) along with corrected standard
errors, tests, etc. Another implementation of several of these models is in
Generalized estimation equations and GLMs
: GEE models for panel data (or longitudinal
data in statistical jargon) are in
estimation of GLM-like models for panel data.
Mixed effects models
: Linear and nonlinear models for panel data (and more
general multi-level data) are available in
ivpanel, see also above.
Heterogeneous time trends
offers the possibility of
analyzing panel data with large dimensions n and T and can be considered when
the unobserved heterogeneity effects are time-varying.
Multiple group fixed effects are in
Autocorrelation and heteroskedasticity correction in are available in
PANIC Tests of nonstationarity are in
Threshold regression and unit root tests are in
The panel data approach method for program evaluation is available in
Further regression models
Nonlinear least squares modeling
in package stats.
(including linear, nonlinear, censored,
locally polynomial and additive quantile regressions).
Generalized method of moments (GMM) and generalized empirical likelihood (GEL)
Spatial econometric models
view gives details about
handling spatial data, along with information about (regression) modeling. In particular,
spatial regression models can be fitted using
latter using a GMM approach).
is a package for spatial panel
models. Spatial probit models are available in
Bayesian model averaging (BMA)
: A comprehensive toolbox for BMA is provided by
including flexible prior selection, sampling, etc. A different implementation
for linear models, generalizable linear models and survival models (Cox regression).
Linear structural equation models
See also the
task view for more details.
Simultaneous equation estimation
Nonparametric kernel methods
Linear and nonlinear mixed-effect models
Generalized additive models (GAMs)
Extreme bounds analysis
: The packages
provide several tools for extended
handling of (generalized) linear regression models.
is a unified
easy-to-use interface to a wide range of regression models.
Time series data and models
task view provides much more detailed
information about both basic time series infrastructure and time series models.
Here, only the most important aspects relating to econometrics are briefly mentioned.
Time series models for financial econometrics (e.g., GARCH, stochastic volatility models, or
stochastic differential equations, etc.) are described in the
Infrastructure for regularly spaced time series
: The class
in package stats is R's standard class for
regularly spaced time series (especially annual, quarterly, and monthly data). It can be
coerced back and forth without loss of information to
Infrastructure for irregularly spaced time series
provides infrastructure for
both regularly and irregularly spaced time series (the latter via the class
"zoo") where the time information can be of arbitrary class.
This includes daily series (typically with
or intra-day series (e.g., with
time index). An extension
geared towards time series with different kinds of
time index is
xts. Further packages aimed particularly at
finance applications are discussed in the
Classical time series models
: Simple autoregressive models can be estimated
and ARIMA modeling and Box-Jenkins-type analysis can be
carried out with
(both in the stats package). An enhanced
Linear regression models
: A convenience interface to
for estimating OLS and 2SLS models based on time series data is
Linear regression models with AR error terms via GLS is possible
Structural time series models
: Standard models can be fitted with
Further packages are discussed in the
Filtering and decomposition
in stats. The basic function for computing filters (both rolling and autoregressive) is
in stats. Many extensions to these methods, in particular for
forecasting and model selection, are provided in the
: Simple models can be fitted by
in stats, more
elaborate models are provided in package
along with suitable diagnostics,
visualizations etc. A Bayesian approach is available in
Unit root and cointegration tests
- Threshold and smooth transistion models.
and other econometric methods for mixed frequency time series data analysis.
- GEneral-To-Specific (GETS) model selection for either ARX models with log-ARCH-X errors, or a log-ARCH-X model of the log variance.
- Time series factor analysis.
- Asymmetric price transmission models.
Textbooks and journals
contain a comprehensive collections of data sets from various standard econometric
textbooks as well as several data sets from the Journal of
Applied Econometrics and the Journal of Business & Economic Statistics
additionally provides an extensive set of
examples reproducing analyses from the textbooks/papers, illustrating
various econometric methods.
Tsay's 'Analysis of Financial Time Series'
is the R companion to Tsay's 'Analysis of
Financial Time Series' (2nd ed., 2005, Wiley) containing data sets, functions
and script files required to work some of the examples.
Canadian monetary aggregates
Penn World Table
provides versions 5.6, 6.x, 7.x. Version 8.x and 9.x
data are available in
Time series and forecasting data
: The packages
data packages with time series data
from the books 'Forecasting with Exponential Smoothing: The State Space Approach'
(Hyndman, Koehler, Ord, Snyder, 2008, Springer) and 'Forecasting: Methods and Applications'
(Makridakis, Wheelwright, Hyndman, 3rd ed., 1998, Wiley) and the M-competitions,
Empirical Research in Economics
contains functions and datasets for the book of
'Empirical Research in Economics: Growing up with R' (Sun, forthcoming).
Panel Study of Income Dynamics (PSID)
can build panel data sets
from the Panel Study of Income Dynamics (PSID).
US state- and county-level panel data:
World Bank data and statistics: The
programmatic access to the World Bank API.
: As a vector- and matrix-based language, base R
ships with many powerful tools for doing matrix manipulations, which are
complemented by the packages
Optimization and mathematical programming
: R and many of its contributed
packages provide many specialized functions for solving particular optimization
problems, e.g., in regression as discussed above. Further functionality for
solving more general optimization problems, e.g., likelihood maximization, is
discussed in the the
: In addition to the recommended
there are some other general bootstrapping techniques available in
as well some bootstrap techniques
designed for time-series data, such as the maximum entropy bootstrap in
: For measuring inequality, concentration and poverty the
provides some basic tools such as Lorenz curves,
Pen's parade, the Gini coefficient and many more.
: R is particularly strong when dealing with
structural changes and changepoints in parametric models, see
Exchange rate regimes
: Methods for inference about exchange
rate regimes, in particular in a structural change setting, are provided
Global value chains
: Tools and decompositions for global value
chains are in
Regression discontinuity design
: A variety of methods are provided in