INDperform

Overview

INDperform is an R package for validating the performance of ecological state indicators and assessing the ecological status based on a suite of indicators.

Finding suitable state indicators (IND) is challenging and cumbersome in stochastic and complex ecological systems. Particularly, features associated with the indicator’s performance such as sensitivity or robustness are often neglected due to the lack of quantitative validation tools. INDperform implements a novel quantitative framework for selecting and validating the performance of state indicators tailored to meet regional conditions and specific management needs as described in Otto et al. (2018). The package builds upon the tidy data principles and offers functions to

These functions can be executed on any number of indicators and pressures. Based on these analyses and a scoring scheme for selected criteria the individual performances can be quantified, visualized, and compared. The combination of tools provided in this package can help making state indicators operational under given management schemes such as the EU Marine Strategy Framework Directive.

Installation

Install the development version from Github using devtools (soon also on CRAN):

# install.packages("devtools")
devtools::install_github("saskiaotto/INDperform")

If you encounter a clear bug, please file a minimal reproducible example on github. For questions email me any time.

Cheatsheet

Usage

INDperform offers function that can be applied individually to some extent but mostly build upon each other to follow the 7-step process proposed in Otto et al. (2018) (see also the package’s cheat sheet for detailed instructions). For demonstration purposes the package provides a dataset of food web indicators and pressure variables in the Central Baltic Sea (modified from Otto et al., 2018).

This is a suggested workflow demonstrated on the example data included in the package:

library(INDperform)
# Using the demo data
head(ind_ex)
head(press_ex)
head(press_type_ex)
# Scoring template:
crit_scores_tmpl


# Trend modeling -------------

m_trend <- model_trend(ind_tbl = ind_ex[ ,-1],
  time = ind_ex$Year)
# Model diagnostics
pd <- plot_diagnostics(model_list = m_trend$model)
pd$all_plots[[1]] # first indicator
# Inspect trends
pt <- plot_trend(m_trend)
pt$TZA # shows trend of TZA indicator


# Indicator response modeling ------------

### Initialize data (combining IND with pressures)
dat_init <- ind_init(ind_tbl = ind_ex[ ,-1],
  press_tbl = press_ex[ ,-1], time = ind_ex$Year)

### Model responses
m_gam <- model_gam(init_tbl = dat_init)

# Model diagnostics (e.g. first model)
plot_diagnostics(model_list = m_gam$model[[1]])$all_plots[[1]]
# Any outlier?
m_gam$pres_outlier %>% purrr::compact(.)
# - get number of models with outliers detected
purrr::map_lgl(m_gam$pres_outlier, ~!is.null(.)) %>% sum()
# - which models and what observations?
m_gam %>%
    dplyr::select(id, ind, press, pres_outlier) %>%
    dplyr::filter(!purrr::map_lgl(m_gam$pres_outlier, .f = is.null)) %>%
    tidyr::unnest(pres_outlier)
# Exclude outlier in models
m_gam <- model_gam(init_tbl = dat_init, excl_outlier = m_gam$pres_outlier)
# Any temporal autocorrelation
sum(m_gam$tac)
# - which models
m_gam %>%
    dplyr::select(id, ind, press, tac) %>%
    dplyr::filter(tac)

# If temporal autocorrelation present
m_gamm <- model_gamm(init_tbl = dat_init,
  filter = m_gam$tac)
# Again, any outlier?
purrr::map_lgl(m_gamm$pres_outlier, ~!is.null(.)) %>% sum()

# Select best GAMM from different correlation structures
# (based on AIC)
best_gamm <- select_model(gam_tbl = m_gam,
  gamm_tbl = m_gamm)
plot_diagnostics(model_list = best_gamm$model[[1]])$all_plots[[1]]
# Merge GAM and GAMMs
m_merged <- merge_models(m_gam[m_gam$tac == FALSE, ], best_gamm)

# Calculate derivatives
m_calc <- calc_deriv(init_tbl = dat_init,
  mod_tbl = m_merged)

# Test for pressure interactions
it <- select_interaction(mod_tbl = m_calc)
# (creates combinations to test for)
m_all <- test_interaction(init_tbl = dat_init, mod_tbl = m_calc,
     interactions = it)


# Scoring based on model output ------------
scores <- scoring(trend_tbl = m_trend, mod_tbl = m_all, press_type = press_type_ex)
# Runs a shiny app to modify the score for the subcriterion 10.1:
#scores <- expect_resp(mod_tbl = m_all, scores_tbl = scores)
sum_sc <- summary_sc(scores)
spie <- plot_spiechart(sum_sc)
spie$TZA # shows the spiechart of the indicator TZA

NOTE FOR SPATIAL DATA:

All functions are tailored to indicator time series. Spatial data and spatial autocorrelation testing is currently not included. However, if you have spatial data you could still use all functions except for model_gamm() as it incorporates only temporal autocorrelation structures (AR and ARMA). Simply do the following and use as time vector in ind_init() an integer variable with consecutive numbers (with no gaps!) representing your different stations.

### Use of station numbers instead of time vector
station_id <- 1:nrow(your_indicator_dfr) 
dat_init <- ind_init(ind_tbl = your_indicator_dfr,
  press_tbl = your_pressure_dfr, time = station_id)

Validation of IND performances

Each IND is modeled as a function of time or a single pressure variable using Generalized Additive Models (GAMs) (based on the mgcv package).

To show the model diagnostics or complete model results

Scoring IND performance based on model output

Among the 16 common indicator selection criteria, five criteria relate to the indicator’s performances and require time series for their evaluation, i.e.

  1. Development reflects ecosystem change caused by variation in manageable pressure(s)
  2. Sensitive or responsive to pressures
  3. Robust, i.e. responses in a predictive fashion, and statistically sound
  4. Links to management measures (responsiveness and specificity)
  5. Relates where appropriate to other indicators but is not redundant

As these are subject to the quality of the underlying data, a thorough determination of whether the indicator as implemented meets the expected requirements is needed. In this package, the scoring scheme for these criteria as proposed by Otto et al. (2018) serves as basis for the quantification of the IND performance. Sensitivity (criterion 9) and robustness (criterion 10) are specified into more detailed sub-criteria to allow for quantification based on statistical models and rated individually for every potential pressure that might affect the IND directly or indirectly.

However, the scoring scheme can easily be adapted to any kind of state indicator and management scheme by modifying the scores, the weighting of scores or by removing (sub)criteria.

crit_scores_tmpl

This table contains the scores and weights for each (sub-)criterion. It includes also the variables from the model output tibbles on which each(sub)criterion is based on as well as the condition to determine the actual score. crit_scores_tmpl is set as default in the scoring() function and, if needed, should be modified prior to using the function.


Examining redundancies and selecting robust indicator suites

Assessment of current state status

Two approaches based on trajectories in state space to determine the current state of the system in comparison to an earlier period as reference using the selected IND suite (state space = n-dimensional space of possible locations of IND variables)

  1. Calculation of the Euclidean distance in state space of any dimensionality between each single year (or any other time step used) and a defined reference year.

  2. Given the identification of a reference domain in state space, more recent observations might lie within or outside this domain. The convex hull is a multivariate measure derived from computational geometry representing the smallest convex set containing all the reference points in Euclidean plane or space. For visualization, only 2 dimensions considered (dimension reduction through e.g. Principal Component Analysis suggested).

Documentation and further information

For guidance on how to apply the functions step-by-step see also the INDperform cheatsheet. We are currently working on the Vignette but if you want more information on the framework for quantifying IND performances and its statistical tools implemented in this package see

Otto, S.A., Kadin, M., Casini, M., Torres, M.A., Blenckner, T. (2018): A quantitative framework for selecting and validating food web indicators. Ecological Indicators, 84: 619-631, doi: https://doi.org/10.1016/j.ecolind.2017.05.045