# Introduction

## Scope

Discharge simulations are affected by several sources of uncertainty (e.g. errors in meteorological forcings, sub-optimal parameter estimates).

Data assimilation (DA) allows to combine measurements and model simulations under the assumption that both supply useful information to obtain the most likely estimate of the system state. Among the sequential methods, the ensemble-based techniques allow to explicitly handle different sources of uncertainty and to quantify the unknown errors of both model states and observations.

After the study of Piazzi et al. (accepted), airGRdatassim package (Piazzi and Delaigue 2021) provides functions allowing to perform DA-based ensemble discharge simulations. airGRdatassim is a package based on the airGR hydrological modeling package (Coron et al. 2017, 2021). This package is designed to flexibly use the Ensemble Kalman filter (EnKF) or Particle filter (PF) technique to assimilate daily discharge observations into GR4J, GR5J, and GR6J models (use the command ?airGR to get the references of the GR models).

The uncertainty in both the meteorological forcings (i.e. precipitation and potential evapotranspiration) and model states (i.e. production store level, routing store level and unit hydrographs) can be easily taken into account by enabling specific perturbation procedures.

In this user guide we go through the assimilation procedure with the aim of explaining how to perform DA-based discharge simulations. Along with the description of the modelling system, key recommendations are provided to support the definition of the operational configuration.

airGRdatassim is a package based on the airGR hydrological modeling package (Coron et al. 2017, 2021).

All the examples rely on the following data set, which is available in airGR package:

library(airGRdatassim)
## Loading required package: airGR
data(L0123001, package = "airGR")
head(BasinObs)
##       DatesR    P    T   E   Qls    Qmm
## 1 1984-01-01  4.1  0.5 0.2  2640 0.6336
## 2 1984-01-02 15.9  0.2 0.2  3440 0.8256
## 3 1984-01-03  0.8  0.9 0.3 12200 2.9280
## 4 1984-01-04  0.0  0.5 0.3  7600 1.8240
## 5 1984-01-05  0.0 -1.6 0.1  6250 1.5000
## 6 1984-01-06  0.0  0.9 0.3  5650 1.3560

# Settings for data assimilation

This section introduces and explains the main settings of the DA scheme that the user needs to define.

## Ensemble size

The ensemble size can be defined by assigning NbMbr with the number of ensemble members.

It is noteworthy that the PF technique is more sensitive to the ensemble size than the EnKF scheme. Indeed, compared to the EnKF scheme, the PF generally needs a larger ensemble size to efficiently perform the weighting and resampling of particles. It is recommended to generate and use an ensemble of at least 20 or 30 members, when using the EnKF or the PF scheme, respectively.

## Enable/disable the assimilation via EnKF or PF

The DA methodology can be chosen between EnKF and PF by defining DaMethod. To perform the open-loop simulation (i.e., without DA), the assimilation of discharge observations can be disabled.

If DaMethod = "PF", the PF performs the joint update of all the state variables.

If DaMethod = "EnKF", the EnKF performs the update of the state variables specified in StateEnKF.

If DaMethod = "none", discharge assimilation is not performed (i.e., open-loop simulation).

Among the model states, the update of the routing store level ensures the most benefit from the assimilation of observed discharges. Conversely, the update of the unit hydrograph only has a slight impact on the accuracy of the DA-based discharge simulations (Piazzi et al. accepted).

## Model uncertainties

Both the uncetainties in meteorological forcings and state variables can be taken into account in the assimilation scheme.

For both DA techniques, the state variables defined in StatePert are perturbed.

The perturbation of model states is performed through a normally-distributed null-mean noise. The noise variance is assumed equal to the variance of the state variables resulting from the analysis procedure and it is restricted between upper and lower limits (Salamon and Feyen 2009).

When considering the uncertainty in the forcings, the CreateInputsPert() function generates an ensemble for precipitation or/and potential evapotranspiration, provided as function argument/s. It is noteworthy that a significant reduction in the ensemble spread may occur especially in no-rain periods, when accounting only for meteorological uncertainty.

Even though a more comprehensive representation of the state uncertainty can prevent the ensemble shrinkage, it has a different impact on the performance of the PF and the EnKF schemes (Piazzi et al. accepted). Indeed, while a higher model error covariance can lead to an overweighting of observations in the EnKF-based analysis procedure, a larger spread of the ensemble simulations allows for a more straightforward weighting and thus a more efficient resampling of particles in the PF scheme.

## Definition of DA settings

The example below considers EnKF-based discharge simulations relying on a 100-member ensemble, when accounting for the uncertainty in model state variables (see StatePert). More specifically, only the levels of the production ("Prod") and the routing ("Rout") stores are updated and perturbed (see StatePert).

It is of key importance to consider that StateEnKF allows to select the state variables to be updated and StatePert identifies the state variables to be perturbed in the EnKF scheme. Conversely, in the PF scheme StatePert identifies the state variables of interest only within the perturbation procedure, as this DA technique jointly updates all the model states.

# ensemble size
NbMbr <- 100L

# "EnKF" ; "PF" ; "none"
DaMethod <- "EnKF"

# "Prod": production store level
# "Rout": routing store level;
# "UH1": unit hydrograph 1 levels (not defined in GR5J model)
# "UH2": unit hydrograph 2 levels
StateEnKF <- c("Prod", "Rout")
StatePert <- c("Prod", "Rout") 

# Ensemble meteorological forcings

When accounting for the uncertainty in meteorological forcings, the CreateInputsPert() function generates an ensemble for precipitation or/and potential evapotranspiration, provided as function argument/s.

Probabilistic model inputs result from the perturbation of the meteorological data through a multiplicative stochastic noise, according to the methodology proposed by Clark et al. (2008) The random perturbations are provided through a first-order autoregressive model in order to guarantee a physical consistency and a temporal correlation of the time-variant forcings. For the generation of perturbations of both the meteorological variables, the fractional error parameter is set equal to 0.65 and the temporal decorrelation length is defined as 1 day for rainfall and 2 days for potential evapotranspiration, by default.

Deterministic model inputs are still needed for the warm-up period and they are generated through the CreateInputsModel() function of the airGR package.

In this example, discharge simulations are provided by GR5J model throughout the period from 2006-09-01 to 2006-09-31.

## simulation period
IndRun <- seq(which(format(BasinObs$DatesR, format = "%Y-%m-%d") == "2006-09-01"), which(format(BasinObs$DatesR, format = "%Y-%m-%d") == "2006-10-31"))

## preparation of InputsModel object
InputsModel <- CreateInputsModel(FUN_MOD = RunModel_GR5J, DatesR = BasinObs$DatesR, Precip = BasinObs$P, PotEvap = BasinObs$E) ## generation of probabilistic model inputs InputsPert <- CreateInputsPert(FUN_MOD = RunModel_GR5J, DatesR = BasinObs$DatesR,
Precip = BasinObs$P, PotEvap = BasinObs$E,
NbMbr = NbMbr)

Please note that if the uncertainty in meteorological forcings is not taken into account, the same deterministic InputsModel object is used to force each ensemble member.

# Discharge observations for DA

The estimate of the observation error is of critical importance, as it defines how the filter trusts the observations and thus to what extent they are assimilated into the model.

By default, the measurement noise is generated from a normal distribution with a zero-valued mean and a variance parameterized as a function of the observed discharge rate (Weerts and El Serafy 2006; Clark et al. 2008). With the aim of preventing underestimated error variances in case of low discharge, the minimum threshold of the observation error variance is evaluated as the quantile 10 of discharge observations, by default (Thirel et al. 2010).

## discharge observations to be assimilated
Qobs <- BasinObs$Qmm[IndRun] # DA-based discharge simulations We assume that the GR5J model has been previously calibrated by using the Calibration() function of airGR package. Therefore, we refer to the calibrated values of model parameters as Param. The RunModel_DA() function supplies the ensemble discharge simulations resulting from the sequential assimilation of daily discharge observations via EnKF or PF. Because of the recursive approach, the analysis states resulting from the assimilation procedure at time step t are used as initial states at the following prediction time step t + 1. If the state uncertainty is taken into account (StatePert), the initial state at the prediction time step t + 1 is the perturbed analysis state resulting from the assimilation and perturbation procedures at time step t. The function provides three outputs: • the ensemble discharge simulations (QsimEns) • the ensemble values of background model states, namely before the filter update (EnsStateBkg) • the ensemble values of analysis model states, namely after the filter update (EnsStateA) ## EnKF-based discharge simulations The following example supplies probabilistic discharge simulations relying on the EnKF-based assimilation of daily discharge observations. Param <- c(X1 = 194.243, X2 = -0.088, X3 = 117.740, X4 = 1.680, X5 = 0.000) ResEnKF <- RunModel_DA(InputsModel = InputsModel, InputsPert = InputsPert, Qobs = BasinObs$Qmm,
IndRun = IndRun,
FUN_MOD = RunModel_GR5J, Param = Param,
DaMethod = "EnKF", NbMbr = NbMbr,
StateEnKF = StateEnKF, StatePert = StatePert)

As defined in StateEnKF, in this example the EnKF scheme updates only the levels of the production ("Prod") and the routing ("Rout") stores through the assimilation of discharge observations. Along with the uncertainty in meterological forcings (InputsPert is supplied), also the uncertainty in model states (StatePert) is taken into account. Therefore, the selected state variables (i.e. the store levels) are consistently perturbed after their update.

The EnKF scheme relies on mass-constraints to guarantee the non-negativity of state variables by ensuring minimum state values. It is noteworthy that the level of production store cannot drop below the 5 percent of its capacity. A further constraint prevents the levels of both the production and the routing stores from exceeding their maximum capacities.

## PF-based discharge simulations

The example below provides ensemble discharge simulations resulting from the daily PF-based assimilation of discharge observations.

The PF scheme relies on the Sequential Importance Resampling (SIR) technique (Gordon, Salmond, and Smith 1993). According to the SIR approach, the importance weights associated to the particles are updated according to the likelihood of particle states with respect to the observed one. A resampling procedure allows to discard particles having a low probability and to replicate those having a high importance weight. If the ensemble is too squeezed to properly evaluate the importance weights, all particles are assigned equal weights.

ResPF <- RunModel_DA(InputsModel = InputsModel,
InputsPert = InputsPert,
Qobs = BasinObs$Qmm, IndRun = IndRun, FUN_MOD = RunModel_GR5J, Param = Param, DaMethod = "PF", NbMbr = NbMbr, StatePert = "Rout") Unlike the previous example, here only the uncertainty in the level of the routing store (see StatePert) is taken into account. Therefore, the routing store level is the only state variable to be perturbed after the updating procedure. After the state perturbation, a mass-constraint prevents the levels of both the production and the routing stores from exceeding their maximum capacities. # Comparative performance assessment Here the performances of the EnKF- and PF-based modelling systems are assessed against the open-loop run, which provides discharge simulations without the assimilation of discharge observations. RunOptions <- CreateRunOptions(FUN_MOD = RunModel_GR5J, InputsModel = InputsModel, IndPeriod_Run = IndRun) InputsCritMulti <- CreateInputsCrit(FUN_CRIT = list(ErrorCrit_KGE, ErrorCrit_NSE), InputsModel = InputsModel, RunOptions = RunOptions, Obs = list(Qobs, Qobs), VarObs = list("Q", "Q"), transfo = list("", "sqrt"), Weights = NULL) # open-loop run (without DA) OutputsModel_OL <- RunModel_GR5J(InputsModel = InputsModel, RunOptions = RunOptions, Param = Param) CritOL <- ErrorCrit(InputsCrit = InputsCritMulti, OutputsModel = OutputsModel_OL) # EnKF run OutputsModel_EnKF <- OutputsModel_OL OutputsModel_EnKF$Qsim <- rowMeans(ResEnKF$QsimEns) CritEnKF <- ErrorCrit(InputsCrit = InputsCritMulti, OutputsModel = OutputsModel_EnKF) # PF run OutputsModel_PF <- OutputsModel_OL OutputsModel_PF$Qsim <- rowMeans(ResPF\$QsimEns)
CritPF <- ErrorCrit(InputsCrit = InputsCritMulti, OutputsModel = OutputsModel_PF)

When comparing the performance of these 3 modelling configurations, it is noteworthy that the assimilation of discharge observations generally succeeds in enhancing the accuracy of discharge simulations, though they rely on different DA settings (i.e. different DA techniques, system uncertainties and updated states).

##   Method KGE[Q] NSE[sqrt(Q)]
## 1     OL  0.747        0.771
## 2   EnKF  0.749        0.677
## 3     PL  0.864        0.787

# References

Clark, M. P., D. E. Rupp, R. A. Woods, X. Zheng, R. P. Ibbitt, A. G. Slater, J. Schmidt, and M. J. Uddstrom. 2008. “Hydrological Data Assimilation with the Ensemble Kalman Filter: Use of Streamflow Observations to Update States in a Distributed Hydrological Model.” Advances in Water Resources 31 (10): 1309–24. https://doi.org/10.1016/j.advwatres.2008.06.005.
Coron, L., O. Delaigue, G. Thirel, D. Dorchies, C. Perrin, and C. Michel. 2021. airGR: Suite of GR Hydrological Models for Precipitation-Runoff Modelling. R News. https://doi.org/10.15454/EX11NA.
Coron, L., G. Thirel, O. Delaigue, C. Perrin, and V. Andréassian. 2017. “The Suite of Lumped GR Hydrological Models in an R Package.” Environmental Modelling and Software 94: 166–71. https://doi.org/10.1016/j.envsoft.2017.05.002.
Gordon, N. J., D. J. Salmond, and A. F. M. Smith. 1993. “Novel Approach to Nonlinear/Non-Gaussian Bayesian State Estimation.” IEE Proceedings F - Radar and Signal Processing 140 (2): 107–13. https://doi.org/10.1049/ip-f-2.1993.0015.
Piazzi, G., and O. Delaigue. 2021. airGRdatassim: Suite of Tools to Perform Ensemble-Based Data Assimilation in GR Hydrological Models. R News. https://doi.org/10.15454/WEYYVZ.
Piazzi, G., G. Thirel, C. Perrin, and O. Delaigue. accepted. “Sequential Data Assimilation for Streamflow Forecasting: Assessing the Sensitivity to Uncertinties and to Updated Variables of a Conceptual Hydrological Model.” Water Resources Research, accepted. https://doi.org/10.1029/2020WR028390.
Salamon, P., and L. Feyen. 2009. “Assessing Parameter, Precipitation, and Predictive Uncertainty in a Distributed Hydrological Model Using Sequential Data Assimilation with the Particle Filter.” Journal of Hydrology 376 (3): 428–42. https://doi.org/https://doi.org/10.1016/j.jhydrol.2009.07.051.
Thirel, G., E. Martin, J.-F. Mahfouf, S. Massart, S. Ricci, and F. Habets. 2010. “A Past Discharges Assimilation System for Ensemble Streamflow Forecasts over FrancePart 1: Description and Validation of the Assimilation System.” Hydrology and Earth System Sciences 14 (8): 1623–37. https://doi.org/10.5194/hess-14-1623-2010.
Weerts, A. H., and G. Y. H. El Serafy. 2006. “Particle Filtering and Ensemble Kalman Filtering for State Updating with Hydrological Conceptual Rainfall-Runoff Models.” Water Resources Research 42 (9). https://doi.org/10.1029/2005WR004093.