**Vignette with figures can be found here.**

*Hydroclimatic processes come in all “shapes and sizes”. They are characterized by different spatiotemporal correlation structures and probability distributions that can be continuous, mixed-type, discrete or even binary. Simulating such processes by reproducing precisely their marginal distribution and linear correlation structure, including features like intermittency, can greatly improve hydrological analysis and design. Traditionally, modelling schemes are case specific and typically attempt to preserve few statistical moments providing inadequate and potentially risky distribution approximations.*

*Here, a single framework is proposed that unifies, extends, and improves a general-purpose modelling strategy, based on the assumption that any process can emerge by transforming a specific “parent”" Gaussian process. A novel mathematical representation of this scheme, introducing parametric correlation transformation functions, enables straightforward estimation of the parent-Gaussian process yielding the target process after the marginal back transformation, while it provides a general description that supersedes previous specific parameterizations, offering a simple, fast and efficient simulation procedure for every stationary process at any spatiotemporal scale.*

*This framework, also applicable for cyclostationary and multivariate modelling, is augmented with flexible parametric correlation structures that parsimoniously describe observed correlations. Real-world simulations of various hydroclimatic processes with different correlation structures and marginals, such as precipitation, river discharge, wind speed, humidity, extreme events per year, etc., as well as a multivariate example, highlight the flexibility, advantages, and complete generality of the method.*

From Papalexiou (2018).

The implementation of the stochastic simulation framework follows a few simple steps. Our aim is to generate a time series from a given marginal distribution, autocorrelation function and probability zero. The latter is the probability that a sample value is zero and is commonly used in modelling variables with a substantial number of zero values, such as daily precipitation (intermittent processes).

Marginal distributions have different number of parameters and different type of parameters (location, scale, shape). For example, the Generalized Gamma distribution has one scale and two shape parameters, e.g., `margarg = list(scale = 2, shape1 = 0.9, shape2 = 0.8)`

.

In general, CoSMoS supports all build-in distribution functions of `R`

and of other packages, but also comes with some build-in distributions such as Generalized Gamma or Pareto type II. A list of the `CoSMoS`

distributions with their parameters can be found in Section 3.

Hence, the first step of stochastic simulation with `CoSMoS`

is to set the theoretical distribution that we will simulate, as well as its parameters:

We can introduce the autocorrelation structure in two ways. Either by using specific lag autocorrelations as a list starting from lag 0 up to a desired lag, or by using a pre-defined parametric autocorrelation structure.

For instance, a time series with lag-1 autocorrelation \(\rho_1 = 0.8\) can be generated by:

```
acf <- c(1, 0.8)
ggamma_sim <- generateTS(n = no,
margdist = marginaldist,
margarg = param,
acsvalue = acf)
plot(ggamma_sim, main = "")
par(mfrow = c(1, 2))
plot(density(ggamma_sim[[1]]), main = "")
acf(ggamma_sim[[1]], main = "")
```

Similarly, we can add more autocorrelation coeficients for a higher number of lags.

```
acf <- c(1, 0.5, 0.5, 0.4, 0.4) #up to lag-4
ggamma_sim <- generateTS(n = no,
margdist = marginaldist,
margarg = param,
acsvalue = acf)
plot(ggamma_sim, main = "")
par(mfrow = c(1, 2))
plot(density(ggamma_sim[[1]]), main = "")
acf(ggamma_sim[[1]], main = "")
```

Another, more sophisticated approach is to use a function that creates the autocorrelation structure. This can be done be the `acs()`

function or the user can create his own function. In the case of `acs()`

, four types of autocorrelation functions are provided:

- Fractional Gaussian Noise (FGN)
- Weibull
- Pareto II (also known as Lomax)
- Burr XII

The `acs()`

can produce values of the linear autocorrelation according to this four structures up to a desired lag. Thus, instead of setting each autocorrelation coefficient explicitly, we can fit a model with one to four arguments depending on the selected structure.

```
## specify lag
lags <- 0:10
## get the ACS
f <- acs('fgn', t = lags, H = .75)
b <- acs('burrXII', t = lags, scale = 1, shape1 = .6, shape2 = .4)
w <- acs('weibull', t = lags, scale = 2, shape = 0.8)
p <- acs('paretoII', t = lags, scale = 3, shape = 0.3)
## visualize the ACS
dta <- data.frame(lags, f, b, w, p)
m.dta <- melt(dta, id.vars = 'lags')
ggplot(m.dta,
aes(x = lags,
y = value,
group = variable,
colour = variable)) +
geom_point(size = 2.5) +
geom_line(lwd = 1) +
scale_color_manual(values = c('steelblue4', 'red4', 'green4', 'darkorange'),
labels = c('FGN', 'Burr XII', 'Weibull', 'Pareto II'),
name = '') +
labs(x = bquote(lag ~ tau),
y = 'Acf') +
scale_x_continuous(breaks = t) +
theme_grey()
```

Back to our example, we can implement a 30-lag autocorrelation structure as:

```
acf = acs(id = 'paretoII',
t = 0:30,
scale = 1,
shape = .75)
ggamma_sim <- generateTS(n = no,
margdist = marginaldist,
margarg = param,
acsvalue = acf)
par(mfrow = c(1, 1))
plot(ggamma_sim, main = "")
```

Note that in this case we use only 2 parameters for autocorrelation and not 30, which increases our simulation parsimony. More details about the parametric autocorrelation structures can be found in section 3.2 in Papalexiou (2018).

Except for the four autocorrelation functions provided by `acs()`

, we can also create the our own:

If we wish to simulate an intermittent process (e.g., precipitation), then we have to define the probability zero. For example, if `p0 = 0.9`

, then generated data shall have 90% of zero values (dry days). In addition, we can simulate multiple processes at once by using argument `TSn = 5`

.

The efficiency of the generated time series can be validated with `checkTS()`

. The first three moments, probability zero and the first three autocorrelation coefficients are printed.

In practice, we usually want to simulate a natural process using some sampled time series. A typical hydroclimatic variable, also available in `CoSMoS`

package is daily precipitation.

```
data('precip')
plot(precip, type = 'l')
par(mfrow = c(1, 2))
plot(density(precip$value), main = "", xlim = c(0, 10)) #Does not plot extreme values
acf(precip$value, main = "", lag.max = 20)
```

To generate a synthetic time series with similar characteristics to the observed precipitation, we have to determine their marginal distribution, autocorrelation structure and probability zero for each individual month. This can be done by trying to fit various distributions and autocorrelation structures with `analyzeTS()`

and then check the goodness-of-fit with `reportTS()`

.

```
precip_ggamma <- analyzeTS(TS = precip,
season = 'month',
dist = 'ggamma',
acsID = "weibull",
lag.max = 12)
reportTS(precip_ggamma, 'dist')
reportTS(precip_ggamma, 'acs')
reportTS(precip_ggamma, 'stat')
```

In this example, the Generalized Gamma distribution, with a Weibull auto-correlation structure, appears to fit well to the observered time series. Other combinations might not give so good results. For instance:

```
precip_pareto <- analyzeTS(TS = precip,
season = 'month',
dist = 'paretoII',
acsID = 'fgn',
lag.max = 12)
reportTS(precip_pareto, 'dist')
reportTS(precip_pareto, 'acs')
```

When the marginal distribution and the autocorrelation structure with the best fit are determined, we can produce a synthetic time series with the same statistical properties with `simulateTS()`

for a given period.

```
sim_precip <- simulateTS(aTS = precip_ggamma,
from = as.POSIXct('1978-12-01 00:00:00'),
to = as.POSIXct('2008-12-01 00:00:00'))
dta <- precip
dta[, id := 'observed']
sim_precip[, id := 'simulated']
dta <- rbind(dta, sim_precip)
ggplot(dta) +
geom_line(aes(x = date, y = value)) +
facet_wrap(~id, ncol = 1) +
theme_classic()
```

We can also apply the same methods to a different dataset - daily streamflow! For this example we have used random dataset from the MOPEX database (https://www.nws.noaa.gov/oh/mopex/mo_datasets.htm).

```
id <- '02135000'
dta_raw <- as.data.table(read.fwf(sprintf(
'ftp://hydrology.nws.noaa.gov/pub/gcip/mopex/US_Data/Us_438_Daily/%s.dly', id),
widths = c(8,10,10,10,10,10),
col.names = c('date', 'P', 'E', 'value', 'Tmax', 'Tmin')))
dta_raw[, date := as.POSIXct(gsub(' ','0', date), format = '%Y%m%d')]
daily_streamflow <- dta_raw[value >= 0, .(date, value)]
str_burr <- analyzeTS(daily_streamflow,
dist = 'ggamma',
norm = 'N4',
acsID = 'paretoII',
lag.max = 50)
sim_str <- simulateTS(str_burr)
dta <- daily_streamflow
dta[, id := 'observed']
sim_str[, id := 'simulated']
dta <- rbind(dta, sim_str)
ggplot(dta) +
geom_line(aes(x = date, y = value)) +
facet_wrap(~id, ncol = 1) +
theme_classic()
```

**Recomended distributions for variables:**

*precipitation*: ggamma (Generalized Gamma), burr### (Burr type)

*streamflow*: ggamma (Generalized Gamma), burr### (Burr type)

*relative humidity*: beta

*temperature*: norm (Normal distribution)

Package CoSMoS supports majority of continuous distributions available in R with defined quantile and probability density functions. We also provide four widely used distributions - Burr type III and XII, Generalized gamma and Pareto type II distribution.

distribution argument/name - burrIII

parameters - list(scale, shape1, shape2)

\[f_{\text{BrIII}}(x)=\frac{\left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}-1} \left(\frac{\left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}}}{\gamma _1}+1\right){}^{-\gamma _1 \gamma _2-1}}{\beta } \\ F_{\text{BrIII}}(x)=\left(\frac{\left(\frac{x}{\beta }\right)^{-\frac{1}{\gamma _2}}}{\gamma _1}+1\right){}^{-\gamma _1 \gamma _2} \\ Q_{\text{BrIII}}(u)=\beta \left(\gamma _1 \left(u^{-\frac{1}{\gamma _1 \gamma _2}}-1\right)\right){}^{-\gamma _2} \\ m_{\text{BrIII}}(q)=\frac{\beta ^q \gamma _1^{\gamma _2 (-q)} \Gamma \left(\left(q+\gamma _1\right) \gamma _2\right) \Gamma \left(1-q \gamma _2\right)}{\Gamma \left(\gamma _1 \gamma _2\right)}\]

distribution argument/name - burrXII

parameters - list(scale, shape1, shape2)

\[f_{\text{BrXII}}(x)=\frac{\left(\frac{x}{\beta }\right)^{\gamma _1-1} \left(\gamma _2 \left(\frac{x}{\beta }\right)^{\gamma _1}+1\right){}^{-\frac{1}{\gamma _1 \gamma _2}-1}}{\beta } \\ F_{\text{BrXII}}(x)=1-\left(\gamma _2 \left(\frac{x}{\beta }\right)^{\gamma _1}+1\right){}^{-\frac{1}{\gamma _1 \gamma _2}} \\ Q_{\text{BrXII}}(u)=\beta \left(-\frac{1-(1-u)^{-\gamma _1 \gamma _2}}{\gamma _2}\right){}^{\frac{1}{\gamma _1}} \\ m_{\text{BrXII}}(q)=\frac{\beta ^q \gamma _2^{-\frac{q}{\gamma _1}-1} B\left(\frac{q+\gamma _1}{\gamma _1},\frac{1-q \gamma _2}{\gamma _1 \gamma _2}\right)}{\gamma _1}\]

distribution argument/name - ggamma

parameters - list(scale, shape1, shape2)

\[f_{\mathcal{G}\mathcal{G}}(x)=\frac{\gamma _2 e^{-\left(\frac{x}{\beta }\right)^{\gamma _2}} \left(\frac{x}{\beta }\right)^{\gamma _1-1}}{\beta \Gamma \left(\frac{\gamma _1}{\gamma _2}\right)} \\ F_{\mathcal{G}\mathcal{G}}(x)=Q\left(\frac{\gamma _1}{\gamma _2},0,\left(\frac{x}{\beta }\right)^{\gamma _2}\right) \\ Q_{\mathcal{G}\mathcal{G}}(u)=\beta Q^{-1}\left(\frac{\gamma _1}{\gamma _2},0,u\right){}^{\frac{1}{\gamma _2}} \\ m_{\mathcal{G}\mathcal{G}}(q)=\frac{\beta ^q \Gamma \left(\frac{q}{\gamma _2}+\frac{\gamma _1}{\gamma _2}\right)}{\Gamma \left(\frac{\gamma _1}{\gamma _2}\right)}\]

distribution argument/name - burrIII

parameters - list(scale, shape)

\[f_{\text{PII}}(x)=\frac{\left(\frac{\gamma x}{\beta }+1\right)^{-\frac{1}{\gamma }-1}}{\beta } \\ F_{\text{PII}}(x)=1-\left(\frac{\gamma x}{\beta }+1\right)^{-1/\gamma } \\ Q_{\text{PII}}(u)=\frac{\beta \left((1-u)^{-\gamma }-1\right)}{\gamma } \\ m_{\text{PII}}(q)=\frac{\Gamma (q+1) \left(\frac{\beta }{\gamma }\right)^q \Gamma \left(\frac{1}{\gamma }-q\right)}{\Gamma \left(\frac{1}{\gamma }\right)}\]

Papalexiou, S.M., 2018. Unified theory for stochastic modelling of hydroclimatic processes: Preserving marginal distributions, correlation structures, and intermittency. Advances in Water Resources 115, 234-252. link