# GPoM: Modelling

## Global Modelling

This document gives an illustration on how to apply the global modelling technique. The aim of global modelling is to obtain models of Ordinary Differential Equations (ODEs) of polynomial form from either single or multiple time series. Such type of modelling requires a careful time series preprocessing. Such data preprocessing was sketched in the vignette II_PreProcessing. In the present vignette, it is assumed that a careful and adapted preprocessing was already applied to the studied time series, so that the modelling tools can be applied directly.

## Single time series modelling

The chaotic system introduced Rössler in 19761 is well adapted to illustrate the high skills of the approach to model nonlinear dynamics. Simulations from this chaotic system are available in the GPoM package and can be directly loaded as follows:

    # load data
data("Ross76")

(note that the GPoM package can easily be used to produce such chaotic data set, see vignette I_Generate).

The loaded variable Ross76 is a matrix. The time vector is in the first column and the variables $$x$$, $$y$$ and $$z$$ in the next columns two to four. The global modelling from one single time series is presented in the present section. The variable $$y$$ will be considered here:

    # time vector
tin <- Ross76[,1]
# single time series
data <- Ross76[,3]
# plot
plot(tin, data, type='l',
xlab = 't', ylab = 'y(t)', main = 'Original time series')

Starting from single time series, the following canonical form of equations may be expected2:

$$dX_1/dt = X_2$$

$$dX_2/dt = X_3$$

$$...$$

$$dX_n/dt = P(X_1, X_2, ..., X_n)$$

To apply the global modelling technique to the time series presented above, it is necessary to choose a trial model dimension nVar and maximal polynomial degree dMax. To restrict the model search, the minimum and maximum number of parameters can be chosen manually with nPmin and nPmax. IstepMax corresponds to the maximum iterations of the numerical integration.

    # model dimension
nVar = 3
# maximum polynomial degree
dMax = 2
# generalized Polynomial modelling
out1 <- gPoMo(data, tin=tin, dMax = dMax, nS = nVar, show = 0,
nPmin = 9, nPmax = 11, verbose = FALSE, IstepMax = 3000)

Note that, to follow the model search during the searching process, parameter show should be set on such as show = 1. In practice, the model size (that is, its number of parameters) is generally unknwown. It is then necessary to extend this range. In practice, nPmax cannot be chosen too large in order to keep the computation time reasonable.

With the parameterization chosen here, two models are obtained:

    which(out1$okMod == 1) ## [1] 1 3 The model #3 shows a quite good skill to reproduce the original phase portrait:  # obtained model #3 plot(out1$stockoutreg$model3[,1], out1$stockoutreg$model3[,2], type='l', col = 'red', xlab = 'y(t)', ylab = 'dy(t)/dt', main = 'Phase portrait') # original phase portrait lines(out1$filtdata[,1], out1$filtdata[,2]) The equations of the obtained model can be edited as follows:  # obtained model #3 visuEq(nVar, dMax, out1$models$model3) ## dX1/dt = X2 ## ## dX2/dt = X3 ## ## dX3/dt = -1.99343273198868 -3.4734227287143 X3 + 1.07590783964826 X2 + 0.997785206869574 X2 X3 -0.519426529124158 X2^2 -3.99380611347708 X1 -0.519140167914684 X1 X3 + 1.26817620504214 X1 X2 -0.519439096048853 X1^2 Various methods may be used to validate such a model. In practice, note that validation may depend on the objective of the model. One way to validate the model can be based on predictability (see vignette V_Predictability). ## Causal couplings detection and retro-modelling One interest of this approach is (1) to permit the detection of nonlinear dynamical couplings, (2) to obtain an analytical formulation of the couplings, and (3) to infer causal relations. Here again, the Rössler-1976 chaotic system is used to illustrate the purpose. The time series of the three variables $$x$$, $$y$$ and $$z$$ are used this time.  # multiple (three) time series data <- Ross76[,2:4] The three time series are as follows:  # x(t) plot(tin, data[,1], ylim = c(-6.5,12), type='l', col = 'blue', xlab = 't', ylab = 'x(t)', main = 'Original time series') # y(t) lines(tin, data[,2], type = 'l', col = 'brown') # z(t) lines(tin, data[,3], type = 'l', col = 'orange') A set of three equations is expected, one for each variable: $$dx/dt = P_x(x,y,z)$$ $$dy/dt = P_y(x,y,z)$$ $$dz/dt = P_z(x,y,z)$$ To define such structure, the number nS of equation for each series must be defined as a vector such as: nS = c(1,1,1). (note that not to get information about the curent number of model under test, the optional parameter verbose can be set to verbose = false).  # generalized Polynomial modelling out2 <- outGP <- gPoMo(data, tin = tin, dMax = 2, nS = c(1,1,1), show = 0, IstepMin = 10, IstepMax = 3000, nPmin = 7, nPmax = 8) ## ### For Istep = 10 (max: 3000), models to test: 36 / 36 ## ### For Istep = 20 (max: 3000), models to test: 36 / 36. Runtime: ~ 0h 0min 0.66s ## ### For Istep = 40 (max: 3000), models to test: 36 / 36. Runtime: ~ 0h 0min 0.9s ## ### For Istep = 80 (max: 3000), models to test: 36 / 36. Runtime: ~ 0h 0min 1.8s ## ### For Istep = 160 (max: 3000), models to test: 36 / 36. Runtime: ~ 0h 0min 3s ## ### For Istep = 320 (max: 3000), models to test: 36 / 36. Runtime: ~ 0h 0min 6.4s ## ### For Istep = 640 (max: 3000), models to test: 36 / 36. Runtime: ~ 0h 0min 12.82s ## ### For Istep = 1280 (max: 3000), models to test: 15 / 36. Runtime: ~ 0h 0min 9.24s ## ### For Istep = 2560 (max: 3000), models to test: 15 / 36. Runtime: ~ 0h 0min 19s ## ### Number of unclassified models: 15 / 36 The simplest obtained model able to reproduce the observed dynamics is the model #5:  # obtained model #5 plot(out2$stockoutreg$model5[,1], out2$stockoutreg$model5[,2], type='l', col = 'red', xlab = 'y(t)', ylab = 'dy(t)/dt', main = 'Phase portrait') # original phase portrait lines(out2$filtdata[,1], out2$filtdata[,2]) The equations of the obtained model are given by:  visuEq(3, 2, outGP$models$model5, approx = 4) ## dX1/dt = -0.99924 X3 -0.9995 X2 ## ## dX2/dt = 0.51985 X2 + 0.99971 X1 ## ## dX3/dt = 1.99568 -3.99102 X3 + 0.99775 X1 X3 The original Rössler system is thus retrieved. A practical application of the global modelling technique under real world conditions was done for modelling the plague epidemic of Bombay by the end of 19th century and its coupling to the epizootics among black and brown rats3. ## Generalized global modelling from bivariate time series The GPoM package permits a generalized formulation of the global modelling technique that combines canonical and multivariate forms (see the two previous sections). The two variables $$x$$ and $$y$$ of the Rössler-1976 chaotic system are considered here to illustrate the purpose.  # time vector tin <- Ross76[,1] # multiple (two) time series data <- Ross76[,2:3] The two time series are as follows:  # obtained model #5 plot(tin, data[,1], ylim = c(-6.5,6.5), type='l', col = 'blue', xlab = 't', ylab = 'x(t)', main = 'Original time series') # original phase portrait lines(tin, data[,2], type = 'l', col = 'brown') Obviously, the correlation between the two time series is poor (close to 0.5 in magnitude):  # correlation between Rössler-x and Rössler-y cor(data[,1], data[,2]) ## [1] -0.502108 Though, their dynamics are linked in a causal way since dynamically coupled as defined (and obtained from) the following fully deterministic system: $$dx/dt = - y - z$$ $$dy/dt = x + ay$$ $$dz/dt = b + z(x - c)$$ Because $$x$$ and $$y$$ are coupled to the third variable $$z$$ (assumed not to be observed in the present example), and because of the presence of a nonlinear term ($$xz$$ in the last equation), the relation between $$x$$ and $$y$$ can only be poorly detected using a correlation technique. This difficulty directly results from the complex coevolution between these two variables as illustrated by the following scatter plot (x,y):  plot(data[,1], data[,2], type='p', cex = 0.01, col = 'blue', xlab = 'x', ylab = 'y', main = 'Scatter plot (x,y)') In its principle, the global modelling technique is well adapted to tackle the detection of nonlinear couplings. As the Rössler-1976 system is three-dimensional, a three-dimensional system can be expected, that is, such as sum(nS) = 3. Two formulations may thus be expected, either: $$dX_1/dt = X_2$$ $$dX_2/dt = P_X(X_1, X_2, Y_1)$$ $$dY_1/dt = P_Y(X_1, X_2, Y_1)$$, which general structure will be defined by nS = c(2,1) (two equations of canonical form will be generated from the observed variable $$x$$, a single one from $$y$$); or $$dX_1/dt = P_X(X_1, Y_1, Y_2)$$ $$dY_1/dt = Y_2$$ $$dY_2/dt = P_Y(X_1, Y_1, Y_2)$$ which general structure will be defined by nS = c(1,2) (one equation generated from $$x$$, two ones of canonical form from $$y$$). The former formulation is considered in the present illustration. In some cases, the model structure can be partly known. It is then possible to constrain the model formulation in more details by allowing some terms of the formulation and forbidding others. This structure can be defined as an input of the GPoMo function through the optional parameter EqS. In the present case, the following structure will be used as a template:  # model template: EqS <- matrix(1, ncol = 3, nrow = 10) EqS[,1] <- c(0,0,0,1,0,0,0,0,0,0) EqS[,2] <- c(1,1,0,1,0,1,1,1,1,1) EqS[,3] <- c(0,1,0,0,0,0,1,1,0,0) visuEq(3, 2, EqS, substit = c('X','Y','Z')) ## dX/dt = Y ## ## dY/dt = 1 + Z + Y + Y^2 + X + X Z + X Y + X^2 ## ## dZ/dt = Z + X + X Z Each term of this template may be used (or not) in the global model. Other terms will not. The search for a Generalized Global Model will be launched as follows:  # generalized Polynomial modelling out3 <- gPoMo(data, tin=tin, dMax = 2, nS=c(2,1), EqS = EqS, show = 0, verbose = FALSE, IstepMin = 10, IstepMax = 2000, nPmi = 9, nPmax = 11) The simplest obtained model able to reproduce the observed dynamics is found to be model #2:  # obtained model #2 plot(out3$stockoutreg$model2[,1], out3$stockoutreg$model2[,2], type='l', col = 'red', xlab = 'y(t)', ylab = 'dy(t)/dt', main = 'Phase portrait') # original phase portrait lines(out3$filtdata[,1], out3$filtdata[,2]) The equations of the obtained model are the following  visuEq(3, 2, out3$models$model2, approx = 4) ## dX1/dt = X2 ## ## dX2/dt = -1.99472 -4.51179 X3 -3.99202 X2 -0.99974 X1 + 0.99828 X1 X3 + 0.99783 X1 X2 ## ## dX3/dt = 0.51985 X3 + 0.99971 X1 Note that a practical application of the generalized global modelling technique could be done from observational data under real world conditions for the modelling of the West-Africa epidemic of Ebola Virus Desease4. ## Blind separation and modelling of two independant sets of equations Finally, since able in its principle to detect nonlinear couplings, the global modelling technique may also be used to detect separated sets of equations. Two separated systems are considered here: the Rössler-1976 system already presented upper and the Sprott-K system5.  # load data data(sprottK) data(rossler) # multiple (six) time series data <- cbind(rossler,sprottK)[1:400,] Six time series are thus considered in the present illustration:  tin = (0:(dim(data)[1] - 1)) * 1/20 # For the Rössler-1976 system # x(t) plot(tin, data[,1], ylim = c(-6.5,8.5), type='l', col = 'red', xlab = 't', ylab = 'x(t)', main = 'Original time series') # y(t) lines(tin, data[,2], type = 'l', col = 'brown') # z(t) lines(tin, data[,3], type = 'l', col = 'orange') # For the Sprott-K system # u(t) lines(tin, data[,4], type = 'l', col = 'green') # v(t) lines(tin, data[,5], type = 'l', col = 'darkgreen') # w(t) lines(tin, data[,6], type = 'l', col = 'lightgreen') Since one equation is expected from each variable, vector nS is defined as nS = c(1,1,1,1,1,1). To speed up the computation, Runge-Kutta numerical integration is chosen6.  # generalized Polynomial modelling out4 <- gPoMo(data, dtFixe = 1/20, dMax = 2, nS = c(1,1,1,1,1,1), show = 0, method = 'rk4', IstepMin = 2, IstepMax = 3, nPmin = 13, nPmax = 13) ## ### For Istep = 2 (max: 3), models to test: 792 / 792 ## ### Number of unclassified models: 792 / 792 Due to the huge number of models to be tested, numerical integration is sketched here on a quite short duration and the model search is focused on models of 13-parameter size. A larger range of model size and a longer numerical integration should be chosen to illustrate the ability of the technique to retrieve the original models. This can easily be done, except that the run time will be much longer. However, to distinguish between integrable and non integrable models (in a numerical sense), it is thus necessary to consider the numerical integration of long lengths. Considering longer integration durations (e.g. from IstepMin = 10 to IstepMax = 10000), many of the models will be rejected or lead to trivial solutions (fixed points, periodic cycles) In the present case, only one obtained set of equations corresponding to model #347 is found able to reproduce the phase portraits of the two original systems (Rössler-1976 and Sprott-K):  KL <- out4$models$model347 v0 <- as.numeric(head(data,1)) outNumi <- numicano(nVar = 6, dMax = 2, Istep=5000, onestep=1/20, KL=KL, v0=v0) # obtained model #347 plot(outNumi$reconstr[,2], outNumi$reconstr[,3], type='l', col = 'red', xlab = 'x(t)', ylab = 'y(t)', main = 'Phase portrait') # original phase portrait lines(out4$filtdata[,1], out4$filtdata[,2]) # original phase portrait plot(outNumi$reconstr[,5], outNumi$reconstr[,6], type='l', col = 'green', xlab = 'u(t)', ylab = 'v(t)', main = 'Phase portrait') # original phase portrait lines(out4$filtdata[,4], out4$filtdata[,5]) The estimated model reads  visuEq(6, 2, out4$models$model347, approx = 2, substit = 1) ## dx/dt = -0.983 z -0.99 y ## ## dy/dt = 0.517 y + 0.994 x ## ## dz/dt = 1.91 -3.802 z + 0.951 x z ## ## dw/dt = -0.43 u + 0.992 w v ## ## du/dt = 0.199 u + 0.997 w ## ## dv/dt = -0.996 v + 0.996 w This global modelling permits to distinguish two independant sets of equations: one for the Rössler-1976 system, the other one for the Sprott-K system. Moreover, the obtained equations are very close to the original equations from which time series were derived. This latter point is interesting since it shows that - in the present case at least - not only the couplings, but also each of the nonlinear terms of the equations can be retrieved. The approach may thus be used as an retro-modelling technique in order to interpret the the model terms. ## Time series with gaps and associate modelling When measurements are gathered under real environmental conditions, it can happend that the time series have gaps, or that temporary perturbations may have to be removed, or that equivalent records may be the result of the same dynamical behavior, but were performed at different locations (concurently or not). The GPoM can be used to model such types of data set7. An example of this type of situation is given in the present section. The Rössler system is used again in this example:  # load data data("Ross76") # time vector tin <- Ross76[1:500,1] # single time series series <- Ross76[1:500,3] # plot plot(tin, series, type = 'l', col = 'gray') To illustrate our purpose, some noise is added to the original time series and a weight function is defined to distinguish on what part of the signal the anaysis should focus. This weighing function is defined equal to 1 when free of noise, equal to 0 where noise-contaminated.  # some noise is added series[1:100] <- series[1:100] + 0.1 * runif(1:100, min = -1, max = 1) series[301:320] <- series[301:320] + 0.5 * runif(1:20, min = -1, max = 1) plot(tin, series, type = 'l', col = 'black') # # weighting function W <- tin * 0 + 1 W[1:100] <- 0 # the first fourty values will not be considered W[301:320] <- 0 # twenty other values will not be considered either lines(tin, W, type = 'l', col = 'brown') The global modeling is then applied:  reg <- gloMoId(series, dt=1/100, weight = W, nVar=3, dMax=2, show=1) The following equations are retrieved:  visuEq(3, 2, reg$K, approx = 4)
## dX1/dt = X2
##
## dX2/dt = X3
##
## dX3/dt = -1.99549101  -3.46624593 X3  + 0.00016261 X3^2  + 1.07233583 X2  + 0.99644881 X2 X3  -0.51742454 X2^2  -3.98532045 X1  -0.52077355 X1 X3  + 1.26695183 X1 X2  -0.52024707 X1^2

This set of equations can thus be integrated numerically. Integration will be started from the first state for which weight is equal to 1.

    # first weight which value not equal to zero:
i1 = which(reg$finalWeight == 1)[1] v0 <- reg$init[i1,1:3]
reconstr <- numicano(nVar=3, dMax=2, Istep=5000, onestep=1/250, PolyTerms=reg$K, v0=v0, method="rk4") The Rössler model is correctly retrieved:  plot(reconstr$reconstr[,2], reconstr$reconstr[,3], type='l', lwd = 3, main='phase portrait', xlab='time t', ylab = 'x(t)', col='orange') # original data: lines(reg$init[,1], reg\$init[,2], type='l',
main='phase portrait', xlab='x', ylab = 'dx/dt', col='black')
# initial condition
lines(v0[1], v0[2], type = 'p', col = 'red')

## Output visualization and global models validation

The outputs of the gPoMo function are gathered in one single list. The next vignette IV_VisualizeOutputs aims to show how results are organized in this list. Examples of model validation are then presented in the last vignette V_Predictability.

1. O. Rössler, An Equation for Continuous Chaos, Physics Letters, vol. 57A, no 5, 1976, p. 397-398.

2. G. Gouesbet & C. Letellier, 1994. Global vector-field reconstruction by using a multivariate polynomial L2 approximation on nets. Phys Rev E, 49(6), 4955-4972.

3. S. Mangiarotti, 2015. Low dimensional chaotic models for the plague epidemic in Bombay (1896–1911). Chaos, Solitons & Fractals, 81A, 184–196.

4. S. Mangiarotti, M. Peyre & M. Huc, 2016. A chaotic model for the epidemic of Ebola virus disease in West Africa (2013–2016). Chaos, 26, 113112.

5. J.C. Sprott, 1994. Some simple chaotic flows. Physical Review E, 50(2), 647-650

6. package deSolve is used for this purpose.

7. S. Mangiarotti, F. Le Jean, M. Huc, C. Letellier, 2016. Global modeling of aggregated and associated chaotic dynamics. Chaos, Solitons & Fractals 83, 82-96