This document was built in Markdown in R 4.1.0 and compiled on 14 July 2021. It covers package `lefko3`

version 3.5.3. Please note that this vignette was written with space considerations in mind. To reduce output size, we have prevented some statements from running if they produce long stretches of output. Examples include most `summary()`

calls. In these cases, we include hashtagged versions of these calls, and our text assumes that the user runs these statements without hashtags. We have also written this vignette using the default Ehrlén format as the output format for all historical MPMs. All examples work with deVries format as well.

In this vignette, we will use the `lathyrus`

dataset to illustrate the estimation of **integral projection models**. Please see the other vignettes included in package `lefko3`

, as well as further vignettes posted online on the projects page of the Shefferson lab website. Other vignettes include demonstrations of raw and function-based MPMs, as well as age-by-stage MPMs.

*Lathyrus vernus* (family Fabaceae) is a long-lived forest herb, native to Europe and large parts of northern Asia. Individuals grow larger slowly and usually flower only after 10-15 years of vegetative growth. Flowering individuals have an average conditional lifespan of 44.3 years (Ehrlen and Lehtila 2002). *L. vernus* lacks organs for vegetative spread and individuals are well delimited (Ehrlen 2002). One or several erect shoots of up to 40 cm height emerge from a subterranean rhizome in March and April. Flowering occurs about four weeks after shoot emergence. Shoot growth is determinate, and the number of flowers is determined in the previous year (Ehrlen and Van Groenendael 2001). Individuals sometimes do not produce aboveground structures every year, instead remaining dormant in one or more seasons. *L. vernus* is self-compatible but requires visits from bumble-bees to produce seeds. Individuals produce few, large seeds and establishment from seeds is relatively frequent (Ehrlen and Eriksson 1996). The pre-dispersal seed predator *Bruchus atomarius* often consumes a large fraction of developing seeds, and roe deer (*Capreolus capreolus*) sometimes consume the shoots (Ehrlen and Munzbergova 2009).

Data for this study were collected from six permanent plots in a population of *L. vernus* located in a deciduous forest in the Tullgarn area, SE Sweden (58.9496 N, 17.6097 E), during 1988–1991 (Ehrlen 1995). The six plots were similar with regard to soil type, elevation, slope, and canopy cover. Within each plot, all individuals were marked with numbered tags that remained over the study period, and their locations were mapped. New individuals were included in the study in each year. Individuals were recorded at least three times every growing season. At the time of shoot emergence, we recorded whether individuals were alive and produced above-ground shoots, and if shoots had been grazed. During flowering, we recorded flower number and the height and diameter of all shoots. At fruit maturation, we counted the number of intact and damaged seeds. To derive a measure of above-ground size for each individual, we calculated the volume of each shoot as \(\pi × (\frac{1}{2} diameter)^2 × height\), and summed the volumes of all shoots. This measure is strongly correlated with the dry mass of aboveground tissues (\(R^2 = 0.924\), \(P < 0.001\), \(n = 50\), log-transformed values; Ehrlén 1995). Size of individuals that had been grazed was estimated based on measures of shoot diameter in grazed shoots, and the relationship between shoot diameter and shoot height in non-grazed individuals. Only individuals with an aboveground volume of more than 230 mm^{3} flowered and produced fruits during this study. Individuals that lacked aboveground structures in one season but reappeared in the following year were considered dormant. Individuals that lacked aboveground structures in two subsequent seasons were considered dead from the year in which they first lacked aboveground structures. Probabilities of seeds surviving to the next year, and of being present as seedlings or seeds in the soil seed bank, were derived from separate yearly sowing experiments in separate plots adjacent to each subplot (Ehrlen and Eriksson 1996).

Here we will build historical and ahistorical integral projection models (IPMs). An IPM is a kind of function-based MPM in which transitions are modeled on a continuous state variable rather than discrete stages (Ellner and Rees 2006). A size metric is often used as this continuous state variable. In practice, all size-classified matrix projection models including IPMs require discrete size classes. So, although size is modeled on the Gaussian distribution, the actual stages developed break size up into many fine-scale, equally sized classes or bins. Although the number of these bins varies from analysis to analysis, package `lefko3`

uses a default of 100 (this can be changed as an option). Because vital rates are modeled rather than directly calculated from the data, the number of individuals moving through any particular size class at any particular time does not need to be considered in determining stage boundaries (as they would be in raw MPM estimation). IPMs are often **complex**, meaning that they include some life history stages that fall outside of the size classifications developed for the matrices, such as dormant seeds or juveniles.

The dataset that we have provided is organized in horizontal format, meaning that each row holds all of the data for a single, unique individual, and columns correspond to individual condition in particular observation times (which we refer to as *years* here, since there was one main census in each year). The original spreadsheet file used to keep the dataset has a repeating pattern to these columns, with each year having a similarly arranged group of variables. Package `lefko3`

includes functions to handle data in horizontal format, as well as vertically formatted data (i.e. data for individuals is broken up across rows, where each row is a unique combination of individual and year in time *t*).

First, we will clear memory and take a look at the dataset.

```
rm(list=ls(all=TRUE))
library(lefko3)
data(lathyrus)
#summary(lathyrus)
```

This dataset includes information on 1,119 individuals, so there are 1,119 rows with data. There are 38 columns. The first two columns give identifying information about each individual (`SUBPLOT`

refers to the patch, and `GENET`

refers to individual identity), with each individual’s data entirely restricted to one row. This is followed by four sets of nine columns, each named `VolumeXX`

, `lnVolXX`

, `FCODEXX`

, `FlowXX`

, `IntactseedXX`

, `Dead19XX`

, `DormantXX`

, `Missing19XX`

, and `SeedlingXX`

, where `XX`

corresponds to the year of observation and with years organized consecutively. Thus, columns 3-11 refer to year 1988, columns 12-20 refer to year 1989, etc. This strictly repeated pattern allows us to manipulate the original dataset quickly and efficiently via `lefko3`

. We should know the number of years used, which is 4 years here (includes all years from and including 1988 to 1991). Ideally, we should also have arranged the columns in the same order for each year, with years in consecutive order with no extra columns between them. Note that this order is not required, provided that we are willing to input all variable names in the correct order when transforming the dataset later.

To begin, we need to create a **stageframe** for this dataset. A stageframe is a data frame that describes all stages in the life history of the organism, in a way usable by the functions in this package and using stage names and classifications that completely match those used in the dataset. It needs to include complete descriptions of all stages that occur in the dataset, with each stage defined uniquely. Since this object can be used for automated classification of individuals, all sizes, reproductive states, and other characteristics defining each stage in the dataset need to be accounted for explicitly. This can be difficult if a few data points exist outside the range of sizes specified in the stageframe, so great care must be taken to include all size values and values of other descriptor variables occurring within the dataset. The final description of each stage occurring in the dataset must not completely overlap with any other stage also found in the dataset, although partial overlap is allowed and expected. We will base our stageframe on the life history model provided in Ehrlén (2000), but use a different size classification to allow IPM construction and make all mature stages other than vegetative dormancy reproductive.

In the stageframe code below, we show that we want an IPM by choosing two stages that serve as the size limits for IPM size classification. These two size classes should have exactly the same characteristics in the stageframe *other than size*. By choosing these two size limits, we can skip adding and describing the many size classes that will fall between these limits - function `sf_create()`

will create all of these for us. We mark these limits in the vector that we load into the `stagenames`

option using the string `ipm`

. Package `lefko3`

will then create and name all IPM size classes according to its own conventions. The default number of size classes is 100 bins. We can alter this if we wish using the `ipmbins`

option.

```
<- c(0, 100, 0, 1, 7100)
sizevector <- c("Sd", "Sdl", "Dorm", "ipm", "ipm")
stagevector <- c(0, 0, 0, 1, 1)
repvector <- c(0, 1, 0, 1, 1)
obsvector <- c(0, 0, 1, 1, 1)
matvector <- c(1, 1, 0, 0, 0)
immvector <- c(1, 0, 0, 0, 0)
propvector <- c(0, 1, 1, 1, 1)
indataset <- c(0, 100, 0.5, 1, 1)
binvec <- c("Dormant seed", "Seedling", "Dormant", "ipm adult stage", "ipm adult stage")
comments <- sf_create(sizes = sizevector, stagenames = stagevector,
lathframeipm repstatus = repvector, obsstatus = obsvector, propstatus = propvector,
immstatus = immvector, matstatus = matvector, comments = comments,
indataset = indataset, binhalfwidth = binvec, ipmbins = 100, roundsize = 3)
#lathframeipm
```

This stageframe has 103 stages (we can see this by typing `dim(lathframeipm)`

, which will show the numbers of rows and columns, respectively). The IPM portion technically starts with the fourth stage and keeps going through the 103rd stage. Stage names within this range are concatenations of the size centroid (designated with `sz`

), reproductive status (designated with `rp`

), maturity status (designated with `mt`

), and observation status (designated with `ob`

). The first three stages, which fall outside of the IPM classification, are left unaltered.

To work with this dataset, we first need to format the data into *vertical format*, in which each row corresponds to the state of a single individual in two (if ahistorical) or three (if historical) consecutive time intervals. Because this is an IPM, we will need to estimate linear models of vital rates. This will require us to avoid NAs in size and fecundity (fortunately, in this dataset, NA is equivalent to 0), so we will set `NAas0 = TRUE`

. We will also set `NRasRep = TRUE`

because we will assume that all adult stages other than dormancy are reproductive, and there are mature individuals in the dataset that do not reproduce but need to be included in reproductive stages (setting this option to TRUE makes sure that the reproductive status of non-reproductive individuals in potentially reproductive stages is set to 1, although the actual fecundity is not altered). Finally, we will ignore patches marked in the dataset and estimate matrices only for the full population, in order to preserve statistical power for vital rate modeling in historical IPM analysis.

In the input to `verticalize3()`

below, we utilize a repeating pattern of variable names arranged in the same order for each observation time. This arrangement allows us to enter only the first variable in each set, as long as `noyears`

and `blocksize`

are set properly and no gaps or shuffles appear in the dataset. The data management functions that we have created for `lefko3`

do not require such repeating patterns, but they do make the required input in the function much shorter and more succinct.

```
<- verticalize3(lathyrus, noyears = 4, firstyear = 1988,
lathvertipm individcol = "GENET", blocksize = 9, juvcol = "Seedling1988",
sizeacol = "Volume88", repstracol = "FCODE88", fecacol = "Intactseed88",
deadacol = "Dead1988", nonobsacol = "Dormant1988", stageassign = lathframeipm,
stagesize = "sizea", censorcol = "Missing1988",censorkeep = NA, censor = TRUE,
NAas0 = TRUE, NRasRep = TRUE)
#summary(lathvertipm)
```

Before we move on to the next steps in analysis, let’s take a closer look at fecundity. In this dataset, fecundity is mostly a count of intact seeds, and only differs in six cases where the seed output was estimated based on other models. To see this, try the following code.

```
writeLines(paste0("Total length of variable corresponding to fecundity in time t+1: ", length(lathvertipm$feca3)))
#> Total length of variable corresponding to fecundity in time t+1: 2527
writeLines(paste0("Total non-integer entries in fecundity in time t+1: ", length(which(lathvertipm$feca3 != round(lathvertipm$feca3)))))
#> Total non-integer entries in fecundity in time t+1: 0
writeLines(paste0("\nTotal length of variable corresponding to fecundity in time t: ", length(lathvertipm$feca2)))
#>
#> Total length of variable corresponding to fecundity in time t: 2527
writeLines(paste0("Total non-integer entries in fecundity in time t: ", length(which(lathvertipm$feca2 != round(lathvertipm$feca2)))))
#> Total non-integer entries in fecundity in time t: 6
writeLines(paste0("\nTotal length of variable corresponding to fecundity in time t-1: ", length(lathvertipm$feca1)))
#>
#> Total length of variable corresponding to fecundity in time t-1: 2527
writeLines(paste0("Total non-integer entries in fecundity in time t-1: ", length(which(lathvertipm$feca1 != round(lathvertipm$feca1)))))
#> Total non-integer entries in fecundity in time t-1: 6
```

We see that we have quite a bit of fecundity data, and that it is overwhelmingly but not exclusively integer. So, we can either treat fecundity as a continuous variable, or round the values and treat it as a count variable. We will round fecundity so that we can treat fecundity as a count variable in the analysis.

```
$feca3 <- round(lathvertipm$feca3)
lathvertipm$feca2 <- round(lathvertipm$feca2)
lathvertipm$feca1 <- round(lathvertipm$feca1) lathvertipm
```

Although we wish to treat fecundity as a count, it is still not clear what underlying distribution we should use. This package currently allows 7 choices: Gaussian, Poisson, negative binomial, zero-inflated Poisson, zero-inflated negative binomial, zero-truncated Poisson, and zero-truncated negative binomial. To assess which to use, we should first assess whether the mean and variance of the count are equal using a dispersion test. The Poisson distribution assumes that the mean and variance are equal, and so we can test this assumption using a chi-squared test. If it is not significantly different, then we may use some variant of the Poisson distribution. If the data are significantly over- or under-dispersed, then we should use the negative binomial distribution. If fecundity of 0 is possible in reproductive stages, as in cases where reproductive status is defined by flowering rather than by offspring production, then we should also test whether the number of zeroes is significantly greater than expected under these distributions, and use a zero-inflated distribution if so (if fecundity does not equal 0 in any reproductive individuals at all, then we should use a zero-truncated distribution).

Let’s start off by looking at a plot of the distribution of fecundity.

```
hist(subset(lathvertipm, repstatus2 == 1)$feca2, main = "Fecundity",
xlab = "Intact seeds produced in time t")
```

We see that the distribution seems to conform to a classic count variable with a very low mean value. The first bar suggests that there may be too many zeroes. But to make that decision, let’s formally test the two assumptions of \(mean = variance\) and no excess 0s. Both tests use chi-squared distribution-based approaches, with the zero-inflation test based on van der Broek (1995). The following line will assess all of this in fecundity, but can also assess size if necessary (we will treat size as conforming to a Gaussian distribution, so will not test these assumptions with it).

```
sf_distrib(lathvertipm, size2 = "sizea2", fec = "feca2", repst = "repstatus2")
#>
#> Mean fecundity is 4.791
#> The variance in fecundity is 70.14
#> The probability of this dispersion level by chance assuming the true mean fecundity = variance in fecundity, and an alternative hypothesis of overdispersion, is 0
#>
#> Fecundity is significantly overdispersed.
#>
#>
#> Mean lambda is 0.008302
#> The actual number of 0s in fecundity is 334
#> The expected number of 0s in fecundity under the null hypothesis is 4.973
#> The probability of this deviation in 0s is 0
#>
#> Fecundity is significantly zero-inflated.
#> NULL
```

Such significant results for both tests show us that we really need to use a zero-inflated negative binomial distribution here.

Now we will create **supplemental tables**, which provide extra data for matrix estimation that is not included in the main demographic dataset. Specifically, we will provide the seed dormancy probability and germination rate, which are given as transitions from the dormant seed stage to another year of seed dormancy or to the germinated seedling stage, respectively. We assume that the germination rate is the same regardless of whether seed was produced in the previous year or has been in the seedbank for longer. We will incorporate these terms both as fixed constants for specific transitions within the resulting matrices, and as multipliers for fecundity, since ultimately fecundity will be estimated as the production of seed multiplied by the seed germination rate or the seed dormancy rate. Because some individuals stay in the seedling stage for only 1 year, and the seed stage itself cannot be observed and so does not exist in the dataset, we will also set a proxy set of transitions so that R assumes that the transitions from seed in time *t*-1 to seedling in time *t* to all mature stages in time *t*+1 are equal to the equivalent transitions from seedling in both times *t*-1 and *t*. We will start with the ahistorical case, and then move on to the historical case, where we also need to input the corresponding stages in time *t*-1 and transition types from time *t*-1 to *t* for each transition. Note the use of the `"rep"`

and `"npr"`

designations in `Stage1`

- these are abbreviations telling R to use all reproductive stages, or all non-propagule stages (mature stages plus the seedling stage) in general, respectively.

```
<- supplemental(stage3 = c("Sd", "Sdl", "Sd", "Sdl"),
lathsupp2 stage2 = c("Sd", "Sd", "rep", "rep"),
givenrate = c(0.345, 0.054, NA, NA),
multiplier = c(NA, NA, 0.345, 0.054),
type = c(1, 1, 3, 3), stageframe = lathframeipm, historical = FALSE)
<- supplemental(stage3 = c("Sd", "Sd", "Sdl", "Sdl", "npr", "Sd", "Sdl"),
lathsupp3 stage2 = c("Sd", "Sd", "Sd", "Sd", "Sdl", "rep", "rep"),
stage1 = c("Sd", "rep", "Sd", "rep", "Sd", "mat", "mat"),
eststage3 = c(NA, NA, NA, NA, "npr", NA, NA),
eststage2 = c(NA, NA, NA, NA, "Sdl", NA, NA),
eststage1 = c(NA, NA, NA, NA, "Sdl", NA, NA),
givenrate = c(0.345, 0.345, 0.054, 0.054, NA, NA, NA),
multiplier = c(NA, NA, NA, NA, NA, 0.345, 0.054),
type = c(1, 1, 1, 1, 1, 3, 3), type_t12 = c(1, 2, 1, 2, 1, 1, 1),
stageframe = lathframeipm, historical = TRUE)
#lathsupp2
#lathsupp3
```

Integral projection models (IPMs) require functions of vital rates to populate them. Here, we will develop these functions as linear models using `modelsearch()`

. This function automates several crucial and complex tasks in MPM analysis. Specifically, it automates 1) the building of global models for each vital rate requested, 2) the exhaustive construction of all reduced models, and 3) the selection of the best-fit models. Let’s look at some of the options that we will utilize for this purpose (please note that this list includes only some of the options actually offered by the function - further options are shown in the documentation for `modelsearch()`

, and further theoretical details are shown in the `Basic theory and concepts`

vignette):

**historical**: Setting this to TRUE or FALSE indicates to include state in time *t*-1 in the global models.

**approach**: Designates whether to use generalized linear models (`glm`

), in which all factors are treated as fixed, or mixed effects models (`mixed`

), in which factors are treated as either fixed or random (most notably, time, patch, and individual identity can be treated as random).

**suite**: Designates which factors to include in the global model. Possible values include `size`

, which includes size only; `rep`

, which includes reproductive status only; `main`

, which includes both size and reproductive status as main effects only; `full`

, which includes both size and reproductive status and all two-way interactions; and `const`

, which includes only an intercept. These factors are in addition to individual identity, patch, and time.

**vitalrates**: Designates which vital rates to model. The default is to model survival, size, and fecundity, but users can also model observation status and reproduction status.

**juvestimate**: Designates the name of the juvenile stage transitioning to maturity, in cases where the dataset includes data on juveniles.

**juvsize**: Denotes whether size should be used as an independent term in models involving transition from the juvenile stage.

**bestfit**: Denotes whether the best-fit model should be chosen as the model with the lowest AICc (`AICc`

) or as the most parsimonious model (`AICc&k`

), where the latter is the model that has the fewest estimated parameters and is within 2 AICc units of the model with the lowest AICc.

**sizedist**: Designates the distribution used for size. The options include `gaussian`

, `poisson`

, and `negbin`

, the last of which refers to the negative binomial distribution.

**fecdist**: Designates the distribution used for fecundity. The options include `gaussian`

, `poisson`

, and `negbin`

, the last of which refers to the negative binomial distribution.

**fectime**: Designates whether fecundity is estimated within time *t* (the default) or time *t*+1. Plant ecologists are likely to choose the former, since fecundity is typically estimated via a proxy such as flowers, fruit, or seed produced. Wildlife ecologists might choose the latter, since fecundity may be best estimated as a count of actual juveniles within nests, burrows, or other family structures.

**size.zero**: Designates whether the size distribution should be zero-inflated. Only applies to the Poisson and negative binomial distributions.

**size.trunc**: Designates whether to use a zero-truncated distribution for size. Only applies to the Poisson and negative binomial distributions, and cannot be applied if `size.zero = TRUE`

.

**fec.zero**: Designates whether the fecundity distribution should be zero-inflated. Only applies to the Poisson and negative binomial distributions.

**fec.trunc**: Designates whether to use a zero-truncated distribution for fecundity. Only applies to the Poisson and negative binomial distributions, and cannot be applied if `fec.zero = TRUE`

.

**jsize.zero**: Designates whether the juvenile size distribution should be zero-inflated. Only applies to the Poisson and negative binomial distributions.

**jsize.trunc**: Designates whether to use a zero-truncated distribution for juvenile size. Only applies to the Poisson and negative binomial distributions, and cannot be applied if `jsize.zero = TRUE`

.

**indiv**: Designates the variable corresponding to individual identity in the vertical dataset.

**patch**: Designates the variable corresponding to patch identity in the vertical dataset.

**year**: Designates the variable corresponding to time *t* in the vertical dataset.

**age**: Designates the name of the variable corresponding to age in time *t*. Should be set **only if an age-by-stage model is desired**.

**year.as.random**: Designates whether to treat year as a random factor, and is only used when `approach = "mixed"`

.

**patch.as.random**: Designates whether to treat patch identity as a random factor, and is only used when `approach = "mixed"`

.

**show.model.tables**: Designates whether to include the full dredge model tables showing all models developed and their associated AICc values.

**quiet**: Denotes whether to silence guidepost statements and most warning messages. Only warnings incurred in model testing will be visible, including warnings from the testing of global and reduced models produced during model dredging.

In addition, there are other options that provide flexibility in handling datasets with different designations for key variables, and allowing manual designation of stages.

This function is useful not just because it develops the vital rate models necessary for function-based MPM estimation, but because it provides a test of whether individual history affects demography. Setting `historical = TRUE`

fits size and/or reproductive status in time *t*-1 into global models, while setting `historical = FALSE`

prevents testing of the impacts of state in time *t*-1. The model building and selection protocols can then be used to see if history has a significant impact on a vital rate, by assessing whether a historical term is retained in the best-fit model.

First, we will create the historical models to assess whether history is a significant influence on vital rates. Please remove the hashtag from the `summary`

call to see details of the best-fit models.

```
<- modelsearch(lathvertipm, historical = TRUE, approach = "mixed",
lathmodels3ipm suite = "size", vitalrates = c("surv", "obs", "size", "fec"),
juvestimate = "Sdl", bestfit = "AICc&k", sizedist = "gaussian",
fecdist = "negbin", fec.zero = TRUE, indiv = "individ", year = "year2",
year.as.random = TRUE, juvsize = TRUE, show.model.tables = TRUE, quiet = TRUE)
#> Warning: Some predictor variables are on very different scales: consider
#> rescaling
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
#> Model failed to converge with max|grad| = 0.0158034 (tol = 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#> - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#> - Rescale variables?
#> Warning: Some predictor variables are on very different scales: consider
#> rescaling
#> boundary (singular) fit: see ?isSingular
#> Warning: Some predictor variables are on very different scales: consider
#> rescaling
#> boundary (singular) fit: see ?isSingular
#> Warning in fitTMB(TMBStruc): Model convergence problem; non-positive-definite
#> Hessian matrix. See vignette('troubleshooting')
#> boundary (singular) fit: see ?isSingular
#> boundary (singular) fit: see ?isSingular
#summary(lathmodels3ipm)
```

We see here, as before, that size in time *t*-1 exerts an influence on some vital rates, including survival to time *t*+1 and size in time *t*+1. So, the historical IPM is the correct choice here. However, we will also create an ahistorical IPM for comparison. For that purpose, we will create the ahistorical linear model set.

```
<- modelsearch(lathvertipm, historical = FALSE,
lathmodels2ipm approach = "mixed", suite = "size",
vitalrates = c("surv", "obs", "size", "fec"), juvestimate = "Sdl",
bestfit = "AICc&k", sizedist = "gaussian", fecdist = "negbin",
fec.zero = TRUE, indiv = "individ", year = "year2", year.as.random = TRUE,
juvsize = TRUE, show.model.tables = TRUE, quiet = TRUE)
#> boundary (singular) fit: see ?isSingular
#> boundary (singular) fit: see ?isSingular
#> boundary (singular) fit: see ?isSingular
#> boundary (singular) fit: see ?isSingular
#> boundary (singular) fit: see ?isSingular
#summary(lathmodels2ipm)
```

We will now create the historical suite of matrices covering the years of study. These matrices will be extremely large - large enough that some computers might have difficulty with them. If you encounter an error message telling you that you have run out of memory, then please try this on a more powerful computer :) .

```
<- flefko3(stageframe = lathframeipm, modelsuite = lathmodels3ipm,
lathmat3ipm supplement = lathsupp3, data = lathvertipm, year.as.random = FALSE,
patch.as.random = FALSE, reduce = FALSE)
summary(lathmat3ipm)
#>
#> This historical lefkoMat object contains 3 matrices.
#>
#> Each matrix is a square matrix with 10609 rows and columns, and a total of 112550881 elements.
#> A total of 3122136 survival transitions were estimated, with 1040712 per matrix.
#> A total of 60600 fecundity transitions were estimated, with 20200 per matrix.
#>
#> Vital rate modeling quality control:
#>
#> Survival estimated with 257 individuals and 2246 individual transitions.
#> Observation estimated with 254 individuals and 2121 individual transitions.
#> Size estimated with 254 individuals and 1916 individual transitions.
#> Reproduction probability not estimated.
#> Fecundity estimated with 257 individuals and 2246 individual transitions.
#> Juvenile survival estimated with 187 individuals and 281 individual transitions.
#> Juvenile observation estimated with 154 individuals and 210 individual transitions.
#> Juvenile size estimated with 144 individuals and 193 individual transitions.
#> Juvenile reproduction probability not estimated.
#>
#> Survival probability sum check (each matrix represented by column in order):
#> [,1] [,2] [,3]
#> Min. 0.000 0.000 0.000
#> 1st Qu. 0.996 0.983 0.988
#> Median 1.000 0.999 1.000
#> Mean 0.969 0.952 0.960
#> 3rd Qu. 1.000 1.000 1.000
#> Max. 1.000 1.000 1.000
#> NULL
```

These are giant matrices. With 10,609 rows and columns, there are a total of 112,550,881 elements per matrix. But they are also amazingly sparse - with 1,060,912 elements estimated, only 0.9% of elements per matrix are non-zero. The survival probability sums all look good, so we appear to have no problems with overly large given and proxy survival transitions provided through our supplemental tables. Let’s now build the ahistorical IPMs.

```
<- flefko2(stageframe = lathframeipm, modelsuite = lathmodels2ipm,
lathmat2ipm supplement = lathsupp2, data = lathvertipm, year.as.random = FALSE,
patch.as.random = FALSE, reduce = FALSE)
summary(lathmat2ipm)
#>
#> This ahistorical lefkoMat object contains 3 matrices.
#>
#> Each matrix is a square matrix with 103 rows and columns, and a total of 10609 elements.
#> A total of 30621 survival transitions were estimated, with 10207 per matrix.
#> A total of 600 fecundity transitions were estimated, with 200 per matrix.
#>
#> Vital rate modeling quality control:
#>
#> Survival estimated with 257 individuals and 2246 individual transitions.
#> Observation estimated with 254 individuals and 2121 individual transitions.
#> Size estimated with 254 individuals and 1916 individual transitions.
#> Reproduction probability not estimated.
#> Fecundity estimated with 257 individuals and 2246 individual transitions.
#> Juvenile survival estimated with 187 individuals and 281 individual transitions.
#> Juvenile observation estimated with 154 individuals and 210 individual transitions.
#> Juvenile size estimated with 144 individuals and 193 individual transitions.
#> Juvenile reproduction probability not estimated.
#>
#> Survival probability sum check (each matrix represented by column in order):
#> [,1] [,2] [,3]
#> Min. 4.87e-05 1.35e-05 0.000059
#> 1st Qu. 9.84e-01 9.61e-01 0.985057
#> Median 9.99e-01 9.99e-01 0.998781
#> Mean 9.54e-01 9.24e-01 0.956967
#> 3rd Qu. 1.00e+00 1.00e+00 0.999866
#> Max. 1.00e+00 1.00e+00 0.999985
#> NULL
```

The ahistorical IPMs are certainly smaller than the historical IPMs, but are nonetheless quiet large. Although huge, these matrices are not sparse - 10,407 elements out of 10,609 per matrix are estimated (98.1%). Fortunately, we can assume that neither IPM is overparameterized, since ultimately the elements in an IPM and any other function-based matrix reflect the statistical power of the underlying vital rate models. In `lefko3`

, the vital rate models used are the best-fit models from exhaustive model selection, and so are already the most parsimonious from within the suite of tested models.

Let’s do a further comparison - let’s view a matrix plot of each kind of MPM, ahistorical and then historical. First, the ahistorical plot.

`image3(lathmat2ipm, used = 1)`

`#> NULL`

Now the historical plot.

`image3(lathmat3ipm, used = 1)`

`#> NULL`

The plots above show that the ahistorical matrix is large but dense, mostly full of non-zero entries (the colored elements correspond to non-zero elements). In contrast, the historical matrix is huge and sparse, mostly full of zeroes with a general pattern to the distribution of non-zero elements.

Now let’s estimate the mean IPM matrices. We will estimate one mean matrix each, because we did not separate patches in the data reorganization and vital rate modeling. As a check, let’s also take a look at the column sums of the grand mean survival-transition matrix from each case.

```
<- lmean(lathmat2ipm)
lath2ipmmean <- lmean(lathmat3ipm)
lath3ipmmean
writeLines("\nAhistorical matrix stage survival distribution: ")
#>
#> Ahistorical matrix stage survival distribution:
summary(colSums(lath2ipmmean$U[[1]]))
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000404 0.9765161 0.9987748 0.9451579 0.9998662 0.9999849
writeLines("\nHistorical matrix stage-pair survival distribution: ")
#>
#> Historical matrix stage-pair survival distribution:
summary(colSums(lath3ipmmean$U[[1]]))
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.0000 0.9887 0.9994 0.9604 1.0000 1.0000
```

All looks fine! Let’s also take a look at a portion of one of the conditional historical matrices, particularly the matrix conditional on vegetative dormancy in time *t*-1. This matrix can be compared to the ahistorical mean to assess the impacts of history on the matrix elements themselves. Please remember to remove the hashtag.

```
<- cond_hmpm(lath3ipmmean)
l3mcond #l3mcond$Acond[[1]]$Dorm
```

Now let’s estimate and plot the deterministic population growth rate, \(\lambda\), for each year.

```
<- lambda3(lathmat2ipm)
ipm2lambda <- lambda3(lathmat3ipm)
ipm3lambda
plot(lambda ~ year2, data = ipm2lambda, xlab = "Year", ylab = "Lambda",
ylim = c(0.65, 1.00), type = "l", lwd = 2, bty = "n")
lines(lambda ~ year2, data = ipm3lambda, lwd = 2, lty = 2, col = "red")
legend("bottomleft", c("ahistorical", "historical"), lty = c(1, 2),
col = c("black", "red"), lwd = 2, bty = "n")
```

Ahistorical estimates of \(\lambda\) are lower than historical estimates, and the historical \(\lambda\) values are more in line with estimates from the other *Lathyrus* vignettes.

Let’s now look at \(\lambda\) for the mean matrices and compare with the stochastic growth rate, \(a = \text{log} \lambda _{S}\). We will set the number of simulations low in the historical case in order to keep the amount of memory used and computation time low, because the size of the historical matrices will use up plenty of both (the default is 10,000). We will also set the seed for the random number generator to make our results reproducible.

```
lambda3(lath2ipmmean)
#> pop patch lambda
#> 1 1 1 0.815444
lambda3(lath3ipmmean)
#> pop patch lambda
#> 1 1 1 0.8719855
set.seed(42)
slambda3(lathmat2ipm)
#> pop patch a var sd se
#> 1 1 1 -0.2112375 0.01050791 0.1025081 0.001025081
slambda3(lathmat3ipm, times = 1000)
#> pop patch a var sd se
#> 1 1 1 -0.1501268 0.007987215 0.08937122 0.002826166
```

The historical growth rate is larger than the ahistorical in both deterministic and stochastic analyses, although all four numbers suggest a declining population. Now let’s compare the stable stage distribution from both the ahistorical and historical mean MPMs.

```
<- stablestage3(lath2ipmmean)
ipm2ss <- stablestage3(lath3ipmmean)
ipm3ss <- stablestage3(lathmat2ipm, stochastic = TRUE, seed = 42)
ipm2ss_s <- stablestage3(lathmat3ipm, stochastic = TRUE, times = 1000, seed = 42)
ipm3ss_s
<- cbind.data.frame(ipm2ss$ss_prop, ipm3ss$ahist$ss_prop,
ss_put_together $ss_prop, ipm3ss_s$ahist$ss_prop)
ipm2ss_snames(ss_put_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(ss_put_together) <- ipm2ss$stage_id
<- par("lty")
lty.o par(lty = 0)
barplot(t(ss_put_together), beside=T, ylab = "Proportion", xlab = "Stage",
col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")
```

`par(lty = lty.o)`

Both ahistorical and historical approaches show the stable stage distribution dominated by the first stage, which is the dormant seed stage. However, the deterministic historic analysis suggests roughly equal weighting to the seedling stage, and increases the importance of small adult stages. Stochastic analysis shows similar patterns. The importance of the seed bank to the population is quite clear in this analysis!

Next, we will estimate the reproductive values associated with the element-wise mean matrices.

```
<- repvalue3(lath2ipmmean)
ipm2rv <- repvalue3(lath3ipmmean)
ipm3rv <- repvalue3(lathmat2ipm, stochastic = TRUE, seed = 42)
ipm2rv_s <- repvalue3(lathmat3ipm, stochastic = TRUE, times = 1000, seed = 42)
ipm3rv_s
<- cbind.data.frame((ipm2rv$rep_value / max(ipm2rv$rep_value)),
rv_put_together $ahist$rep_value / max(ipm3rv$ahist$rep_value)),
(ipm3rv$rep_value / max(ipm2rv_s$rep_value)),
(ipm2rv_s$ahist$rep_value / max(ipm3rv_s$ahist$rep_value)))
(ipm3rv_snames(rv_put_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(rv_put_together) <- ipm2rv$stage_id
<- par("lty")
lty.o par(lty = 0)
barplot(t(rv_put_together), beside=T, ylab = "Relative rep value",
xlab = "Stage", col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topleft", c("det ahist", "det hist", "sto ahist", "sto hist"),
col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")
```

`par(lty = lty.o)`

A quick scan through these values shows that the highest reproductive values in the deterministic ahistorical analysis are for the largest adults. However, historically-corrected deterministic reproductive values drop off once adults get somewhat large, with the largest value associated with plants with a leaf volume of 5076 (the maximum is >7000). Stochastic analysis shows a general build-up of reproductive value with increasing size, similarly to the deterministic ahistorical projection. However, the ahistorical stochastic analyses match the deterministic analyses fairly strongly, while the historical stochastic analyses show a larger hump of reproductive value associated with medium sizes and a slower build-up to the largest sizes. So, individual history and temporal environmental stochasticity both have strong impacts here.

Let’s now try a sensitivity analysis. We will conduct both deterministic and stochastic analyses of ahistorical and historical MPMs. Because of the size of the hMPMs, we will limit run time of the stochastic sensitivity analysis to 200 time steps (this may take several hours even with reduced time steps). We will also limit our analysis to biologically possible transitions by ignoring the sensitivities of zero elements.

```
<- sensitivity3(lath2ipmmean)
lath2ipmsens <- sensitivity3(lath3ipmmean)
lath3ipmsens set.seed(42)
<- sensitivity3(lathmat2ipm, stochastic = TRUE)
lath2ipmsens_s <- sensitivity3(lathmat3ipm, stochastic = TRUE, steps = 200)
lath3ipmsens_s
writeLines("\nThe highest ahistorical deterministic sensitivity is associated with element: ")
#>
#> The highest ahistorical deterministic sensitivity is associated with element:
which(lath2ipmsens$ah_sensmats[[1]] == max(lath2ipmsens$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> [1] 309
writeLines("\nThe highest historically-corrected deterministic sensitivity is associated with element: ")
#>
#> The highest historically-corrected deterministic sensitivity is associated with element:
which(lath3ipmsens$ah_sensmats[[1]] == max(lath3ipmsens$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> [1] 283
writeLines("\nThe highest ahistorical stochastic sensitivity is associated with element: ")
#>
#> The highest ahistorical stochastic sensitivity is associated with element:
which(lath2ipmsens_s$ah_sensmats[[1]] == max(lath2ipmsens_s$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> [1] 309
writeLines("\nThe highest historically-corrected stochastic sensitivity is associated with element: ")
#>
#> The highest historically-corrected stochastic sensitivity is associated with element:
which(lath3ipmsens_s$ah_sensmats[[1]] == max(lath3ipmsens_s$ah_sensmats[[1]][which(lath2ipmmean$A[[1]] > 0)]))
#> [1] 309
writeLines("\nThe highest historical deterministic sensitivity is associated with element: ")
#>
#> The highest historical deterministic sensitivity is associated with element:
which(lath3ipmsens$h_sensmats[[1]] == max(lath3ipmsens$h_sensmats[[1]][which(lath3ipmmean$A[[1]] > 0)]))
#> [1] 2206981
writeLines("\nThe highest historical stochastic sensitivity is associated with element: ")
#>
#> The highest historical stochastic sensitivity is associated with element:
which(lath3ipmsens_s$h_sensmats[[1]] == max(lath3ipmsens_s$h_sensmats[[1]][which(lath3ipmmean$A[[1]] > 0)]))
#> [1] 2206981
```

Our sensitivity analyses generally suggest that population growth rate is most sensitive to element 309, which is the third column (309 / 103 = 3) and the 103rd row and so is associated with the transition from vegetative dormancy to the largest adult stage. However, deterministic analysis of the hMPM differs, suggesting that population growth rate is most elastic to element 283, which is in the third column and 77th row. This transition is from vegetative dormancy to a medium-to-large size adult. Historical deterministic and stochastic sensitivity analyses both suggest that population growth rate is most sensitive to changes in element 2,206,981, which is in the 209th column (2,206,981 / 10,609 = 208.0291) and the 309th row. That element corresponds to the transition from vegetative dormancy in times *t*-1 and *t* to the largest adult in time *t*+1. So, sensitivity analysis strongly suggests that vegetative dormance has strong impacts on population growth rate.

We will now move on to elasticity analysis. We will reduce the number of time steps simulated in the stochastic historical analysis to reduce the amount of computer processing time (this might take several hours even with the number of steps reduced). We will also remove the sensitivity analyses from memory.

```
rm("lath2ipmsens", "lath3ipmsens", "lath2ipmsens_s", "lath3ipmsens_s")
<- elasticity3(lath2ipmmean)
lath2ipmelas <- elasticity3(lath3ipmmean)
lath3ipmelas set.seed(42)
<- elasticity3(lathmat2ipm, stochastic = TRUE)
lath2ipmelas_s <- elasticity3(lathmat3ipm, stochastic = TRUE, steps = 200)
lath3ipmelas_s
writeLines("\nThe highest ahistorical deterministic elasticity is associated with element: ")
#>
#> The highest ahistorical deterministic elasticity is associated with element:
which(lath2ipmelas$ah_elasmats[[1]] == max(lath2ipmelas$ah_elasmats[[1]]))
#> [1] 209
writeLines("\nThe highest historically-corrected deterministic elasticity is associated with element: ")
#>
#> The highest historically-corrected deterministic elasticity is associated with element:
which(lath3ipmelas$ah_elasmats[[1]] == max(lath3ipmelas$ah_elasmats[[1]]))
#> [1] 209
writeLines("\nThe highest ahistorical stochastic elasticity is associated with element: ")
#>
#> The highest ahistorical stochastic elasticity is associated with element:
which(lath2ipmelas_s$ah_elasmats[[1]] == max(lath2ipmelas_s$ah_elasmats[[1]]))
#> [1] 209
writeLines("\nThe highest historically-corrected stochastic elasticity is associated with element: ")
#>
#> The highest historically-corrected stochastic elasticity is associated with element:
which(lath3ipmelas_s$ah_elasmats[[1]] == max(lath3ipmelas_s$ah_elasmats[[1]]))
#> [1] 209
writeLines("\nThe highest historical deterministic elasticity is associated with element: ")
#>
#> The highest historical deterministic elasticity is associated with element:
which(lath3ipmelas$h_elasmats[[1]] == max(lath3ipmelas$h_elasmats[[1]]))
#> [1] 2206881
writeLines("\nThe highest historical stochastic elasticity is associated with element: ")
#>
#> The highest historical stochastic elasticity is associated with element:
which(lath3ipmelas_s$h_elasmats[[1]] == max(lath3ipmelas_s$h_elasmats[[1]]))
#> [1] 2206881
```

All analyses agree that \(\lambda\) is most elastic in response to stasis in vegetative dormancy (element 209 is the element in the third column, 209 / 103 = 2.029, and the third row, 209 - (103 * 2) = 3). The historical analyses both agree that population growth rate is most elastic to element 2,206,881, which is in the 209th column and the 209th row and corresponds to the transition from vegetative dormancy in times *t*-1 and *t* to vegetative dormancy in time *t*+1. Now let’s plot the elasticity of \(\lambda\) to transitions from both perspectives.

```
<- cbind.data.frame(colSums(lath2ipmelas$ah_elasmats[[1]]),
elas_put_together colSums(lath3ipmelas$ah_elasmats[[1]]), colSums(lath2ipmelas_s$ah_elasmats[[1]]),
colSums(lath3ipmelas_s$ah_elasmats[[1]]))
names(elas_put_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(elas_put_together) <- lath2ipmelas$ah_stages$stage_id
<- par("lty")
lty.o par(lty = 0)
barplot(t(elas_put_together), beside=T, ylab = "Elasticity", xlab = "Stage",
col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")
```

`par(lty = lty.o)`

The plot of these distributions shows the strong importance of vegetative dormancy, which is the tallest bar in both plots (the first bars, corresponding to dormant seeds and seedlings, have elasticity at nearly 0). However, the distribution of elasticity values in adult stages is shifted to the right in the historically-corrected IPM. Thus, while the ahistorical IPMs show the second highest elasticity of \(\lambda\) to be associated with plants with a leaf volume of 746, historically-corrected analyses suggest that the second highest elasticity is associated with plants with a leaf volume of 960.

Now let’s take a look at the summed elasticities of different kinds of transitions, beginning with a comparison of ahistorical to historically-corrected transitions.

```
<- summary(lath2ipmelas)
lath2elas_sums <- summary(lath3ipmelas)
lath3elas_sums <- summary(lath2ipmelas_s)
lath2elas_sums_s <- summary(lath3ipmelas_s)
lath3elas_sums_s
<- cbind.data.frame(lath2elas_sums$ahist[,2],
elas_sums_together $ahist[,2], lath2elas_sums_s$ahist[,2], lath3elas_sums_s$ahist[,2])
lath3elas_sumsnames(elas_sums_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(elas_sums_together) <- lath2elas_sums$ahist$category
barplot(t(elas_sums_together), beside=T, ylab = "Elasticity", xlab = "Transition",
col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")
```

We see similar transition elasticities between ahistorical and historical analyses, with growth and shrinkage the most influential on \(\lambda\), while fecundity is by far the least influential.

Further analytical tools are being planned for `lefko3`

, but packages that handle projection matrices can typically handle the individual matrices produced and saved in `lefkoMat`

objects in this package. Differences, obscure results, and errors sometimes arise when packages are not made to handle large and/or sparse matrices - historical matrices are both, and so care must be taken with their analysis.

We are grateful to two anonymous reviewers whose scrutiny improved the quality of this vignette. The project resulting in this package and this tutorial was funded by Grant-In-Aid 19H03298 from the Japan Society for the Promotion of Science.

Broek, Jan van den. 1995. “A Score Test for Zero Inflation in a Poisson Distribution.” *Biometrics* 51 (2): 738–43. https://doi.org/10.2307/2532959.

Ehrlen, Johan. 1995. “Demography of the Perennial Herb *Lathyrus Vernus*. I. Herbivory and Individual Performance.” *Journal of Ecology* 83 (2): 287–95. https://doi.org/10.2307/2261567.

———. 2000. “The Dynamics of Plant Populations: Does the History of Individuals Matter?” *Ecology* 81 (6): 1675–84. https://doi.org/10.1890/0012-9658(2000)081[1675:TDOPPD]2.0.CO;2.

———. 2002. “Assessing the Lifetime Consequences of Plant-Animal Interactions for the Perennial Herb Lathyrus Vernus (Fabaceae).” *Perspectives in Plant Ecology, Evolution and Systematics* 5 (3): 145–63. https://doi.org/10.1078/1433-8319-00031.

Ehrlen, Johan, and Ove Eriksson. 1996. “Seedling Recruitment in the Perennial Herb Lathyrus Vernus.” *Flora* 191 (4): 377–83. https://doi.org/10.1016/S0367-2530(17)30744-2.

Ehrlen, Johan, and Kari Lehtila. 2002. “How Perennal Are Perennial Plants?” *Oikos* 98: 308–22. https://doi.org/10.1034/j.1600-0706.2002.980212.x.

Ehrlen, Johan, and Zuzana Munzbergova. 2009. “Timing of Flowering: Opposed Selection on Different Fitness Components and Trait Covariation.” *The American Naturalist* 173 (6): 819–30. https://doi.org/10.1086/598492.

Ehrlen, Johan, and Jan Van Groenendael. 2001. “Storage and the Delayed Costs of Reproduction in the Understorey Perennial *Lathyrus Vernus*.” *Journal of Ecology* 89 (2): 237–46. https://doi.org/10.1046/j.1365-2745.2001.00546.x.

Ellner, Stephen P., and Mark Rees. 2006. “Integral Projection Models for Species with Complex Demography.” *American Naturalist* 167 (3): 410–28. https://doi.org/10.1086/499438.