Lathyrus vernus IPMs

Richard P. Shefferson, Shun Kurokawa, and Johan Ehrlén

This document was built in Markdown in R 4.0.4 and compiled on 30 March 2021. It covers package lefko3 version 3.4.0. Please note that this vignette was written with space considerations in mind. To reduce output size, we have prevented some statements from running if they produce long stretches of output. Examples include most summary() calls. In these cases, we include hashtagged versions of these calls, and our text assumes that the user runs these statements without hashtags.

CASE STUDIES OF SWEDISH Lathyrus vernus POPULATION

In this vignette, we will use the lathyrus dataset to illustrate the estimation of integral projection models. Please see the other vignettes included in package lefko3, as well as further vignettes posted online on the projects page of the Shefferson lab website. Other vignettes include demonstrations of raw and function-based MPMs, as well as age-by-stage MPMs.

ORGANISM AND POPULATION

Lathyrus vernus (family Fabaceae) is a long-lived forest herb, native to Europe and large parts of northern Asia. Individuals increase slowly in size and usually flower only after 10-15 years of vegetative growth. Flowering individuals have an average conditional lifespan of 44.3 years (Ehrlen and Lehtila 2002). Lathyrus vernus lacks organs for vegetative spread and individuals are well delimited (Ehrlen 2002). One or several erect shoots of up to 40 cm height emerge from a subterranean rhizome in March and April. Flowering occurs about four weeks after shoot emergence. Shoot growth is determinate, and the number of flowers is determined in the previous year (Ehrlen and Van Groenendael 2001). Individuals sometimes do not produce aboveground structures every year, instead remaining dormant in one season. Lathyrus vernus is self-compatible but requires visits from bumble-bees to produce seeds. Individuals produce few, large seeds and establishment from seeds is relatively frequent (Ehrlen and Eriksson 1996). The pre-dispersal seed predator Bruchus atomarius often consumes a large fraction of developing seeds, and roe deer (Capreolus capreolus) sometimes consume the shoots (Ehrlen and Munzbergova 2009).

Data for this study were collected from six permanent plots in a population of L. vernus located in a deciduous forest in the Tullgarn area, SE Sweden (58.9496 N, 17.6097 E), during 1988–1991 (Ehrlen 1995). The six plots were relatively similar with regard to soil type, elevation, slope, and canopy cover. Within each plot, all individuals were marked with numbered tags that remained over the study period, and their locations were carefully mapped. New individuals were included in the study in each year. Individuals were recorded at least three times every growing season. At the time of shoot emergence, we recorded whether individuals were alive and produced above-ground shoots, and if shoots had been grazed. During flowering, we recorded flower number and the height and diameter of all shoots. At fruit maturation, we counted the number of intact and damaged seeds. To derive a measure of above-ground size for each individual, we calculated the volume of each shoot as \(\pi × (\frac{1}{2} diameter)^2 × height\), and summed the volumes of all shoots. This measure is strongly correlated with the dry mass of aboveground tissues (\(R^2 = 0.924\), \(P < 0.001\), \(n = 50\), log-transformed values; Ehrlén 1995). Size of individuals that had been grazed was estimated based on measures of shoot diameter in grazed shoots, and the relationship between shoot diameter and shoot height in non-grazed individuals. Only individuals with an aboveground volume of more than 230 mm3 flowered and produced fruits during this study. Individuals that lacked aboveground structures in one season but reappeared in the following year were considered dormant. Individuals that lacked aboveground structures in two subsequent seasons were considered dead from the year in which they first lacked aboveground structures. Probabilities of seeds surviving to the next year, and of being present as seedlings or seeds in the soil seed bank, were derived from separate yearly sowing experiments in separate plots adjacent to each subplot (Ehrlen and Eriksson 1996).

OVERALL GOALS AND INITIAL CONSIDERATIONS

Here we will build historical and ahistorical integral projection models (IPMs). An IPM is a kind of function-based MPM in which transitions are modeled on a continuous state variable rather than discrete stages (Ellner and Rees 2006). A size metric is often used as this continuous state variable. In practice, all size-classified matrix projection models including IPMs require discrete size classes. So, although size is modeled as conforming to the Gaussian distribution, the actual stages developed break size up into many fine-scale, equally sized classes or bins. Although the number of these bins varies from analysis to analysis, package lefko3 uses a default of 100 (this can be changed as an option). Because vital rates are modeled rather than directly calculated from the data, the number of individuals moving through any particular size class at any particular time does not need to be considered in determining stage boundaries (as they would be in raw MPM estimation). IPMs are often complex, meaning that they include some life history stages that fall outside of the size classifications developed for the matrices, such as dormant seeds or juveniles. Package lefko3 can handle all of this complexity.

The dataset that we have provided is organized in horizontal format, meaning that each row holds all of the data for a single, unique individual, and columns correspond to individual condition in particular observation times (which we refer to as years here, since there was one main census in each year). The original Excel spreadsheet used to keep the dataset has a repeating pattern to these columns, with each year having a similarly arranged group of variables. Package lefko3 includes functions to handle data in horizontal format based on these patterns, as well as functions to handle vertically formatted data (i.e. data for individuals is broken up across rows, where each row is a unique combination of individual and year in time t).

Figure 6.1. Organization of the Lathyrus dataset, as viewed in Microsoft Excel.

This dataset includes information on 1,119 individuals, so there are 1,119 rows with data (not counting the header). There are 38 columns. The first two columns are variables giving identifying information about each individual (SUBPLOT refers to the patch, and GENET refers to individual identity), with each individual’s data entirely restricted to one row. This is followed by four sets of nine columns, each named VolumeXX, lnVolXX, FCODEXX, FlowXX, IntactseedXX, Dead19XX, DormantXX, Missing19XX, and SeedlingXX, where XX corresponds to the year of observation and with years organized consecutively. Thus, columns 3-11 refer to year 1988, columns 12-20 refer to year 1989, etc. For lefko3 to handle this dataset correctly, we need to know the exact number of years used, which is 4 years here (includes all years from and including 1988 to 1991), we need the columns to be repeated in the same order for each year, and we need years in consecutive order with no extra columns between them.

First, we clear memory and load the dataset.

rm(list=ls(all=TRUE))

library(lefko3)
data(lathyrus)
#summary(lathyrus)

Step 1. Life history model development

To begin, we need to create a stageframe for this dataset. A stageframe is a data frame that describes all stages in the life history of the organism, in a way usable by the functions in this package and using stage names and classifications that completely match those used in the dataset. It needs to include complete descriptions of all stages that occur in the dataset, with each stage defined uniquely. Since this object can be used for automated classification of individuals, all sizes, reproductive states, and other characteristics defining each stage in the dataset need to be accounted for explicitly. This can be difficult if a few data points exist outside the range of sizes specified in the stageframe, so great care must be taken to include all size values and values of other descriptor variables occurring within the dataset. The final description of each stage occurring in the dataset must not completely overlap with any other stage also found in the dataset, although partial overlap is allowed and expected. We will base our stageframe on the life history model provided in Ehrlén (2000), but use a different size classification to allow IPM construction and make all mature stages other than vegetative dormancy reproductive.

In the stageframe code below, we show that we want an IPM by choosing two stages that serve as the size limits for IPM size classification. These two size classes should have exactly the same characteristics in the stageframe other than size. By choosing these two size limits, we can skip adding and describing the many size classes that will fall between these limits - function sf_create() will create all of these for us. We mark these limits in the vector that we load into the stagenames option using the string ipm. Package lefko3 will then create and name all IPM size classes according to its own conventions. The default number of size classes is 100 bins. We can alter this if we wish using the ipmbins option.

sizevector <- c(0, 100, 0, 1, 7100)
stagevector <- c("Sd", "Sdl", "Dorm", "ipm", "ipm")
repvector <- c(0, 0, 0, 1, 1)
obsvector <- c(0, 1, 0, 1, 1)
matvector <- c(0, 0, 1, 1, 1)
immvector <- c(1, 1, 0, 0, 0)
propvector <- c(1, 0, 0, 0, 0)
indataset <- c(0, 1, 1, 1, 1)
binvec <- c(0, 100, 0.5, 1, 1)
comments <- c("Dormant seed", "Seedling", "Dormant", "ipm adult stage", "ipm adult stage")
lathframeipm <- sf_create(sizes = sizevector, stagenames = stagevector, 
  repstatus = repvector, obsstatus = obsvector, propstatus = propvector,
  immstatus = immvector, matstatus = matvector, comments = comments,
  indataset = indataset, binhalfwidth = binvec, ipmbins = 100, roundsize = 3)
#lathframeipm

This stageframe has 103 stages (we can see this by typing dim(lathframeipm), which will show the numbers of rows and columns, respectively). The IPM portion technically starts with the fourth stage and keeps going through the 103rd stage. Stage names within this range are concatenations of the size centroid (designated with sz), reproductive status (designated with rp), maturity status (designated with mt), and observation status (designated with ob). The first three stages, which fall outside of the IPM classification, are left unaltered.

Step 2a. Dataset organization

To work with this dataset, we first need to format the data into vertical format, in which each row corresponds to the state of a single individual in two (if ahistorical) or three (if historical) consecutive time intervals. Because this is an IPM, we will need to estimate linear models of vital rates This will require us to avoid NAs in size and fecundity (fortunately, in this dataset, NA is equivalent to 0), so we will set NAas0 = TRUE. We will also set NRasRep = TRUE because we will assume that all adult stages other than dormancy are reproductive, and there are mature individuals in the dataset that do not reproduce but need to be included in reproductive stages (setting this option to TRUE makes sure that the reproductive status of non-reproductive individuals in potentially reproductive stages is set to 1, although the actual fecundity is not altered). Finally, we will ignore patches marked in the dataset and estimate matrices only for the full population, in order to preserve statistical power for vital rate modeling in historical IPM analysis.

lathvertipm <- verticalize3(lathyrus, noyears = 4, firstyear = 1988, 
  individcol = "GENET", blocksize = 9, juvcol = "Seedling1988", 
  sizeacol = "Volume88", repstracol = "FCODE88", fecacol = "Intactseed88", 
  deadacol = "Dead1988", nonobsacol = "Dormant1988", stageassign = lathframeipm,
  stagesize = "sizea", censorcol = "Missing1988",censorkeep = NA, censor = TRUE,
  NAas0 = TRUE, NRasRep = TRUE)
#summary(lathvertipm)

Before we move on to the next key steps in analysis, let’s take a closer look at fecundity. In this dataset, fecundity is mostly a count of intact seeds, and only differs in six cases where the seed output was estimated based on other models. To see this, try the following code.

writeLines(paste0("Total length of variable corresponding to fecundity in time t+1: ", length(lathvertipm$feca3)))
#> Total length of variable corresponding to fecundity in time t+1: 2527
writeLines(paste0("Total non-integer entries in fecundity in time t+1: ", length(which(lathvertipm$feca3 != round(lathvertipm$feca3)))))
#> Total non-integer entries in fecundity in time t+1: 0
writeLines(paste0("\nTotal length of variable corresponding to fecundity in time t: ", length(lathvertipm$feca2)))
#> 
#> Total length of variable corresponding to fecundity in time t: 2527
writeLines(paste0("Total non-integer entries in fecundity in time t: ", length(which(lathvertipm$feca2 != round(lathvertipm$feca2)))))
#> Total non-integer entries in fecundity in time t: 6
writeLines(paste0("\nTotal length of variable corresponding to fecundity in time t-1: ", length(lathvertipm$feca1)))
#> 
#> Total length of variable corresponding to fecundity in time t-1: 2527
writeLines(paste0("Total non-integer entries in fecundity in time t-1: ", length(which(lathvertipm$feca1 != round(lathvertipm$feca1)))))
#> Total non-integer entries in fecundity in time t-1: 6

We see that we have quite a bit of fecundity data, and that it is overwhelmingly but not exclusively integer. So, we can either treat fecundity as a continuous variable, or round the values and treat it as a count variable. We will round fecundity so that we can treat fecundity as a count variable in the analysis.

lathvertipm$feca3 <- round(lathvertipm$feca3)
lathvertipm$feca2 <- round(lathvertipm$feca2)
lathvertipm$feca1 <- round(lathvertipm$feca1)

Although we wish to treat fecundity as a count, it is still not clear what underlying distribution we should use. This package currently allows 7 choices: Gaussian, Poisson, negative binomial, zero-inflated Poisson, zero-inflated negative binomial, zero-truncated Poisson, and zero-truncated negative binomial. To assess which to use, we should first assess whether the mean and variance of the count are equal using a dispersion test. The Poisson distribution assumes that the mean and variance are equal, and so we can test this assumption using a chi-squared test. If it is not significantly different, then we may use some variant of the Poisson distribution. If the data are significantly over- or under-dispersed, then we should use the negative binomial distribution. If fecundity of 0 is possible in reproductive stages, as in cases where reproductive status is defined by flowering rather than by offspring production, then we should also test whether the number of zeroes is significantly greater than expected under these distributions, and use a zero-inflated distribution if so (if fecundity does equal 0 in reproductive individuals in the dataset, then we should use a zero-truncated distribution).

Let’s start off by looking at a plot of the distribution of fecundity.

hist(subset(lathvertipm, repstatus2 == 1)$feca2, main = "Fecundity", 
  xlab = "Intact seeds produced in time t")

Figure 6.2. Histogram of fecundity in time t

We see that the distribution seems to conform to a classic count variable with a very low mean value. The first bar suggests that there may be too many zeroes, and we can see this in the following plot, which focuses in on the leftmost portion of the above graph.

hist(subset(lathvertipm, repstatus2 == 1)$feca2[which(subset(lathvertipm, 
  repstatus2 == 1)$feca2 < 11)], main = "Fecundity", 
  xlab = "Intact seeds produced in time t")

Figure 6.3. Histogram of fecundity in time t, only cases with <10 seeds produced

This is fairly ample evidence that we should use a zero-inflated distribution of some sort. But to make that decision, let’s now go to a formal test of the two assumptions of \(mean = variance\) and no excess 0s. Both tests use chi-squared distribution-based approaches, with the zero-inflation test based on van der Broek (1995). The following line will assess all of this in fecundity, but can also assess size if necessary (we will treat size as conforming to a Gaussian distribution, so will not test these assumptions with it).

sf_distrib(lathvertipm, size2 = "sizea2", fec = "feca2", repst = "repstatus2")
#> 
#> Mean fecundity is 4.791
#> The variance in fecundity is 70.14
#> The probability of this dispersion level by chance assuming the true mean fecundity = variance in fecundity, and an alternative hypothesis of overdispersion, is 0
#> 
#> Fecundity is significantly overdispersed.
#> 
#> 
#> Mean lambda is 0.008302
#> The actual number of 0s in fecundity is 334
#> The expected number of 0s in fecundity under the null hypothesis is 4.973
#> The probability of this deviation in 0s is 0
#> 
#> Fecundity is significantly zero-inflated.
#> NULL

Such significant results for both tests show us that we really need to use a zero-inflated negative binomial distribution here.

Step 2b: Develop supplemental information for matrix estimation

Now we will create supplemental tables, which provide extra data for matrix estimation that is not included in the main demographic dataset. Specifically, we will provide the seed dormancy probability and germination rate, which are given as transitions from the dormant seed stage to another year of seed dormancy or to the germinated seedling stage, respectively. We assume that the germination rate is the same regardless of whether the seed was produced in the previous year or has been in the seedbank for longer. We will incorporate these terms both as fixed constants for specific transitions within the resulting matrices, and as multipliers for fecundity, since ultimately fecundity will be estimated as the production of seed multiplied by the seed germination rate or the seed dormancy/survival rate. We will start with the ahistorical case, and then move on to the historical case, where we also need to input the corresponding stages in time t-1 for each transition.

lathsupp2 <- supplemental(stage3 = c("Sd", "Sdl", "Sd", "Sdl"), 
  stage2 = c("Sd", "Sd", "rep", "rep"),
  givenrate = c(0.345, 0.054, NA, NA),
  multiplier = c(NA, NA, 0.345, 0.054),
  type = c(1, 1, 3, 3), stageframe = lathframeipm, historical = FALSE)
lathsupp3 <- supplemental(stage3 = c("Sd", "Sd", "Sdl", "Sd", "Sdl"), 
  stage2 = c("Sd", "Sd", "Sd", "rep", "rep"),
  stage1 = c("Sd", "rep", "rep", "all", "all"), 
  givenrate = c(0.345, 0.345, 0.054, NA, NA),
  multiplier = c(NA, NA, NA, 0.345, 0.054),
  type = c(1, 1, 1, 3, 3), stageframe = lathframeipm, historical = TRUE)
#lathsupp2
#lathsupp3

Step 3. Tests of history, and vital rate modeling

Integral projection models (IPMs) require functions of vital rates to populate them. Here, we will develop these functions as linear models using modelsearch(). This function will develop our best-fit vital rate models for us. This function looks simple, but it automates several crucial and complex tasks in MPM analysis. Specifically, it automates 1) the building of global models for each vital rate requested, 2) the exhaustive construction of all reduced models, and 3) the selection of the best-fit models. Let’s look at some of the options that we will utilize for this purpose (please note that this list includes only some of the options actually offered by the function - further options are shown in the documentation for modelsearch(), and further theoretical details are shown in the Basic theory and concepts vignette):

historical: Setting this to TRUE or FALSE indicates to include state in time t-1 in the global models.

approach: Designates whether to use generalized linear models (glm), in which all factors are treated as fixed, or mixed effects models (mixed), in which factors are treated as either fixed or random (most notably, time, patch, and individual identity can be treated as random). We encourage users to use the latter option, as it accounts for pseudoreplication, but the former approach is more common.

suite: Designates which factors to include in the global model. Possible values include size, which includes size only; rep, which includes reproductive status only; main, which includes both size and reproductive status as main effects only; full, which includes both size and reproductive status and all two-way interactions; and const, which includes only an intercept. These factors are in addition to individual identity, patch, and time.

vitalrates: Designates which vital rates to model. The default is to model survival, size, and fecundity, but users can also model observation status and reproduction status.

juvestimate: Designates the name of the juvenile stage transitioning to maturity, in cases where the dataset includes data on juveniles.

juvsize: Denotes whether size should be used as an independent term in models involving transition from the juvenile stage.

bestfit: Denotes whether the best-fit model should be chosen as the model with the lowest AICc (AICc) or as the most parsimonious model (AICc&k), where the latter is the model that has the fewest estimated parameters and is within 2 AICc units of the model with the lowest AICc. We encourage users to choose the latter. This is the default setting, and is more in-line with model selection protocols preferred by information theorists (Burnham and Anderson 2002)..

sizedist: Designates the distribution used for size. The options include gaussian, poisson, and negbin, the last of which refers to the negative binomial distribution.

fecdist: Designates the distribution used for fecundity. The options include gaussian, poisson, and negbin, the last of which refers to the negative binomial distribution.

fectime: Designates whether fecundity is estimated within time t (the default) or time t+1. Plant ecologists are likely to choose the former, since fecundity is typically estimated via a proxy such as flowers, fruit, or seed produced. Wildlife ecologists might choose the latter, since fecundity may be best estimated as a count of actual juveniles within nests, burrows, or other family structures.

size.zero: Designates whether the size distribution should be zero-inflated. Only applies to the Poisson and negative binomial distributions.

size.trunc: Designates whether to use a zero-truncated distribution for size. Only applies to the Poisson and negative binomial distributions, and cannot be applied is size.zero = TRUE.

fec.zero: Designates whether the fecundity distribution should be zero-inflated. Only applies to the Poisson and negative binomial distributions.

fec.trunc: Designates whether to use a zero-truncated distribution for fecundity. Only applies to the Poisson and negative binomial distributions, and cannot be applied is fec.zero = TRUE.

jsize.zero: Designates whether the juvenile size distribution should be zero-inflated. Only applies to the Poisson and negative binomial distributions.

jsize.trunc: Designates whether to use a zero-truncated distribution for juvenile size. Only applies to the Poisson and negative binomial distributions, and cannot be applied is jsize.zero = TRUE.

indiv: Designates the variable corresponding to individual identity in the vertical dataset.

patch: Designates the variable corresponding to patch identity in the vertical dataset.

year: Designates the variable corresponding to time t in the vertical dataset.

age: Designates the name of the variable corresponding to age in time t. Should be set only if an age-by-stage model is desired. Do not use this option to produce a model that does not incorporate age.

year.as.random: Designates whether to treat year as a random factor, and is only used when approach = "mixed".

patch.as.random: Designates whether to treat patch identity as a random factor, and is only used when approach = "mixed".

show.model.tables: Designates whether to include the full dredge model tables showing all models developed and their associated AICc values.

quiet: Denotes whether to silence guidepost statements and most warning messages. Only warnings incurred in model testing will be visible, including warnings from the testing of global models and those from the testing of reduced models produced during model dredging.

In addition, there are other options that provide flexibility in handling datasets with different designations for key variables, and allowing manual designation of stages.

This function is useful not just because it develops the vital rate models necessary for function-based MPM estimation, but because it provides a test of whether individual history affects demography. Setting historical = TRUE fits size and/or reproductive status in time t-1 into global models, while setting historical = FALSE prevents testing of the impacts of state in time t-1. The model building and selection protocols can then be used to see if history has a significant impact on a vital rate, by assessing whether a historical term is retained in the best-fit model.

First, we will create the historical models in order to assess whether history is a significant influence on vital rates. Please remove the hashtag from the summary call to see details of the best-fit models.

lathmodels3ipm <- modelsearch(lathvertipm, historical = TRUE, approach = "mixed", 
  suite = "size", vitalrates = c("surv", "obs", "size", "fec"), 
  juvestimate = "Sdl", bestfit = "AICc&k", sizedist = "gaussian", 
  fecdist = "negbin", fec.zero = TRUE, indiv = "individ", year = "year2", 
  year.as.random = TRUE, juvsize = TRUE, show.model.tables = TRUE, quiet = TRUE)
#> Warning: Some predictor variables are on very different scales: consider rescaling
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 0.0158034 (tol =
#> 0.002, component 1)
#> Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
#>  - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
#>  - Rescale variables?
#> Warning: Some predictor variables are on very different scales: consider rescaling
#> boundary (singular) fit: see ?isSingular
#> Warning: Some predictor variables are on very different scales: consider rescaling
#> boundary (singular) fit: see ?isSingular
#> Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j = integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for
#> you
#> Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j = integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for
#> you

#> Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j = integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for
#> you
#> Warning in fitTMB(TMBStruc): Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
#> boundary (singular) fit: see ?isSingular
#> boundary (singular) fit: see ?isSingular
#summary(lathmodels3ipm)

We see here, as before, that size in time t-1 exerts an influence on some vital rates, including survival to time t+1 and size in time t+1. So, the historical IPM is the correct choice here. However, we will also create an ahistorical IPM for comparison. For that purpose, we will create the ahistorical linear model set.

lathmodels2ipm <- modelsearch(lathvertipm, historical = FALSE, 
  approach = "mixed", suite = "size", 
  vitalrates = c("surv", "obs", "size", "fec"), juvestimate = "Sdl", 
  bestfit = "AICc&k", sizedist = "gaussian", fecdist = "negbin", 
  fec.zero = TRUE, indiv = "individ", year = "year2", year.as.random = TRUE, 
  juvsize = TRUE, show.model.tables = TRUE, quiet = TRUE)
#> boundary (singular) fit: see ?isSingular
#> boundary (singular) fit: see ?isSingular
#> boundary (singular) fit: see ?isSingular
#> Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j = integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for
#> you

#> Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j = integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for
#> you

#> Warning in Matrix::sparseMatrix(dims = c(0, 0), i = integer(0), j = integer(0), : 'giveCsparse' has been deprecated; setting 'repr = "T"' for
#> you
#> boundary (singular) fit: see ?isSingular
#> boundary (singular) fit: see ?isSingular
#summary(lathmodels2ipm)

Type summary(lathmodels2ipm) to examine and compare against the historical case. We note some strong similarities here, although obviously size in time t-1 is no longer present in any vital rate model. Let’s move on now to the matrices themselves.

Step 4. IPM estimation

We will now create the historical suite of matrices covering the years of study. Be aware that the output matrices will be extremely large - large enough that some computers might have difficulty with them. If you encounter an error message telling you that you have run out of memory, then please try this on a more powerful computer :) .

lathmat3ipm <- flefko3(stageframe = lathframeipm, modelsuite = lathmodels3ipm,
  supplement = lathsupp3, data = lathvertipm, year.as.random = FALSE,
  patch.as.random = FALSE, reduce = FALSE)
summary(lathmat3ipm)
#> 
#> This historical lefkoMat object contains 3 matrices.
#> 
#> Each matrix is a square matrix with 10609 rows and columns, and a total of 112550881 elements.
#> A total of 3122121 survival transitions were estimated, with 1040707 per matrix.
#> A total of 61800 fecundity transitions were estimated, with 20600 per matrix.
#> 
#> Vital rate modeling quality control:
#> 
#> Survival estimated with 257 individuals and 2246 individual transitions.
#> Observation estimated with 254 individuals and 2121 individual transitions.
#> Size estimated with 254 individuals and 1916 individual transitions.
#> Reproduction probability not estimated.
#> Fecundity estimated with 257 individuals and 2246 individual transitions.
#> Juvenile survival estimated with 187 individuals and 281 individual transitions.
#> Juvenile observation estimated with 154 individuals and 210 individual transitions.
#> Juvenile size estimated with 144 individuals and 193 individual transitions.
#> Juvenile reproduction probability not estimated.
#> NULL

These are giant matrices. With 10,609 rows and columns, there are a total of 112,550,881 elements per matrix. But they are also amazingly sparse - with 1,061,307 elements estimated, 0.9% of elements per matrix are non-zero. Let’s now build the ahistorical IPMs.

lathmat2ipm <- flefko2(stageframe = lathframeipm, modelsuite = lathmodels2ipm,
  supplement = lathsupp2, data = lathvertipm, year.as.random = FALSE,
  patch.as.random = FALSE, reduce = FALSE)
summary(lathmat2ipm)
#> 
#> This ahistorical lefkoMat object contains 3 matrices.
#> 
#> Each matrix is a square matrix with 103 rows and columns, and a total of 10609 elements.
#> A total of 30621 survival transitions were estimated, with 10207 per matrix.
#> A total of 600 fecundity transitions were estimated, with 200 per matrix.
#> 
#> Vital rate modeling quality control:
#> 
#> Survival estimated with 257 individuals and 2246 individual transitions.
#> Observation estimated with 254 individuals and 2121 individual transitions.
#> Size estimated with 254 individuals and 1916 individual transitions.
#> Reproduction probability not estimated.
#> Fecundity estimated with 257 individuals and 2246 individual transitions.
#> Juvenile survival estimated with 187 individuals and 281 individual transitions.
#> Juvenile observation estimated with 154 individuals and 210 individual transitions.
#> Juvenile size estimated with 144 individuals and 193 individual transitions.
#> Juvenile reproduction probability not estimated.
#> NULL

The ahistorical IPMs are certainly smaller than the historical IPMs, but are nonetheless quiet large. Although huge, these matrices are not sparse - 10,407 elements out of 10,609 per matrix are estimated (98.1%). Fortunately, we can assume that neither IPM is overparameterized, since ultimately the elements in an IPM and any other function-based matrix reflect the statistical power of the underlying vital rate models. In lefko3, the vital rate models used are the best-fit models from exhaustive model selection, and so are already the most parsimonious from within the suite of tested models.

Let’s take a look at the column sums of the first ahistorical U matrix, to make sure that all column sums look like survival probabilities.

summary(colSums(lathmat2ipm$U[[1]]))
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> 0.0000487 0.9838110 0.9987809 0.9540850 0.9998662 0.9999848

Everything looks OK, with stage survival probabilities within the realm of possibility. Now let’s repeat with the first historical U matrix. First the top corner of the first historical matrix.

summary(colSums(lathmat3ipm$U[[1]]))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.0000  0.9963  0.9999  0.9690  1.0000  1.0000

These numbers also all look fine. These are the expected survival probabilities to time t+1 of individuals in each combination of stages in times t-1 and t.

Let’s do a further comparison - let’s view a matrix plot of each kind of MPM, ahistorical and then historical. First, the ahistorical plot.

image3(lathmat2ipm, used = 1)

Figure 6.4. Image of ahistorical projection matrix

#> NULL

Now the historical plot.

image3(lathmat3ipm, used = 1)

Figure 6.5. Image of historical projection matrix

#> NULL

The plots above show the major differences between ahistorical and historical MPMs. The ahistorical matrix is large but dense, mostly full of non-zero entries (the colored elements correspond to non-zero elements). In contrast, the historical matrix is huge and sparse, mostly full of zeroes with a general pattern to the distribution of non-zero elements. It should be obvious that many methodologies used for ahistorical analysis need to change to deal with the reality of the historical MPM as a series of huge, sparse matrices. Fortunately, package lefko3 includes these methodologies.

Now let’s estimate the mean IPM matrices. This code will estimate 1 mean matrix each, because we did not separate patches in the data reorganization and vital rate modeling. As a check, let’s also take a look at the column sums of the grand mean survival-transition matrix from each case.

lath2ipmmean <- lmean(lathmat2ipm)
lath3ipmmean <- lmean(lathmat3ipm)

writeLines("\nAhistorical matrix stage survival distribution: ")
#> 
#> Ahistorical matrix stage survival distribution:
summary(colSums(lath2ipmmean$U[[1]]))
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> 0.0000404 0.9765161 0.9987748 0.9451579 0.9998662 0.9999849
writeLines("\nHistorical matrix stage-pair survival distribution: ")
#> 
#> Historical matrix stage-pair survival distribution:
summary(colSums(lath3ipmmean$U[[1]]))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.0000  0.9887  0.9994  0.9603  1.0000  1.0000

All looks fine! Let’s also take a look at a portion of one of the conditional historical matrices, particularly the matrix conditional on vegetative dormancy in time t-1. This matrix can be compared to the ahistorical mean to assess the impacts of history on the matrix elements themselves. Please remember to remove the hashtag.

l3mcond <- cond_hmpm(lath3ipmmean)
#l3mcond$Acond[[1]]$Dorm

Step 5. MPM analysis

Now let’s estimate and plot the deterministic population growth rate, \(\lambda\), for each year.

ipm2lambda <- lambda3(lathmat2ipm)
ipm3lambda <- lambda3(lathmat3ipm)

plot(lambda ~ year2, data = ipm2lambda, xlab = "Year", ylab = "Lambda", 
  ylim = c(0.65, 1.00), type = "l", lwd = 2, bty = "n")
lines(lambda ~ year2, data = ipm3lambda, lwd = 2, lty = 2, col = "red")
legend("bottomleft", c("ahistorical", "historical"), lty = c(1, 2), 
  col = c("black", "red"), lwd = 2, bty = "n")

Figure 6.6. Ahistorical vs. historical lambda

Ahistorical estimates of \(\lambda\) are lower than historical estimates, and the historical \(\lambda\) values are more in line with estimates from the other Lathyrus vignettes.

Let’s now take a peek at \(\lambda\) for the mean matrices and compare with the stochastic growth rate, \(a = \text{log} \lambda _{S}\). We will set the number of simulations low in the historical case in order to keep the amount of memory used and computational time low, because the size of the historical matrices will use up plenty of both. Normally we would want to keep the number of simulations at least at the default level, which is 10,000. We will also set the seed for the random number generator to make our results reproducible.

lambda3(lath2ipmmean)
#>   pop patch   lambda
#> 1   1     1 0.815444
lambda3(lath3ipmmean)
#>   pop patch    lambda
#> 1   1     1 0.8719852
set.seed(42)
slambda3(lathmat2ipm)
#>   pop patch          a        var        sd          se
#> 1   1     1 -0.2112375 0.01050791 0.1025081 0.001025081
slambda3(lathmat3ipm, times = 1000)
#>   pop patch          a         var         sd          se
#> 1   1     1 -0.1501267 0.008062375 0.08979073 0.002839432

The historical growth rate is larger than the ahistorical in both deterministic and stochastic analyses, although all four numbers suggest a declining population.

Now let’s compare the stable stage distribution from both the ahistorical and historical mean MPMs.

ipm2ss <- stablestage3(lath2ipmmean)
ipm3ss <- stablestage3(lath3ipmmean)
ipm2ss_s <- stablestage3(lathmat2ipm, stochastic = TRUE, seed = 42)
ipm3ss_s <- stablestage3(lathmat3ipm, stochastic = TRUE, times = 1000, seed = 42)

ss_put_together <- cbind.data.frame(ipm2ss$ss_prop, ipm3ss$ahist$ss_prop,
  ipm2ss_s$ss_prop, ipm3ss_s$ahist$ss_prop)
names(ss_put_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(ss_put_together) <- ipm2ss$stage_id

lty.o <- par("lty")
par(lty = 0)
barplot(t(ss_put_together), beside=T, ylab = "Proportion", xlab = "Stage",
  col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
  col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")

Figure 6.7. Ahistorical vs. historically-corrected stable stage distribution

par(lty = lty.o)

Both ahistorical and historical approaches show the stable stage distribution dominated by the first stage, which is the dormant seed stage. However, the deterministic historic analysis suggests roughly equal weighting to the seedling stage, and increases the importance of larger adult stages. Stochastic analysis shows similar patterns. The importance of the seed bank to the population is quite clear in this analysis!

Next, we will estimate the reproductive values associated with the element-wise mean matrices.

ipm2rv <- repvalue3(lath2ipmmean)
ipm3rv <- repvalue3(lath3ipmmean)
ipm2rv_s <- repvalue3(lathmat2ipm, stochastic = TRUE, seed = 42)
ipm3rv_s <- repvalue3(lathmat3ipm, stochastic = TRUE, times = 1000, seed = 42)

rv_put_together <- cbind.data.frame((ipm2rv$rep_value / max(ipm2rv$rep_value)),
  (ipm3rv$ahist$rep_value / max(ipm3rv$ahist$rep_value)),
  (ipm2rv_s$rep_value / max(ipm2rv_s$rep_value)),
  (ipm3rv_s$ahist$rep_value / max(ipm3rv_s$ahist$rep_value)))
names(rv_put_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(rv_put_together) <- ipm2rv$stage_id

lty.o <- par("lty")
par(lty = 0)
barplot(t(rv_put_together), beside=T, ylab = "Relative rep value", 
  xlab = "Stage", col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topleft", c("det ahist", "det hist", "sto ahist", "sto hist"),
  col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")

Figure 6.8. Ahistorical vs. historically-corrected reproductive values

par(lty = lty.o)

A quick scan through these values shows that the highest reproductive values in the deterministic ahistorical analysis are for the largest adults. However, historically-corrected deterministic reproductive values drop off once adults get mildly large, with the largest value associated with plants with a leaf volume of 5076 (the maximum is >7000). Stochastic analysis shows a general build-up of reproductive value with increasing size, similarly to the deterministic ahistorical projection. However, the ahistorical stochastic analyses match the deterministic analyses fairly strongly, while the historical stochastic analyses show a larger hump of reproductive value associated with medium sizes and a slower build-up to the largest sizes. So, individual history and temporal environmental stochasticity both have strong impacts here.

Let’s move on. Given the size of these matrices and the difficulty of working with them, we will skip sensitivity analysis here and move on to elasticity analysis. We will also reduce the number of time steps simulated in the stochastic historical analysis to reduce the amount of computer processing time (note that this might take several hours even with the number of steps reduced).

lath2ipmelas <- elasticity3(lath2ipmmean)
lath3ipmelas <- elasticity3(lath3ipmmean)
lath2ipmelas_s <- elasticity3(lathmat2ipm, stochastic = TRUE, seed = 42)
lath3ipmelas_s <- elasticity3(lathmat3ipm, stochastic = TRUE, times = 1000, seed = 42)

writeLines("\nThe highest ahistorical deterministic elasticity is associated with element: ")
#> 
#> The highest ahistorical deterministic elasticity is associated with element:
which(lath2ipmelas$ah_elasmats[[1]] == max(lath2ipmelas$ah_elasmats[[1]]))
#> [1] 209
writeLines("\nThe highest historically-corrected deterministic elasticity is associated with element: ")
#> 
#> The highest historically-corrected deterministic elasticity is associated with element:
which(lath3ipmelas$ah_elasmats[[1]] == max(lath3ipmelas$ah_elasmats[[1]]))
#> [1] 209
writeLines("\nThe highest ahistorical stochastic elasticity is associated with element: ")
#> 
#> The highest ahistorical stochastic elasticity is associated with element:
which(lath2ipmelas_s$ah_elasmats[[1]] == max(lath2ipmelas_s$ah_elasmats[[1]]))
#> [1] 209
writeLines("\nThe highest historically-corrected stochastic elasticity is associated with element: ")
#> 
#> The highest historically-corrected stochastic elasticity is associated with element:
which(lath3ipmelas_s$ah_elasmats[[1]] == max(lath3ipmelas_s$ah_elasmats[[1]]))
#> [1] 209
writeLines("\nThe highest historical deterministic elasticity is associated with element: ")
#> 
#> The highest historical deterministic elasticity is associated with element:
which(lath3ipmelas$h_elasmats[[1]] == max(lath3ipmelas$h_elasmats[[1]]))
#> [1] 2206881
writeLines("\nThe highest historical stochastic elasticity is associated with element: ")
#> 
#> The highest historical stochastic elasticity is associated with element:
which(lath3ipmelas_s$h_elasmats[[1]] == max(lath3ipmelas_s$h_elasmats[[1]]))
#> [1] 2206881

All analyses agree that \(\lambda\) is most elastic in response to stasis in vegetative dormancy (element 209 is the element in the third column, 209 / 103 = 2.029, and the third row, 209 - (103 * 2) = 3). The historical analyses both agree that population growth rate is most elastic to element 2,206,881, which is in the 209th column and the 209th row and corresponds to the transition from vegetative dormancy in times t-1 and t to vegetative dormancy in time t+1. Now let’s plot the elasticity of \(\lambda\) to transitions from both perspectives.

elas_put_together <- cbind.data.frame(colSums(lath2ipmelas$ah_elasmats[[1]]), 
  colSums(lath3ipmelas$ah_elasmats[[1]]), colSums(lath2ipmelas_s$ah_elasmats[[1]]), 
  colSums(lath3ipmelas_s$ah_elasmats[[1]]))
names(elas_put_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(elas_put_together) <- lath2ipmelas$ah_stages$stage_id

lty.o <- par("lty")
par(lty = 0)
barplot(t(elas_put_together), beside=T, ylab = "Elasticity", xlab = "Stage",
  col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
  col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")

Figure 6.9. Ahistorical vs. historically-corrected elasticity of lambda to stage

par(lty = lty.o)

The plot of these distributions shows the strong importance of vegetative dormancy, which is the tallest bar in both plots (the first bars, corresponding to dormant seeds and seedlings, have elasticity at nearly 0). However, the distribution of elasticity values in adult stages is shifted to the right in the historically-corrected IPM. Thus, while the ahistorical IPMs show the second highest elasticity of \(\lambda\) to be associated with plants with a leaf volume of 746, historically-corrected analyses suggest that the second highest elasticity is associated with plants with a leaf volume of 960.

Now let’s take a look at the summed elasticities of different kinds of transitions, beginning with a comparison of ahistorical to historically-corrected transitions.

lath2elas_sums <- summary(lath2ipmelas)
lath3elas_sums <- summary(lath3ipmelas)
lath2elas_sums_s <- summary(lath2ipmelas_s)
lath3elas_sums_s <- summary(lath3ipmelas_s)

elas_sums_together <- cbind.data.frame(lath2elas_sums$ahist[,2],
  lath3elas_sums$ahist[,2], lath2elas_sums_s$ahist[,2], lath3elas_sums_s$ahist[,2])
names(elas_sums_together) <- c("det ahist", "det hist", "sto ahist", "sto hist")
rownames(elas_sums_together) <- lath2elas_sums$ahist$category

barplot(t(elas_sums_together), beside=T, ylab = "Elasticity", xlab = "Transition",
  col = c("black", "orangered", "grey", "darkred"), bty = "n")
legend("topright", c("det ahist", "det hist", "sto ahist", "sto hist"),
  col = c("black", "orangered", "grey", "darkred"), pch = 15, bty = "n")

Figure 6.10. Ahistorical vs. historically-corrected elasticity of lambda to transitions

We see extremely similar transition elasticities between ahistorical and historical analyses, with growth and shrinkage the most influential on \(\lambda\), while fecundity is by far the least influential.

Further analytical tools are being planned for lefko3, but packages that handle projection matrices can typically handle the individual matrices produced and saved in lefkoMat objects in this package. Differences, obscure results, and errors sometimes arise when packages are not made to handle large and/or sparse matrices - historical matrices are both, and so care must be taken with their analysis.

Acknowledgements

We are grateful to two anonymous reviewers whose scrutiny improved the quality of this vignette. The project resulting in this package and this tutorial was funded by Grant-In-Aid 19H03298 from the Japan Society for the Promotion of Science.

Literature cited

Broek, Jan van den. 1995. “A Score Test for Zero Inflation in a Poisson Distribution.” Biometrics 51 (2): 738–43. https://doi.org/10.2307/2532959.
Burnham, Kenneth P., and David R. Anderson. 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. New York, New York, USA: Springer-Verlag New York, Inc.
Ehrlen, Johan. 1995. “Demography of the Perennial Herb Lathyrus Vernus. I. Herbivory and Individual Performance.” Journal of Ecology 83 (2): 287–95. https://doi.org/10.2307/2261567.
———. 2000. “The Dynamics of Plant Populations: Does the History of Individuals Matter?” Ecology 81 (6): 1675–84. https://doi.org/10.1890/0012-9658(2000)081[1675:TDOPPD]2.0.CO;2.
———. 2002. “Assessing the Lifetime Consequences of Plant-Animal Interactions for the Perennial Herb Lathyrus Vernus (Fabaceae).” Perspectives in Plant Ecology, Evolution and Systematics 5 (3): 145–63. https://doi.org/10.1078/1433-8319-00031.
Ehrlen, Johan, and Ove Eriksson. 1996. “Seedling Recruitment in the Perennial Herb Lathyrus Vernus.” Flora 191 (4): 377–83. https://doi.org/10.1016/S0367-2530(17)30744-2.
Ehrlen, Johan, and Kari Lehtila. 2002. “How Perennal Are Perennial Plants?” Oikos 98: 308–22. https://doi.org/10.1034/j.1600-0706.2002.980212.x.
Ehrlen, Johan, and Zuzana Munzbergova. 2009. “Timing of Flowering: Opposed Selection on Different Fitness Components and Trait Covariation.” The American Naturalist 173 (6): 819–30. https://doi.org/10.1086/598492.
Ehrlen, Johan, and Jan Van Groenendael. 2001. “Storage and the Delayed Costs of Reproduction in the Understorey Perennial Lathyrus Vernus.” Journal of Ecology 89 (2): 237–46. https://doi.org/10.1046/j.1365-2745.2001.00546.x.
Ellner, Stephen P., and Mark Rees. 2006. “Integral Projection Models for Species with Complex Demography.” American Naturalist 167 (3): 410–28. https://doi.org/10.1086/499438.