We will introduce the basic functionality of **assignR**
using data bundled with the package. We’ll review how to access data for
known-origin biological samples and environmental models, use these to
fit and apply functions estimating the probability of sample origin
across a study region, and summarize these results to answer research
and conservation questions. We’ll also demonstrate a quality analysis
tool useful in study design, method comparison, and uncertainty
analysis.

Let’s load **assignR** and two other packages we’ll
need.

```
library(assignR)
library(raster)
library(sp)
```

Now add some data from the package to your local environment. Load and plot the North America boundary mask.

```
data("naMap")
plot(naMap)
```

Let’s do the same for a growing season precipitation H isoscape for
North America. Notice this is a RasterBrick with two layers, the mean
prediction and a standard deviation of the prediction. The layers are
from waterisotopes.org,
and their resolution has been reduced to speed up processing in these
examples. Full-resolution isoscapes of several different types can be
downloaded using the `getIsoscapes`

function (refer to the
help page for details).

```
data("d2h_lrNA")
plot(d2h_lrNA)
```

The package includes a database of H and O isotope data for known
origin samples (`knownOrig.rda`

), which consists of three
features (`sites`

, `samples`

, and
`sources`

). Let’s load it and have a look. First we’ll get
the names of the data fields available in the tables.

```
data("knownOrig")
names(knownOrig$sites)
```

```
## [1] "Site_ID" "Site_name" "State" "Country"
## [5] "Site_comments"
```

`names(knownOrig$samples)`

```
## [1] "Sample_ID" "Sample_ID_orig" "Site_ID" "Dataset_ID"
## [5] "Taxon" "Group" "Source_quality" "Age_class"
## [9] "Material_type" "Matrix" "d2H" "d2H.sd"
## [13] "d18O" "d18O.sd" "Sample_comments"
```

`names(knownOrig$sources)`

```
## [1] "Dataset_ID" "Dataset_name"
## [3] "Citation" "Sampling_method"
## [5] "Sample_powdered" "Lipid_extraction"
## [7] "Lipid_extraction_method" "Exchange"
## [9] "Exchange_method" "Exchange_T"
## [11] "H_cal" "O_cal"
## [13] "Std_powdered" "Drying"
## [15] "Analysis_method" "Analysis_type"
## [17] "Source_comments"
```

The `sites`

feature is a spatial object that records the
geographic location of all sites from which samples are available.

```
plot(assignR:::wrld_simpl)
points(knownOrig$sites, col = "red")
```

Now lets look at a list of species names available.

`unique(knownOrig$samples$Taxon)`

```
## [1] "Danaus plexippus" "Setophaga ruticilla"
## [3] "Turdus migratorius" "Setophaga coronata auduboni"
## [5] "Poecile atricapillus" "Thryomanes bewickii"
## [7] "Thryothorus ludovicianus" "Spizella passerina"
## [9] "Geothlypis trichas" "Setophaga pensylvanica"
## [11] "Baeolophus bicolor" "Vermivora chrysoptera"
## [13] "Catharus guttatus" "Setophaga citrina"
## [15] "Geothlypis formosa" "Geothlypis tolmiei"
## [17] "Oreothlypis ruficapilla" "Cardinalis cardinalis"
## [19] "Oreothlypis celata" "Junco hyemalis oregonus"
## [21] "Seiurus aurocapilla" "Vireo olivaceus"
## [23] "Melospiza melodia" "Catharus ustulatus"
## [25] "Catharus fuscescens" "Vireo griseus"
## [27] "Cardellina pusilla" "Hylocichla mustelina"
## [29] "Icteria virens" "Setophaga petechia"
## [31] "Melozone aberti" "Vermivora cyanoptera"
## [33] "Passer domesticus" "Aimophila ruficeps"
## [35] "Poecile carolinensis" "Troglodytes aedon"
## [37] "Dumetella carolinensis" "Mniotilta varia"
## [39] "Lanius ludovicianus" "Anthus spragueii"
## [41] "Euphagus carolinus" "Empidonax minimus"
## [43] "Oreothlypis peregrina" "Aythya affinis"
## [45] "Cyanistes caeruleus" "Phasianus colchicus"
## [47] "Lagopus lagopus" "Tetrao tetrix"
## [49] "Dryocopus maritus" "Serin serin"
## [51] "Vanellus vanellus" "Corvus corone"
## [53] "Turdus merula" "Corvus monedula"
## [55] "Columba palumbus" "Turtle Dove"
## [57] "Tetrastes bonasia" "Perdix perdix"
## [59] "Anas platyrhyncos" "Branta canadensis"
## [61] "Columba livia" "Numenius arguata"
## [63] "Turdus pilaris" "Turdus iliacus"
## [65] "Turdus philomelos" "Fringilla coelebs"
## [67] "Buteo lagopus" "Accipiter striatus"
## [69] "Falco sparverius" "Accipiter gentillis"
## [71] "Accipiter cooperii" "Buteo jamaicensis"
## [73] "Buteo platypterus" "Buteo swainsoni"
## [75] "Circus cyaneus" "Falco columbarius"
## [77] "Falco mexicanus" "Falco perigrinus"
## [79] "Wilsonia citrina" "Oporornis tolmiei"
## [81] "Wilsonia pusilla" "Homo sapiens"
## [83] "Charadrius montanus" "Anas platyrhynchos"
## [85] "Locustella luscinioides" "Acrocephalus arundinaceus"
## [87] "Acrocephalus scirpaceus" "Pipilo maculatus"
```

Load H isotope data for North American Loggerhead Shrike from the package database.

`= subOrigData(taxon = "Lanius ludovicianus", mask = naMap) Ll_d `

`## mask was reprojected`

`## 524 samples are found from 427 sites`

`## 524 samples from 427 sites in the transformed dataset`

By default, the `subOrigData`

function transforms all data
to a common reference scale (defined by the standard materials and
assigned, calibrated values for those; by default VSMOW-SLAP) using data
from co-analysis of different laboratory standards (see Magozzi et al.,
2021). The calibrations used are documented in the function’s return
object.

`$chains Ll_d`

```
## [[1]]
## [1] "OldEC.1_H_1" "EC_H_7" "EC_H_9" "VSMOW_H"
```

Information on these calibrations is contained in the
`stds.rda`

data file.

Transformation is important when blending data from different labs or papers because different reference scales have been used to calibrate published data and these calibrations are not always comparable. In this case all the data come from one paper:

`$sources[,1:3] Ll_d`

```
## Dataset_ID Dataset_name
## 2 2 Hobson et al. 2012 Plos
## Citation
## 2 Hobson KA, Van Wilgenburg SL, Wassenaar LI, Larson K. 2012. Linking hydrogen (d2H) isotopes in feathers and precipitation: sources of variance and consequences for assignment to isoscapes. Plos One 7:e35137
```

If we didn’t want to transform the data, and instead wished to use
the reference scale from the original publication, we can specify that
in our call to `subOrigData`

. Keep in mind that any
subsequent analyses using these data will be based on this calibration
scale: for example, if you wish to assign samples of unknown origin, the
values for those samples should be reported on the same scale.

`= subOrigData(taxon = "Lanius ludovicianus", mask = naMap, ref_scale = NULL) Ll_d `

`## mask was reprojected`

`## 524 samples are found from 427 sites`

`$sources$H_cal Ll_d`

`## [1] "OldEC.1_H_1"`

For a real application you would want to explore the database to find measurements that are appropriate to your study system (same or similar taxon, geographic region, measurement approach, etc.) or collect and import known-origin data that are specific to your system.

We need to start by assessing how the environmental (precipitation)
isoscape values correlate with the sample values. `calRaster`

fits a linear model relating the precipitation isoscape values to sample
values, and applies it to produce a calibrated, sample-type specific
isoscape.

`= calRaster(known = Ll_d, isoscape = d2h_lrNA, mask = naMap) d2h_Ll `

`## known was reprojected`

```
##
##
## ---------------------------------------
## ------------------------------------------
## rescale function uses linear regression model,
## the summary of this model is:
## -------------------------------------------
## --------------------------------------
##
## Call:
## lm(formula = tissue.iso ~ isoscape.iso[, 1], weights = tissue.iso.wt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -148.787 -16.584 0.353 19.239 107.727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.92452 1.77787 0.52 0.603
## isoscape.iso[, 1] 1.11204 0.03966 28.04 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 32.4 on 522 degrees of freedom
## Multiple R-squared: 0.601, Adjusted R-squared: 0.6002
## F-statistic: 786.2 on 1 and 522 DF, p-value: < 2.2e-16
```

Let’s create some hypothetical samples to use in demonstrating how we can evaluate the probability that the samples originated from different parts of the isoscape. The isotope values are drawn from a random distribution with a standard deviation of 8 per mil, which is a pretty reasonable variance for conspecific residents at a single location. We’ll also add made-up values for the analytical uncertainty for each sample and a column recording the calibration scale used for our measurements. If you had real measured data for your study samples you would load them here, instead.

```
= letters[1:5]
id set.seed(123)
= rnorm(5, -110, 8)
d2H = runif(5, 1.5, 2.5)
d2H.sd = rep("UT_H_1", 5)
d2H_cal = data.frame(id, d2H, d2H.sd, d2H_cal)
Ll_un print(Ll_un)
```

```
## id d2H d2H.sd d2H_cal
## 1 a -114.48381 2.456833 UT_H_1
## 2 b -111.84142 1.953334 UT_H_1
## 3 c -97.53033 2.177571 UT_H_1
## 4 d -109.43593 2.072633 UT_H_1
## 5 e -108.96570 1.602925 UT_H_1
```

As discussed above, one issue that must be considered with any
organic H or O isotope data is the reference scale used by the
laboratory producing the data. The reference scale for your unknown
samples should be the same as that for the known origin data used in
calRaster. Remember that the scale for our known origin data
`d`

is *OldEC.1_H_1*. Let’s assume that our fake data
were normalized to the *UT_H_1* scale. The `refTrans`

function allows us to convert between the two.

```
= refTrans(Ll_un, ref_scale = "OldEC.1_H_1")
Ll_un print(Ll_un)
```

```
## $data
## id d2H d2H.sd d2H_cal
## 1 a -127.8800 3.374984 OldEC.1_H_1
## 2 b -125.0795 2.970657 OldEC.1_H_1
## 3 c -109.6746 2.808334 OldEC.1_H_1
## 4 d -122.5874 3.016841 OldEC.1_H_1
## 5 e -121.7123 2.594943 OldEC.1_H_1
##
## $chains
## $chains[[1]]
## [1] "UT_H_1" "UT_H_2" "US_H_5" "US_H_6" "VSMOW_H"
## [6] "EC_H_9" "EC_H_7" "OldEC.1_H_1"
##
##
## attr(,"class")
## [1] "refTrans"
```

Notice that both the d2H values and the uncertainties have been updated to reflect the scale transformation.

Now we will produce posterior probability density maps for the unknown samples. For reference on the Bayesian inversion method see Wunder, 2010

`= pdRaster(d2h_Ll, unknown = Ll_un) Ll_prob `

Cell values in these maps are small because each cell’s value
represents the probability that this one cell, out of all of them on the
map, is the actual origin of the sample. Together, all cell values on
the map sum to ‘1’, reflecting the assumption that the sample originated
*somewhere* in the study area. Let’s check this for sample
‘a’.

`cellStats(Ll_prob[[1]], 'sum')`

`## [1] 1`

Check out the help page for `pdRaster`

for additional
options, including the use of informative prior probabilities.

We can also use multiple isoscapes to (potentially) add power to our
analyses. We will start by calibrating a H isoscape for the monarch
butterfly, *Danaus plexippus*.

`= subOrigData(taxon = "Danaus plexippus") Dp_d `

`## 208 samples are found from 32 sites`

`## 150 samples from 31 sites in the transformed dataset`

`= calRaster(Dp_d, d2h_lrNA) d2h_Dp `

`## known was reprojected`

```
##
##
## ---------------------------------------
## ------------------------------------------
## rescale function uses linear regression model,
## the summary of this model is:
## -------------------------------------------
## --------------------------------------
##
## Call:
## lm(formula = tissue.iso ~ isoscape.iso[, 1], weights = tissue.iso.wt)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -11.1010 -3.4064 0.3067 3.0780 8.7990
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -51.86981 1.98424 -26.14 <2e-16 ***
## isoscape.iso[, 1] 0.75543 0.04154 18.19 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.469 on 148 degrees of freedom
## Multiple R-squared: 0.6908, Adjusted R-squared: 0.6888
## F-statistic: 330.7 on 1 and 148 DF, p-value: < 2.2e-16
```

Our second isoscape represents ^{87}Sr/^{86}Sr values
across our study region, the state of Michigan. It was published by Bataille and Bowen,
2012, obtained from waterisotopes.org,
cropped and aggregated to coarser resolution, and a rough estimate of
uncertainty added.

In this case, we do not have any known-origin tissue samples to work with. However, our isoscape was developed to approximate the bioavailable Sr pool, and Sr isotopes are not strongly fractionated in food webs. Thus, our analysis will assume that the isoscape provides a good representation of the expected Sr values for our study species without calibration.

Let’s look at the Sr isoscape and compare it with our butterfly H isoscape.

```
data("sr_MI")
plot(sr_MI$weathered.mean)
```

`proj4string(sr_MI)`

`## [1] "+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +ellps=sphere +units=m +no_defs"`

`proj4string(d2h_Dp$isoscape.rescale)`

`## [1] "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"`

Notice that the we have two different spatial data objects, one for
Sr and one for d2H, and that they have different extents and
projections. In order to conduct a multi-isotope analysis, we’ll first
combine these into a single object using the `isoStack`

function. In addition to combining the objects, this function resolves
differences in their projection, resolution, and extent. It’s always a
good idea to check that the properties of the isoStack components are
consistent with your expectations.

```
= isoStack(d2h_Dp, sr_MI)
Dp_multi lapply(Dp_multi, proj4string)
```

```
## [[1]]
## [1] "+proj=longlat +ellps=WGS84 +no_defs"
##
## [[2]]
## [1] "+proj=longlat +ellps=WGS84 +no_defs"
```

`lapply(Dp_multi, bbox)`

```
## [[1]]
## min max
## s1 -91.73724 -80.36224
## s2 40.54362 48.93212
##
## [[2]]
## min max
## s1 -91.73724 -80.36224
## s2 40.54362 48.93212
```

Now we’ll generate a couple of hypothetical unknown samples to use in
our analysis. It is important that our isotopic markers appear here in
the same order as in the `isoStack`

object we created
above.

`= data.frame("ID" = c("A", "B"), "d2H" = c(-86, -96), "Sr" = c(0.7089, 0.7375)) Dp_unk `

We are ready to make our probability maps. First let’s see how our posterior probabilities would look if we only used the hydrogen isotope data.

`= pdRaster(Dp_multi[[1]], Dp_unk[,-3]) Dp_pd_Honly `

We see pretty clear distinctions between the two samples, driven by a strong SW-NE gradient in the tissue isoscape H values across the state.

What if we add the Sr information to the analysis? The syntax for
running `pdRaster`

is the same, but now we provide our
isoStack object in place of the single isoscape. The function will use
the spatial covariance of the isoscape values to approximate the error
covariance for the two (or more) markers and return posterior
probabilities based on the multivariate normal probability density
function evaluated at each grid cell.

`= pdRaster(Dp_multi, Dp_unk) Dp_pd_multi `

Note that the addition of Sr data greatly strengthens the geographic
constraints on our hypothetical unknown samples: the difference between
the highest and lowest posterior probabilities is much larger than with
H only, and the pattern of high probabilities reflects the
regionalization characteristic of the Sr isoscape. This is especially
true for sample B, which has a fairly distinctive, high
^{87}Sr/^{86}Sr value.

Many of the functions in **assignR** are designed to
help you analyze and draw inference from the posterior probability
surfaces we’ve created above. For the following examples we’ll return to
our single-isoscape, Loggerhead shrike analysis, but the tools work
identically for multi-isoscape results.

The `oddsRatio`

tool compares the posterior probabilities
for two different locations or regions. This might be useful in
answering real-world questions…for example “is this sample more likely
from France or Spain?”, or “how likely is this hypothesized location
relative to other possibilities?”.

Let’s compare probabilities for two spatial areas - the states of Utah and New Mexico. First we’ll load the SpatialPolygons and plot them.

```
data("states")
= states[states$STATE_ABBR == "UT",]
s1 = states[states$STATE_ABBR == "NM",]
s2 plot(naMap)
plot(s1, col = c("red"), add = TRUE)
plot(s2, col = c("blue"), add = TRUE)
```

Now we can get the odds ratio for the two regions. The result reports the odds ratio for the regions (first relative to second) for each of the 5 unknown samples plus the ratio of the areas of the regions. If the isotope values (& prior) were completely uninformative the odds ratios would equal the ratio of areas.

```
= rbind(s1, s2)
s12 oddsRatio(Ll_prob, s12)
```

```
## $`P1/P2 odds ratio`
## a b c d e
## 915.0020 653.3719 107.4004 485.0700 437.0938
##
## $`Ratio of numbers of cells in two polygons`
## [1] 2
```

Here you can see that even though Utah is quite a bit smaller the isotopic evidence suggests it’s much more likely to be the origin of each sample. This result is consistent with what you might infer from a first-order comparison of the state map with the posterior probability maps, above.

Comparisons can also be made using points. Let’s create two points (one in each of the Plover regions) and compare their odds. This result also shows the odds ratio for each point relative to the most- and least-likely grid cells on the posterior probability map.

```
= c(-112,40)
pp1 = c(-105,33)
pp2 = SpatialPoints(coords = rbind(pp1,pp2))
pp12 proj4string(pp12) = proj4string(naMap)
oddsRatio(Ll_prob, pp12)
```

`## inputP was reprojected`

```
## $`P1/P2 odds ratio`
## a b c d e
## 6217.4161 3952.0075 327.4901 2640.8761 2292.3554
##
## $`Odds relative to the max/min pixel`
## ratioToMax.a ratioToMax.b ratioToMax.c ratioToMax.d ratioToMax.e
## P1 3.010867e-01 3.620427e-01 0.873982278 0.4532542433 0.477526920
## P2 4.877685e-05 9.534945e-05 0.002709692 0.0001647915 0.000207839
## ratioToMin.a ratioToMin.b ratioToMin.c ratioToMin.d ratioToMin.e
## P1 3638241569 6564051 1045523273 1.873758e+09 783944970.4
## P2 201644 156533 32743104 2.987769e+03 248632.6
```

The odds of the first point being the location of origin are pretty high for each sample, and much higher than for the second point.

A common goal in movement research is to characterize the distance or
direction of movement for individuals. The `wDist`

tool and
it’s helper methods are designed to leverage the information in the
posterior probability surfaces for this purpose.

The analyses conducted in **assignR** cannot determine a
single unique location of origin for a given sample, but the do give the
probability that each location on the map is the location of origin. If
we know the collection location for a sample, we can calculate the
distance and direction between each possible location of origin and the
collection site, and weighting these by their posterior probability
generate a distribution (and statistics for that distribution)
describing the distance and direction of travel.

Let’s do a weighted distance analysis for our first two unknown
origin loggerhead shrike samples. Since these are pretend samples, we’ll
pretend that the two point locations we defined above for the
`oddsRatio`

analysis are the locations at which these samples
were collected. Here are those locations plotted with the corresponding
posterior probability maps. Then we’ll use the functions `c`

and `plot`

to view the summary statistics and distributions
returned by `wDist`

.

```
# View the data
plot(Ll_prob[[1]], main = names(Ll_prob)[1])
points(pp12[1])
```

```
plot(Ll_prob[[2]], main = names(Ll_prob)[2])
points(pp12[2])
```

Now let’s run the analysis and use the functions `c`

and
`plot`

to view the summary statistics and distributions
returned by `wDist`

.

```
= wDist(Ll_prob[[1:2]], pp12)
wd c(wd)[c(1,2,4,6,8,10,12,14,16)] #only showing select columns for formatting!
```

```
## Sample_ID wMeanDist w10Dist w50Dist w90Dist wMeanBear w10Bear w50Bear
## 1 a 2812114 1258374 3059893 4095615 -168.4333 114.1813 -159.4810
## 2 b 3447815 1987950 3495115 4950487 178.9947 113.8129 -177.3227
## w90Bear
## 1 -105.4942
## 2 -121.0331
```

`plot(wd)`

`## NULL`

Comparing these statistics and plots with the data shows how the
`wDist`

metrics nicely summarize the direction and distance
of movement. Both individuals almost certainly moved south from their
location of origin to the collection location. Individual a’s migration
may have been a little bit shorter than b’s, and in a more southwesterly
direction, patterns that are dominated more by the difference in
collection locations than the probability surfaces for location of
origin. Also notice the multi-modal distance distribution for individual
a…these can be common in `wDist`

summaries so it’s a good
ideal to look at the distributions themselves before choosing and
interpreting summary statistics.

Researchers often want to classify their study area in to regions
that are and are not likely to be the origin of the sample (effectively
‘assigning’ the sample to a part of the area). This requires choosing a
subjective threshold to define how much of the study domain is
represented in the assignment region. `qtlRaster`

offers two
choices.

Let’s extract 10% of the study area, giving maps that show the 10% of grid cells with the highest posterior probability for each sample.

`qtlRaster(Ll_prob, threshold = 0.1)`

```
## class : RasterStack
## dimensions : 17, 38, 646, 5 (nrow, ncol, ncell, nlayers)
## resolution : 2.999999, 2.999999 (x, y)
## extent : -165, -51.00005, 20.58329, 71.58327 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs
## names : a, b, c, d, e
## min values : 0, 0, 0, 0, 0
## max values : 1, 1, 1, 1, 1
```

Now we’ll instead extract 80% of the posterior probability density, giving maps that show the smallest region within which there is an 80% chance each sample originated.

`qtlRaster(Ll_prob, threshold = 0.8, thresholdType = "prob")`

```
## class : RasterStack
## dimensions : 17, 38, 646, 5 (nrow, ncol, ncell, nlayers)
## resolution : 2.999999, 2.999999 (x, y)
## extent : -165, -51.00005, 20.58329, 71.58327 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs
## names : a, b, c, d, e
## min values : 0, 0, 0, 0, 0
## max values : 1, 1, 1, 1, 1
```

Comparing the two results, the probability-based assignment regions are broader. This suggests that we’ll need to assign to more than 10% of the study area if we want to correctly assign 80% or more of our samples. We’ll revisit this below and see how we can chose thresholds that are as specific as possible while achieving a desired level of assignment ‘quality’.

Most studies involve multiple unknown samples, and often it is
desirable to summarize the results from these individuals.
`jointP`

and `unionP`

offer two options for
summarizing posterior probabilities from multiple samples.

`jointP`

calculates the probability that
**all** samples came from each grid cell in the analysis
area. Note that this summarization will only be useful if all samples
are truly derived from a single population of common geographic
origin.

`jointP(Ll_prob)`

```
## class : RasterLayer
## dimensions : 17, 38, 646 (nrow, ncol, ncell)
## resolution : 2.999999, 2.999999 (x, y)
## extent : -165, -51.00005, 20.58329, 71.58327 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs
## source : memory
## names : Joint_Probability
## values : 1.5326e-47, 0.02101926 (min, max)
```

`unionP`

calculates the probability that
**any** sample came from each grid cell in the analysis
area. In this case we’ll save the output to a variable for later
use.

`= unionP(Ll_prob) Ll_up `

The results from `unionP`

highlight a broader region, as
you might expect.

Any of the other post-hoc analysis tools can be applied to the
summarized results. Here we’ll use `qtlRaster`

to identify
the 10% of the study area that is most likely to be the origin of one or
more samples.

`qtlRaster(Ll_up, threshold = 0.1)`

```
## class : RasterLayer
## dimensions : 17, 38, 646 (nrow, ncol, ncell)
## resolution : 2.999999, 2.999999 (x, y)
## extent : -165, -51.00005, 20.58329, 71.58327 (xmin, xmax, ymin, ymax)
## crs : +proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs
## source : memory
## names : layer
## values : 0, 1 (min, max)
```

How good are the geographic assignments? What area or probability
threshold should be used? Is it better to use isoscape *A* or
*B* for my analysis? The `QA`

function is designed to
help answer these questions.

`QA`

uses known-origin data to test the quality of
isotope-based assignments and returns a set of metrics from this test.
The default method conducts a split-sample test, iteratively splitting
the dataset and using part to calibrate the isoscape(s) and the rest to
evaluate assignment quality. The option `recal = FALSE`

allows `QA`

to be run without the `calRaster`

calibration step. This provides a less complete assessment of
methodological error but allows evaluation of assignments to tissue
isoscapes made outside of the `QA`

function, for example
those calibrated using a different known-origin dataset or made through
spatial modeling of tissue data, directly.

We will run quality assessment on the Loggerhead shrike known-origin dataset and precipitation isoscape. These analyses take some time to run, depending on the number of stations and iterations used.

`= QA(Ll_d, d2h_lrNA, valiStation = 8, valiTime = 4, by = 5, mask = naMap, name = "normal") qa1 `

`## known was reprojected`

We can plot the result using `plot`

.

`plot(qa1)`

The first three panels show three metrics, granularity (higher is better), bias (closer to 1:1 is better), and sensitivity (higher is better). The second plot shows the posterior probabilities at the known locations of origin relative to random (=1, higher is better). More information is provided in Ma et al., 2020.

A researcher might refer to the sensitivity plot, for example, to
assess what `qtlRaster`

area threshold would be required to
obtain 90% correct assignments in their study system. Here it’s
somewhere between 0.25 and 0.3.

How would using a different isoscape or different known origin dataset affect the analysis? Multiple QA objects can be compared to make these types of assessments.

Let’s modify our isoscape to add some random noise.

```
= getValues(d2h_lrNA[[1]])
dv = dv + rnorm(length(dv), 0, 15)
dv = setValues(d2h_lrNA[[1]], dv)
d2h_fuzzy plot(d2h_fuzzy)
```

We’ll combine the fuzzy isoscape with the uncertainty layer from the
original isoscape, then rerun `QA`

using the new version.
Obviously this is not something you’d do in real work, but as an example
it allows us to ask the question “how would the quality of my
assignments change if my isoscape predictions were of reduced
quality?”.

```
= brick(d2h_fuzzy, d2h_lrNA[[2]])
d2h_fuzzy = QA(Ll_d, d2h_fuzzy, valiStation = 8, valiTime = 4, by = 5, mask = naMap, name = "fuzzy") qa2
```

`## mask was reprojected`

Now we can `plot`

to compare.

`plot(qa1, qa2)`

`## NULL`

Assignments made using the fuzzy isoscape are generally poorer than those made without fuzzing. Hopefully that’s not a surprise, but you might encounter cases where decisions about how to design your project or conduct your data analysis do have previously unknown or unexpected consequences. These types of comparisons can help reveal them!

Questions or comments? gabe.bowen@utah.edu