Setup and Introduction

This vignette demonstrates the latest features from the development version, which is installed via GitHub.

library(devtools)
install_github("SPATIAL-Lab/assignR@*release")

We will introduce the basic functionality of assignR using data bundled with the package. We’ll review how to access data for known-origin biological samples and environmental models, use these to fit and apply functions estimating the probability of sample origin across a study region, and summarize these results to answer research and conservation questions. We’ll also demonstrate a quality analysis tool useful in study design, method comparison, and uncertainty analysis.


Let’s load assignR and another package we’ll need.

library(assignR)
library(terra)

Now use data from the package to plot a simplified North America boundary mask.

plot(naMap)


Let’s do the same for a growing season precipitation H isoscape for North America. Notice this is a spatial raster (SpatRaster) with two layers, the mean prediction and a standard error of the prediction. The layers are from waterisotopes.org, and their resolution has been reduced to speed up processing in these examples. Full-resolution isoscapes of several different types can be downloaded using the getIsoscapes function (refer to the help page for details).

plot(d2h_lrNA)


The package includes a database of H and O isotope data for known origin samples (knownOrig.rda), which consists of three features (sites, samples, and sources). Let’s load it and have a look. First we’ll get the names of the data fields available in the tables.

names(knownOrig$sites)
## [1] "Site_ID"    "Site_name"  "State"      "Country"    "Site_comme"
names(knownOrig$samples)
##  [1] "Sample_ID"       "Sample_ID_orig"  "Site_ID"         "Dataset_ID"      "Taxon"           "Group"          
##  [7] "Source_quality"  "Age_class"       "Material_type"   "Matrix"          "d2H"             "d2H.sd"         
## [13] "d18O"            "d18O.sd"         "Sample_comments"
names(knownOrig$sources)
##  [1] "Dataset_ID"              "Dataset_name"            "Citation"                "Sampling_method"        
##  [5] "Sample_powdered"         "Lipid_extraction"        "Lipid_extraction_method" "Exchange"               
##  [9] "Exchange_method"         "Exchange_T"              "H_cal"                   "O_cal"                  
## [13] "Std_powdered"            "Drying"                  "Analysis_method"         "Analysis_type"          
## [17] "Source_comments"

The sites feature is a spatial object that records the geographic location of all sites from which samples are available.

plot(wrld_simpl)
points(knownOrig$sites, col = "red")

Now lets look at a list of species names available.

unique(knownOrig$samples$Taxon)
##  [1] "Danaus plexippus"            "Setophaga ruticilla"         "Turdus migratorius"          "Setophaga coronata auduboni"
##  [5] "Poecile atricapillus"        "Thryomanes bewickii"         "Thryothorus ludovicianus"    "Spizella passerina"         
##  [9] "Geothlypis trichas"          "Setophaga pensylvanica"      "Baeolophus bicolor"          "Vermivora chrysoptera"      
## [13] "Catharus guttatus"           "Setophaga citrina"           "Geothlypis formosa"          "Geothlypis tolmiei"         
## [17] "Oreothlypis ruficapilla"     "Cardinalis cardinalis"       "Oreothlypis celata"          "Junco hyemalis oregonus"    
## [21] "Seiurus aurocapilla"         "Vireo olivaceus"             "Melospiza melodia"           "Catharus ustulatus"         
## [25] "Catharus fuscescens"         "Vireo griseus"               "Cardellina pusilla"          "Hylocichla mustelina"       
## [29] "Icteria virens"              "Setophaga petechia"          "Melozone aberti"             "Vermivora cyanoptera"       
## [33] "Passer domesticus"           "Aimophila ruficeps"          "Poecile carolinensis"        "Troglodytes aedon"          
## [37] "Dumetella carolinensis"      "Mniotilta varia"             "Lanius ludovicianus"         "Anthus spragueii"           
## [41] "Euphagus carolinus"          "Empidonax minimus"           "Oreothlypis peregrina"       "Aythya affinis"             
## [45] "Cyanistes caeruleus"         "Phasianus colchicus"         "Lagopus lagopus"             "Tetrao tetrix"              
## [49] "Dryocopus maritus"           "Serin serin"                 "Vanellus vanellus"           "Corvus corone"              
## [53] "Turdus merula"               "Corvus monedula"             "Columba palumbus"            "Turtle Dove"                
## [57] "Tetrastes bonasia"           "Perdix perdix"               "Anas platyrhyncos"           "Branta canadensis"          
## [61] "Columba livia"               "Numenius arguata"            "Turdus pilaris"              "Turdus iliacus"             
## [65] "Turdus philomelos"           "Fringilla coelebs"           "Buteo lagopus"               "Accipiter striatus"         
## [69] "Falco sparverius"            "Accipiter gentillis"         "Accipiter cooperii"          "Buteo jamaicensis"          
## [73] "Buteo platypterus"           "Buteo swainsoni"             "Circus cyaneus"              "Falco columbarius"          
## [77] "Falco mexicanus"             "Falco perigrinus"            "Wilsonia citrina"            "Oporornis tolmiei"          
## [81] "Wilsonia pusilla"            "Homo sapiens"                "Charadrius montanus"         "Anas platyrhynchos"         
## [85] "Locustella luscinioides"     "Acrocephalus arundinaceus"   "Acrocephalus scirpaceus"     "Pipilo maculatus"

Load H isotope data for North American Loggerhead Shrike from the package database.

Ll_d = subOrigData(taxon = "Lanius ludovicianus", mask = naMap)
## 524 samples are found from 427 sites
## 524 samples from 427 sites in the transformed dataset

By default, the subOrigData function transforms all data to a common reference scale (defined by the standard materials and assigned, calibrated values for those; by default VSMOW-SLAP) using data from co-analysis of different laboratory standards (see Magozzi et al., 2021). The calibrations used are documented in the function’s return object.

Ll_d$chains
## [[1]]
## [1] "OldEC.1_H_1" "EC_H_7"      "EC_H_9"      "VSMOW_H"

Information on these calibrations is contained in the stds.rda data file.

Transformation is important when blending data from different labs or papers because different reference scales have been used to calibrate published data and these calibrations are not always comparable. In this case all the data come from one paper:

Ll_d$sources[,1:3]
##   Dataset_ID            Dataset_name
## 2          2 Hobson et al. 2012 Plos
##                                                                                                                                                                                                         Citation
## 2 Hobson KA, Van Wilgenburg SL, Wassenaar LI, Larson K. 2012. Linking hydrogen (d2H) isotopes in feathers and precipitation: sources of variance and consequences for assignment to isoscapes. Plos One 7:e35137

If we didn’t want to transform the data, and instead wished to use the reference scale from the original publication, we can specify that in our call to subOrigData. Keep in mind that any subsequent analyses using these data will be based on this calibration scale: for example, if you wish to assign samples of unknown origin, the values for those samples should be reported on the same scale.

Ll_d = subOrigData(taxon = "Lanius ludovicianus", mask = naMap, ref_scale = NULL)
## 524 samples are found from 427 sites

Ll_d$sources$H_cal
## [1] "OldEC.1_H_1"

For a real application you would want to explore the database to find measurements that are appropriate to your study system (same or similar taxon, geographic region, measurement approach, etc.) or collect and import known-origin data that are specific to your system.


Single-isoscape Analysis

We need to start by assessing how the environmental (precipitation) isoscape values correlate with the sample values. calRaster fits a linear model relating the precipitation isoscape values to sample values, and applies it to produce a calibrated, sample-type specific isoscape.

d2h_Ll = calRaster(known = Ll_d, isoscape = d2h_lrNA, mask = naMap)
## 
## 
## ---------------------------------------
##         ------------------------------------------
## rescale function uses linear regression model, 
##         the summary of this model is:
## -------------------------------------------
##         --------------------------------------
## 
## Call:
## lm(formula = tissue.iso ~ isoscape.iso[, 1], weights = tissue.iso.wt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -149.640  -16.474    0.743   19.946  107.625 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        2.20058    1.81952   1.209    0.227    
## isoscape.iso[, 1]  1.12628    0.04019  28.026   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.41 on 522 degrees of freedom
## Multiple R-squared:  0.6007, Adjusted R-squared:    0.6 
## F-statistic: 785.4 on 1 and 522 DF,  p-value: < 2.2e-16