Here is an example of how to use BPER functions to interpret foraminiferal data.
First let’s install BPER from GitHub. You will need to load the devtools package to do this.
Now that the BPER package has been installed and
loaded, load proxy data. Proxy data can be loaded into the model in a
variety of formats. For example, you can directly load a data frame R
object, or provide a path from the working directory to a .csv or .xlsx
file. The function load_foram
should be the first function
used and will load and check the proxy data.
The input data should be formatted with 8 columns and n row(s), with columns ordered as: age, d11B, d11B2s, d11Bspec, Mg/Ca, Mg/Ca2s, d18O, d18O2s. The column order matters. Where ‘…2s’ indicates 2 sigma uncertainties.
You must indicate which modern species’ d11B vital effects are to be
applied for each row of d11B data - this goes in the d11Bspec column.
Use abbreviations including: Grub
(G. ruber),
Tsac
(T. sacculifer), Ouni
(O.
universa), custom
(specify values below using
priors_foram_adj
function), or borate
.
Here, we’ll use low resolution ODP Site 1209 PETM data as an example
and load it as a .csv. This .csv is stored as a system file in the
package. For most applications, users should simply provide the file
path from the directory to the .csv or .xlsx file in quotes for the
foram_data
argument in the load_foram
function. Alternatively, users can load a data frame directly. Let’s
take a look first at the raw data we will be loading.
## age d11B d11B2s d11Bspec Mg.Ca Mg.Ca2s d18O d18O2s
## 1 55956 14.61 0.23 "Tsac" 3.26 0.10 NA NA
## 2 55956 15.46 0.22 "Grub" NA NA NA NA
## 3 55954 NA NA 3.33 0.10 NA NA
## 4 55932 14.49 0.34 "Grub" 5.04 0.15 -2.14 0.16
## 5 55926 14.11 0.33 "Tsac" NA NA NA NA
## 6 55924 NA NA 4.83 0.14 -1.98 0.16
## 7 55901 14.46 0.21 "Grub" NA NA -2.19 0.16
## 8 55899 13.94 0.29 "Tsac" NA NA -1.99 0.16
Blanks or NAs are okay in your data - you do not need to have every
measurement for each sample. You can have repeat sample ages for
replicates etc., and you may provide d11B data from multiple species.
Just indicate which modern species’, or custom
, calibration
you’d like to use for each row in d11Bspec.
Now let’s run the load_foram
function on the input data
and save the output as an object called lf_out
(this will
be a list) in the global environment.
lf_out <- load_foram(foram_data = system.file("extdata", "PETM_VignetteData.csv", package = "BPER"))
In the MCMC inversion, a proposed time series model of the suite of environmental variables is evaluated against real proxy measurements to determine the likelihood of the time series model. Thus, we must link our measurements to the appropriate time series model time step by age indexing our data. In essence, this will tell the proxy model which proxy data should be used to evaluate each time step of the time series model.
To do this, take the returned list from load_foram
,
which we called lf_out
and use it for the
load_proxy
argument in the age_index
function.
Specify: the age_units
used in the input data set
(kyr
or Myr
are the only two options), the
time step type (step_type
): regular
,
every
, both
or custom
.
regular
: regularly spaced intervals for time steps.
If ‘regular’ is used, you must specify a value for the
step_int
argument (i.e., step interval; in the same units
indicated for age_units
).
every
: there will be a time step at each age for
which there is data. No further arguments are needed. for every unique
age in the data set.
both
: there will be a time step at each age for
which there is data plus regular spaced time steps.
step_int
must be specified for if both
is
used.
custom
: The user provides a vector of ages for which
they want time steps using the cust_steps
argument in the
function.
Note that this information, and all the documentation on the
functions and arguments within the functions can be easily accessed by
typing ?<function name>
in the command line. For
example, type ?age_index
into the command line. If using R
studio, you should see the function documentation pop up in the help
window.
For our example, let’s use regular
for
step_type
with a 10 kyr step interval. This function
returns a list. Let’s call that list ai_out
.
Many of the parameter values, including those that define priors, can
be revised using parms_foram_adjust
function. There are two
arguments in this function, parms2change
and
change_values
. For parms2change
provide a
vector of character strings for the parameters you would like to adjust
from default values. For change_values
argument, provide a
vector of numeric values which correspond to the parameter names in
parms2change
.
The user-changeable parameters are listed in the help documentation
of this function (type ?parms_foram_adj
). For environmental
variables, BPER defaults to distributions typical of
mid- to low- latitudes in the earliest Eocene. The user is
expected to adjust environmental priors to values consistent with their
study interval and locality. You can take a look at all of
the default values for all of the user-accessible parameters by opening
up one of the package system files. To do this, type in the console:
file.edit(system.file("extdata", "parms_in_foram.R", package = "BPER"))
To demonstrate how to adjust these parameters, we will change the parameters which define the prior distribution of the % of secondary calcite for the diagenetic correction on d18O. We will keep the remaining parameter values unchanged since these are consistent with our study interval and location.
Next, load in any adjustments to default parameter and prior values.
You may load in the adjusted parameters using the
parms_foram_adj
argument in the priors_foram
function. This is what we will do for this example, however, you can
leave this argument empty to maintain the default model parameter
values.
Include the age_index
return object (here we called it
ai_out
) in arguments. Specify which 2nd carbonate chemistry
variable to use in CO2 calculations. Type ?priors_foram
into the console for list of options.
We will also need to specify the type of prior to use for the 2nd
carbonate chemistry variable with the cc2ndparm.pt
argument:
You can choose t1
, where the user specifies the
prior distribution which will be sampled at time step 1, and this
distribution is allowed to shift for subsequent time steps following an
autocorrelated time series model.
Or you can choose ts
, where the user supplies a time
series record of their choosing which will be used to define the prior
distribution at each time step. If ts
is selected, the user
must include time series data object for cc2ndparmTS
argument consisting of 3 columns: age, mean value for 2nd carbonate
chemistry variable, and 1sd for 2nd carbonate chemistry variable. This
record should span the data interval, but values do not need to be
included for every time step as the record will be linearly interpolated
to create a time-continuous record.
Here, let’s demonstrate using DIC for the second carbonate chemistry
variable. Let’s indicate ts
for cc2ndparm.pt
,
and supply an input data.frame containing a DIC record which will be
used to define the DIC prior at each time step. Here, we use a carbon
cycle model result, and call it dic_pri
.
Let’s run the priors_foram
function and save the
returned object as pf_out
. Note that
priors_foram
can take several minutes to run as it includes
MCMC inversions to generate modern foraminiferal d11Bf vs. d11B of
borate calibrations.
dic_pri <- data.frame(c(56200, 55934, 55928, 55919, 55900, 55883),
c(2320, 2328, 2405, 2440, 2481, 2500),
c(150, 150, 150, 150, 150, 150))
pf_out <- priors_foram(parms_foram_adj = parms_foram_adj,
age_index = ai_out,
cc2ndparm.vt = 'dic',
cc2ndparm.pt = 'ts',
cc2ndparmTS = dic_pri)
## module glm loaded
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 6
## Unobserved stochastic nodes: 5
## Total graph size: 47
##
## Initializing model
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 10
## Unobserved stochastic nodes: 7
## Total graph size: 67
##
## Initializing model
##
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 38
## Unobserved stochastic nodes: 21
## Total graph size: 191
##
## Initializing model
Now we will run the MCMC inversion by supplying the
priors_foram
return object (pf_out
from
above). Specify a vector of character strings indicating the parameters
you would like saved in the MCMC output (save.parms
argument). Also indicate the number of iterations (n.iter
),
the number of chains (n.chains
; recommendation is to use
from 3 to 9 chains depending on the application), the number of burn-in
iterations (n.burnin
; should be less than
n.iter
) and the amount of thinning (n.thin
;
i.e., saves every nth iteration).
Here, we will use a much lower number for iterations and burn-in than is recommended so that we can more quickly run the function for demonstration purposes. Even with lower iterations, the model can take several minutes to initialize and run. For this particular model, it is recommended to use on the order of hundreds of thousands of iterations, which may take several hours to days depending on the amount of proxy data included and computational capability.
inv_out <- run_inversion(priors = pf_out,
save.parms = c("sal", "tempC", "xca", "xmg", "xso4", "d11Bsw", "d18Osw", "d18Osw.local","pco2", "dic", "pH", "press"),
n.iter = 100,
n.chains = 3,
n.burnin = 30,
n.thin = 1)
## Compiling model graph
## Resolving undeclared variables
## Allocating nodes
## Graph information:
## Observed stochastic nodes: 14
## Unobserved stochastic nodes: 89
## Total graph size: 3052889
##
## Initializing model
## Warning in run_inversion(priors = pf_out, save.parms = c("sal", "tempC", : Some
## parameters have n.eff statistical parameter values less than 50, consider
## running more iterations
## Warning in run_inversion(priors = pf_out, save.parms = c("sal", "tempC", : Some
## parameters have Rhat statistical parameter values greater than 1.03, consider
## running more iterations
A list is returned with MCMC data object jout
as the
first object. As you can see, the model generated warnings indicating
that Rhat
was high and n.eff
was low.
It is recommended that you take a look at the statistical parameters
Rhat
and n.eff
to make sure they are
acceptable and support a robust result. These parameters can be found in
the jout
MCMC data object:
The sub-optimal values for these statistical parameters indicate we should increase the number of iterations to generate a more robust result.
It is recommended that chain convergence is confirmed using trace
plots (i.e., the BPER trace_plot
function). Specify the parameter of interest and the number of randomly
selected trace plots for that parameter (n > 1 for time-dependent
parameters). This parameter must be one of those listed in the
save.parms
argument used above in the
run_inversion
function.
Let’s take a look at 3 randomly selected trace plots for pCO2
## Registered S3 method overwritten by 'mcmcplots':
## method from
## as.mcmc.rjags R2jags
## NULL
Note that the chains (each chain is a different color) in these plots show poor convergence and mixing. Ideally, these chains should overlap often and explore roughly the same parameter space. This likely means that the posterior distributions are not reliable and more iterations and/or burn-in should be used. For now let’s move on and take a look at some of the output plots.
You can use the post_plot
function to generate a time
series plot of the posterior distributions for a time-dependent
parameter. For type
argument, the user can select either
CI
to plot the 95% credible interval, or draws
for random draws of time series posteriors. 500 random draws is default,
but this can be adjusted using the n.draws
argument.
Let’s try out both types of time series plots using our pCO2 posteriors:
## NULL
## NULL
Note that our resulting pCO2 reconstruction is poorly constrained (e.g., the relative change is somewhat strange showing a linear increase that doesn’t closely reflect d11B data, which may or may not be suspect). This poor reconstruction is likely because we ran with low iteration and burn-in values, and there was poor chain convergence for pCO2.
Use the post_plot_ind
function to generate a density
plot for a parameter of interest. For time-dependent parameters, the
time step to be plotted must be specified. You can also turn on/off the
median line in the plot, include the prior distribution, and turn on/off
and relocate the legend. Here is an example using pH as our parameter of
interest:
post_plot_ind(inv_out = inv_out, parm = "pH", tstep_age = 55944, show.median = TRUE, show.legend = TRUE, show.prior = TRUE, leg.pos = "topright")
## Warning in post_plot_ind(inv_out = inv_out, parm = "pH", tstep_age = 55944, :
## 'tstep_age' is not a value in 'ages_prox'. The closest time step to 'tstep_age'
## in 'ages_prox' vector has been used.
## NULL
Note that the tstep_age
we specified above resulted in a
warning, but the function still runs. This is because the
tstep_age
value we included does not correspond to any of
the time step ages used in the inversion. By default, the
post_plot_ind
function uses the closest time step age to
the specified tstep_age
. The ages used for each inversion
time step are listed in the ages_prox
vector. This vector
can be accessed from the inv_out
list:
## [1] 55951 55941 55931 55921 55911 55901
It’s also worth noting that the pH distribution for this time step (gray distribution) may fall outside the prior distribution for the parameter (red distribution). This is because the prior pH distribution is sampled at time step 1, and subsequent time steps are allowed to further explore parameter space beyond the prior, within realistic bounds, according to the BPER time series model.
Last, BPER has a built in carbonate chemistry calculator. This calculator follows Zeebe and Wolf-Gladrow (2001) and references therein, with added adjustments to account for major ion concentration on equilibrium chemistry following Zeebe and Tyrrell (2019).
This allows you to compute other carbonate chemistry variables of
interest from the inv_out
object, or a suite of
user-provided vectors. Simply specify which carbonate chemistry variable
(ccparm2
) you would like to use in addition to pH, to
calculate the other carbonate chemistry variable of interest
(ccout
). If using inv_out
argument to specify
carbonate chemistry values, the ccparm2
variable will need
to be one of the variables specified in save.parms
argument
of run_inversion
function.
The current version only uses median variable values from the
Bayesian inversion in inv_out
to compute carbonate
chemistry. Incorporation of uncertainty (i.e., Monte Carlo sampling of
posteriors) is in the works.
## [1] 0.002557721 0.002527981 0.002583889 0.002604856 0.002610899 0.002654425
A vector of alkalinity corresponding to the time steps of the
inversion (i.e., inv_out$ages_prox
) is returned.
For questions, comments or to report an issue contact: dustin.t.harper@utah.edu