Loading Proxy Data

Now that the BPER package has been installed and loaded, load proxy data. Proxy data can be loaded into the model in a variety of formats. For example, you can directly load a data frame R object, or provide a path from the working directory to a .csv or .xlsx file. The function load_foram should be the first function used and will load and check the proxy data.

The input data should be formatted with 8 columns and n row(s), with columns ordered as: age, d11B, d11B2s, d11Bspec, Mg/Ca, Mg/Ca2s, d18O, d18O2s. The column order matters. Where ‘…2s’ indicates 2 sigma uncertainties.

You must indicate which modern species’ d11B vital effects are to be applied for each row of d11B data - this goes in the d11Bspec column. Use abbreviations including: Grub (G. ruber), Tsac (T. sacculifer), Ouni (O. universa), custom (specify values below using priors_foram_adj function), or borate.

Here, we’ll use low resolution ODP Site 1209 PETM data as an example and load it as a .csv. This .csv is stored as a system file in the package. For most applications, users should simply provide the file path from the directory to the .csv or .xlsx file in quotes for the foram_data argument in the load_foram function. Alternatively, users can load a data frame directly. Let’s take a look first at the raw data we will be loading.

print(read.csv(system.file("extdata", "PETM_VignetteData.csv", package = "BPER")))

##     age  d11B d11B2s d11Bspec Mg.Ca Mg.Ca2s  d18O d18O2s
## 1 55956 14.61   0.23   "Tsac"  3.26    0.10    NA     NA
## 2 55956 15.46   0.22   "Grub"    NA      NA    NA     NA
## 3 55954    NA     NA           3.33    0.10    NA     NA
## 4 55932 14.49   0.34   "Grub"  5.04    0.15 -2.14   0.16
## 5 55926 14.11   0.33   "Tsac"    NA      NA    NA     NA
## 6 55924    NA     NA           4.83    0.14 -1.98   0.16
## 7 55901 14.46   0.21   "Grub"    NA      NA -2.19   0.16
## 8 55899 13.94   0.29   "Tsac"    NA      NA -1.99   0.16

Blanks or NAs are okay in your data - you do not need to have every measurement for each sample. You can have repeat sample ages for replicates etc., and you may provide d11B data from multiple species. Just indicate which modern species’, or custom, calibration you’d like to use for each row in d11Bspec.

Now let’s run the load_foram function on the input data and save the output as an object called lf_out (this will be a list) in the global environment.

lf_out <- load_foram(foram_data = system.file("extdata", "PETM_VignetteData.csv", package = "BPER"))

Age Indexing Proxy Data

In the MCMC inversion, a proposed time series model of the suite of environmental variables is evaluated against real proxy measurements to determine the likelihood of the time series model. Thus, we must link our measurements to the appropriate time series model time step by age indexing our data. In essence, this will tell the proxy model which proxy data should be used to evaluate each time step of the time series model.

To do this, take the returned list from load_foram, which we called lf_out and use it for the load_proxy argument in the age_index function. Specify: the age_units used in the input data set (kyr or Myr are the only two options), the time step type (step_type): regular, every, both or custom.

Time step options

regular: regularly spaced intervals for time steps. If ‘regular’ is used, you must specify a value for the step_int argument (i.e., step interval; in the same units indicated for age_units).
every: there will be a time step at each age for which there is data. No further arguments are needed. for every unique age in the data set.
both: there will be a time step at each age for which there is data plus regular spaced time steps. step_int must be specified for if both is used.
custom: The user provides a vector of ages for which they want time steps using the cust_steps argument in the function.

Note that this information, and all the documentation on the functions and arguments within the functions can be easily accessed by typing ?<function name> in the command line. For example, type ?age_index into the command line. If using R studio, you should see the function documentation pop up in the help window.

For our example, let’s use regular for step_type with a 10 kyr step interval. This function returns a list. Let’s call that list ai_out.

ai_out <- age_index(load_proxy = lf_out, age_units = 'kyr', step_type = 'regular', step_int = 10)

Adjusting and Loading Prior Distributions

Adjusting parameter values and priors used in the MCMC inversion

Many of the parameter values, including those that define priors, can be revised using parms_foram_adjust function. There are two arguments in this function, parms2change and change_values. For parms2change provide a vector of character strings for the parameters you would like to adjust from default values. For change_values argument, provide a vector of numeric values which correspond to the parameter names in parms2change.

The user-changeable parameters are listed in the help documentation of this function (type ?parms_foram_adj). For environmental variables, BPER defaults to distributions typical of mid- to low- latitudes in the earliest Eocene. The user is expected to adjust environmental priors to values consistent with their study interval and locality. You can take a look at all of the default values for all of the user-accessible parameters by opening up one of the package system files. To do this, type in the console:

file.edit(system.file("extdata", "parms_in_foram.R", package = "BPER"))

To demonstrate how to adjust these parameters, we will change the parameters which define the prior distribution of the % of secondary calcite for the diagenetic correction on d18O. We will keep the remaining parameter values unchanged since these are consistent with our study interval and location.

parm_names <- c("seccal", "seccal.sd")
parm_values <- c(60, 5)

parms_foram_adj <- parms_foram_adj(parms2change = parm_names, change_values = parm_values)

Loading parameter values and priors used in the MCMC inversion

Next, load in any adjustments to default parameter and prior values. You may load in the adjusted parameters using the parms_foram_adj argument in the priors_foram function. This is what we will do for this example, however, you can leave this argument empty to maintain the default model parameter values.

Include the age_index return object (here we called it ai_out) in arguments. Specify which 2nd carbonate chemistry variable to use in CO2 calculations. Type ?priors_foram into the console for list of options.

We will also need to specify the type of prior to use for the 2nd carbonate chemistry variable with the cc2ndparm.pt argument:

You can choose t1, where the user specifies the prior distribution which will be sampled at time step 1, and this distribution is allowed to shift for subsequent time steps following an autocorrelated time series model.
Or you can choose ts, where the user supplies a time series record of their choosing which will be used to define the prior distribution at each time step. If ts is selected, the user must include time series data object for cc2ndparmTS argument consisting of 3 columns: age, mean value for 2nd carbonate chemistry variable, and 1sd for 2nd carbonate chemistry variable. This record should span the data interval, but values do not need to be included for every time step as the record will be linearly interpolated to create a time-continuous record.

Here, let’s demonstrate using DIC for the second carbonate chemistry variable. Let’s indicate ts for cc2ndparm.pt, and supply an input data.frame containing a DIC record which will be used to define the DIC prior at each time step. Here, we use a carbon cycle model result, and call it dic_pri.

Let’s run the priors_foram function and save the returned object as pf_out. Note that priors_foram can take several minutes to run as it includes MCMC inversions to generate modern foraminiferal d11Bf vs. d11B of borate calibrations.

dic_pri <- data.frame(c(56200, 55934, 55928, 55919, 55900, 55883), 
                      c(2320, 2328, 2405, 2440, 2481, 2500), 
                      c(150, 150, 150, 150, 150, 150)) 

pf_out <- priors_foram(parms_foram_adj = parms_foram_adj, 
                       age_index = ai_out, 
                       cc2ndparm.vt = 'dic', 
                       cc2ndparm.pt = 'ts', 
                       cc2ndparmTS = dic_pri)

## module glm loaded

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 6
##    Unobserved stochastic nodes: 5
##    Total graph size: 47
## 
## Initializing model
## 
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 10
##    Unobserved stochastic nodes: 7
##    Total graph size: 67
## 
## Initializing model
## 
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 38
##    Unobserved stochastic nodes: 21
##    Total graph size: 191
## 
## Initializing model

Running the MCMC Inversion

Now we will run the MCMC inversion by supplying the priors_foram return object (pf_out from above). Specify a vector of character strings indicating the parameters you would like saved in the MCMC output (save.parms argument). Also indicate the number of iterations (n.iter), the number of chains (n.chains; recommendation is to use from 3 to 9 chains depending on the application), the number of burn-in iterations (n.burnin; should be less than n.iter) and the amount of thinning (n.thin; i.e., saves every nth iteration).

Here, we will use a much lower number for iterations and burn-in than is recommended so that we can more quickly run the function for demonstration purposes. Even with lower iterations, the model can take several minutes to initialize and run. For this particular model, it is recommended to use on the order of hundreds of thousands of iterations, which may take several hours to days depending on the amount of proxy data included and computational capability.

inv_out <- run_inversion(priors = pf_out, 
                         save.parms = c("sal", "tempC", "xca", "xmg", "xso4", "d11Bsw", "d18Osw", "d18Osw.local","pco2", "dic", "pH", "press"), 
                         n.iter = 100, 
                         n.chains = 3, 
                         n.burnin = 30, 
                         n.thin = 1)

## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 14
##    Unobserved stochastic nodes: 89
##    Total graph size: 3052889
## 
## Initializing model

## Warning in run_inversion(priors = pf_out, save.parms = c("sal", "tempC", : Some
## parameters have n.eff statistical parameter values less than 50, consider
## running more iterations

## Warning in run_inversion(priors = pf_out, save.parms = c("sal", "tempC", : Some
## parameters have Rhat statistical parameter values greater than 1.03, consider
## running more iterations

A list is returned with MCMC data object jout as the first object. As you can see, the model generated warnings indicating that Rhat was high and n.eff was low.

Checking the Results

It is recommended that you take a look at the statistical parameters Rhat and n.eff to make sure they are acceptable and support a robust result. These parameters can be found in the jout MCMC data object:

inv_out$jout$BUGSoutput$summary

The sub-optimal values for these statistical parameters indicate we should increase the number of iterations to generate a more robust result.

It is recommended that chain convergence is confirmed using trace plots (i.e., the BPER trace_plot function). Specify the parameter of interest and the number of randomly selected trace plots for that parameter (n > 1 for time-dependent parameters). This parameter must be one of those listed in the save.parms argument used above in the run_inversion function.

Evaluating trace plots for chain convergence

Let’s take a look at 3 randomly selected trace plots for pCO2

trace_plot(inv_out = inv_out, parm = "pco2", n = 3)

## Registered S3 method overwritten by 'mcmcplots':
##   method        from  
##   as.mcmc.rjags R2jags

## NULL

Note that the chains (each chain is a different color) in these plots show poor convergence and mixing. Ideally, these chains should overlap often and explore roughly the same parameter space. This likely means that the posterior distributions are not reliable and more iterations and/or burn-in should be used. For now let’s move on and take a look at some of the output plots.

Plotting Results

You can use the post_plot function to generate a time series plot of the posterior distributions for a time-dependent parameter. For type argument, the user can select either CI to plot the 95% credible interval, or draws for random draws of time series posteriors. 500 random draws is default, but this can be adjusted using the n.draws argument.

Let’s try out both types of time series plots using our pCO2 posteriors:

post_plot(inv_out = inv_out, parm = "pco2", type = "CI")

## NULL

post_plot(inv_out = inv_out, parm = "pco2", type = "draws", n.draws = 800)

## NULL

Note that our resulting pCO2 reconstruction is poorly constrained (e.g., the relative change is somewhat strange showing a linear increase that doesn’t closely reflect d11B data, which may or may not be suspect). This poor reconstruction is likely because we ran with low iteration and burn-in values, and there was poor chain convergence for pCO2.

Use the post_plot_ind function to generate a density plot for a parameter of interest. For time-dependent parameters, the time step to be plotted must be specified. You can also turn on/off the median line in the plot, include the prior distribution, and turn on/off and relocate the legend. Here is an example using pH as our parameter of interest:

post_plot_ind(inv_out = inv_out, parm = "pH", tstep_age = 55944, show.median = TRUE, show.legend = TRUE, show.prior = TRUE, leg.pos = "topright")

## Warning in post_plot_ind(inv_out = inv_out, parm = "pH", tstep_age = 55944, :
## 'tstep_age' is not a value in 'ages_prox'. The closest time step to 'tstep_age'
## in 'ages_prox' vector has been used.

## NULL

Note that the tstep_age we specified above resulted in a warning, but the function still runs. This is because the tstep_age value we included does not correspond to any of the time step ages used in the inversion. By default, the post_plot_ind function uses the closest time step age to the specified tstep_age. The ages used for each inversion time step are listed in the ages_prox vector. This vector can be accessed from the inv_out list:

inv_out$ages_prox

## [1] 55951 55941 55931 55921 55911 55901

It’s also worth noting that the pH distribution for this time step (gray distribution) may fall outside the prior distribution for the parameter (red distribution). This is because the prior pH distribution is sampled at time step 1, and subsequent time steps are allowed to further explore parameter space beyond the prior, within realistic bounds, according to the BPER time series model.

Carbonate Chemistry Calculations

Last, BPER has a built in carbonate chemistry calculator. This calculator follows Zeebe and Wolf-Gladrow (2001) and references therein, with added adjustments to account for major ion concentration on equilibrium chemistry following Zeebe and Tyrrell (2019).

This allows you to compute other carbonate chemistry variables of interest from the inv_out object, or a suite of user-provided vectors. Simply specify which carbonate chemistry variable (ccparm2) you would like to use in addition to pH, to calculate the other carbonate chemistry variable of interest (ccout). If using inv_out argument to specify carbonate chemistry values, the ccparm2 variable will need to be one of the variables specified in save.parms argument of run_inversion function.

The current version only uses median variable values from the Bayesian inversion in inv_out to compute carbonate chemistry. Incorporation of uncertainty (i.e., Monte Carlo sampling of posteriors) is in the works.

carbchem(ccparm2 = "pco2", ccout = "alk", inv_out = inv_out)

## [1] 0.002557721 0.002527981 0.002583889 0.002604856 0.002610899 0.002654425

A vector of alkalinity corresponding to the time steps of the inversion (i.e., inv_out$ages_prox) is returned.

Foram Data Example

Dustin Harper and Gabriel Bowen

June 04, 2024

Setup and Installation