Using exPrior with multiple types of data

Falk Heße

2019-11-13

The exPrior package is written in a flexible manner, such that it can assimilate data that come in the form of measurements, bounds, or moments. To exemplify this flexibility, let us use in this example synthetic data from three sites called S1, S2, and S3. From Site S1 we have data in the form of bounds, where the minimum value of a property of S1 is 2, and its maximum value is 4. Site S2 has data in the form of moments, where the first moment, or site mean, is 2, while the second moment, or site variance, is 0.1. Finally, site S3 has three measurements. The code below shows how to format the data in R such that it can be read into genExPrior().

library(devtools)
## Loading required package: usethis
load_all()
## Warning: 1 components of `...` were not used.
## 
## We detected these problematic arguments:
## * `action`
## 
## Did you misspecify an argument?
## Loading exPrior
## Loading required package: nimble
## nimble version 0.8.0 is loaded.
## For more information on NIMBLE and a User Manual,
## please visit http://R-nimble.org.
## 
## Attaching package: 'nimble'
## The following object is masked from 'package:stats':
## 
##     simulate
## 
## Attaching package: 'testthat'
## The following object is masked from 'package:devtools':
## 
##     test_file
library(exPrior)

Under the assumption that the site specific parameter follows a normal distribution, the function genExPrior takes in three parameters. First, exdata is a data frame where the first column contains the data and the second column is a site index where the data come from. Second, $\theta$ is a vector of numerical values where to evaluate the prior distribution. Finally, niter is an integer for the sample size in the MCMC that is used to evaluate unknown \(\mu_i, \sigma^2_j\) at each site i ( i = 1, 2, 3 in our case). By default it is set to \(10^5\), which is an effective sample size for MCMC. Users are free to choose a different sample size. Putting the data of the three sites into dataframe, we have:

exdata_S1 = data.frame(val=c(2,4), site_id=rep('S1',2), type=c('bound.min','bound.max'))
exdata_S2 = data.frame(val=c(2,0.1), site_id=rep('S2',2), type=c('moment.1','moment.2'))
exdata_S3 = data.frame(val=c(2,3,4), site_id=rep('S3',3), type=c('meas','meas','meas'))
exdata_multitype <- rbind(exdata_S1, exdata_S2, exdata_S3)
exdata_multitype
##   val site_id      type
## 1 2.0      S1 bound.min
## 2 4.0      S1 bound.max
## 3 2.0      S2  moment.1
## 4 0.1      S2  moment.2
## 5 2.0      S3      meas
## 6 3.0      S3      meas
## 7 4.0      S3      meas
theta = seq(from=-10, to=10, by=0.1)

Running genExPrior with these arguments, we attain the prior distribution for \(\theta\) as well as the posterior hyperparameters of our Bayesian hierarchical model.

resExPrior = genExPrior(exdata = exdata_multitype, theta = theta)
## defining model...
## building model...
## setting data and initial values...
## running calculate on model (any error reports that follow may simply reflect missing values in model variables) ... 
## checking model sizes and dimensions... This model is not fully initialized. This is not an error. To see which variables are not initialized, use model$initializeInfo(). For more information on model initialization, see help(modelInitialization).
## model building finished.
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## [1] conjugate_dnorm_dnorm sampler: alpha
## [2] RW sampler: chSqTau
## [3] RW sampler: sigma
## [4] RW sampler: xiTau_negOrPos
## [5] conjugate_dnorm_dnorm sampler: mu[1]
## [6] conjugate_dnorm_dnorm sampler: mu[2]
## [7] conjugate_dnorm_dnorm sampler: mu[3]
## thin = 1: alpha, chSqTau, sigma, xiTau_negOrPos, tau
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|

If the distribution of the parameter is not normal, genExPrior provides an option to transform the distribution to normal under user’s choices. Two types of Johnson transformation, logarithm and log ratio, as well as Box-Cox transformation are provided. Lower and upper limit of log ratio, and value of \(\lambda\) for Box-Cox transformation should be chosen so that the transformed data has normal distribution.

First, let us look at the posteriors of the hyperparameters, which are conditioned on the data in the exdata data frame. To that end, we use the function plotHyperDist with the results from genExPrior as input

plotHyperDist(resExPrior)

Then, we can visualize both the uninformative and informative distribution of \(\theta\) using plotExPrior. This function again takes as input the output from genExPrior as well as a Boolean asking whether to additionally plot the used data.

plotExPrior(resExPrior, plotExData = FALSE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.