The exPrior
package is written in a flexible manner, such that it can assimilate data that come in the form of measurements, bounds, or moments. To exemplify this flexibility, let us use in this example synthetic data from three sites called S1, S2, and S3. From Site S1 we have data in the form of bounds, where the minimum value of a property of S1 is 2, and its maximum value is 4. Site S2 has data in the form of moments, where the first moment, or site mean, is 2, while the second moment, or site variance, is 0.1. Finally, site S3 has three measurements. The code below shows how to format the data in R such that it can be read into genExPrior()
.
library(devtools)
## Loading required package: usethis
load_all()
## Warning: 1 components of `...` were not used.
##
## We detected these problematic arguments:
## * `action`
##
## Did you misspecify an argument?
## Loading exPrior
## Loading required package: nimble
## nimble version 0.8.0 is loaded.
## For more information on NIMBLE and a User Manual,
## please visit http://R-nimble.org.
##
## Attaching package: 'nimble'
## The following object is masked from 'package:stats':
##
## simulate
##
## Attaching package: 'testthat'
## The following object is masked from 'package:devtools':
##
## test_file
library(exPrior)
Under the assumption that the site specific parameter follows a normal distribution, the function genExPrior
takes in three parameters. First, exdata
is a data frame where the first column contains the data and the second column is a site index where the data come from. Second, $\theta$
is a vector of numerical values where to evaluate the prior distribution. Finally, niter
is an integer for the sample size in the MCMC that is used to evaluate unknown \(\mu_i, \sigma^2_j\) at each site i ( i = 1, 2, 3 in our case). By default it is set to \(10^5\), which is an effective sample size for MCMC. Users are free to choose a different sample size. Putting the data of the three sites into dataframe, we have:
exdata_S1 = data.frame(val=c(2,4), site_id=rep('S1',2), type=c('bound.min','bound.max'))
exdata_S2 = data.frame(val=c(2,0.1), site_id=rep('S2',2), type=c('moment.1','moment.2'))
exdata_S3 = data.frame(val=c(2,3,4), site_id=rep('S3',3), type=c('meas','meas','meas'))
exdata_multitype <- rbind(exdata_S1, exdata_S2, exdata_S3)
exdata_multitype
## val site_id type
## 1 2.0 S1 bound.min
## 2 4.0 S1 bound.max
## 3 2.0 S2 moment.1
## 4 0.1 S2 moment.2
## 5 2.0 S3 meas
## 6 3.0 S3 meas
## 7 4.0 S3 meas
theta = seq(from=-10, to=10, by=0.1)
Running genExPrior
with these arguments, we attain the prior distribution for \(\theta\) as well as the posterior hyperparameters of our Bayesian hierarchical model.
resExPrior = genExPrior(exdata = exdata_multitype, theta = theta)
## defining model...
## building model...
## setting data and initial values...
## running calculate on model (any error reports that follow may simply reflect missing values in model variables) ...
## checking model sizes and dimensions... This model is not fully initialized. This is not an error. To see which variables are not initialized, use model$initializeInfo(). For more information on model initialization, see help(modelInitialization).
## model building finished.
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## [1] conjugate_dnorm_dnorm sampler: alpha
## [2] RW sampler: chSqTau
## [3] RW sampler: sigma
## [4] RW sampler: xiTau_negOrPos
## [5] conjugate_dnorm_dnorm sampler: mu[1]
## [6] conjugate_dnorm_dnorm sampler: mu[2]
## [7] conjugate_dnorm_dnorm sampler: mu[3]
## thin = 1: alpha, chSqTau, sigma, xiTau_negOrPos, tau
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
If the distribution of the parameter is not normal, genExPrior
provides an option to transform the distribution to normal under user’s choices. Two types of Johnson transformation, logarithm and log ratio, as well as Box-Cox transformation are provided. Lower and upper limit of log ratio, and value of \(\lambda\) for Box-Cox transformation should be chosen so that the transformed data has normal distribution.
First, let us look at the posteriors of the hyperparameters, which are conditioned on the data in the exdata
data frame. To that end, we use the function plotHyperDist
with the results from genExPrior
as input
plotHyperDist(resExPrior)
Then, we can visualize both the uninformative and informative distribution of \(\theta\) using plotExPrior
. This function again takes as input the output from genExPrior
as well as a Boolean asking whether to additionally plot the used data.
plotExPrior(resExPrior, plotExData = FALSE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.