Using exPrior with real-world data

Falk Heße

2019-11-13

This packages provides real-world data on porosity. In this vignette, we are going to exemplify the use of this package using these data.

First, we should load the libraries to get access to this function:

library(devtools)
load_all()
## Warning: 1 components of `...` were not used.
## 
## We detected these problematic arguments:
## * `action`
## 
## Did you misspecify an argument?
## Loading exPrior
library(exPrior)

Let us start by importing data on porosity in sandstone aquifers.

load(file="../data/df_porosity.rda")

These real-world data can now be used to compute the prior.

resExPrior = genExPrior(exdata = df_porosity, theta = seq(from=0, to=1, by=0.01))
## defining model...
## building model...
## setting data and initial values...
## running calculate on model (any error reports that follow may simply reflect missing values in model variables) ... 
## checking model sizes and dimensions... This model is not fully initialized. This is not an error. To see which variables are not initialized, use model$initializeInfo(). For more information on model initialization, see help(modelInitialization).
## model building finished.
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## [1]  conjugate_dnorm_dnorm sampler: alpha
## [2]  RW sampler: chSqTau
## [3]  RW sampler: sigma
## [4]  RW sampler: xiTau_negOrPos
## [5]  conjugate_dnorm_dnorm sampler: mu[1]
## [6]  conjugate_dnorm_dnorm sampler: mu[2]
## [7]  conjugate_dnorm_dnorm sampler: mu[3]
## [8]  conjugate_dnorm_dnorm sampler: mu[4]
## [9]  conjugate_dnorm_dnorm sampler: mu[5]
## [10] conjugate_dnorm_dnorm sampler: mu[6]
## thin = 1: alpha, chSqTau, sigma, xiTau_negOrPos, tau
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|

Here, the range of the theta vector reflects the common-sense intuition that porosity values can only be found between 0 and 1.

After the completion of exPrior, we can visualize both the uninformative and informative distribution of \(\theta\) using plotExPrior. This function again takes as input the output from genExPrior as well as a Boolean asking whether to additionally plot the used data.

plotHyperDist(resExPrior)

Compared to the introductory example, the hyper prior distributions are much narrower, indicating extremely small uncertainty about their values. This small uncertainty is caused by the large amount of data used for the inference. Next, let us look at the predicted prior distribution.

plotExPrior(resExPrior, plotExData = TRUE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 12 rows containing missing values (geom_bar).

## NULL

As can be seen, the prior distribution for porosity values in sandstone is strongly peaked between 0.2 and 0.3 as determined by the available data. Using this prior can therefore provide a sound foundation for statistical inference.