In most cases, the data used in assimilation are spatially correlated, since measurements are usually collected in a clustered way. The data assimilation model of exPrior can in principle account for patterns of spatial variability by using multivariate distributions as site-specific distributions. First, we should load the libraries to get access to this function:
library(devtools)
load_all()
## Warning: 1 components of `...` were not used.
##
## We detected these problematic arguments:
## * `action`
##
## Did you misspecify an argument?
## Loading exPrior
library(exPrior)
Using this model, we will use spatially correlated, synthetic data from 3 different sites. Other that that, we will follow the set up of of the vignette “Using exPrior by virtue of a simple example”. To assure the spatial correlation in the data, we will use data that was generated with a statistical algorithm from the gstat
package
exdata_spatial <- data.frame("x" = sample(seq(0, 1, 0.01), 22),
"y" = sample(seq(0, 1, 0.01), 22),
"val" = c(-2.5020, -1.9410, -3.0240, -2.5929, -2.4292, -3.0682,
-2.9953, -2.8178, -2.7236, -1.9657, -2.6567, -2.4977,
-1.1583, -3.0637, -1.6788, -3.5102, -2.3866, -3.4092,
-3.5907, -3.2470, -4.1272, -3.5717),
"site_id" = c(rep("S1", 10), rep("S2", 5), rep("S3", 7)))
The resulting dataframe then looks like this
exdata_spatial
## x y val site_id
## 1 0.45 0.16 -2.5020 S1
## 2 0.35 0.59 -1.9410 S1
## 3 0.25 0.73 -3.0240 S1
## 4 0.75 0.97 -2.5929 S1
## 5 0.91 0.01 -2.4292 S1
## 6 0.87 0.72 -3.0682 S1
## 7 0.86 0.18 -2.9953 S1
## 8 0.17 0.78 -2.8178 S1
## 9 0.28 0.64 -2.7236 S1
## 10 0.77 0.44 -1.9657 S1
## 11 0.66 0.33 -2.6567 S2
## 12 0.52 0.19 -2.4977 S2
## 13 0.33 0.51 -1.1583 S2
## 14 0.23 0.50 -3.0637 S2
## 15 0.37 0.43 -1.6788 S2
## 16 0.16 0.92 -3.5102 S3
## 17 0.26 0.77 -2.3866 S3
## 18 0.71 0.94 -3.4092 S3
## 19 0.06 0.96 -3.5907 S3
## 20 0.11 0.60 -3.2470 S3
## 21 0.12 0.24 -4.1272 S3
## 22 0.07 0.71 -3.5717 S3
Now that we have generated our dataframe exdata_spatial
, containing the data together with the spatial coordinates, we can start inferring the prior distribution. Let us start by running genExPrior
with these data but without accounting for the spatial correlation and look at the uninformative and informative distribution of \(\theta\) using plotExPrior
.
theta <- seq(from=-10, to=10, by=0.1)
resExPrior = genExPrior(exdata=exdata_spatial, theta=theta)
## defining model...
## building model...
## setting data and initial values...
## running calculate on model (any error reports that follow may simply reflect missing values in model variables) ...
## checking model sizes and dimensions... This model is not fully initialized. This is not an error. To see which variables are not initialized, use model$initializeInfo(). For more information on model initialization, see help(modelInitialization).
## model building finished.
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## [1] conjugate_dnorm_dnorm sampler: alpha
## [2] RW sampler: chSqTau
## [3] RW sampler: sigma
## [4] RW sampler: xiTau_negOrPos
## [5] conjugate_dnorm_dnorm sampler: mu[1]
## [6] conjugate_dnorm_dnorm sampler: mu[2]
## [7] conjugate_dnorm_dnorm sampler: mu[3]
## thin = 1: alpha, chSqTau, sigma, xiTau_negOrPos, tau
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
plotExPrior(resExPrior, plotExData=TRUE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## NULL
As can be seen, the results are along the lines of the example in the vignette “Using exPrior by virtue of a simple example”. Now let us redo the inference but accounting for the spatial correlations in the data and visualize the results. This is done by setting the flag spatialCoordinates
to TRUE
.
resExPrior_spatial = genExPrior(exdata=exdata_spatial, theta=theta, spatialCoordinates=TRUE)
## defining model...
## Adding matrix_dist, matrix_ones as data for building model.
## building model...
## setting data and initial values...
## running calculate on model (any error reports that follow may simply reflect missing values in model variables) ...
## checking model sizes and dimensions... This model is not fully initialized. This is not an error. To see which variables are not initialized, use model$initializeInfo(). For more information on model initialization, see help(modelInitialization).
## model building finished.
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## [1] conjugate_dnorm_dnorm sampler: alpha
## [2] RW sampler: chSqTau
## [3] RW sampler: sigma
## [4] RW sampler: lambda
## [5] RW sampler: xiTau_negOrPos
## [6] RW sampler: mu[1]
## [7] RW sampler: mu[2]
## [8] RW sampler: mu[3]
## thin = 1: alpha, chSqTau, sigma, lambda, xiTau_negOrPos, tau
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
plotExPrior(resExPrior_spatial, plotExData=TRUE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## NULL
If we compare both the prior distributions, one without accounting for spatial correlation and one with accounting for it, we see overall similar results. The main difference is that the latter shows a somewhat increased uncertainty, i.e., wider variance. The fact that the more realistic model produces more uncertain results may seem surprising at first. However, the aim of Bayesian inference is not to reduce the uncertainty as much as possible but to correctly represent the uncertainty in the used data and the model.