Accounting for spatial autocorrelation in the data

Falk Heße

2019-11-13

In most cases, the data used in assimilation are spatially correlated, since measurements are usually collected in a clustered way. The data assimilation model of exPrior can in principle account for patterns of spatial variability by using multivariate distributions as site-specific distributions. First, we should load the libraries to get access to this function:

library(devtools)
load_all()
## Warning: 1 components of `...` were not used.
## 
## We detected these problematic arguments:
## * `action`
## 
## Did you misspecify an argument?
## Loading exPrior
library(exPrior)

Using this model, we will use spatially correlated, synthetic data from 3 different sites. Other that that, we will follow the set up of of the vignette “Using exPrior by virtue of a simple example”. To assure the spatial correlation in the data, we will use data that was generated with a statistical algorithm from the gstat package

exdata_spatial <- data.frame("x" = sample(seq(0, 1, 0.01), 22),
                               "y" = sample(seq(0, 1, 0.01), 22),
                               "val" = c(-2.5020, -1.9410, -3.0240, -2.5929, -2.4292, -3.0682,
                                         -2.9953, -2.8178, -2.7236, -1.9657, -2.6567, -2.4977,
                                         -1.1583, -3.0637, -1.6788, -3.5102, -2.3866, -3.4092,
                                         -3.5907, -3.2470, -4.1272, -3.5717),
                               "site_id" = c(rep("S1", 10), rep("S2", 5), rep("S3", 7)))

The resulting dataframe then looks like this

exdata_spatial
##       x    y     val site_id
## 1  0.45 0.16 -2.5020      S1
## 2  0.35 0.59 -1.9410      S1
## 3  0.25 0.73 -3.0240      S1
## 4  0.75 0.97 -2.5929      S1
## 5  0.91 0.01 -2.4292      S1
## 6  0.87 0.72 -3.0682      S1
## 7  0.86 0.18 -2.9953      S1
## 8  0.17 0.78 -2.8178      S1
## 9  0.28 0.64 -2.7236      S1
## 10 0.77 0.44 -1.9657      S1
## 11 0.66 0.33 -2.6567      S2
## 12 0.52 0.19 -2.4977      S2
## 13 0.33 0.51 -1.1583      S2
## 14 0.23 0.50 -3.0637      S2
## 15 0.37 0.43 -1.6788      S2
## 16 0.16 0.92 -3.5102      S3
## 17 0.26 0.77 -2.3866      S3
## 18 0.71 0.94 -3.4092      S3
## 19 0.06 0.96 -3.5907      S3
## 20 0.11 0.60 -3.2470      S3
## 21 0.12 0.24 -4.1272      S3
## 22 0.07 0.71 -3.5717      S3

Now that we have generated our dataframe exdata_spatial, containing the data together with the spatial coordinates, we can start inferring the prior distribution. Let us start by running genExPrior with these data but without accounting for the spatial correlation and look at the uninformative and informative distribution of \(\theta\) using plotExPrior.

theta <- seq(from=-10, to=10, by=0.1)
resExPrior = genExPrior(exdata=exdata_spatial, theta=theta)
## defining model...
## building model...
## setting data and initial values...
## running calculate on model (any error reports that follow may simply reflect missing values in model variables) ... 
## checking model sizes and dimensions... This model is not fully initialized. This is not an error. To see which variables are not initialized, use model$initializeInfo(). For more information on model initialization, see help(modelInitialization).
## model building finished.
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## [1] conjugate_dnorm_dnorm sampler: alpha
## [2] RW sampler: chSqTau
## [3] RW sampler: sigma
## [4] RW sampler: xiTau_negOrPos
## [5] conjugate_dnorm_dnorm sampler: mu[1]
## [6] conjugate_dnorm_dnorm sampler: mu[2]
## [7] conjugate_dnorm_dnorm sampler: mu[3]
## thin = 1: alpha, chSqTau, sigma, xiTau_negOrPos, tau
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
plotExPrior(resExPrior, plotExData=TRUE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## NULL

As can be seen, the results are along the lines of the example in the vignette “Using exPrior by virtue of a simple example”. Now let us redo the inference but accounting for the spatial correlations in the data and visualize the results. This is done by setting the flag spatialCoordinates to TRUE.

resExPrior_spatial = genExPrior(exdata=exdata_spatial, theta=theta, spatialCoordinates=TRUE)
## defining model...
## Adding matrix_dist, matrix_ones as data for building model.
## building model...
## setting data and initial values...
## running calculate on model (any error reports that follow may simply reflect missing values in model variables) ... 
## checking model sizes and dimensions... This model is not fully initialized. This is not an error. To see which variables are not initialized, use model$initializeInfo(). For more information on model initialization, see help(modelInitialization).
## model building finished.
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## [1] conjugate_dnorm_dnorm sampler: alpha
## [2] RW sampler: chSqTau
## [3] RW sampler: sigma
## [4] RW sampler: lambda
## [5] RW sampler: xiTau_negOrPos
## [6] RW sampler: mu[1]
## [7] RW sampler: mu[2]
## [8] RW sampler: mu[3]
## thin = 1: alpha, chSqTau, sigma, lambda, xiTau_negOrPos, tau
## compiling... this may take a minute. Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## compilation finished.
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
plotExPrior(resExPrior_spatial, plotExData=TRUE)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## NULL

If we compare both the prior distributions, one without accounting for spatial correlation and one with accounting for it, we see overall similar results. The main difference is that the latter shows a somewhat increased uncertainty, i.e., wider variance. The fact that the more realistic model produces more uncertain results may seem surprising at first. However, the aim of Bayesian inference is not to reduce the uncertainty as much as possible but to correctly represent the uncertainty in the used data and the model.