The functions listed in this vignette apply to linear regression models, linear mixed models, and GAMMs (i.e., the functions are tested with lm, glm, lmer, glmer, gam, and bam models).

suppressMessages(library(itsadug))
info('version')
## Package itsadug, version 1.0.1

Example GAMM model

The code below was used to fit a GAMM model m1 to the data set simdat from the package itsadug. The data set simdat is simulated time series data with arbitrary predictors.

data(simdat)

# For illustration purposes, we build a GAMM model
# with a nonlinear interaction, two groups, and
# random wiggly smooths for Subjects:
m1 <- bam(Y ~ Group + te(Time, Trial, by=Group)
  + s(Time, Subject, bs='fs', m=1),
  data=simdat)

Visualizing the ACF of model residuals with acf_resid

The function acf_resid is a wrapper around the functions acf_plot and acf_n_plots. It allows for different ways of checking the ACF.

1. standard ACF

The default acf function R plots the autocorrelation function of the residuals as if the residuals are a single time series:

acf(resid(m1))

plot of chunk unnamed-chunk-3

Alternatively, the function acf_resid of the package itsadug could be used. This function offers different possibilities, as listed below:

acf_resid(m1)

plot of chunk unnamed-chunk-4

2. Average ACF over different time series

Individual time series could be provided as a named list, or as a vector with model predictors.

# Option A: include named list
acf_resid(m1, split_pred=list(simdat$Subject,simdat$Trial))

# Option B: include model predictors
# This method only works for predictors that are included in the model.
acf_resid(m1, split_pred=c("Subject","Trial"))

plot of chunk unnamed-chunk-6

By default, function acf_resid calls acf_plot to calculate the averages of the time series. However, different measures can be provided with the argument fun in acf_plots:

# Minimum ACF per lag:
acf_resid(m1, split_pred=c("Subject","Trial"), fun=min)

plot of chunk unnamed-chunk-7

# Maximum ACF per lag:
acf_resid(m1, split_pred=c("Subject","Trial"), fun=max)

plot of chunk unnamed-chunk-7

The function optionally returns the acf values, which can be used for generate more advanced ACF plots:

# Median ACF per lag:
acf_resid(m1, split_pred="Subject", fun=median, lwd=3,
          main="Distribution of ACF")
# Calculate 25% and 75% quantiles:
acf1 <- acf_resid(m1, split_pred="Subject", 
    fun=function(x){quantile(x, .25)}, plot=FALSE)
acf2 <- acf_resid(m1, split_pred="Subject", 
    fun=function(x){quantile(x, .75)}, plot=FALSE)
# Plot these as error bars in different colors:
len <- length(acf1)-1
fill_area(x=0:len, y=acf2, from=acf1, col=alpha(1))
addInterval(pos=0:len, acf1, acf2, horiz=FALSE, col=alpha(1))
# add legend:
legend('topright',
    fill=alpha(1),
    border=alpha(1),
    legend='25-75%',
    bty='n')

plot of chunk unnamed-chunk-8

3. N different ACF plots

The function acf_resid makes use of the function acf_n_plots to plot individual time series when the argument n is specified.

Quantiles

By default n time series are plotted that represent \(N\) quantiles (with respect to the value of lag 1).

acf_resid(m1, split_pred=c("Subject","Trial"), n=6)
## Quantiles to be plotted:
##          0%         20%         40%         60%         80%        100% 
## -0.31881507 -0.01289427  0.08513155  0.21909939  0.53231119  0.96714554

plot of chunk unnamed-chunk-9

Optionally, the function outputs the quantiles:

out <- acf_resid(m1, split_pred=c("Subject","Trial"), n=6, plot=FALSE)
## Quantiles to be plotted:
##          0%         20%         40%         60%         80%        100% 
## -0.31881507 -0.01289427  0.08513155  0.21909939  0.53231119  0.96714554
# print the head of the elements in the first quantile:
head(out[[1]][['elements']])
##     event        lag1
## 1 c05.-10 -0.04423006
## 2 c11.-10 -0.10037337
## 3  a05.-9 -0.18734255
## 4  a09.-9 -0.17250376
## 5  a13.-9 -0.02298813
## 6  c02.-9 -0.20476818
# print the quantile:
out[[1]][['quantile']]
##          0%         20% 
## -0.31881507 -0.01289427
Random events

When random=TRUE, \(N\) random events are being plotted:

acf_resid(m1, split_pred=c("Subject","Trial"), n=6, random=TRUE)

plot of chunk unnamed-chunk-11

Selection

With the argument cond (see help(acf_pn_plots)) specific events could be plotted:

simdat$Event <- with(simdat, interaction(Subject, Trial))
acf_resid(m1, split_pred=list(Event=simdat$Event), n=6, 
    cond=list(Event=c('c05.-10', 'c11.-10', 'a05.-9', 'a09.-9', 'a13.-9', 'c02.-9')))
## Quantiles to be plotted:
##          0%         20%         40%         60%         80%        100% 
## -0.20476818 -0.18734255 -0.17250376 -0.10037337 -0.04423006 -0.02298813

plot of chunk unnamed-chunk-12

4. Output

The function acf_resid optionally gives back information about individual timeseries:

# default output is the acf values:
(out <- acf_resid(m1, split_pred=c("Subject","Trial"), plot=FALSE))
##          0          1          2          3          4          5 
## 1.00000000 0.23712054 0.22873738 0.22579123 0.22026437 0.20615686 
##          6          7          8          9         10         11 
## 0.20717297 0.19301394 0.18658474 0.18870720 0.17914887 0.17146109 
##         12         13         14         15         16         17 
## 0.15874921 0.15707638 0.14718378 0.14104510 0.13889794 0.12968832 
##         18         19         20 
## 0.11796424 0.10745294 0.09860553
# Alternatively, more information could be retrieved:
out <- acf_resid(m1, split_pred=c("Subject","Trial"), plot=FALSE, return_all=TRUE)
# out is a list of info:
names(out)
## [1] "acf"       "acftable"  "dataframe" "n"         "series"    "FUN"
# 1. acf gives the acf values:
out[['acf']]
##          0          1          2          3          4          5 
## 1.00000000 0.23712054 0.22873738 0.22579123 0.22026437 0.20615686 
##          6          7          8          9         10         11 
## 0.20717297 0.19301394 0.18658474 0.18870720 0.17914887 0.17146109 
##         12         13         14         15         16         17 
## 0.15874921 0.15707638 0.14718378 0.14104510 0.13889794 0.12968832 
##         18         19         20 
## 0.11796424 0.10745294 0.09860553
# 2. acftable provides the individual acf's in wide table format:
head(out[['acftable']], 3)
##         0         1         2         3         4         5         6
## a01.-10 1 0.2208330 0.1351628 0.2910060 0.1277496 0.2439650 0.2377283
## a02.-10 1 0.4069722 0.4542935 0.3196872 0.4171387 0.3109420 0.3262143
## a03.-10 1 0.1448119 0.2602716 0.2236864 0.1480399 0.3161058 0.1507453
##                 7         8         9        10        11         12
## a01.-10 0.2283752 0.1663220 0.1378972 0.1784408 0.2712004 0.05067271
## a02.-10 0.3280109 0.3654185 0.3015577 0.2539762 0.2672726 0.25087072
## a03.-10 0.3021498 0.2405207 0.2565739 0.1676992 0.2306338 0.18238364
##                 13        14         15        16        17        18
## a01.-10 0.01943445 0.2884304 0.13345619 0.1178192 0.2474626 0.1080486
## a02.-10 0.34361377 0.3056743 0.28053685 0.2541574 0.2225718 0.1528064
## a03.-10 0.21911029 0.2525816 0.04745451 0.2099518 0.2957930 0.1640577
##                 19         20
## a01.-10 0.07749091 0.07787484
## a02.-10 0.19932699 0.13487574
## a03.-10 0.25687585 0.04938008
dim(out[['acftable']])
## [1] 756  21
# 3. dataframe prvides a data frame with the acf, n, and ci information
# in long table format:
head(out[['dataframe']])
##    event        acf lag   n   ci Subject Trial
## 1 a01.-1 1.00000000   0 100 0.19     a01    -1
## 2 a01.-1 0.09764361   1 100 0.19     a01    -1
## 3 a01.-1 0.03373664   2 100 0.19     a01    -1
## 4 a01.-1 0.18912723   3 100 0.19     a01    -1
## 5 a01.-1 0.12477850   4 100 0.19     a01    -1
## 6 a01.-1 0.08529486   5 100 0.19     a01    -1
# 4. n provides the number of data points underlying each ACF:
head(out[['n']])
##     n   event
## 1 100 a01.-10
## 2 100 a02.-10
## 3 100 a03.-10
## 4 100 a04.-10
## 5 100 a05.-10
## 6 100 a06.-10
# 5. series and FUN provide info on input and function:
out[['series']]
## [1] "resid_gam(model)"
out[['FUN']]
## function (x, ...) 
## UseMethod("mean")
## <bytecode: 0x7fe56001a8e8>
## <environment: namespace:base>

The data frames are useful for plotting the ACFs using other packages. This is an example from the vignette of the article of @BatesEtal :

# Plot individual participants with the package lattice:
library(lattice)
out <- acf_resid(m1, split_pred=c("Subject"), plot=FALSE, return_all=TRUE)$dataframe
civec = out[out$lag==0,]$ci
xyplot(acf ~ lag | event, type = "h", data = out, col.line = "black", 
            panel = function(...) {
                panel.abline(h = civec[panel.number()], col.line = "grey")
                panel.abline(h = -civec[panel.number()], col.line = "grey")
                panel.abline(h = 0, col.line = "black")
                panel.xyplot(...)
            }, 
            strip = strip.custom(bg = "grey90"), 
            par.strip.text = list(cex = 0.8),
            xlab="lag", ylab="autocorrelation")

5. Correcting for AR1 \(\rho\)

When an AR1 model is included in a gam or bam model, the function acf_resid autmatically corrects for it:

# genetare AR start column:
simdat <- start_event(simdat, column="Time", event="Event")
head(simdat)

# run GAMM with AR1 model:
m1 <- bam(Y ~ Group + te(Time, Trial, by=Group)
  + s(Time, Subject, bs='fs', m=1),
  data=simdat, rho=.65, AR.start=simdat$start.event)

# plot normal acf, without correction for rho:
acf(resid(m1))

plot of chunk unnamed-chunk-15

# plot normal acf with acf_plot:
acf_resid(m1)

plot of chunk unnamed-chunk-15

# plot normal acf with acf_plot:
acf_plot(resid(m1), split_by=list(simdat$Subject))

plot of chunk unnamed-chunk-15

# plot corrected acf plot with acf_plot:
acf_plot(resid_gam(m1, incl_na=TRUE), split_by=list(simdat$Subject))

plot of chunk unnamed-chunk-15

##    Group      Time Trial Condition Subject         Y   Event start.event
## 1 Adults   0.00000   -10        -1     a01 0.7554469 a01.-10        TRUE
## 2 Adults  20.20202   -10        -1     a01 2.7834759 a01.-10       FALSE
## 3 Adults  40.40404   -10        -1     a01 1.9696963 a01.-10       FALSE
## 4 Adults  60.60606   -10        -1     a01 0.6814298 a01.-10       FALSE
## 5 Adults  80.80808   -10        -1     a01 1.6939195 a01.-10       FALSE
## 6 Adults 101.01010   -10        -1     a01 2.3651969 a01.-10       FALSE

The use of acf_plot

The function acf_plot is used for generating the ACF for individual time series, and may plot the averaged ACF. In contrast with acf_resid the input needs to be a vector, and the grouping predictors are provided to the argument split_by as a list with vectors.

acf_plot(resid_gam(m1))

plot of chunk unnamed-chunk-16

acf_plot(resid_gam(m1, incl_na=TRUE), split_by=list(simdat$Subject))

plot of chunk unnamed-chunk-16

The use of acf_n_plots

The function acf_n_plots is used for generating \(N\) ACF plots of individual time series. In contrast with acf_resid the input needs to be a vector, and the grouping predictors are provided to the argument split_by as a list with vectors.

acf_n_plots(resid_gam(m1, incl_na=TRUE), split_by=list(simdat$Subject), n=6, random=TRUE)

plot of chunk unnamed-chunk-17

References