The functions listed in this vignette apply to linear regression models, linear mixed models, and GAMMs (i.e., the functions are tested with lm
, glm
, lmer
, glmer
, gam
, and bam
models).
suppressMessages(library(itsadug))
info('version')
## Package itsadug, version 1.0.1
The code below was used to fit a GAMM model m1
to the data set simdat
from the package itsadug
. The data set simdat
is simulated time series data with arbitrary predictors.
data(simdat)
# For illustration purposes, we build a GAMM model
# with a nonlinear interaction, two groups, and
# random wiggly smooths for Subjects:
m1 <- bam(Y ~ Group + te(Time, Trial, by=Group)
+ s(Time, Subject, bs='fs', m=1),
data=simdat)
acf_resid
The function acf_resid
is a wrapper around the functions acf_plot
and acf_n_plots
. It allows for different ways of checking the ACF.
The default acf
function R plots the autocorrelation function of the residuals as if the residuals are a single time series:
acf(resid(m1))
Alternatively, the function acf_resid
of the package itsadug
could be used. This function offers different possibilities, as listed below:
acf_resid(m1)
Individual time series could be provided as a named list, or as a vector with model predictors.
# Option A: include named list
acf_resid(m1, split_pred=list(simdat$Subject,simdat$Trial))
# Option B: include model predictors
# This method only works for predictors that are included in the model.
acf_resid(m1, split_pred=c("Subject","Trial"))
By default, function acf_resid
calls acf_plot
to calculate the averages of the time series. However, different measures can be provided with the argument fun
in acf_plots
:
# Minimum ACF per lag:
acf_resid(m1, split_pred=c("Subject","Trial"), fun=min)
# Maximum ACF per lag:
acf_resid(m1, split_pred=c("Subject","Trial"), fun=max)
The function optionally returns the acf values, which can be used for generate more advanced ACF plots:
# Median ACF per lag:
acf_resid(m1, split_pred="Subject", fun=median, lwd=3,
main="Distribution of ACF")
# Calculate 25% and 75% quantiles:
acf1 <- acf_resid(m1, split_pred="Subject",
fun=function(x){quantile(x, .25)}, plot=FALSE)
acf2 <- acf_resid(m1, split_pred="Subject",
fun=function(x){quantile(x, .75)}, plot=FALSE)
# Plot these as error bars in different colors:
len <- length(acf1)-1
fill_area(x=0:len, y=acf2, from=acf1, col=alpha(1))
addInterval(pos=0:len, acf1, acf2, horiz=FALSE, col=alpha(1))
# add legend:
legend('topright',
fill=alpha(1),
border=alpha(1),
legend='25-75%',
bty='n')
The function acf_resid
makes use of the function acf_n_plots
to plot individual time series when the argument n
is specified.
By default n
time series are plotted that represent \(N\) quantiles (with respect to the value of lag 1).
acf_resid(m1, split_pred=c("Subject","Trial"), n=6)
## Quantiles to be plotted:
## 0% 20% 40% 60% 80% 100%
## -0.31881507 -0.01289427 0.08513155 0.21909939 0.53231119 0.96714554
Optionally, the function outputs the quantiles:
out <- acf_resid(m1, split_pred=c("Subject","Trial"), n=6, plot=FALSE)
## Quantiles to be plotted:
## 0% 20% 40% 60% 80% 100%
## -0.31881507 -0.01289427 0.08513155 0.21909939 0.53231119 0.96714554
# print the head of the elements in the first quantile:
head(out[[1]][['elements']])
## event lag1
## 1 c05.-10 -0.04423006
## 2 c11.-10 -0.10037337
## 3 a05.-9 -0.18734255
## 4 a09.-9 -0.17250376
## 5 a13.-9 -0.02298813
## 6 c02.-9 -0.20476818
# print the quantile:
out[[1]][['quantile']]
## 0% 20%
## -0.31881507 -0.01289427
When random=TRUE
, \(N\) random events are being plotted:
acf_resid(m1, split_pred=c("Subject","Trial"), n=6, random=TRUE)
With the argument cond
(see help(acf_pn_plots)
) specific events could be plotted:
simdat$Event <- with(simdat, interaction(Subject, Trial))
acf_resid(m1, split_pred=list(Event=simdat$Event), n=6,
cond=list(Event=c('c05.-10', 'c11.-10', 'a05.-9', 'a09.-9', 'a13.-9', 'c02.-9')))
## Quantiles to be plotted:
## 0% 20% 40% 60% 80% 100%
## -0.20476818 -0.18734255 -0.17250376 -0.10037337 -0.04423006 -0.02298813
The function acf_resid
optionally gives back information about individual timeseries:
# default output is the acf values:
(out <- acf_resid(m1, split_pred=c("Subject","Trial"), plot=FALSE))
## 0 1 2 3 4 5
## 1.00000000 0.23712054 0.22873738 0.22579123 0.22026437 0.20615686
## 6 7 8 9 10 11
## 0.20717297 0.19301394 0.18658474 0.18870720 0.17914887 0.17146109
## 12 13 14 15 16 17
## 0.15874921 0.15707638 0.14718378 0.14104510 0.13889794 0.12968832
## 18 19 20
## 0.11796424 0.10745294 0.09860553
# Alternatively, more information could be retrieved:
out <- acf_resid(m1, split_pred=c("Subject","Trial"), plot=FALSE, return_all=TRUE)
# out is a list of info:
names(out)
## [1] "acf" "acftable" "dataframe" "n" "series" "FUN"
# 1. acf gives the acf values:
out[['acf']]
## 0 1 2 3 4 5
## 1.00000000 0.23712054 0.22873738 0.22579123 0.22026437 0.20615686
## 6 7 8 9 10 11
## 0.20717297 0.19301394 0.18658474 0.18870720 0.17914887 0.17146109
## 12 13 14 15 16 17
## 0.15874921 0.15707638 0.14718378 0.14104510 0.13889794 0.12968832
## 18 19 20
## 0.11796424 0.10745294 0.09860553
# 2. acftable provides the individual acf's in wide table format:
head(out[['acftable']], 3)
## 0 1 2 3 4 5 6
## a01.-10 1 0.2208330 0.1351628 0.2910060 0.1277496 0.2439650 0.2377283
## a02.-10 1 0.4069722 0.4542935 0.3196872 0.4171387 0.3109420 0.3262143
## a03.-10 1 0.1448119 0.2602716 0.2236864 0.1480399 0.3161058 0.1507453
## 7 8 9 10 11 12
## a01.-10 0.2283752 0.1663220 0.1378972 0.1784408 0.2712004 0.05067271
## a02.-10 0.3280109 0.3654185 0.3015577 0.2539762 0.2672726 0.25087072
## a03.-10 0.3021498 0.2405207 0.2565739 0.1676992 0.2306338 0.18238364
## 13 14 15 16 17 18
## a01.-10 0.01943445 0.2884304 0.13345619 0.1178192 0.2474626 0.1080486
## a02.-10 0.34361377 0.3056743 0.28053685 0.2541574 0.2225718 0.1528064
## a03.-10 0.21911029 0.2525816 0.04745451 0.2099518 0.2957930 0.1640577
## 19 20
## a01.-10 0.07749091 0.07787484
## a02.-10 0.19932699 0.13487574
## a03.-10 0.25687585 0.04938008
dim(out[['acftable']])
## [1] 756 21
# 3. dataframe prvides a data frame with the acf, n, and ci information
# in long table format:
head(out[['dataframe']])
## event acf lag n ci Subject Trial
## 1 a01.-1 1.00000000 0 100 0.19 a01 -1
## 2 a01.-1 0.09764361 1 100 0.19 a01 -1
## 3 a01.-1 0.03373664 2 100 0.19 a01 -1
## 4 a01.-1 0.18912723 3 100 0.19 a01 -1
## 5 a01.-1 0.12477850 4 100 0.19 a01 -1
## 6 a01.-1 0.08529486 5 100 0.19 a01 -1
# 4. n provides the number of data points underlying each ACF:
head(out[['n']])
## n event
## 1 100 a01.-10
## 2 100 a02.-10
## 3 100 a03.-10
## 4 100 a04.-10
## 5 100 a05.-10
## 6 100 a06.-10
# 5. series and FUN provide info on input and function:
out[['series']]
## [1] "resid_gam(model)"
out[['FUN']]
## function (x, ...)
## UseMethod("mean")
## <bytecode: 0x7fe56001a8e8>
## <environment: namespace:base>
The data frames are useful for plotting the ACFs using other packages. This is an example from the vignette of the article of @BatesEtal :
# Plot individual participants with the package lattice:
library(lattice)
out <- acf_resid(m1, split_pred=c("Subject"), plot=FALSE, return_all=TRUE)$dataframe
civec = out[out$lag==0,]$ci
xyplot(acf ~ lag | event, type = "h", data = out, col.line = "black",
panel = function(...) {
panel.abline(h = civec[panel.number()], col.line = "grey")
panel.abline(h = -civec[panel.number()], col.line = "grey")
panel.abline(h = 0, col.line = "black")
panel.xyplot(...)
},
strip = strip.custom(bg = "grey90"),
par.strip.text = list(cex = 0.8),
xlab="lag", ylab="autocorrelation")
When an AR1 model is included in a gam
or bam
model, the function acf_resid
autmatically corrects for it:
# genetare AR start column:
simdat <- start_event(simdat, column="Time", event="Event")
head(simdat)
# run GAMM with AR1 model:
m1 <- bam(Y ~ Group + te(Time, Trial, by=Group)
+ s(Time, Subject, bs='fs', m=1),
data=simdat, rho=.65, AR.start=simdat$start.event)
# plot normal acf, without correction for rho:
acf(resid(m1))
# plot normal acf with acf_plot:
acf_resid(m1)
# plot normal acf with acf_plot:
acf_plot(resid(m1), split_by=list(simdat$Subject))
# plot corrected acf plot with acf_plot:
acf_plot(resid_gam(m1, incl_na=TRUE), split_by=list(simdat$Subject))
## Group Time Trial Condition Subject Y Event start.event
## 1 Adults 0.00000 -10 -1 a01 0.7554469 a01.-10 TRUE
## 2 Adults 20.20202 -10 -1 a01 2.7834759 a01.-10 FALSE
## 3 Adults 40.40404 -10 -1 a01 1.9696963 a01.-10 FALSE
## 4 Adults 60.60606 -10 -1 a01 0.6814298 a01.-10 FALSE
## 5 Adults 80.80808 -10 -1 a01 1.6939195 a01.-10 FALSE
## 6 Adults 101.01010 -10 -1 a01 2.3651969 a01.-10 FALSE
acf_plot
The function acf_plot
is used for generating the ACF for individual time series, and may plot the averaged ACF. In contrast with acf_resid
the input needs to be a vector, and the grouping predictors are provided to the argument split_by
as a list with vectors.
acf_plot(resid_gam(m1))
acf_plot(resid_gam(m1, incl_na=TRUE), split_by=list(simdat$Subject))
acf_n_plots
The function acf_n_plots
is used for generating \(N\) ACF plots of individual time series. In contrast with acf_resid
the input needs to be a vector, and the grouping predictors are provided to the argument split_by
as a list with vectors.
acf_n_plots(resid_gam(m1, incl_na=TRUE), split_by=list(simdat$Subject), n=6, random=TRUE)