The purpose of this document is to show how to describe the analysis of psychological well-being by multivariate tree boosting (Miller, Lubke, McArtor, & Bergeman, 2015).
Identifying the factors that impact well-being in aging adults is an important step to understanding successful aging and decreasing the risk for pathological aging (Wallace et al., 2002). Previous research has identified that high resilience, coping strategies, social support from family and friends, good physical health, and the lack of stress and depression are important to successful aging (Wallace et al., 2002). In our exploratory analysis, we included these predictors as well as several new predictors — control of internal states, trait-ego resilience, and hardiness — and investigated the extent to which these predictors influenced particular aspects of well-being. Most research has focused on a well-being aggregate score, and little is known about whether the influence of these predictors varies across the different sub-scales of well-being.
The Psychological Well-Being Scale (Ryff & Keyes, 1995) has six sub-scales: autonomy, environmental mastery, personal growth, positive relationships with others, purpose in life, and self-acceptance. These were used as dependent variables in the analysis. Gender, age, income, and education were included as demographic predictors. The primary predictors of interest were chronic, somatic, and self-reported health, depression (positive and negative affect), perceived social control, control of internal states, sub-scales of dispositional resilience (commitment, control, and challenge), ego resilience, social support (friends and family), self-reported stress (problems, emotions), and loneliness. Each sub-scale score is continuous, and approximately normally distributed. In total, 20 predictors were included in the analysis.
This scales predictors, and gives them pretty labels.
#install.packages("mvtboost")
library(mvtboost)
data(wellbeing)
Y <- wellbeing[,21:26]
X <- wellbeing[,1:20]
Ys <- scale(Y)
ynames <- c("Autonomy","Environmental Mastery","Personal Growth","Positive Relationships","Purpose in Life","Self Acceptance")
xnames <- c("Gender","Age","Income","Education","Chronic Health","Somatic Health","Self Report Health","Positive Affect","Negative Affect","Perceived Social Control","Control Internal States","Commitment","Control","Challenge","Ego Resilience","Social Support - Friends","Social Support - Family","Stress-Problems","Stress-Emotions","Loneliness")
cont.id <- unlist(lapply(X,is.numeric))
Xs <- X
Xs[,cont.id] <- scale(X[,cont.id])
colnames(Xs) <- xnames
colnames(Ys) <- ynames
res <- mvtb(Y=Ys,X=Xs)
res5 <- mvtb(Y=Ys,X=Xs,n.trees=10000,shrinkage=.005,cv.folds=5,compress=FALSE)
As with the univariate procedure, the number of trees can be chosen to minimize a test or cross-validation estimate of the prediction error. mvtb
uses the multivariate a useful criterion is the multivariate mean-squared error, which is used
res5$best.trees
summary(res5,covex=FALSE)
## $best.testerr
## integer(0)
##
## $best.cv
## [1] 2482
##
## $last
## [1] 10000
Most procedures in the mvtboost
package will by default automatically select the lowest number of trees, corresponding to a minimally complex model, if the number of trees are not specified.
Below we show the importance of choosing the model complexity to minimize the CV rather than training error.
One of the challenges of using multivariate decision tree ensembles is that the model is more difficult to interpret than a single tree. While tree boosting can be used to build a very accurate predictive model, it is potentially more important for researchers to interpret the effects of predictors. Below, we describe approaches that have been developed to 1) identify predictors with effects on individual outcome variables, 2) identify groups of predictors that jointly influence one or more outcome variables, 3) visualize the functional form of the effect of important predictors, and 4) detect predictors with possible interaction non-linear effects.
The first goal in interpretation is to identify which predictors influence which outcome variables. The influence (or variable importance) of each predictor from the tree ensemble has been defined as the reduction in sums of squared error due to any split on that predictor, summed over all trees in the model (Friedman, 2001).
Below, we compute the relative influences for the well-being data, and plot using a heat map. Using the ‘mvtb.ri’ function, we can make the influence scores relative per outcome (‘col’) or across outcomes (‘tot’). By default, importances are relative to the column.
summary(res5,covex=FALSE)
round(mvtb.ri(res5,relative = "tot"),2)
numformat <- function(val){sub("^(-?)0.", "\\1.", sprintf("%.1f", val))}
par(mar=c(8,10,1,1))
mvtb.heat(t(mvtb.ri(res5)),clust.method = NULL,cexRow=1,cexCol=1,numformat=numformat)
We see that control of internal states affects all aspects of psychological well being except positive relationships with others. Like control of internal states, perceived stress-problems affects three aspects of well-being: self acceptance, purpose in life, and environmental mastery. Personal growth is driven by control of internal states and ego-resilience. Other patterns in the influences can be interpreted similarly, and conform to theoretical expectations (e.g. Wallace et al., 2002; Ryff & Keyes, 1995).
Because selecting variables based on relative influence scores can be problematic, we should check the fit of the model also. If the model explains little or no variance in the outcomes, there is no reason to use the model for variable selection. For the well-being data, we compute the R2 for each dependent variable below.
yhat <- predict(res5,newdata=Xs)
diag(var(yhat)/var(Ys))
## Autonomy Environmental Mastery Personal Growth
## 0.2586866 0.6358911 0.4378024
## Positive Relationships Purpose in Life Self Acceptance
## 0.6353121 0.5771950 0.5834466
In addition to selecting predictors for inclusion into a subsequent multivariate model (e.g. a multivariate regression model or SEM), it may also be informative to select the outcome variables that are associated with the set of predictors. One criterion for selecting outcome variables is to choose the outcome variables whose covariance can be explained by a function of a common set of predictors. This approach could be used to, for example, 1) identify a set of demographic predictors that similarly affect particular symptoms of a disorder, 2) indicate to what extent covariance in sub-scales of a construct is due to effects of predictors.
The covariance explained in each pair of outcomes by predictors is estimated directly in mvtb
. A covariance-explained matrix be organized as a \(Q(Q+1)/2 \times p\) (where \(Q\) is the number of outcomes, and \(p\) is the number of predictors) table, where each element is the covariance explained in any pair of outcomes by predictor.
When the outcomes are standardized to unit variance, each element can be interpreted as the correlation explained in any pair of outcomes by predictor \(X_j\). This decomposition is similar to decomposing R2 in multiple regression. When the trees of the ensemble are limited to a single split and the predictors are independent, this decomposition is exact, otherwise it is approximate.
For the well-being data, the covariance explained matrix is obtained directly from the fitted model: res5$covex
. We also plot it as a heatmap below:
par(mar=c(8,15,1,1),mfrow=c(1,1))
mvtb.heat(res5$covex[,-c(1:7)],cexRow=.9,numformat=numformat,clust.method = NULL)
We see that negative affect and stress problems have widespread effects on well-being. Control of internal states explains correlations across all dimensions, and is the primary explanatory predictor for autonomy. Similarly, stress, which can be detrimental to well-being, most strongly affects purpose in life and environmental mastery. Unsurprisingly, loneliness and social support from friends primarily affect positive relationships with others. Ego resilience mainly affects personal growth.
If the number of predictors/outcomes is large, interpreting the matrix by bare eyes is challenging. The covariance explained matrix can be clustered by first computing the distance between columns (predictors) and the rows (pairs of outcomes), respectively. Predictors that explain similar patterns of covariance in the outcomes will be closer together (have smaller distance), as will pairs of outcomes that are functions of a similar set of predictors. The resulting distance matrices computed for the rows and columns can then be used to group rows or columns by hierarchical clustering. This corresponds to grouping the predictors that explain covariance in similar pairs of outcomes and grouping pairs of outcomes dependent on similar sets of predictors.
Clustering the matrix can be done with mvtb.cluster
. Below we plot the solution below as a heatmap with mvtb.heat
.
mvtb.cluster(res5)
par(mar=c(8,12,1,1),mfrow=c(1,1))
mvtb.heat(res5$covex[,-c(1:7)],cexRow=.9)
These partial dependence plots effectively complement interpretations of relative influence by showing the direction and functional form of the effect of the predictor. Identifying non-linear effects with plots can also help prevent model misspecification if a parametric model is the final goal.
Here we show the univariate and multivariate perspective plots. The first plot shows that above-average control of internal states is associated with larger personal growth. The second shows the non-additive effect of control of internal states and perceived stress problems on self-acceptance.
par(mfcol=c(1,2),mar=c(5,5,4,1))
plot(res5,predictor.no=11,response.no=3,ylim=c(-1,1.5)) # persgrwth on cis
text(-4,1.825,labels="A",xpd=TRUE)
mvtb.perspec(res5,predictor.no=c(11,18),response.no=6)
text(-.5,.5,labels="B",xpd=TRUE)
Though decision trees are models of interactions, it is difficult to detect and interpret interaction effects from a decision tree ensemble. To address this issue, we can again analyze the fitted values of the model. Following Elith et al. (2008), possible 2-way interactions can be detected by checking whether the fitted values of the approximation as a function of any pair of predictors deviates from a linear combination of the two predictors. Such departures indicate that the joint effect of the predictors is not additive, and indicate a non-linear effect or a possible interaction.
A check of departures from additivity can be accomplished by computing the fitted values for any pair of predictors, over a grid of all possible levels for the two variables. For continuous predictors, 100 sample values are taken. The fitted values are then regressed onto the grid. Large residuals from this model indicate the fitted values are not a linear combination of the predictors, demonstrating non-linearity or a possible interaction. For computational simplicity with many predictors, this might be done only for pairs of important variables.
We note that this approach is primarily a heuristic for interpreting the model. A variable with a non-additive effect (e.g. a non-linear effect like control of internal states) can produce bivariate departures from additivity which are not necessarily interactions.
There are several other methods for detecting departures, including ‘grid’, ‘influence’, and ‘lm’, following different suggestions. Generally these are still heuristics that give somewhat different results.
res.nl <- mvtb.nonlin(res5,Y=Ys,X=Xs)
Below, we compute the top 5 departures for each DV.
lapply(res.nl,function(r){head(r[[1]])})
## $Autonomy
## var1.index var1.names var2.index var2.names
## 1 14 Challenge 11 Control Internal States
## 2 13 Control 11 Control Internal States
## 3 11 Control Internal States 2 Age
## 4 19 Stress-Emotions 11 Control Internal States
## 5 11 Control Internal States 8 Positive Affect
## 6 11 Control Internal States 5 Chronic Health
## nonlin.size
## 1 43.26387
## 2 41.50832
## 3 38.31550
## 4 37.32876
## 5 36.19117
## 6 35.76509
##
## $`Environmental Mastery`
## var1.index var1.names var2.index var2.names
## 1 18 Stress-Problems 11 Control Internal States
## 2 11 Control Internal States 9 Negative Affect
## 3 19 Stress-Emotions 11 Control Internal States
## 4 11 Control Internal States 10 Perceived Social Control
## 5 20 Loneliness 11 Control Internal States
## 6 11 Control Internal States 7 Self Report Health
## nonlin.size
## 1 26.65935
## 2 24.61075
## 3 24.59578
## 4 24.03992
## 5 23.11341
## 6 23.00445
##
## $`Personal Growth`
## var1.index var1.names var2.index var2.names
## 1 15 Ego Resilience 11 Control Internal States
## 2 14 Challenge 11 Control Internal States
## 3 16 Social Support - Friends 11 Control Internal States
## 4 11 Control Internal States 2 Age
## 5 13 Control 11 Control Internal States
## 6 12 Commitment 11 Control Internal States
## nonlin.size
## 1 69.02447
## 2 68.81142
## 3 62.94969
## 4 62.63443
## 5 62.29467
## 6 61.76429
##
## $`Positive Relationships`
## var1.index var1.names var2.index var2.names
## 1 16 Social Support - Friends 11 Control Internal States
## 2 20 Loneliness 16 Social Support - Friends
## 3 20 Loneliness 11 Control Internal States
## 4 16 Social Support - Friends 15 Ego Resilience
## 5 16 Social Support - Friends 12 Commitment
## 6 15 Ego Resilience 11 Control Internal States
## nonlin.size
## 1 20.18334
## 2 19.26420
## 3 18.06628
## 4 15.79557
## 5 15.14645
## 6 14.59766
##
## $`Purpose in Life`
## var1.index var1.names var2.index var2.names
## 1 18 Stress-Problems 11 Control Internal States
## 2 12 Commitment 11 Control Internal States
## 3 14 Challenge 11 Control Internal States
## 4 11 Control Internal States 10 Perceived Social Control
## 5 11 Control Internal States 9 Negative Affect
## 6 13 Control 11 Control Internal States
## nonlin.size
## 1 24.80944
## 2 24.80262
## 3 24.02665
## 4 22.97823
## 5 22.24065
## 6 21.53987
##
## $`Self Acceptance`
## var1.index var1.names var2.index var2.names
## 1 18 Stress-Problems 11 Control Internal States
## 2 12 Commitment 11 Control Internal States
## 3 11 Control Internal States 9 Negative Affect
## 4 14 Challenge 11 Control Internal States
## 5 19 Stress-Emotions 11 Control Internal States
## 6 11 Control Internal States 10 Perceived Social Control
## nonlin.size
## 1 15.19187
## 2 13.15652
## 3 13.11163
## 4 12.91632
## 5 12.90175
## 6 12.82175
Elith, J., Leathwick, J. R., & Hastie, T. (2008). A working guide to boosted regression trees. Journal of Animal Ecology, 77(4), 802-813.
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
Miller P.J., Lubke G.H, McArtor D.B., Bergeman C.S. (Submitted) Finding structure in data with multivariate tree boosting.
Ridgeway, G., Southworth, M. H., & RUnit, S. (2013). Package ‘gbm’. Viitattu, 10, 2013.
Wallace, K. A., Bergeman, C. S., & Maxwell, S. E. (2002). Predicting well-being outcomes in later life: An application of classification and regression tree (CART) analysis.