The dendroTools R package provides some novel dendroclimatological methods to analyse the relationship between tree-ring and environmental data. In this document we give details about how to use our package. All the data included in examples bellow is a part of the dendroTools R package.
Please note, the examples presented here are made computationally less intensive to satisfy the CRAN policy. You are invited to explore the full potential of our package by using the wide range of possible window widths.
The daily_reponse() works by sliding a moving window through daily environmental data and calculates statistical metrics with one or more tree ring proxies. Possible metrics are correlation coefficient, coefficient of determination (r squared) or adjusted coefficient of determination (adjusted r squared). In addition to linear regression, it is possible to use nonlinear artificial neural network with Bayesian regularization training algorithm (brnn). In general, user can use a fixed or progressive window for calculations of moving averages. To use a fixed window, select its width by assigning an integer to the argument fixed_width. To use many different windows, define the lower_limit and upper_limit arguments. In this case, all window widths between the lower and upper limits will be considered. In the later text, window width representative of a specific day of year (DOY) is defined as the values for this particular day and number of subsequent days corresponding to window width. All calculated metrics are stored in a matrix, which is later used to find the optimal window (i.e. optimal consecutive sequence of days), that returns the highest calculated metric. This selection is later used to calculate temporal stability and cross validation.
In this example, we analyse the relationship between MVA and daily temperature data for wind width of 60 days (argument window_width = 60). Importantly, the row_names_subset argument is set to TRUE. This argument automatically subsets both data frames (i.e., environmental and tree-ring data) and keep only matching years, which are used for calculations. To use this feature, years must be included as row names. All insignificant correlations are removed by setting the argument remove_insignificant to TRUE. The threshold for significance is set with the alpha argument.
# Load the dendroTools R package
library(dendroTools)
# Load data
data(data_MVA)
data(LJ_daily_temperatures)
# Example with fixed width
example_fixed_width <- daily_response(response = data_MVA, env_data = LJ_daily_temperatures,
method = "cor", fixed_width = 60,
row_names_subset = TRUE, remove_insignificant = TRUE,
alpha = 0.05)
example_fixed_width$plot_extreme
Figure 1: The MVA parameter contains the optimal temperature signal from March 14 (DOY 73) to May 12 (DOY 132).
In the exploration of tree-climate relationships, researchers might be interested in how this relationship is different between past and present. The key argument for such analysis is subset_years, which defines the subset of years to be analysed. In the following example, we will analyse the relationship between MVA and daily temperature data for two periods, 1940 – 1980 and 1981 – 2010. All possible window widths between 30 and 70 days, including the previous year, will be considered. The latter is achieved by setting the previous_year argument to TRUE. Importantly, the row_names_subset argument is set to TRUE. This argument automatically subsets both data frames (i.e., environmental and tree-ring data) and keep only matching years, which are used for calculations. To use this feature, years must be included as row names. All insignificant correlations are removed by setting the argument remove_insignificant to TRUE. The threshold for significance is set with the alpha argument. After the function converges, we will plot the temporal patterns for past and present. In addition, we are interested in specific window width with span of 60 days (use plot_specific_window argument).
# Load the dendroTools R package
library(dendroTools)
# Load the data
data(data_MVA)
data(LJ_daily_temperatures)
# Example for past and present
example_MVA_past <- daily_response(response = data_MVA, env_data = LJ_daily_temperatures,
method = "cor", lower_limit = 30, upper_limit = 70,
row_names_subset = TRUE, previous_year = TRUE,
remove_insignificant = TRUE, alpha = 0.05,
plot_specific_window = 60, subset_years = c(1940, 1980))
example_MVA_present <- daily_response(response = data_MVA, env_data = LJ_daily_temperatures,
method = "cor", lower_limit = 30, upper_limit = 70,
row_names_subset = TRUE, previous_year = TRUE,
remove_insignificant = TRUE, alpha = 0.05,
plot_specific_window = 60, subset_years = c(1981, 2010))
example_MVA_past$plot_heatmap
Figure 2: The temporal correlations pattern for the past
example_MVA_present$plot_heatmap
Figure 3: The temporal correlations pattern for the present
example_MVA_past$plot_specific
Figure 4: The temporal correlations for pre-defined window of 60 days for past years
example_MVA_present$plot_specific
Figure 5: The temporal correlations for pre-defined window of 60 days for present years
The climate-tree relationships between past and present are similar. However, there are higher correlations with previous year temperatures for the present.
Principal Component Analysis (PCA) is commonly used with tree-ring data to reduce the full set of original tree-ring chronologies to a more manageable set of transformed variables. These transformed variables, the set of principal component scores, are then used as predictors in climate reconstruction models. The PCA also acts to strengthen the common regional-scale climate response within a group of tree-ring chronologies by concentrating the common signal in the components with the largest eigenvalues.
To use PCA regression within the daily_response(), set the argument PCA_transformation to TRUE. All variables in the response data frame will be transformed using PCA transformation. If the parameter log_preprocess is set to TRUE, variables will be transformed with logarithmic transformation before used in PCA. With the argument components_selection, we specify how to select PC scores that will be used as predictors. There are three options: “automatic”, “manual” and “plot_selection”. If argument is set to “automatic”, all PC scores with eigenvalues above 1 will be selected. This threshold could be changed by changing the eigenvalues_threshold argument. If argument is set to “manual”, user should set the number of components with N_components argument. If components_selection is set to “plot_selection”, A scree plot will be shown, and a user must manually enter the number of components to be used as predictors. The latter seems to be the most reasonable choice.
In our example, we use PCA on 10 individual Mean Vessel Area (MVA) chronologies (example_proxies_individual). For climate data, LJ_daily_temperatures data is used. The selection of components is set to “manual”, N_components of components to be used in the later analysis is set to 2. All window widths between 30 and 70 days, will be considered. Importantly, the row_names_subset argument is set to TRUE. This argument automatically subsets both data frames (i.e., env_data and response) and keep only matching years, which are used for calculations. To use this feature, years must be included as row names. All insignificant correlations are removed by setting the argument remove_insignificant to TRUE. The threshold for significance is controlled by the alpha argument.
# Load the dendroTools R package
library(dendroTools)
# Load data
data(example_proxies_individual)
data(LJ_daily_temperatures)
# Example PCA
example_PCA <- daily_response(response = example_proxies_individual,
env_data = LJ_daily_temperatures, method = "lm",
lower_limit = 30, upper_limit = 70,
row_names_subset = TRUE, remove_insignificant = TRUE,
alpha = 0.01, PCA_transformation = TRUE,
components_selection = "manual", N_components = 2)
# Get the summary statistics for the PCA
summary(example_PCA$PCA_output)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 1.8691406 1.5007125 1.1867201 0.87168877 0.79432916
## Proportion of Variance 0.3493686 0.2252138 0.1408305 0.07598413 0.06309588
## Cumulative Proportion 0.3493686 0.5745824 0.7154129 0.79139704 0.85449292
## Comp.6 Comp.7 Comp.8 Comp.9
## Standard deviation 0.68818504 0.61467322 0.57233447 0.49611738
## Proportion of Variance 0.04735987 0.03778232 0.03275667 0.02461325
## Cumulative Proportion 0.90185279 0.93963510 0.97239178 0.99700502
## Comp.10
## Standard deviation 0.173060035
## Proportion of Variance 0.002994978
## Cumulative Proportion 1.000000000
example_PCA$plot_heatmap
Figure 6: The temporal pattern for the r.squared. The highest coefficients of determination were calculated for DOY around 90 with span of 2 months.
The daily_reponse() recognizes negative relationship (negative correlations) between proxy and daily climate data. Here we show a simple example with negative correlation coefficients. For the demonstration, we use data_TRW_1 and LJ_daily_temperatures.
# Load the dendroTools R package
library(dendroTools)
# Load data
data(data_TRW_1)
data(LJ_daily_temperatures)
# Example negative correlations
data(data_TRW_1)
example_neg_cor <- daily_response(response = data_TRW_1, env_data = LJ_daily_temperatures,
method = "cor", lower_limit = 30, upper_limit = 70,
row_names_subset = TRUE, remove_insignificant = TRUE,
alpha = 0.05)
example_neg_cor$plot_heatmap
Figure 7: The temporal pattern of correlation coefficients.
example_neg_cor$plot_extreme
Figure 8: The highest calculated correlation coefficient.
# The temporal stability of correlations
example_neg_cor$temporal_stability
## Period correaltion
## 1 1961 - 1971 -0.534
## 2 1972 - 1981 -0.554
## 3 1982 - 1991 0.002
## 4 1992 - 2001 -0.335
## 5 2002 - 2012 -0.486
Reconstructions of past climate conditions include reconstructions of past temperature, precipitation, vegetation, streamflow, sea surface temperature, and other climatic or climate-dependent conditions. With the daily_response(), we first find the optimal sequence of consecutive days that returns the highest metric. In this example, we use data_TRW and daily temperatures from Kredarica (KRE_daily_temperatures). The temporal stability of statistical relationship will be analysed with “progressive” method, using 3 splits (k = 3). Progressive check splits data into k splits, calculates metric for the first split and then progressively adds 1 split at a time and calculates selected metric. The argument cross_validation_type is set to “randomized”, so years will be shuffled before the cross-validation test.
# Load the dendroTools R package
library(dendroTools)
# Load data
data(data_TRW)
data(KRE_daily_temperatures)
example_reconstruction_lin <- daily_response(response = data_TRW,
env_data = KRE_daily_temperatures,
method = "lm", metric = "r.squared",
lower_limit = 30, upper_limit = 70,
row_names_subset = TRUE,
temporal_stability_check = "progressive",
cross_validation_type = "randomized", k = 3)
example_reconstruction_lin$plot_extreme
Figure 9: The highest calculated coefficient of determination.
The highest r squared was calculated for the period from May 15 to June 27. The aggregated (averaged) daily data for this perod is saved in a data frame $optimized_return. Before reconstructing the May 15 to June 27 mean temperature, we should check temporal stability, cross validation and transfer function.
example_reconstruction_lin$temporal_stability
## Period r.squared
## 1 1955 - 1963 0.098
## 2 1955 - 1972 0.222
## 3 1955 - 1981 0.362
example_reconstruction_lin$cross_validation
## CV Period cor RMSE RE CE
## 1 1 Calibration 0.5028338 0.7240896 NA NA
## 2 1 Validation 0.6240327 0.9963485 0.44991226 0.07910855
## 3 2 Calibration 0.5559560 0.8068514 NA NA
## 4 2 Validation 0.6677315 0.8160608 0.45762402 0.15541709
## 5 3 Calibration 0.6717064 0.8075798 NA NA
## 6 3 Validation 0.1817090 0.7697515 0.01436086 -0.04833285
example_reconstruction_lin$transfer_function
Figure 10: Linear transfer function
A sample code for climate reconstruction with the lm method:
linear_model <- lm(Optimized_return ~ TRW, data = example_reconstruction_lin$optimized_return)
reconstruction <- data.frame(predictions = predict(linear_model, newdata = data_TRW))
plot(row.names(data_TRW), reconstruction$predictions, type = "l", xlab = "Year", ylab = "Mean temperature May 15 - Jun 27 [ºC]")
Figure 11: The reconstructed average temperature May 15 - June 27 with linear model
In this example, we use the same data and procedure as in previous one, with the exception of method. Here, we perform a climate reconstruction with the brnn method.
# Load the dendroTools and brnn R package
library(dendroTools)
library(brnn)
## Loading required package: Formula
# Load data
data(data_TRW)
data(KRE_daily_temperatures)
example_reconstruction_brnn <- daily_response(response = data_TRW,
env_data = KRE_daily_temperatures,
method = "brnn", metric = "r.squared",
lower_limit = 40, upper_limit = 45,
row_names_subset = TRUE,
temporal_stability_check = "progressive",
cross_validation_type = "randomized", k = 3)
example_reconstruction_brnn$plot_extreme
Figure 12: The highest calculated coefficient of determination for the brnn model.
The results for the brnn model are the same as for the linear model. The highest r squared was calculated for the period from May 15 to June 27. The aggregated (averaged) daily data for this perod is saved in a data frame $optimized_return. Before reconstructing the May 15 to June 27 mean temperature, we should check temporal stability, cross validation and transfer function.
example_reconstruction_brnn$temporal_stability
## Period r.squared
## 1 1955 - 1963 -0.013
## 2 1955 - 1972 -0.002
## 3 1955 - 1981 0.348
example_reconstruction_brnn$cross_validation
## CV Period cor RMSE RE CE
## 1 1 Calibration 0.5642525 0.6635147 NA NA
## 2 1 Validation 0.5912647 1.0713228 0.3302211 0.3059996
## 3 2 Calibration 0.6148061 0.8777381 NA NA
## 4 2 Validation 0.4283027 0.6246597 0.2961269 0.1440352
## 5 3 Calibration 0.5810958 0.8562066 NA NA
## 6 3 Validation 0.6336912 0.7064833 0.3670393 0.3631613
example_reconstruction_brnn$transfer_function
Figure 13: Nonlinear brnn transfer function
A sample code for climate reconstruction with the brnn method:
brnn_model <- brnn(Optimized_return ~ TRW, data = example_reconstruction_brnn$optimized_return)
## Number of parameters (weights and biases) to estimate: 6
## Nguyen-Widrow method
## Scaling factor= 1.4
## gamma= 1.9577 alpha= 0.9523 beta= 3.7442
reconstruction <- data.frame(predictions = predict(brnn_model, newdata = data_TRW))
plot(row.names(data_TRW),reconstruction$predictions, type = "l", xlab = "Year",
ylab = "Mean temperature May 15 - Jun 27 [ºC]")
Figure 14: The reconstructed average temperature May 15 - June 27 with the nonlinear brnn model
The current version of the daily_response() does not allow for using more than 1 daily environmental data, but the analysis with, e.g., daily temperature and precipitation data could be done separately and visualised together. In the following example, we analyse the relationship between Mean Vessel Area (MVA) chronology and daily temperature and precipitation data for the period 1960 – 2010. The period is defined with the subset_ years argument.
The daily precipitation data is in tidy form, i.e. data frame with three columns representing “Year”, “DOY” and “Precipitation”. When tidy data is used, the argument tidy_env_data has to be set to TRUE.
# Load the dendroTools and brnn R package
library(dendroTools)
# Load data
data(data_MVA)
data(LJ_daily_temperatures)
data(LJ_daily_precipitation)
# Example with precipitation and temperatures
example_MVA_TEMP <- daily_response(response = data_MVA, env_data = LJ_daily_temperatures,
method = "cor", lower_limit = 30, upper_limit = 70,
row_names_subset = TRUE, previous_year = FALSE,
remove_insignificant = TRUE, alpha = 0.05,
tidy_env_data = FALSE, subset_years = c(1960, 2010))
example_MVA_PREC <- daily_response(response = data_MVA, env_data = LJ_daily_precipitation,
method = "cor", lower_limit = 30, upper_limit = 70,
row_names_subset = TRUE, previous_year = FALSE,
remove_insignificant = TRUE, alpha = 0.05,
tidy_env_data = TRUE, subset_years = c(1960, 2010))
library(gridExtra)
grid.arrange(example_MVA_TEMP$plot_heatmap, example_MVA_PREC$plot_heatmap)
Figure 15: The temporal pattern of correlatioins for temperatures (up) and precipitation (down)
grid.arrange(example_MVA_TEMP$plot_extreme, example_MVA_PREC$plot_extreme)
Figure 16: The highest calculated correlations for temperatures (up) and precipitation (down)
The daily_response() enables the analysis of the relationship between daily environmental data and more than one response (proxy) variable, simultaneously. However, users should select multiple proxies reasonably and with caution, since there is nothing to prevent from including colinear variables. Using several proxies will result in higher explained variance but at the cost of degrees of freedom. In those cases, users should check the temporal stability of relationship and the cross-validation result. If metrics on validation data are much lower than on calibration data, there is a problem of overfitting and users should exclude some proxies and repeat the analysis.
# Load the dendroTools and brnn R package
library(dendroTools)
# Example of multiproxy analysis
data(example_proxies_1)
data(LJ_daily_temperatures)
# Summary of the example_proxies_1 data frame
summary(example_proxies_1)
## MVA O18 TRW
## Min. :0.03812 Min. :-2.12315 Min. :0.7851
## 1st Qu.:0.04204 1st Qu.:-0.54584 1st Qu.:0.8918
## Median :0.04482 Median :-0.04609 Median :1.0049
## Mean :0.04532 Mean : 0.04739 Mean :1.0197
## 3rd Qu.:0.04752 3rd Qu.: 0.48215 3rd Qu.:1.1159
## Max. :0.05608 Max. : 2.66724 Max. :1.3782
There are three proxy variables: Tree-ring width (TRW), stable oxygen isotope ratio (O18) and Mean Vessel Area (MVA).
cor(example_proxies_1)
## MVA O18 TRW
## MVA 1.0000000 0.2114934 0.0342514
## O18 0.2114934 1.0000000 0.2702639
## TRW 0.0342514 0.2702639 1.0000000
The correlation matrix shows only low correlations among the three proxies.
example_multiproxy <- daily_response(response = example_proxies_1,
env_data = LJ_daily_temperatures,
method = "lm", metric = "adj.r.squared",
lower_limit = 30, upper_limit = 70,
row_names_subset = TRUE, previous_year = FALSE,
remove_insignificant = TRUE, alpha = 0.05)
example_multiproxy$plot_heatmap
Figure 17: The temporal pattern of r squared for the multiproxy example)
The temporal pattern shows similar result than for the example with MVA only. Therefore, the highest percentage of explained variance is probable due to the MVA variable. TRW and O18 are probable not significant variables. Let’s check this out.
liner_model <- lm(Optimized_return ~ ., data = example_multiproxy$optimized_return)
summary(liner_model)
##
## Call:
## lm(formula = Optimized_return ~ ., data = example_multiproxy$optimized_return)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6020 -0.5727 -0.2066 0.5311 2.1176
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.33773 1.53493 -0.872 0.388
## MVA 255.31565 27.25022 9.369 1.13e-12 ***
## O18 -0.02921 0.13109 -0.223 0.825
## TRW 1.11592 0.86094 1.296 0.201
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8556 on 51 degrees of freedom
## Multiple R-squared: 0.6472, Adjusted R-squared: 0.6265
## F-statistic: 31.19 on 3 and 51 DF, p-value: 1.358e-11
For the multiproxy approach, there is no transfer function created.
example_multiproxy$transfer_function
## [1] "No transfer function is created for two or more response variables"