We introduce a Statistical Approach via Pseudo-value Information and Estimation for Differential Network Analysis (SOHPIE; pronounced as “Sofie”) [1]. This is a regression modeling method for differential network (DN) analysis that can include covariate information in analyzing microbiome data.
Please install these R packages prior to use SOHPIE-DNA.
# library(robustbase) # To fit a robust regression.
# library(parallel) # To use mclapply() when reestimating the association matrix.
# library(dplyr) # For the convenience of tabulating p-values, coefficients, and q-values.
# library(fdrtool) # For false discovery rate control.
# library(gtools) # To estimate an association matrix via SparCC.
Two sample datasets are available in this package. One (combinedamgut
) is from the American Gut Project [2] and contains 138 taxa and 268 subjects. In this user manual, the first 30 out of 138 taxa will be used for the simple demonstration purpose. The other (combineddietswap
) is from the geographical epidemiology study of diet swap intervention [3] that includes 112 taxa with 37 subjects (20 African Americans from Pittsburgh and 17 rural South Africans). The full data of each study are available in the SpiecEasi and microbiome R packages, respectively.
The main grouping variable will be the indicator variable for the status of living with a dog. After the data processing, the indices of subjects will be available for each ‘Not living with a dog (Group A)’ vs. ‘Living with a dog (Group B).’ We need these indices for the estimation of group-specific \(p \times p\) association matrices (and re-estimation of association matrices for pseudo-value calculations later).
# Note: Again, we will use a toy example with the first 30 out of 138 taxa.
OTUtab = combinedamgut[ , 8:37]
# Clinical/demographic covariates (phenotypic data):
# Note: All of these covariates in phenodat below will be included in the regression
# when you use SOHPIE_DNA function later. Please make sure
# phenodat below include variables that will be analyzed only.
phenodat = combinedamgut[, 1:7] # first column is ID, so not using it.
# Obtain indices of each grouping factor.
# In this example, a variable indicating the status of living with a dog was chosen (i.e. bin_dog).
# Accordingly, Groups A and B imply living without and with a dog, respectively.
newindex_grpA = which(combinedamgut$bin_dog == 0)
newindex_grpB = which(combinedamgut$bin_dog == 1)
Upon our data processing step above is complete, we can then fit a pseudo-value regression using SOHPIE_DNA
function. An important note! Please provide the object name of each OTU table and clinical/demographic data (i.e. metadata) separately in the function. In addition, you must indicate the object names of the indices for each group of a binary indicator variable that is used as a main predictor variable (e.g. living with a dog vs. without a dog).
Now, I would like to show you that SOHPIE has some convenient tools/functions after fitting a pseudo-value regression. There are functions that you can quickly extract names of taxa that are significantly differentially connected (DC; DCtaxa_tab
), as well as adjusted p-values (q-values; qval
and qval_specific_var
) and coefficient estimates (coeff
and coeff_specific_var
) of all variables that are considered in the regression or a specific variable.
# qval() function will get you a table with q-values.
qval(SOHPIEres)
#> bin_dog age sex bin_floss bin_exercise cat_alcohol1
#> 326792 0.65184904 0.1389609733 0.90617331 0.05472762 0.7368758 0.27482645
#> 348374 0.58300569 0.2463704326 0.58241648 0.38025648 0.2493295 0.40114472
#> 181016 0.48996325 0.0913125647 0.87246819 0.56765606 0.4433666 0.48024987
#> 191687 0.57750814 0.0110468105 0.86275115 0.10030278 0.5383121 0.26896567
#> 305760 0.28750529 0.1752438021 0.60603659 0.58931824 0.6217049 0.09825625
#> 326977 0.20931924 0.5668848235 0.67190001 0.12065409 0.3597920 0.28059659
#> 194648 0.22253539 0.5332629827 0.64896236 0.05472762 0.7944077 0.22605114
#> 28186 0.36090220 0.0004095281 0.61171421 0.23415637 0.7085595 0.28961734
#> 541301 0.37686808 0.0050424024 0.79392896 0.11940747 0.7252734 0.47820453
#> 198941 0.15497052 0.5340275405 0.90737371 0.57932699 0.6579054 0.23606726
#> 353985 0.24596522 0.6198580596 0.85785119 0.55661429 0.2023414 0.02210682
#> 187524 0.59541018 0.6449340252 0.84454413 0.26575684 0.5515914 0.54860345
#> 182054 0.66167577 0.0886202057 0.38057886 0.26008467 0.7117856 0.25855048
#> 175537 0.48493346 0.4758601016 0.52449237 0.05472762 0.8121989 0.28767737
#> 9753 0.45091765 0.2809900536 0.91410489 0.53374251 0.6951400 0.06911578
#> 194211 0.23408249 0.4527357227 0.78797884 0.47187429 0.4613934 0.55318493
#> 188518 0.24425275 0.3958358319 0.91328534 0.69730083 0.6444959 0.20049119
#> 189396 0.41900690 0.2679203541 0.87290951 0.69564553 0.6466805 0.23556564
#> 90487 0.68779637 0.0722434282 0.90914994 0.55294579 0.6048882 0.20028213
#> 203708 0.67757620 0.6123421255 0.61715647 0.10261804 0.8063447 0.31020069
#> 173965 0.24713952 0.2124947280 0.88580005 0.38506751 0.8048103 0.24902645
#> 194661 0.01218206 0.5046086741 0.86654567 0.50931051 0.7330917 0.37324020
#> 512309 0.60922763 0.5682693101 0.81455958 0.40831356 0.6600038 0.54252226
#> 170124 0.66152769 0.0016034816 0.77208715 0.65814980 0.7107611 0.28570121
#> 216862 0.44045115 0.5521486658 0.87933814 0.08408232 0.4631340 0.32678431
#> 352304 0.66717079 0.5200318452 0.84973277 0.65578900 0.5957770 0.27854988
#> 191306 0.60299785 0.0783352663 0.07426693 0.50851045 0.2885065 0.37913124
#> 191541 0.66687266 0.6408471387 0.82777326 0.06454390 0.4227028 0.23663949
#> 191547 0.24657357 0.3218825745 0.86869995 0.18362275 0.7720381 0.42548884
#> 195493 0.42819397 0.2502292584 0.90741229 0.65392874 0.7755484 0.53809805
#> cat_alcohol2 bin_migraine
#> 326792 0.28342500 0.1994183306
#> 348374 0.24695632 0.2963821197
#> 181016 0.27372862 0.6781458306
#> 191687 0.24144296 0.1488682335
#> 305760 0.31904174 0.6720119714
#> 326977 0.07813721 0.6459028932
#> 194648 0.45471383 0.6835435515
#> 28186 0.27062487 0.6299888603
#> 541301 0.50805246 0.1914264363
#> 198941 0.20841234 0.3120604923
#> 353985 0.17087028 0.1755102727
#> 187524 0.43797294 0.0006258633
#> 182054 0.25053377 0.5509849337
#> 175537 0.38832256 0.4063732597
#> 9753 0.25576680 0.3182100461
#> 194211 0.33959878 0.3117821676
#> 188518 0.07813721 0.5813385833
#> 189396 0.26160370 0.2780263116
#> 90487 0.22589619 0.2071820407
#> 203708 0.25828792 0.0798324461
#> 173965 0.24483846 0.5675864474
#> 194661 0.30264968 0.4067857459
#> 512309 0.28074548 0.5802520112
#> 170124 0.42037666 0.2985925591
#> 216862 0.38326730 0.5371295483
#> 352304 0.47081255 0.2764918761
#> 191306 0.26664805 0.6517030651
#> 191541 0.26384779 0.4748121853
#> 191547 0.27502704 0.4780347064
#> 195493 0.26806946 0.5061621734
qval_specific_var
function will be useful to retrieve the q-values of a specific variable, bin_dog
in this example.
# Create an object to keep the table with q-values.
qvaltab <- qval(SOHPIEres)
# Retrieve a vector of q-values for a single variable of interest.
qval_specific_var(qvaltab = qvaltab, varname = "bin_dog")
#> bin_dog
#> 326792 0.65184904
#> 348374 0.58300569
#> 181016 0.48996325
#> 191687 0.57750814
#> 305760 0.28750529
#> 326977 0.20931924
#> 194648 0.22253539
#> 28186 0.36090220
#> 541301 0.37686808
#> 198941 0.15497052
#> 353985 0.24596522
#> 187524 0.59541018
#> 182054 0.66167577
#> 175537 0.48493346
#> 9753 0.45091765
#> 194211 0.23408249
#> 188518 0.24425275
#> 189396 0.41900690
#> 90487 0.68779637
#> 203708 0.67757620
#> 173965 0.24713952
#> 194661 0.01218206
#> 512309 0.60922763
#> 170124 0.66152769
#> 216862 0.44045115
#> 352304 0.66717079
#> 191306 0.60299785
#> 191541 0.66687266
#> 191547 0.24657357
#> 195493 0.42819397
DCtaxa_tab
will return a list containing of (1) names and q-values of taxa that are significantly DC between two biological conditions and (2) names of DC taxa only.
# Please do NOT forget to provide the name of variable in DCtaxa_tab(groupvar = )
# and the level of significance (0.3 in this example).
DCtaxa_tab <- DCtaxa_tab(qvaltab = qvaltab, groupvar = "bin_dog", alpha = 0.3)
DCtaxa_tab
#> $DCtaxa_complete_tab
#> bin_dog
#> 305760 0.28750529
#> 326977 0.20931924
#> 194648 0.22253539
#> 198941 0.15497052
#> 353985 0.24596522
#> 194211 0.23408249
#> 188518 0.24425275
#> 173965 0.24713952
#> 194661 0.01218206
#> 191547 0.24657357
#>
#> $DCtaxa_names_only
#> [1] "305760" "326977" "194648" "198941" "353985" "194211" "188518" "173965"
#> [9] "194661" "191547"
[1] Ahn S, Datta S. (2023). Differential Co-Abundance Network Analyses for Microbiome Data Adjusted for Clinical Covariates Using Jackknife Pseudo-Values. Under Review at \(\textit{BMC Bioinformatics}\).
[2] McDonald D. et al. (2018). American Gut: an Open Platform for Citizen Science Microbiome Research. \(\textit{mSystems}\). 3(3), e00031–18
[3] O’Keefe SJ. et al. (2015). Fat, fibre and cancer risk in African Americans and rural Africans. \(\textit{Nat Commun}\). 6, 6342