In this short vignette, we fit a sparse linear regression model with up to \(L > 0\) non-zero effects. Generally, there is no harm in over-stating \(L\) (that is, the method is pretty robust to overfitting), except that computation will grow as \(L\) grows.
Here is a minimal example:
library(susieR)
set.seed(1)
n <- 1000
p <- 1000
beta <- rep(0,p)
beta[c(1,2,300,400)] <- 1
X <- matrix(rnorm(n*p),nrow=n,ncol=p)
y <- X %*% beta + rnorm(n)
res <- susie(X,y,L=10)
plot(coef(res)[-1],pch = 20)
Plot the ground-truth outcomes vs. the predicted outcomes:
Here are some details about the computing environment, including the versions of R, and the R packages, used to generate these results.
sessionInfo()
# R version 3.6.2 (2019-12-12)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS Catalina 10.15.7
#
# Matrix products: default
# BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] L0Learn_1.2.0 susieR_0.11.33
#
# loaded via a namespace (and not attached):
# [1] Rcpp_1.0.5 pillar_1.4.3 compiler_3.6.2 plyr_1.8.5
# [5] highr_0.8 tools_3.6.2 digest_0.6.23 evaluate_0.14
# [9] lifecycle_0.1.0 tibble_2.1.3 gtable_0.3.0 lattice_0.20-38
# [13] pkgconfig_2.0.3 rlang_0.4.5 Matrix_1.2-18 yaml_2.2.0
# [17] xfun_0.11 stringr_1.4.0 dplyr_0.8.3 knitr_1.26
# [21] grid_3.6.2 tidyselect_0.2.5 reshape_0.8.8 glue_1.3.1
# [25] R6_2.4.1 rmarkdown_2.3 mixsqp_0.3-46 irlba_2.3.3
# [29] reshape2_1.4.3 ggplot2_3.3.0 purrr_0.3.3 magrittr_1.5
# [33] scales_1.1.0 htmltools_0.4.0 assertthat_0.2.1 colorspace_1.4-1
# [37] stringi_1.4.3 munsell_0.5.0 crayon_1.3.4