The goal of surveysd is to combine all necessary steps to use calibrated bootstrapping with custom estimation functions. This vignette will cover the usage of the most important functions. For insights in the theory used in this package, refer to vignette("methodology")
.
A test data set based on data(eusilc, package = "laeken")
can be created with demo.eusilc()
library(surveysd)
set.seed(1234)
eusilc <- demo.eusilc(n = 2, prettyNames = TRUE)
eusilc[1:5, .(year, povertyRisk, gender, pWeight)]
year | povertyRisk | gender | pWeight |
---|---|---|---|
2010 | FALSE | female | 504.5696 |
2010 | FALSE | male | 504.5696 |
2010 | FALSE | male | 504.5696 |
2010 | FALSE | female | 493.3824 |
2010 | FALSE | male | 493.3824 |
Use stratified resampling without replacement to generate 10 samples. Those samples are consistent with respect to the reference periods.
Calibrate each sample according to the distribution of gender
(on a personal level) and region
(on a household level).
dat_boot_calib <- recalib(dat_boot, conP.var = "gender", conH.var = "region",
epsP = 1e-2, epsH = 2.5e-2, verbose = FALSE)
dat_boot_calib[1:5, .(year, povertyRisk, gender, pWeight, w1, w2, w3, w4)]
year | povertyRisk | gender | pWeight | w1 | w2 | w3 | w4 |
---|---|---|---|---|---|---|---|
2010 | FALSE | female | 504.5696 | 1025.360 | 0.4581938 | 0.4456302 | 0.4520549 |
2010 | FALSE | male | 504.5696 | 1025.360 | 0.4581938 | 0.4456302 | 0.4520549 |
2010 | FALSE | male | 504.5696 | 1025.360 | 0.4581938 | 0.4456302 | 0.4520549 |
2011 | FALSE | female | 504.5696 | 1024.862 | 0.4721126 | 0.4582807 | 0.4608312 |
2011 | FALSE | male | 504.5696 | 1024.862 | 0.4721126 | 0.4582807 | 0.4608312 |
Estimate relative amount of persons at risk of poverty per period and gender
.
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = "gender")
err.est$Estimates
year | n | N | gender | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|
2010 | 7267 | 3979572 | male | 12.02660 | 0.5882841 |
2010 | 7560 | 4202650 | female | 16.73351 | 0.7473909 |
2010 | 14827 | 8182222 | NA | 14.44422 | 0.6626295 |
2011 | 7267 | 3979572 | male | 12.81921 | 0.6059190 |
2011 | 7560 | 4202650 | female | 16.62488 | 0.7355060 |
2011 | 14827 | 8182222 | NA | 14.77393 | 0.6631967 |
The output contains estimates (val_povertyRisk
) as well as standard errors (stE_povertyRisk
) measured in percent. The rows with gender = NA
denotes the aggregate over all genders for the corresponding year.
Estimate relative amount of persons at risk of poverty per period for each region
, gender
, and combination of both.
group <- list("gender", "region", c("gender", "region"))
err.est <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = group)
head(err.est$Estimates)
year | n | N | gender | region | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|---|
2010 | 261 | 122741.8 | male | Burgenland | 17.414524 | 3.831697 |
2010 | 288 | 137822.2 | female | Burgenland | 21.432598 | 3.243412 |
2010 | 359 | 182732.9 | male | Vorarlberg | 12.973259 | 1.869263 |
2010 | 374 | 194622.1 | female | Vorarlberg | 19.883637 | 3.112974 |
2010 | 440 | 253143.7 | male | Salzburg | 9.156964 | 1.809600 |
2010 | 484 | 282307.3 | female | Salzburg | 17.939382 | 2.587059 |