Exercise 25. Localised melanoma : generating and analysing a nested case-control (NCC) study


You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

library(biostat3) # cox and conditional logistic analyses
library(Epi)      # sample a nested case-control study from a cohort

Load the melanoma data. Restrict the data to the localised status of the melanoma and to the 10 firts years of follow-up. Use the time-on-study as the timescale. define the event as death from cancer.

## Get the data for exercise 25 and have a look at it
data(melanoma)
mel <- subset(melanoma, stage=="Localised")           # restrict the cohort to stage==1
mel <- transform(mel,
                 dc = (mel$status=="Dead: cancer" & surv_mm<120)+0,
                 surv_10y = pmin(120, surv_mm))
table(mel$dc , mel$status)                            # check

The Cox proportional hazard analysis is done thanks to the "coxph" command in the "survival" package. It is often a good idea to check first the structure of the variables.

str(mel$sex) ; str(mel$year8594) ; str(mel$agegrp)    #Check structure of risk factors/confounders
out_coh <- coxph(Surv(surv_10y,dc) ~ sex + year8594 + agegrp, data = mel)
summary(out_coh)

(a)

How many individuals are in the study?

n_ind <- length(mel$id)
n_ind

(b)

How many experience the event?

table(mel$dc, useNA="always")
ncase <-  table (mel$dc, useNA="always")[2]
ncase

(c1)

Generate a nested case-control study with 1 control per case. Cases and controls are matched on time since entry. This is done by using the function "cwcc" in the Epi package. Note that in the codes provided here, the variables "dc" and "id" are not necessary. They permit however to understand how the data are handled as for eample: how many individuals are sampled several times or how many cases happened to be sampled as controls before their failure time.

nccdata <-ccwc( entry=0, exit=surv_10y , fail=dc, origin=0, controls=1, include=list(sex,year8594,agegrp,dc,id), data=mel )
tail(nccdata, 8)

(c2)

Analyse the nested case-control data (with the survival package or the Epi package) and function "clogit". As the ncc data was generated with the Epi package, we use the syntax of this package.

out_ncc <- clogit(Fail ~ sex + year8594 + agegrp + strata(Set), data=nccdata)
summary(out_ncc)

(d)

How many unique individuals are in our study?

n_uni <- length(unique(nccdata$id))
n_uni

(e)

Compare the estimated parameters and standard errors between the full cohort anal- ysis and the nested case-control study. What is the relative efficiency (ratio of vari- ances) of the nested case-control compared to the full cohort design?

comp <- exp(data.frame(coef(out_coh),coef(out_ncc)))
colnames(comp) <- c("cohort HR", "NCC HR")
comp                           # print the HR estimates from the cohort and the NCC
var <- data.frame(diag(vcov(out_coh)), diag(vcov(out_ncc)), diag(vcov(out_coh))/diag(vcov(out_ncc)))
colnames(var) <- c("cohort var", "NCC var", "ratio coh/ncc" )
var                            # print the variances of estimates from the cohort and the NCC and their ratio

(f)

Generate several nested case-control study and analyse them. A loop is generated with M sampling of NCC with 1 control per case. The codes provide the estimated HR for each loop i an data frame in which the first line contains the cohort's HR. The codes provide also a summary table with the cohort's HR and the mean and sd of the HR provided by the M loops. The histograms for each of the variables include a green vertical line at the cohort's HR value and a red line at the HR loops' mean.

M <- 20                   # Number of loops: change the M value to change the number of loops
param   <- matrix(0,M,5)  # Define the matrice of the coefficients
for (i in 1:M)  {         # Start of the loop, create NCC data and analyse it
    nccdata <-ccwc( entry=0, exit=surv_10y , fail=dc, origin=0, controls=1, include=list(sex,year8594,agegrp), data=mel )
    out_ncc <- clogit(Fail ~ sex + year8594 + agegrp + strata(Set), data=nccdata)
    param [ i , 1:5] <- coef(out_ncc)   # store the 5 coefficients M times
}                  # End of the loop