Exercise 23. Calculating SMRs/SIRs


You may have to install the required packages the first time you use them. You can install a package by install.packages("package_of_interest") for each package you require.

library(biostat3) # for Surv and survfit
library(dplyr)    # for data manipulation

Load the melanoma data. Restrict the data to the localised status of the melanoma and to the 10 firts years of follow-up. Use the time-on-study as the timescale. define the event as death from cancer.

The standardized mortality ratio (SMR) is the ratio of the observed number of deaths in the study population to the number that would be expected if the study population experienced the same mortality as the standard population. It is an indirectly standardized rate. When studying disease incidence the corresponding quantity is called a standardized incidence ratio (SIR). These measures are typically used when the entire study population is considered ‘exposed’. Rather than following-up both the exposed study population and an unexposed control population and comparing the two estimated rates we instead only estimate the rate (or number of events) in the study population and compare this to the expected rate (expected number of events) for the standard population. For example, we might study disease incidence or mortality among individuals with a certain occupation (farmers, painters, airline cabin crew) or cancer incidence in a cohort exposed to ionising radiation.

In the analysis of cancer patient survival we typically estimate excess mortality (observed - expected deaths). The SMR (observed/expected deaths) is a measure of relative mortality. The estimation of observed and expected numbers of deaths are performed in an identical manner for each measure but with the SMR we assume that the effect of exposure is multiplicative to the baseline rate whereas with excess mortality we assume it is additive. Which measure, relative mortality or excess mortality, do you think is more homogeneous across age?

The following example illustrates the approach to estimating
SMRs/SIRs using R. Specifically, we will estimate SMRs for the
melanoma data using the general population mortality rates
stratified by age and calendar period (derived from
`popmort`) to estimate the expected number of deaths.
The expected mortality rates depend on current age and current year
so the approach is as follows

-   Split follow-up into 1-year age bands

-   Split the resulting data into 1-year calendar period bands

-   For each age-period band, merge with <span>popmort.dta</span> to
    obtain the expected mortality rates

-   Sum the observed and expected numbers of deaths and calculate
    the SMR (observed/expected) and a 95% CI

(a)

Start by splitting the follow-up into 1 year age bands. We do not have a date of birth available, so we can make a mid-point approximation for the age at diagnosis.

data(melanoma)
scale <- 365.24
mel <- mutate(melanoma,
              ydx=biostat3::year(dx),
              adx=age+0.5, # mid-point approximation
              dead=(status %in% c("Dead: cancer","Dead: other") & surv_mm<110)+0,
              surv_mm=pmin(110,surv_mm),
              astart=adx, 
              astop=adx+surv_mm/12)
mel.split <- survSplit(mel,
                       cut=1:110,
                       event="dead",start="astart", end="astop")
subset(mel.split, id<=2, select=c(id,astart,astop,dead))

(b)

Now split these new records into 1 year calendar period bands.

mel.split <- mutate(mel.split,
                    ystart=year(dx)+astart-adx,
                    ystop=year(dx)+astop-adx)
mel.split2 <- survSplit(mel.split,
                       cut=1970:2000,event="dead",
                       start="ystart", end="ystop") %>%
    mutate(astart=adx+ystart-ydx,
           astop=adx+ystop-ydx,
           age=floor(astop),
           year=floor(ystop),
           pt = ystop - ystart)
subset(mel.split2, id<=2, select=c(id,ystart,ystop,astart,astop,dead))

(c)

Each subject’s follow–up is now divided into small pieces corresponding to the agebands and calendar periods the subject passes through. We can make tables of deaths and person-years by age and calendar period with

xtabs(pt ~ age+year, data=mel.split2, subset = age>=50 & age<60)
xtabs(dead ~ age+year, data=mel.split2, subset = age>=50 & age<60)

As the data have been split in 1-year intervals on both time scales the table created above is not so informative. Grouped variables will provide a better overview.

(d)

To make a table of rates by age and calendar period, try

mel.split2 <- mutate(mel.split2,
                     age10=cut(age,seq(0,110,by=10),right=FALSE),
                     year10=cut(year,seq(1970,2000,by=5),right=FALSE))
head(survRate(Surv(pt,dead)~sex+age10+year10, data=mel.split2))

(e)

To calculate the expected cases for a cohort, using reference mortality rates classified by age and calendar period, it is first necessary to merge the population rates with the observed person-time. Then the expected number of cases are calculated by multiplying the follow-up time for each record by the reference rate for that record. The SMR is the ratio of the total observed cases to the total number expected.

pt <- mutate(mel.split2,sex=unclass(sex)) %>%
    group_by(sex, age, year) %>%
    summarise(pt=sum(pt))
## `summarise()` regrouping output by 'sex', 'age' (override with `.groups` argument)
expected <- inner_join(popmort, pt) %>%
    mutate(pt=ifelse(is.na(pt),0,pt)) %>%
    group_by(sex,year) %>%
    summarise(E=sum(rate*pt)) %>% ungroup
## Joining, by = c("sex", "age", "year")
## `summarise()` regrouping output by 'sex' (override with `.groups` argument)
observed <- mutate(mel.split2, sex=as.numeric(unclass(sex))) %>%
    group_by(sex, year) %>%
    summarise(O=sum(dead)) %>% ungroup
## `summarise()` regrouping output by 'sex' (override with `.groups` argument)
joint <- inner_join(observed,expected) %>%
    mutate(SMR = O/E)
## Joining, by = c("sex", "year")

(f)

We can then model the observed outcomes using Poisson regression with an offset of the log of the expected counts, or using poisson.test with the expected counts as the exposure time.

## overall SMRs
by(joint, joint$sex, function(data) poisson.test(sum(data$O), sum(data$E)))

## utility function to draw a confidence interval
polygon.ci <- function(time, interval, col="lightgrey") 
    polygon(c(time,rev(time)), c(interval[,1],rev(interval[,2])), col=col, border=col)

## modelling by calendar period
summary(fit <- glm(O ~ sex*ns(year,df=3)+offset(log(E)), data=joint, family=poisson))
##
pred <- predict(fit,type="response",newdata=mutate(joint,E=1),se.fit=TRUE)
full <- cbind(mutate(joint,fit=pred$fit), confint.predictnl(pred))
ci.cols <- c("lightgrey", "grey")
matplot(full$year, full[,c("2.5 %", "97.5 %")], type="n", ylab="SMR", xlab="Calendar year")
for (i in 1:2) {
    with(subset(full, sex==i), {
        polygon.ci(year, cbind(`2.5 %`, `97.5 %`), col=ci.cols[i])
    })
}
for (i in 1:2) {
    with(subset(full, sex==i), {
        lines(year,fit,col=i)
    })
}
legend("topright", legend=levels(mel.split2$sex), lty=1, col=1:2, bty="n")