Model Comparison and Norovirus Fit

Overview

This app demonstrates basic fitting of data to 2 simple infection models. This shows the concept of model/hypothesis testing. Read about the model in the “Model” tab. Then do the tasks described in the “What to do” tab.

The Model

Model Overview

This app fits 2 different SIR models to norovirus infection data.

Models

The overall model is a variant of the basic SIR model, with the inclusion of a process that allows infection of individuals from some common (unmodeled) source.

Model Diagram

The diagram illustrates the model.

Flow diagram for the model.

Flow diagram for the model.

Model Equations

Implementing the models as continuous-time, deterministic systems leads to the following set of ordinary differential equations:

\[ \begin{aligned} \dot S & = - nS - bSI \\ \dot I & = nS + bSI - gI \\ \dot R & = gI \\ \end{aligned} \]

Model Variants

Data source

The data being used in this app is daily new cases of norovirus for an outbreak at a school camp. See `help(‘norodata’) for more details.

Model comparison

There are different ways to evaluate how well a model fits to data, and to compare between different models. This app shows the approach of using Akaike’s “An Information Criterion” (AIC), or more precisely, we’ll use the one corrected for small sample size, AICc . If we fit by minimizing the sum of squares (SSR), as we do here, the formula for the AICc is: \[ AICc = N \log(\frac{SSR}{N})+2(K+1)+\frac{2(K+1)(K+2)}{N-K} \] where N is the number of data points, SSR is the sum of squares residual at the final fit, and K is the number of parameters being fit. A lower value means a better model. One nice feature of the AIC is that one can compare as many models as one wants without having issues with p-values and correcting for multiple comparison, and the models do not need to be nested (i.e. each smaller model being contained inside the larger models). That said, AIC has its drawbacks. Nowadays, if one has enough data available, the best approach is to evaluate model performance by a method like cross-validation (Hastie, Tibshirani, and Friedman 2011).

For evaluation of models with different AIC, there is fortunately no arbitrary, “magic” value (like p=0.05). A rule of thumb is that if models differ by AIC of more than 2, the one with the smaller one is considered statistically better supported (don’t use the word ‘significant’ since that is usually associated with a p-value<0.05, which we don’t have here). I tend to be more conservative and want AIC differences to be at least >10 before I’m willing to favor a given model. Also, I think that visual inspection of the fits is useful. If one model has a lower AIC, but the fit doesn’t look that convincing biologically (e.g. very steep increases or decreases in some quantity), I’d be careful drawing very strong conclusions.

Note that the absolute value of the AIC is unimportant and varies from dataset to dataset. Only relative differences matter. And it goes without saying that we can only compare models that are fit to exactly the same data.

What to do

The tasks below are described in a way that assumes everything is in units of days (rate parameters, therefore, have units of inverse days). If any quantity is not given in those units, you need to convert it first (e.g. if it says a week, you need to convert it to 7 days).

Task 1

Task 2

Task 3

Task 4

The SSR is clearly lower than that for model 1 and 2, and the AIC value suggests that the extra model parameters are ‘worth it’, and overall, this model should be favored.

You might have noticed that getting a good fit is tricky and often you don’t reach a good one. This comes back to the ‘optimizer getting stuck’ concept discussed in the ‘flu fitting’ app.

Further Information

References

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2011. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics). Springer.

Kuo, Hung-Wei, Daniela Schmid, Karin Schwarz, Anna-Margaretha Pichler, Heidelinde Klein, Christoph König, Alfred de Martin, and Franz Allerberger. 2009. “A Non-Foodborne Norovirus Outbreak Among School Children During a Skiing Holiday, Austria, 2007.” Wiener Klinische Wochenschrift 121 (3-4): 120–24. https://doi.org/10.1007/s00508-008-1131-1.