ABSTRACT. Theta Hat is a blog about statistics, launched in February of 2009, and abandoned in April of the same year.
The odds ratio is a statistical measure used to compare the odds for a particular outcome across two groups. It is, as the name implies, a ratio of two odds. Suppose for instance that among smokers the prevalence for a particular disease is 5%, whereas among non-smokers the prevalence is 2.5%. The odds for disease amongst smokers is 0.05/(1-0.05) or approximately 0.053; the odds for disease amongst non-smokers is 0.025/(1-0.025) or approximately 0.026. The odds ratio for disease comparing smokers to non-smokers is the ratio of the two odds: 0.053/0.026, approximately 2.05.
The odds ratio is not a very intuitive measure; what does it mean, for instance, to have twice the odds of disease? Unless we spend a lot of time gambling, we are generally more accustomed to thinking about probabilities rather than odds. Thus, from a conceptual perspective, the relative risk is often preferred to the odds ratio. The relative risk, as the name implies, is a comparison of probabilities across two groups: it is the probability for an outcome amongst one group, divided by the probability for the outcome amongst a second group. In the example above, the relative risk for disease comparing smokers to non-smokers is 0.05/0.025, which is precisely 2.
Why does the odds ratio persist in the literature if the relative risk is more intuitive? Practicalities. Mathematically, it is more convenient to model odds than it is to model probabilities. Note that a probability can only take on a value between 0 and 1, and the natural log of a probability is bounded by negative infinity and 0. In comparison, the natural log of an odds can range from negative infinity to positive infinity. Logistic regression models — by far the most popular method for modeling dichotomous outcomes — exploits this property: it models the log odds of an outcome as a linear function of the predictors.
Mathematical convenience isn’t the only reason why the odds ratio is so popular: there are also practical constraints. In certain circumstances, the relative risk simply can not be appropriately calculated. A common example is situations in which the data are sampled separately by outcome — for instance, in case-control studies, diseased individuals are sampled from a registry of cases and controls are sampled from the general population of non-diseased individuals. When the data are sampled in such a manner, the relative risk computed from the sample is simply not an appropriate estimate for the relative risk in the population. The reason is intuitive: if the sampling scheme depends on the outcome, the likelihood of the outcome can not be appropriately estimated from the sample.
How are case-control data analyzed? It turns out that the odds ratio for an outcome is equal to the odds ratio for the predictor (this is a mathematical fact; and is always true). The odds ratio for exposure can be appropriately estimated from a case-control study (the sampling scheme does not depend on the predictor), and thus the odds ratio for disease can be appropriately estimated from a case-control study. Thus, analysis for case-control studies almost invariably involves the odds ratio.
But: the odds ratio is not intuitive! Fortunately, in situations when the outcome is rare, the odds ratio approximates the relative risk; this is a mathematical nicety owing to the fact that the odds of an outcome is roughly equal to the probability for the outcome if the probability is small. Thus, when the outcome is rare, and the analysis explicitly involves the odds ratio, the odds ratio is often presented as an approximation for the relative risk. Note for instance that in the example provided above, the prevalence of the disease is fairly low, and thus the odds ratio is fairly close to the relative risk (2.05 versus 2).
When is the odds ratio not a good approximation for the relative risk? When the outcome is not rare. Many situations arise when the outcome is not sufficiently rare to use the odds ratio to approximate the relative risk. For instance, in a study examining risk factors for unprotected sex amongst groups with a high risk for STDs, the outcome (unprotected sex) is probably reasonably common.
Unfortunately, it is not actually very difficult to find examples — even in generally esteemed journals and publications — in which the odds ratio is inappropriately presented as a relative risk. For example, a 2004 article in CHANCE (a general-interest statistical magazine published by the American Statistical Association) regarding field goals in (American) football, odds ratios are explicitly labeled as relative risks, even though the outcome — hitting field goals — was clearly not sufficiently rare to allow for such an approximation. Worse, a couple passages in the text imply a comparison of probability, and not in fact a comparison of odds. The author writes, for instance, that in cloudy weather there is “an estimated 20.2% increase in the probability of success on each kick.” This is simply not true. There is an estimated 20.2% increase in odds, not a 20.2% increase in risk.
Such mistakes can have unfortunate consequences. Interesting scientific research is often picked up by the popular media, and any odds ratio inappropriately presented as a relative risk in the scientific article is likely to be presented in the media as a relative risk (I’ve come across examples where this has occurred). Headline: “people who do A are 2.2 times more likely to have B.” Maybe… but if 2.2 was actually an odds ratio, then no.
Moral of the story: be careful when presenting or reading about odds ratios or relative risks; if the odds ratio is being used to approximate the relative risk, be sure that the outcome is sufficiently rare to ensure that the approximation is appropriate.