LAST UPDATE : February 26, 2013





History of epidemiology

History of epidemiologic ideas

History of Epidemiologic Methods and Concepts




S.D. Walter, McMaster University, Hamilton, Canada

Updated: 121023

Essay # 6:

How to deal with multiple causes?


This essay will continue the discussion of how epidemiologists try to establish causal associations between risk factors and disease, and specifically will extend the thinking to situations where several causes for a particular disease may be involved. We will review the challenge that multiple causality is quite common in practice, discuss how the existence of multiple risk factors might affect our interpretation of epidemiologic data, and describe the challenges in trying to establish the relative importance of each risk factor in turn. We also mention some of the strategies that epidemiologists might employ to avoid some of these difficulties, in particular by choosing appropriate study designs and careful statistical analysis.

Diseases with multiple causes

Many diseases are caused not by just one thing, but by several. For instance, the risk of cancer may be determined by lifestyle factors such as smoking or diet, but there may be genetic components as well. The risk for a particular individual of developing the cancer will be a complicated function reflecting the individual's personal set of risk exposures. As we have seen, epidemiology studies diseases in populations, and those populations are typically made up of individuals who essentially all have unique profiles of exposure. The challenge to the epidemiologist is, therefore, to sort out which particular factors might be causally related to the disease, and to try to assign responsibility for cases of disease to each of those risk factors in turn, or in other words to calculate the assigned share of the disease burden in the population to each of the relevant risk factors.

Interacting risk factors

In some situations, the disease risks associated with each of two factors may simply add up. In other words, we might suppose that the increased risk of heart disease by failing to engage in regular exercise is the same for individuals who have diets that are either rich or poor in fruit and vegetables. However, in other situations, the combined effect of two (or more) risk factors is greater than the sum of their individual risk components. For instance, in considering the pattern of oral cancers in the community, it has been found that people who smoke and who consume relatively large quantities of alcohol are at much greater risk of oral cancer than would be expected on the basis of risk increases associated with smoking or alcohol use alone. This phenomenon of increased risk effects associated with the combination of two risk factors is known as interaction.

This situation means that there are people in the community who would neither develop oral cancer only by smoking, nor by only drinking alcohol excessively, but they need both risk factors operating jointly. An analogy here is where one person is trying to push open a heavy door but does not succeed. Her friend also tries alone, but similarly does not succeed. However, when they push together, they succeed in opening the door. Here the combined effect of the forces from the two individuals is sufficient to overcome the threshold necessary to open the door. Similarly, when two factors have a strong interaction, both must be present in an individual before the risk of disease is increased.

Of course, like with any causality, we cannot witness it in individuals (see essay 5). We only observe rates or risks in populations. In the situations where two risk factors combine to produce greater risk than expected from each of their components, the interaction is sometimes referred to as positive or “synergistic”. A similar but less common situation is where two factors combined result in a reduced level of disease risk, compared to what we would expect if the risk differences for each factor were additive. For these situations the interaction may be referred to as negative or “antagonistic”.

When factors interact, the task of assigning shares of disease in a population to those risk factors becomes more complex, especially if more than two risk factors are involved. In order to estimate the contribution of each factor to disease in the population, or similarly to estimate the likely impact of preventive interventions in that population, we would need to know the number of people at each possible combination of exposure levels for the relevant risk factors, and the associated disease risks for each of those combinations. We typically will require much more extensive and detailed data to answer such questions, compared to the situations where risk factors do not interact. Unfortunately, the latter situation is most of the time a rather gross simplification.

Confounding of risk factors

A further major difficulty that confronts epidemiologists in many cases is that the exposure to various risk factors may be correlated in the population. For instance, persons who smoke may be less likely to engage in regular activity. Both smoking and the lack of physical activity are risk factors for heart disease, so it becomes a difficult task to determine which of the two factors might be more or less responsible for cases of disease. This phenomenon of correlated exposures is known as confounding. “Confounding” come from the Medieval Latin word “con-fundere” which meant “pour together.” In a certain sense, the two potential causes are ‘poured together’ and it becomes difficult which is the real agent and which is not. This leads often to confusion, which, amusingly, in the English language is linked to the other meaning of the word “confounding”, like in “Confound Thy Enemies”, which means, bring your enemies in a state of utter confusion.

It should also be noted here that risk factors can be confounded as well as interactive. Returning to the example of risk factors for oral cancer, we may find that smoking and alcohol consumption are correlated (or confounded) in the population, and additionally have interacting disease risks. Once again, this would necessitate considerably more detail in the epidemiologic data in order to reliably quantify the relative importance of each risk factor.

It is important for the epidemiologist to sort out which risk factors are truly causing cases of disease in these situations, as opposed to being simply confounded with other risk factors. On the one hand, a causal risk factor might become the target of productive preventive interventions in the population, leading to a reduction in disease risks. On the other hand, a risk factor that is simply confounded with other factors, but is not itself causal, would be a distraction and non-productive if prevention efforts were directed towards it. Considering again the risk factors of smoking and physical exercise for heart disease, it would be important to know if one or other risk factor, or both, was causing cases of disease. For instance, if lack of exercise was the predominant risk factor for heart disease, there would be relatively little gain in initiating anti-smoking programs as a way to reduce heart disease. But if smoking was the real risk factor and it simply turned out that smokers tended to have less exercise but with no implication for the risk of heart disease, then exercise promotion schemes would also be less productive. Finally, it is quite likely in fact that both risk factors are causal, in which case the question becomes which risk factor is responsible for more cases, and correspondingly which risk factor should be the primary target for prevention.

Implications for disease prevention

In general, the existence of confounding as well as interaction also means that we can never be completely certain what will happen is we reduce a risk factor for disease in a population. The results of such an intervention, might be different (less effect, more effect, or totally unexpected happenings) than imagined from epidemiologic studies.

For situations where two risk factors interact positively, and hence generate more cases of disease than would be expected from their individual exposures, we anticipate that removing exposure to even one of those factors would be relatively beneficial. Furthermore, if exposures to those risk factors are positively correlated, then preventive efforts are more easily targeted to relevant sectors of the community that are at higher risk of disease. At the other extreme, where risk factors act independently and are uncorrelated in the population, a case may be made for preventive action on either or both risk factors, in the expectation that some reduction in disease risk for the community will occur. Of course, some risk factors cannot be modified, and hence we can envisage no preventive strategies to change disease risk; obvious examples of these risk factors would include age and genetics.

Epidemiologists' tactics to deal with interaction and confounding

We have previously mentioned that one of the strengths of epidemiology is that it studies disease in the context of human populations in their communities. While this makes the results of epidemiology studies immediately and directly relevant to those communities, there is no escaping the fact that dealing with multiple causes for disease is a considerable challenge. Epidemiologists have developed a number of strategies to be used in their study designs and data analysis in order to avoid the major problems of interaction and confounding of risk factors, as outlined above. For example, one possibility is to identify sectors of the population that are potentially exposed to only one risk factor, while individuals exposed to other risk factors are excluded. For instance, in trying to clearly establish a causative role for level of physical exercise as a risk factor for heart disease, epidemiologists might impose a restriction on their study to exclude smokers. This would then provide a clearer picture of the effect of physical exercise as a potential causal risk factor, exclusive of any effect of smoking. Of course, this description is a simplification, because in practice there will be more than two risk factors involved in many diseases. Furthermore, the exclusion of a substantial sector of the population (in our example, the smokers) would reduce the relevance of the study to the whole community, and limit its generalisability to communities elsewhere.

Another design strategy employed by epidemiologists is that of matching. Suppose an epidemiologist is investigating the possible risks of cancer associated with chemical exposures in a factory workplace. To do so, he may assemble a cohort of workers who are exposed to the hazard in their particular jobs, and compare them to other workers who are matched, or have similar levels of exposure to other risk factors, such as age and smoking status. So, for instance, a 45 year old non-smoking chemical worker would be matched with a 45 year old non-smoker working elsewhere in the same industry or in some other population. By ensuring that the comparison groups of the study have identical (or at least very similar) levels of exposure of other risk factors, the epidemiologist can focus more clearly on the specific effect of the risk under investigation (here the chemical exposure).

Tactics such as matching and careful selection of study participants relate to the design of epidemiology studies, but there is a similar set of techniques associated with the statistical analysis of the data. For instance, in examining the effect of chemical exposures in the previous example, the data analysis may examine subgroups of the study participants according to their status on other risk factors such as age and smoking. Comparisons can be made between exposed and unexposed workers, within each of a number of age groups, and specifically for smokers and non-smokers. Again, the idea here is to achieve a situation in which comparison groups should be balanced with respect to all risk factors other than the one under investigation.

Unfortunately, there are many practical difficulties associated with implementing these strategies. For example, it may not be possible to identify workers who exactly fit the required risk factor profile if matching is adopted. So, for instance, if we have an exposed worker who is 43 years old, and he has smoked two packs of cigarettes a day for the last 17 years, it may not be possible to identify a similar unexposed worker who has exactly the same age and smoking history. Compromises may be required so that the other risk factors are only approximately but not exactly balanced between the comparison groups when the effect of chemical exposure is evaluated.

An additional problem is that when the data is divided up according to the levels of exposure to other risk factors, the sample size in each of those subgroups becomes progressively smaller; especially as progressively more risk factors are considered. When the sample sizes become smaller, the estimates of disease risk become correspondingly less precise, so it is difficult to reach a definitive conclusion about the impact of exposure within each of the subgroups. The investigator may need to calculate an overall average of increases in risk in disease associated with exposure, taken over all of the study subgroups, but without attempting to identify precise estimates of risk within the subgroups. When these steps are necessary, it may be difficult or impossible to determine if risk factors are interacting, for instance.


We have seen that epidemiology has become the essential method of studying problems of health and disease in the community. Because of the realities in studying free-living human populations, the difficulties of clearly identifying causal risk factors of disease and their importance in the population are substantial. Despite these difficulties, the results of epidemiology studies have immediate relevance to public health authorities, clinicians and other health care workers, and to members of the population themselves. There will be a continuing need for epidemiologists to continue their work in the future, as the pattern of older diseases changes, while newer diseases emerge. Advances in basic science through laboratory investigation, and improvements in the delivery of clinical medicine to the community lead to a constantly evolving spectrum of diseases and their associated risk factors that require investigation. The continuing efforts of epidemiologists are required to maintain constant improvements to the level of health in the population and to study advances in health care.

Contact the webmaster