**PEOPLE'S EPIDEMIOLOGY LIBRARY ** |

**PEL ESSAYS ON EPIDEMIOLOGY**

**S.D. Walter, McMaster University, Hamilton, Canada **

**Updated: 120227**

__Essay # 2: __

__How to Count?__

In the essay, we will review how epidemiologists count the number of people at risk of disease, and the number of people who have a disease. These counts are used to calculate risks of disease in various parts of the population or for various time periods.

__Counting__

Epidemiologists often wish to identify groups of individuals and calculate their risk of disease. [Elandt-Johnson, 1975]; [Miettinen, 1976];[Morgenstern et al., 1980] To do so, we need to know two things: the number of people who have the disease, and the number of people in the part of the population that we wish to study. For example, we might be interested in the disease risk of men aged 50 - 59 years. The number of men in that age group who are potentially at risk of having or developing the diseases that are under study forms the *denominator* of a risk estimate.

The number of people who develop the disease forms the *numerator* of the risk. For example, suppose that a particular study involves 1000 men in the 50-59 year age group, and that we find 15 of those men develop lung cancer over 10 years. The 10-year lung cancer risk in the population would therefore be 15/1000 or 1.5%.
__Sources of count data__

The idea of counting people in populations was applied before the 17th century. Censuses/registries of the population were taken in ancient civilizations such as the Egyptians, the Romans and during the 2000 years of existence of the Chinese Empire. The purposes were mainly associated with tax collection and military recruitment. The first known uses of counts in populations for health purposes are from the 17th century; for instance, Graunt analysed the Bills of Mortality which had been centralized and published on a weekly basis around 1600, as a way to monitor the recurrent outbreaks of plague.[Graunt, 1662] More recently, public registrations of births and deaths have helped us to count in contemporary populations, and epidemiologists often directly use this information as denominators when establishing the frequency of health events. For example, epidemiology studies can be done using numerators based on causes of death as recorded in death certificates, and with denominators based on census data. Certain diseases, most notably cancer, are recorded in special registries for some populations, and epidemiologists may use these registries to calculate disease numerators, including both fatal and non-fatal cases of disease.[Terracini and Zanetti, 2003]

Public records and special disease registries can be used to monitor rates of disease on a routine basis. However, for many epidemiology studies, readily available data of this kind do not exist and the studies must be designed to count disease numerators and denominators. The counting process can be over some selected group of people (the 50-59 year old men in the previous example), in particular periods of time such as calendar years, or in certain places within the community. The same approach to counting is used for all of these situations involving counting cases of disease, but it is also used when counting is required to establish the numbers of people exposed to particular risk factors, such as the number of people who smoke. Also in clinical applications of epidemiology, counting is used, e.g., the number of patients whose surgery is successful is a numerator, which is related to the denominator of all patients who underwent surgery.

__Uses of count data__

By counting numerators and denominators in various sectors of the population, epidemiologists can calculate the frequency of a disease, to find out who might be at highest risk. They can do this in a number of different ways. For example, they can calculate the percentage of the 50-59 year old men who are defined to be obese at a particular point in time; this type of figure is known as the prevalence of disease. In contrast, sometimes we are interested to find out the rate at which new disease is occurring. For example, we might study the number of new cases of influenza diagnosed each year per every 1000 women who are alive during that year in the population; the denominator is 1000 women-years - importantly, the denominator is not just people, but people multiplied by time This type of calculation leads to an incidence rate. Finally, we may be interested to calculate the probability that an individual will develop a particular disease over the next 10 years, or possibly his or her entire lifetime; the denominator is the number of people by which the follow-up started. For example we might calculate the lifetime risk of breast cancer. When done in this way, the figure is described as a risk or a *cumulative risk*. [Farr, 1838] and to [Miettinen, 1976]

The formal distinction between risks and incidence rates goes back to William Farr’s work between the 1830s and 1850s [Vandenbroucke, 1985]. Farr explained that people were more afraid of cholera than tuberculosis, not because of its ultimate mortality, but because cholera kills more swiftly, i.e. in lesser time: cholera kills in a week, while tuberculosis may take years to kill; thus, the incidence of mortality from cholera is larger. However, of all people with clinical signs and symptoms of cholera, fewer die of that disease than of all people who develop clinical tuberculosis; thus the ultimate risk of cholera is less than that of tuberculosis.[Farr, 1838] Even today, the distinctions that Farr made - far ahead of his time - continue to give rise to confusion.[Vandenbroucke 2004].

Comparisons of these calculated risk and rates over time or between sectors of the population can be very revealing. For example, by calculating the incidence rate of an infectious disease over time, one will often see an epidemic curve in which the disease frequency initially increases quite rapidly over a relatively short period of time, but then gradually declines as more time passes and the epidemic dissipates. Here again, William Farr was probably the first to observe this ‘law’ on a smallpox epidemic in London in the 1830's; an illustration fo a graph of an epidemic of smallpox in the 1870s can be found in [Martin, 1934]. Similarly, examination of long-term trends in disease incidence rates can be used to predict the number of disease cases in future years. Those predictions can be used to assess the impact of preventive interventions such as cancer screening programs, when the observed number of new cases falls markedly short of the predicted number of cases after the institution of preventive measures. This type of information is very important for health care planning, and has enormous economic importance.

To epidemiologists, the major use of count data is to compare frequencies of disease in studies that are designed to elucidate causes of disease, as will be explained in the next essay.