GLOSSARY OF COMMON TERMS IN EPIDEMIOLOGY


A

Accuracy
The degree to which a measurement, or an estimate based on measurements, represents the true value of the attribute that is being measured.
Adjustment
A summarizing procedure for a statistical measure in which the effects of differences in composition of the populations being compared have been minimized by statistical methods. Examples are adjustment by regression analysis and by standardization.
Age Standardization
A procedure for adjusting rates, e.g. death rates, designed to minimize the effects of differences in age composition when comparing rates for different populations.
Analytic Study
A study designed to examine associations, commonly putative or hypothesized causal relationships. An analytic study is usually concerned with identifying or measuring the effects of risk factors, or is concerned with the health effects of specific exposure(s). The common types of analytic study are CROSS-SECTIONAL, COHORT, and CASE-CONTROL. In an analytic study, individuals in the study population may be classified according to the absence or presence (or future development) of specific disease and according to "attributes" that may influence disease occurrence.
Association
Statistical dependence between two or more events, characteristics, or other variables. An association is present if the probability of occurrence of an event or characteristic, or the quantity of a variable, depends upon the occurrence of one or more other events, the presence of one or more other characteristics, or the quantity of one or more other variables. An association may be fortuitous or may be produced by various other circumstances; the presence of an association does not necessarily imply a causal relationship.
Attack Rate
Attack rate, or case rate, is a cumulative incidence rate often used for particular groups, observed for limited periods and under special circumstances, as in an epidemic. The secondary attack rate is the number of cases among contacts occurring within the accepted incubation period following exposure to a primary case, in relation to the total of exposed contacts.
Attributable Fraction (Exposed)
With a given outcome, exposure factor and population, the attributable fraction among the exposed is the proportion by which the incidence rate of the outcome among those exposed would be reduced if the exposure were eliminated.
Attributable Fraction (Population)
With a given outcome, exposure factor, and population, the attributable fraction among the population is the proportion by which the incidence rate of the outcome in the entire population would be reduced if exposure were eliminated.
Attributable Risk
The rate of a disease or other outcome in exposed individuals that can be attributable to the exposure.
Attributable Risk Percent
Attributable fraction expressed as a percentage rather than as a proportion.
B
Bias
Deviation of results or inferences from the truth, or processes leading to such deviation. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth. Examples of specific types of bias are the following:
Ascertainment Bias: Systematic error arising from the kind of individuals or patients that the individual observer is seeing. Also systematic error arising from the diagnostic process.

Design Bias: The difference between a true value and that actually obtained, occurring as result of faulty design of a study.

Detection Bias: Due to systematic errors in methods of ascertainment, diagnosis, or verification of cases in am epidemiologic survey, study, or investigation.

Information Bias: A flaw in measuring outcome or exposure that results in differential quality (accuracy) of information between compared groups.

Measurement Bias: Systematic error arising from inaccurate measurement (or classification) of subjects on the study variables.

Recall Bias: Systematic error due to differences in accuracy or completeness of recall to memory of prior events or experiences.

Reporting Bias: Selective suppression or revealing of information such as past history of sexually transmitted disease.

Response Bias: Systematic error due to difference in characteristics between those who choose or volunteer to participate in a study and those who do not.

Sampling Bias: Unless the sampling method ensures that all members of the "universe" or reference population have a known chance of inclusion in the sample, bias is possible.

Selection Bias: Error due to systematic differences in characteristics between those who are selected for study and those who are not. Selection bias also invalidates generalizable conclusions from surveys that would include only volunteers from a healthy population.

Berkson's Bias: (A special example of selection bias) The set of selective factors that lead hospital cases and controls in a case-control study to be systematically different from one another. This occurs when the combination of exposure and disease under study increases the risk of hospital admission, thus leading to a higher exposure rate among the hospital cases than the hospital controls.

Biological Plausibility
The criterion that an observed, presumably or putatively causal association fits previously existing biological or medical knowledge.
Blind(ed) Study
A study in which observer(s) and/or subjects are kept ignorant of the group to which the subjects are assigned or of the population from which the subjects come.
C
Case
In epidemiology, a person in the population or study group identified as having the particular disease, health disorder, or condition under investigation.
Case-Control Study
A study that starts with the identification of persons with the disease (or other outcome variable) of interest, and a suitable control (comparison, reference) group of persons without the disease. The relationship of an attribute to the disease is examined by comparing the diseased and nondiseased with regard to how frequently the attribute is present, or, if quantitative, the levels of the attribute, in each of the groups.
Causality
The relating of causes to the effects they produce. Most of epidemiology concerns causality.
Chi-Square (X2) Test
Any statistical test based on comparison of a test statistic to a chi-square distribution. The most common chi-square tests are for detecting whether two or more population distributions differ from one another.
Clinical Trial
A research activity that involves the administration of a test regimen to humans to evaluate its efficacy and safety.
Cohort
The term can be used to describe any designated group of persons who are followed or traced over a period of time, as in a cohort study.
Cohort Study
The method of epidemiologic study in which subsets of a defined population can be identified who are, have been, or in the future may be exposed or not exposed, or exposed in different degrees, to a factor or factors hypothesized to influence the probability of occurrence of a given disease or other outcome.
Communicable Disease
An illness due to a specific infectious agent or its toxic products that arise through transmission of that agent or its products from an infected person, animal, or reservoir to a susceptible host, either directly or indirectly through an intermediate plant or animal host, vector, or the inanimate environment.
Community Trial
Experiment in which the unit of allocation to receive a preventive or therapeutic regimen is an entire community or political subdivision.
Confidence Interval
A range of values for a variable of interest, e.g., a rate, constructed so that this range has a specified probability of including the true value of the variable. The specified probability is called the confidence level, and the end points of the confidence interval are called the confidence limits.
Confounding
1. A situation in which the effects of two processes are not separated. The distortion of the apparent effect of an exposure on risk brought about by the association with other factors that can influence the outcome.
2. A situation in which a measure of the effect of an exposure on risk is distorted because of the association of exposure with other factor(s) that influence the outcome under study.
Confounding Variable
A variable that can cause or prevent the outcome of interest, is not an intermediate variable, and is associated with the factor under consideration. Such a variable must be controlled in order to obtain an undistorted estimate of the effect of the study factor on risk.
Control
As used in the expressions case-control study and randomized control(led) trial, "control" means person(s) in a comparison group that differs, respectively, in disease experience or allocation to a regimen, from the subjects of the study.
Controls, Matched
Controls who are selected so that they are similar to the study group, or cases, in specific characteristics. Some commonly used matching variables are age, sex, race, and socioeconomic status.
Correlation Coefficient
A measure of association that indicates the degree to which two variable have a linear relationship.
Cross-Sectional Study
A study that examines the relationship between disease (or other outcome) and other variables of interest as they exist in a defined population at one particular time. The presence or absence of disease and the presence or absence of the other variables (or, if they are quantitative, their level) are determined in each member of the study population or in a representative sample at one particular time. The relationship between a variable and the disease can be examined (1) in terms of the prevalence of disease in different population subgroups defined according to the presence or absence (or level) of the variables and (2) in terms of the presence or absence (or level) or the variables in the diseased versus the nondiseased.
Cumulative Incidence
The number or proportion of a group of people who experience the onset of a health-related event during a specified time interval.
D
Descriptive Study
A study concerned with and designed only to describe the existing distribution of variables, without regard to causal or other hypotheses.
Dose-response Relationship
A relationship in which a change in amount, intensity, or duration of exposure is associated with a change in risk of a specified outcome.
E
Ecologic Fallacy
The bias that may occur because an association observed between variables on an aggregate level does not necessarily represent the association that exists at an individual level.
Ecologic Study
A study in which the units of analysis are populations or groups of people, rather than individuals.
Effect Measure
A quantity that measures the effect of a factor on the frequency or risk of a health outcome. Three such measures are attributable fractions, which measure the fraction of cases due to a factor; risk and rate differences, which measure the amount a factor adds to the risk or rate of a disease; and risk and rate ratios, which measure the amount by which a factor multiplies the risk or rate of disease.
Effect Modifier
A factor that modifies the effect of a putative causal factor under study. For example, age is an effect modifier for many conditions. Effect modification is detected by varying the selected effect measure for the factor under study across levels of another factor.
Epidemic
The occurrence in a community or region of cases of an illness, specific health-related behavior, or other health-related events clearly in excess of normal expectancy.
Epidemiology
The study of the distribution and determinants of health-related states or events in specified populations, and the application of this study to control of health problems.
Epidemiology, Descriptive
Study of the occurrence of disease or other health-related characteristics in human populations. The major characteristics in descriptive epidemiology can be classified under the headings: person, place, and time.
Error, Type I
The error of rejecting a true null hypothesis.
Error, Type II
The error of failing to reject a false null hypothesis.
Exposed
In epidemiology, the exposed group is often used to connote a group whose members have been exposed to a supposed cause of a disease or health state of interest, or possess a characteristic that is a determinant of the health outcome of interest.
F
False Negative
Negative test results in a subject who possesses the attribute for which the test is conducted. The labeling of a diseased person as healthy when screening in the detection of disease.
False Positive
Positive test result in a subject who does not possess the attribute for which the test is being conducted. The labeling of a healthy person as diseased when screening in the detection of disease.
Follow-up
Observation over a period of time of an individual, group, or initially defined population whose appropriate characteristics have been assessed in order to observe changes in health status or health-related variables.
Follow-up Study
A study in which individuals or populations, selected on the basis of whether they have been exposed to risk, received a specified preventive or therapeutic procedure, or possess a certain characteristic, are followed to assess the outcome of exposure, the procedure, or effect of the characteristic.
G
"Gold Standard"
A jargon term, used to describe a method, procedure, or measurement that is widely accepted as being the best available. Often used to compare with new methods.
H
Historical Cohort Study
A cohort study conducted by reconstructing data about persons at a time or times in the past. This method uses existing records about the health or other relevant aspects of a population as it was at some time in the past and determines the current (or subsequent) status of members of this population with respect to the condition of interest.
I
Incidence Rate
The rate at which new events occur in a population. The numerator is the number of new events that occur in a defined period; the denominator is the population at risk of experiencing the event during this period, sometimes expressed as person-time.
Intervention Study
An epidemiologic investigation designed to test a hypothesized cause-effect relationship by modifying a supposed causal factor in a population.
L
Least Squares
A principle of estimation, due to Gauss, in which the estimates of a set of parameters in a statistical model are those quantities that minimize the sum of squared differences between the observed values of the dependent variable and the values predicted by the model.
Likelihood Function
A function constructed from a statistical model and a set of observed data, which gives the probability of the observed data for various values of the unknown model parameters. The parameter values that maximize the probability are the maximum likelihood estimates of the parameters.
Linear Model
A statistical model in which the value of a parameter for a given value of a factor, x, is assumed to be equal to a + bx, where a and b are constants.
Linear Regression
Regression analysis of data using linear models.
Logistic Model
A statistical model of an individual's risk (probability of disease y) as a function of a risk factor x:
P(y³x) = 1
1 + e-a-bx

where e is the (natural) exponential function. This model has a desirable range, 0 to 1, and other attractive statistical features. In the multiple logistic model, the term bx is replaced by a linear term involving several factors, e.g. b1x1 + b2x2 if there are two factors x1 and x2.

Logit
The logarithm of the ratio of frequencies of two different categorical outcomes such as healthy versus sick.
Logit Model
A linear model for the logit (natural log of the odds) of disease as a function of a quantitative factor X:
Logit (disease given X = x) = a + bx

This model is mathematically equivalent to the logistic model.

M

Mantel-Haenszel Test
A summary CHI-SQUARE TEST developed by Mantel and Haenszel for stratified data and used when controlling for confounding.
Matching
The process of making a study group and a comparison group comparable with respect to extraneous factors. Frequency matching requires that the frequency distributions of the matched variable(s) be similar in study and comparison groups. Individual matching relies on identifying individual subjects for comparison, each resembling a study subject on the matched variable(s).
Measure of Association
A quantity that expresses the strength of association between variables. Commonly used measures of association are differences between means, proportions or rates, the rate ratio, the odds ratio, and correlation and regression coefficients.
Meta-Analysis
The process of using statistical methods to combine the results of different studies. A frequent application has been the pooling of results from a number of small randomized controlled trials, none in itself large enough to demonstrate statistically significant differences, but in aggregate, capable of so doing.
Misclassification
The erroneous classification of an individual, a value, or an attribute into a category other than that to which it should be assigned. The probability of misclassification may be the same in all study groups (nondifferential misclassification) or may vary between groups (differential misclassification).
N
Natural History of Disease
The course of a disease from onset (inception) to resolution. Many diseases have certain well-defined stages that, taken all together, are referred to as the "natural history of the disease" in question.
Nested Case - Control Study
A case control study in which cases and controls are drawn from the population in a cohort study.
Nonparticipants
Members of a study sample or population who do not take part in the study for whatever reason, or members of a target population who do not participate in an activity. Differences between participants and nonparticipants have been demonstrated repeatedly in studies of many kinds, and this is often a source of bias.
Null Hypothesis
The statistical hypothesis that one variable has no association with another variable or set of variables, or that two or more population distributions do not differ from one another. In simplest terms, the null hypothesis states that the results observed in a study, experiment, or test are no different from what might have occurred as a result of the operation of chance alone.
O
Observational Study
Epidemiologic study in situations where nature is allowed to take its course; changes or differences in one characteristic are studied in relation to changes or differences in other(s), without the intervention of the investigator.
Odds
The ratio of the probability of occurrence of an event to that of nonoccurrence, or the ratio of the probability that something is so, to the probability that it is not.
Odds Ratio
The ratio of two odds. The exposure-odds ratio for a set of case-control data is the ratio of the odds of exposure in the cases to the odds of exposure among noncases. The disease-odds ratio for a cohort or cross-sectional study is the ratio of the odds of disease among the exposed to the odds of disease among the unexposed.
Outcomes
All the possible results that may stem from exposure to a causal factor, or from preventive or therapeutic interventions.
P
P, P (Probability) Value
The probability that a test statistic would be as extreme as or more extreme than observed if the null hypothesis were true. The letter P, followed by the abbreviation n.s. (not significant) or a number is a statement of the probability that the difference observed could have occurred by chance, if the groups are really alike.
In most biomedical and epidemiologic work, a study result whose probability value is less than 5% (P<0.05) or 1% (P<0.01) is considered sufficiently unlikely to have occurred by chance to justify the designation "statistically significant."
Person-Time
A measurement combining persons and time, used as denominator in person-time incidence and mortality rates. It is the sum of individual units of time that the persons in the study population have been exposed to the condition of interest.
Population Attributable Risk
Term used by some in preference to the terms "attributable fraction (population)" or "etiologic fraction (population)." It is the incidence of a disease in a population that is associated with (attributable to) exposure to the risk factor.
Population Attributable Risk %
The attributable fraction in the population, expressed as a percentage.
Population Based
Pertaining to a general population defined by geopolitical boundaries; this population is the denominator/sampling frame.
Precision
The quality of being sharply defined or stated. Precision does not imply accuracy.
Predictive Value
In screening and diagnostic tests, the probability that a person with a positive test is a true positive (does have the disease) is referred to as the "predictive value of a positive test." The predictive value of a negative test is the probability that a person with a negative test does not have the disease.
Prevalence
The number of instances of a given disease or other condition in a given population at a designated time.
R
Randomization
Allocation of individuals to groups, e.g., for experimental and control regimens, by chance. Within the limits of chance variation, randomization should make the control and experimental groups similar at the start of an investigation.
Randomized Controlled Trial
An epidemiologic experiment in which subjects in a population are randomly allocated into groups, usually called "study" and "control" groups, to receive or not receive an experimental preventive or therapeutic procedure or intervention.
Random Sample
A sample that is arrived at by selecting sample units such that each possible unit has a fixed and determinate probability of selection.
Rate
A rate is a measure of the frequency of a phenomenon. In epidemiology, demography, and vital statistics, a rate is an expression of the frequency with which an event occurs in a defined population; the use of rates rather than raw numbers is essential for comparison of experience between populations at different times, different places, or among different classes of persons.
Rate Difference
The absolute difference between two rates.
Rate Ratio
The ratio of two rates. In epidemiologic research, the term refers to the ratio of the rate in the exposed population to the rate in the unexposed population.
Relative Risk
The ratio of the risk of disease or death among the exposed to the risk among the unexposed; this usage is synonymous with risk ratio.
Reliability
The degree of stability exhibited when a measurement is repeated under identical conditions. Reliability refers to the degree to which the results obtained by a measurement procedure can be replicated.
Repeatability
A test or measurement is repeatable if the results are identical or closely similar each time it is conducted.
Retrospective Study
A research design that is used to test etiologic hypotheses in which inferences about exposure to the putative causal factor(s) are derived from data relating to characteristics of the persons under study or to events or experiences in their past.
Risk
The probability that an event will occur, e.g., that an individual will become ill or die within a stated period of time or age.
Risk Factor
An aspect of personal behavior or lifestyle, an environmental exposure, or an inborn or inherited characteristic, which on the basis of epidemiologic evidence is known to be associated with health- related condition(s) considered important to prevent.
Risk Ratio
The ratio of two risks.
S
Sample
A selected subset of a population. A sample may be random or nonrandom and may be representative or nonrepresentative.
Sampling
The process of selecting a number of subjects from all the subjects in a particular group or "universe." Conclusions based on sample results may be attributed only to the population sampled.
Sensitivity and Specificity
(of a screening test) Sensitivity is the proportion of truly diseased persons in the screened population who are identified as diseased by the screening test. Sensitivity is a measure of the probability of correctly diagnosing a case. Specificity is the proportion of truly nondiseased persons who are so identified by the screening test. It is a measure of correctly identifying a non-diseased person with a screening test.
Sequential Analysis
A statistical method that allows an experiment to be ended as soon as an answer of the desired precision is obtained.
Socioeconomic Classification
Arrangement of persons into groups according to such characteristics as prior education, occupation, and income. This usually reveals upon analysis a strong correlation with health-related characteristics such as average length of life and risk of dying from certain specific causes.
Socioeconomic Status (SES)
Descriptive term for a person's position in society.
Standard Error
The standard deviation of an estimate.
Standardization
A set of techniques used to remove as far as possible the effects of differences in age or other confounding variables, when comparing two or more populations. The common method uses weighted averaging of rates specific for age, sex, or some other potential confounding variable(s) according to some specific distribution of these variables.
Statistical Significance
Statistical methods allow an estimate to be made of the probability of the observed or greater degree of association between independent and dependent variables under the null hypothesis. Usually the level of statistical significance is stated by the P value.
Stochastic Process
A process that incorporates some element of randomness.
Stratification
The process of or result of separating a sample into several subsamples according to specified criteria such as age groups, socioeconomic status, etc. The effect of confounding variables may be controlled by stratifying the analysis of results.
Surveillance
Ongoing scrutiny, generally using methods distinguished by their practicability, uniformity, and frequently their rapidity, rather than by complete accuracy. Its main purpose is to detect changes in trend or distribution in order to initiate investigative or control measures.
T
Target Population
The collection of individuals, items, measurements, etc., about which we want to make inferences. The term is sometimes used to indicate the population from which a sample is drawn and sometimes to denote any "reference" population about which inferences are required.
V
Validity, Study
The degree to which the inference drawn from a study, especially generalizations extending beyond the study sample, are warranted when account is taken of the study methods, the representativeness of the study sample, and the nature of the population from which it is drawn.
Internal Validity: The index and comparison groups are selected and compared in such a manner that the observed differences between them on the dependent variables under study may, apart from sampling error, be attributed only to the hypothesized effect under investigation.
External Validity: A study is externally valid or generalizable if it can produce unbiased inferences regarding a target population (beyond the subjects in the study).
Vital Statistics
Systematically tabulated information concerning births, marriages, divorces, separations, and deaths based on registrations of these vital events.
Source: Last, JM., A Dictionary of Epidemiology, 2nd Edition Oxford University Press.