Methodology of case-control studies in the epidemiology of multidrug-resistant tuberculosis

M. Vaquero, J. Gutiérrez and M.J. Casal

Mycobacteria Reference Center, Faculty of Medicine, University of Cordoba, Spain.


Tuberculosis (TB) is the largest cause of death worldwide due a single infectious agent. It has been estimated to be responsible for around 7% of all deaths and 26% of preventible deaths throughout the world. Most of these deaths are among young adults. The emergence of the HIV epidemic in the last 10 years has not only increased the problem of TB, but has also raised questions over the adequacy of current diagnostic, treatment and prevention methods. Furthermore, the increase in multidrug-resistant TB, with high death rates especially among HIV-infected patients, has revealed the shortcomings in current control and prevention methods (1).


From a bacteriological point of view, a population of Mycobacterium tuberculosis is resistant if 1% or more of the microorganisms multiply and develop in the presence of a specific concentration of a drug. Drug-resistant TB occurs when the number of bacilli resistant to the drug exceeds that of the bacilli susceptible to it. This occurs because of the selection and multiplication of resistant mutants which arises from inappropriate treatment. From a clinical point of view, there are two types of resistance to drugs: primary resistance and acquired (or secondary) resistance. Primary resistance is found in people who have never been treated for TB and who are infected with resistant microorganisms. Acquired resistance develops during treatment for TB, either because the patient has been treated with an inadequate regimen or because he or she did not correctly follow the regimen prescribed.

Multidrug-resistant TB is defined as the in vitro resistance of a strain of M. tuberculosis to two or more antituberculous drugs. From the clinical perspective, resistance to isoniazid and rifampicin is the most important pattern of resistance as it has been linked to a much lower cure rate than in drug-susceptible TB (2).

The World Health Organization (WHO) provides regular information on resistance to drugs for TB at a worldwide level. The overall trend is stable and even shows a slight decline if we look at initial resistance to one or more drugs. However, the increase in initial resistance to rifampicin and multidrug resistance in some countries are a cause for concern (3).

With regard to acquired resistance, which indicates that earlier treatment has failed, the available global reports show that this type of resistance has decreased over the last 30 years (3). The two main reasons for the development of clinical drug resistance are the lack of compliance with prescribed treatment and the use of inadequate treatment regimens.

The emergence of drug-resistant bacilli can be prevented by treating TB with a combination of two or more drugs. During the early phase of treatment, combined therapy is extremely important as this is when the bacterial population is at its highest (4). Antituberculous drugs vary in their ability to prevent the appearance of resistance to other drugs; the most effective seem to be isoniazid and rifampicin. In order to prevent the emergence of resistance when other comparatively weaker drugs are used, a combination of three drugs is required. Later, during continued treatment, combined therapy is of lesser importance because the level of the bacillary population is lower. Resistance to drugs is much less likely to occur in most types of extrapulmonary TB, which have much smaller populations of bacilli.

Drug resistance occurs more frequently in people living in parts of the world where the prevalence of drug-resistant TB is high (Southeast Asia, Latin America, Haiti and the Philippines). A number of studies have found high rates of primary drug resistance among these people, suggesting that drug-resistant TB is transmitted in the country of origin. Nevertheless, some patients treated previously may be mistakenly classified as cases of primary resistance, since it can be difficult to obtain an accurate history of their treatment.

Multidrug-resistant TB was reported shortly after the introduction of antituberculous drugs, but it was not until 1970 that outbreaks of this kind of TB were seen to be important. Between 1970 and 1990, there were microepidemics of TB resistant to isoniazid, as well as to several drugs; they affected a few people with close contact who had had prolonged or repeated exposure to probands.

From 1990 to October 1992, the Centers for Disease Control and Prevention (CDC) investigated epidemic outbreaks of TB resistant to numerous drugs in hospitals in Florida, New York and New Jersey, and in the penitentiary system in the state of New York.

In recent epidemics, the transmission has been from patient to patient, and from patient to health-care worker. Nosocomial transmission tests were confirmed by data on the haplospecific configuration of the bacteria's DNA: strains in cases linked from the epidemiological point of view had identical patterns according to the restriction fragment length polymorphism (RFLP) technique. There is no evidence that people infected with HIV are more likely to be infected by the TB bacillus if they are exposed to it. However, once infected, patients who have HIV infection are at a much greater risk of developing active disease than people who are not. In addition, TB infection may progress rapidly towards active disease. In these outbreaks, cases of active TB developed several weeks after exposure to the disease. These patients themselves went on to become a source of transmission, meaning that in a short space of time they caused multiple cases.

Prolonged infectiousness also aids transmission. The diagnosis of TB was delayed in people infected with HIV due to radiography patterns not typical of TB, coinfection with other pulmonary pathogens to which patients' symptoms were attributed, and to the excessive isolation in the laboratory of mycobacteria other than M. tuberculosis. Slow diagnosis led to delays in starting treatment and in taking steps to control the source of infection. In addition, resistance to antituberculous drugs was reported late, making effective treatment difficult and prolonging infectiousness.

One important risk factor in drug resistance is prior treatment with antituberculous medication. In a study conducted in New York by Frieden et al., the most reliable predictor of the presence of drug-resistant microorganisms was a history of previous treatment (5). Other studies have shown that drug-resistance rates increase with the duration of prior treatment. In most cases, drug resistance linked to prior treatment occurs because the treatment was inadequate (6). Furthermore, previous treatment has not been linked with drug resistance when short treatment regimens have been adequately supervised (7). Bacterial relapses after completion of supervised short-course chemotherapy with regimens containing both isoniazid and rifampicin are usually due to microorganisms susceptible to the drugs. Such patients respond well to a repeat of the previous treatment (8).

Another risk factor in multidrug-resistant TB is being in contact with a person who has infectious drug-resistant TB (9). Recent nosocomial outbreaks show a clear link between prior exposure to a patient who has infectious TB resistant to numerous drugs and the subsequent development of TB which is also resistant to the same drugs (10). Moreover, people previously treated for TB susceptible to antituberculous drugs may be reinfected with strains resistant to these drugs (11).


Descriptive epidemiology studies do not explain the causes of diseases: they provide us with ideas of how they spread. In order to determine whether certain characteristics of a person, time and place give rise to health problems, we require analytical studies based on numerous comparisons. Analytical studies form the basis of etiological research and are useful in helping us determine the risk factors in multidrug-resistant TB.

According to the general definition of cause, a "cause" (etiological factor) must be modifiable, and its modification must produce changes in its effect (patients' state and propagation of the disease).

The causes of disease are often called risk factors. This term is frequently misused as it represents only part of the risk characteristics (i.e., any variable that may be associated with a high probability of disease). Grundy (12) perceives two elements in risk characteristics: risk markers (age, gender, etc.) which cannot be changed, but which nevertheless mean there is a high probability of disease in the future; and risk factors which are modifiable (smoking, diet, etc.).

In statistical terms, in keeping with the cause and effect sequence, epidemiology studies explore the relationships between independent variables (factors or risk markers, causes, etc.) and dependent variables (consequences, diseases, etc.). Identifying direct causes remains one of the major challenges in etiological research in medicine.

Etiological constellations or clusters of causes may appear at a given moment or in a particular sequence. If more than one "cause" is required to produce an effect, the "minimum sufficient causes" must be known. For example, an epidemic of multidrug-resistant TB is not due solely to the presence of isoniazid and rifampicin-resistant M. tuberculosis in the environment, but to inappropriate treatment, prolonged infectiousness, etc.

The most commonly used studies in epidemiology to determine the causes of multidrug-resistant TB are case-control studies. A case-control study may be defined as the comparison of earlier frequency of exposure (to a disease risk factor, protection factor or attribute) in at least two groups of subjects selected on the basis of their status with respect to a particular disease or condition. People with the disease (cases) and people without the disease (controls) are studied and compared in relation to attributes or exposure in the present or past that may be relevant to the development of the condition or disease (13). Instead of numerous measurements of the state of health and exposure, as required in cohort studies (at least two – one at the start and one at the end of a defined period), case-control studies are based on a single identification of the state of health and exposure in the past or present for each individual being studied. In terms of direction, a case-control study begins at the end of the natural history of the disease and "looks back" towards the exposure that might have given rise to the consequence (disease) being studied.

Generally, fewer individuals are analyzed than in a cohort study. A limited number of cases is usually defined and a control group is chosen that excludes cases (one or more controls per case). Exposure or lack of exposure is determined for all participants, and estimates of the strength and specificity of the causal link are obtained, while also saving time and resources.


Fundamental principles

It is not possible to calculate individual risks for exposed and unexposed subjects in a case-control study. Unfortunately, these studies are almost always based on prevalence (not incidence) and the denominators are ambiguous or unknown. Nevertheless, most of the studies of this type applied to multiresistance work with the number of incidences of tuberculosis reported in a period of time.

The best case-control study is one which has correctly neutralized systematic bias or errors (14); these errors may be derived from the selection of the cases and controls, from the gathering of information and the skewing that may be caused by confounding factors. Wacholder et al. (15) indicate four basic principles to be kept in mind when designing a case-control study: the basis of the study – to reduce bias in selection; the comparability of the measurement of exposure – to reduce bias in the information; "deconfounding" – to control the effect of other confounding factors; and efficiency – related to the appropriate use of time and resources.

Sample, case and control sources

One of the first problems that emerges when designing a case-control study is defining the population. Both the criteria and the source for selecting the subjects in the study must be fully representative of this base population. There are various sources that can be used for selecting cases and controls, including base population records, such as tumor records, records of epidemiological vigilance systems or vital statistics (deaths), and records from hospitals, primary health care centers, companies, insurance companies, etc.

Including individuals randomly selected from population records prevents bias in the selection and makes it possible to calculate the risk attributable to factors being studied in the population. The drawbacks are that this usually takes longer, it is more expensive to locate, identify and obtain information (16), and individuals selected in this manner are less likely to cooperate.Defining and selecting cases.

Defining and selecting cases

The study population may include all the cases that occurred in a particular period of time, or a sample may be selected when the number of cases is very large. Cases of TB that develop multiresistance within a period of time are selected for multidrug-resistant TB studies (17-19). Perfectly valid case-control studies can also be done by restricting cases by age, gender, place of residence or any other condition or characteristic of interest (20).

Suppose that we wish to conduct a case-control study to verify the association between the risk of multidrug-resistant TB and noncompliance with treatment. The cases could include all those of multidrug-resistant TB that occurred in a sample in a given year, or we could restrict it to cases in a district within the city, to those that occurred in penitentiary institutions, etc. Some authors (13) have advocated a case-restriction criterion based on the principle of the possibility of exposure: to restrict the study only to certain cases of multidrug-resistant TB (20-22).

The aim of this restriction would be to prevent a weakening of the estimated association and a decrease in the accuracy of the effect. In reality, however, this restriction criterion is related to efficiency: it reduces the number of subjects and interviews and reduces the cost of the study without affecting its validity, weakening the association or distorting the value of the estimated effect (23).

One important aspect regarding the determination of cases to be included in the study is the condition of the patient. It is possible to include only incident cases (i.e., recently diagnosed cases) or prevalent cases (cases already existing at the start of the study; 24), living cases or dead cases. The best approach is to take only incident cases and living subjects at the time of the interview. It should be kept in mind that incident cases have a better recollection of their past experience as it is more recent. Their survival is not affected by risk factors, as may occur in prevalent cases, and it is less likely that the disease will modify the exposure being studied. Prevalent cases may be included in the study in certain special situations, when there are not enough cases, and above all when dealing with uncommon diseases. It may be justifiable to include dead cases in the study in special circumstances, depending on the aims of the study and the sources of information used. When collecting detailed information on a subject's past, it should be remembered that if the source of information is the family, they will not know the details as well as the subject him or herself (25). If, however, the subject's clinical history or company records are used as a source of information, it is of no great importance whether the person is alive or dead (26).

It is also important to consider the criteria for defining cases. Published case-control studies on multiresistance establish clinical criteria (20) and bacteriological criteria that include the identification of the microorganism responsible for the process, tests on susceptibility to antimicrobials (27, 28) and even genetic constitution (18, 29). The diagnostic criteria used will affect whether or not false-positive cases are included in the study. The more false-positive cases are included in an etiological study on risk factors, the more the estimated association will be weakened. In the case of multidrug-resistant TB, the cases would be TB patients with a strain of M. tuberculosis resistant to at least isoniazid and rifampicin, although clinical and/or therapeutic criteria that classify the multiresistance as primary or acquired may be added (28).

Definition and selection of controls

Selecting controls has been seen as one of the most difficult issues in a case-control study. The most appropriate controls must be carefully defined for each study. The idea that controls must be similar to cases in all aspects except for not having the disease is not correct (30).

The first requirement in selecting a control is that he or she must come from the same population as the case. A practical aspect to bear in mind is that subjects selected as controls must represent subjects that may turn into cases in the study (17, 26, 27). The number of those exposed to one or more of the risk factors studied in unhospitalized cases may be different from the number of those exposed to the same risk factors among the individuals comprising the population base that the cases come from. As a result, inclusion and exclusion criteria are generally used for controls that will prevent the under- or overrepresentation of conditions associated with the risk factors being studied and which might distort the representative nature of the controls. A list of inclusion and exclusion diagnostic conditions may also be established.

It is important to consider that the probability of an admitted subject being selected as a control is not connected to exposure, i.e, it does not depend on a positive or negative association with the factors being studied.

The advantages and disadvantages of using healthy or sick subjects as controls must also be considered. Sick subjects may have a better recollection of past events related to their disease. The researcher has a number of alternatives when it comes to selecting controls. One possibility is to conduct an unpaired study (27, 31) by obtaining a random sample of controls from the base population of the study, regardless of the characteristics of the cases. Another option is to conduct a paired study by selecting controls in accordance with one or more characteristics shared among the cases (26), such as age group, gender, etc. Pairing may be one-to-one (each control is paired to one case) or on a group basis (known as frequency pairing).

Pairing has several advantages (32): it increases efficiency (there is more useful information for analysis); it is useful in controlling known confounding factors; and it also protects against unknown or unmeasured confounding factors. When, for example, the distribution of cases by age is very diverse, pairing controls by age can ensure that the number of controls per case will be similar in all age groups, which will be useful for evaluating an effect in the different groups. When pairing is done using variables related to time (e.g., year of TB relapse, year of death, etc.), it makes it easier to compare the cases and controls in relation to exposures that vary in time.

One drawback is that pairing sometimes makes it more difficult to identify and obtain controls that meet the requirements (it takes longer or is more expensive). In addition, pairing also reduces the analysis possibilities as it is impossible to analyze the risk effect of a pairing variable because it is, by definition, similar in both the cases and the controls. It is, however, possible to measure the modification of the effect of exposure produced by the pairing variable. The variables most commonly recommended or to which pairing is applied are age, gender and race (21).

In multidrug-resistant TB, the controls may be TB patients in whom M. tuberculosis susceptible to antituberculous drugs and/or strains resistant to a single antituberculous agent have been isolated and identified. In our experience, pairing is more complicated in multicenter studies due to the difficulties of obtaining controls with the same age and gender characteristics as the cases.

Size of study, accuracy and power

In the design phase of a case-control study, the size of the study must be determined, i.e., the number of cases and the number of controls to be included. The study must be of sufficient size to prevent two types of errors: a or type I error and ß or type II error. The a error expresses the probability of concluding that there is a link to exposure when in reality there is not. ß error is defined as the probability of not detecting a real effect (the results of the study find there is no link when in fact there is).

The power of a study (in relation to size) is its capacity to reveal a link when this link really exists. The power is equal to the unit minus ß error: the higher the ß error, the lower the power. Accuracy is related to the reduction of random errors or errors caused by chance. Increasing the size of the study reduces the probability of there being an error due to chance. The larger the study, the greater the accuracy of the estimates and the lower the probability of finding false results due to chance.

However, increasing the size of a study does not increase its validity and may in fact reduce its efficiency. Validity is determined by the absence of systematic errors (in selection, information and confounding). Accuracy is linked to the exactness of the estimate of the effect, but not to validity. It is possible to conduct a very large study with limited validity, while a small study may be highly valid.

The elements that define the size of a study are the level of preestablished statistical significance (a error or type I error); the desired power; the expected number of exposed controls (the prevalence of exposure in the absence of disease); the magnitude of the expected effect (of the relative risk); and the ratio between the number of cases and the number of controls (33).

There are formulae for calculating the size of the study and there are also tables available (13) that make it possible to estimate size according to these parameters. It is also possible to work out the degree of power of a study given a number of cases and controls, an effect magnitude and a determined level of exposure among the controls. In general terms, the greater the magnitude of the effect and the higher the proportion of exposed controls, the smaller the study size required.

One of the components that defines the size of the study is the ratio of cases to controls, or in other words, the number of controls per case, which determines the total number of individuals involved in the study. This is of practical importance because when the number of available cases is relatively low, more controls per case may be included, thus maintaining the power of the study at the same level.

There are mathematical formulae that demonstrate that more than four controls per case gives only a minimal gain in power, meaning that it is advisable to include two, three and even four controls per case, but no more. Logically, when the number of available cases and controls for a study is high, and the cost and the difficulties in selecting, locating and interviewing them are similar, it is more reasonable to conduct the study with one control for each case.

The studies reviewed vary in sample size. They all use a small number of cases, but the number of controls can be similar to the number of cases, the total number of TB cases reported (18) or the total number of cases identified for which susceptibility to antimicrobials was tested (28).

Information gathering

The basic information in a case-control study is the past exposure of both the cases and the controls. The tools and markers used to define a category or level of exposure may vary in sensitivity and specificity.

The new fields of molecular biology and the various laboratory techniques to measure exposure are of great importance in the future of epidemiology (17, 18, 29). The biological indicators of exposure open up new horizons that may have a significant impact on the study of numerous risk factors, although they still raise many problems in the interpretation, validity and receptivity of the results, as well as in the standardization of laboratory techniques.

Classic tools used as sources of information include records (e.g., clinical histories, company records, insurance company records, etc.) and special questionnaires designed for the study. Questionnaires have been and remain the most common and useful means of gathering information.

Questionnaires for finding out about risk factors in resistance to antituberculous drugs may include open or closed questions. Closed questions may be more uniform and precise and may be easier to code and record. Questionnaires should be completed during an interview in person or over the phone, or may be completed by individuals and sent in by post. Self-administered questionnaires are easy to use with a large number of people, they reduce costs because there is no need for an interview, and they also reduce the amount of time required to gather the information. They also eliminate bias that the interviewer may introduce, but they can generate a very high proportion of "no response" answers and it is not possible to be certain about who really answered the questions.

The information to be gathered, the system used to collect it, the interview method and the structure of the questionnaire should be defined in the study design.

One of the classic limitations of case-control studies is that by studying the existence of exposure in the past, it may be difficult to obtain accurate, detailed information. Depending on the information to be obtained, it is possible that the subject may not remember distant events in detail. In order to avoid this, information from the recent past is gathered with the assumption that it is representative of the distant past.

The most important thing to bear in mind is that if there is an error in the information on exposure, both the cases and the controls are affected in a similar manner. This is the so-called nondifferential error in classifying exposure (34), which is always an underestimate of the effect of exposure. In contrast, if no effect is observed, this should always lead us to think that the negative result may be the consequence of the nondifferential error in classifying exposure.

The information gathered will depend on the aims of the study and on the risk factors being researched. The information should not only cover the risk factors being studied, but also other variables which may modify the effect or which may be potential confounding factors to be controlled in the analysis.

It is advisable to conduct a prior study on the validity of the tools or to use tools that have already been validated. In addition, conducting a pilot study on a small sample of subjects with characteristics similar to those of the overall study will make it possible to check the feasibility and viability of the information gathering techniques. A pilot study is essential when questionnaires and interviewers are used.

To avoid the "halo" effect in case interviews, the same interviewers must be used for the controls so that any systematic error introduced by an interviewer affects both the cases and the controls in a similar way. In addition, interviews with cases must be held at the same time as interviews with controls in order to eliminate any influence due to changes in the season or weather. It is also better to ensure that, in as much as possible, interviewers are unaware of the hypothesis of the study and of whether they are interviewing cases or controls so that they have no influence on the gathering of the information.


Before starting the analysis phase of the study, it is necessary to check the quality of the information gathered through the questionnaires, and then codify it. In addition, a study of all the variables should initially be conducted, the minimum, maximum and aberrant values should be checked and the data rectified before beginning the analysis.

Measuring the effect

There are two possible measures of the effect of exposure in a case-control study: the relative measure and the absolute measure.

The relative measure of the effect is indicated by the RR, which measures the strength of the association, i.e., it measures how many times the disease (or effect being studied) is more frequent among the exposed than the unexposed. The RR is the rate ratio or relative rate as it expresses the ratio between the rate of incidence among the exposed in comparison with that among the unexposed.

Table1. Measuring effect in case control studies. 2 x 2 table for calculing the odds ratio (OR).

  Yes MDRTB= No MDRTB=    
  cases controls    
to the variable
a b a+b  
No Exposure
to the variable
c d c+d  
  a+c b+d N  
  ODDS exposure given the disease
OR =
  ODDS exposure in absence of disease
  (a/a + c)/(c/a + c)   a / c   a x d
OR =
  (b/b + d)/(d/b + d)   b / d   b x c
MDRTB = multidrug - resistant tuberculosis
  Exposed rate of incidence
RR =
  Unexposed rate of incidence

In a case-control study, we do not usually have the information necessary to enable us to calculate the incidence rates, so we calculate the odds ratio (OR), which is considered to be a good estimator of RR when the duration of the disease is not associated with exposure.

In such a study, the information that is analyzed in relation to a variable is that given in a simple 2 ´ 2 table (Table 1). As shown in Table 1, "yes" and "no" (the total numbers of cases and controls) are previously known figures because they are defined in the design. The estimated unknown values are "a" and "c" ("b" and "c" are only the complement); "a" represents the probability of exposure in the cases, while "c" represents the probability of exposure in the controls.

Possibility is defined as a probability divided by its complement (possibility = P/1-P) and expresses the probability that an event will occur divided by the probability that the event will not occur.

Possibility =

The OR expresses the probability of exposure according to disease divided by the probability of exposure in the absence of disease.

  Probability of exposure according to disease
OR =
  Probability of exposure in absence of disease

The formula for this ratio can be deduced and simplified, and represents a similar expression to the possibility of disease among those exposed, divided by the possibility of disease among those not exposed (the equivalent to the incidence rates ratio).

It must be remembered that the OR is a value that has no unit: it is simply a ratio. Its minimum value may be zero (the maximum possible effect of a protection factor) and there is no limit to its maximum value. An OR of 1 signifies that the effect studied is nonexistent as there is no difference in risk associated with exposure, i.e., exposure neither increases nor reduces risk. An OR of 0.5 indicates that exposure reduces the risk by 50%.

Depending on the type of exposure, the OR is generally calculated in accordance with whether the sample is or has been affected or unaffected by the exposure in question. It is also calculated in accordance with specific categories of exposure, which may represent different degrees of strength or periods of time of exposure. The odds ratios can easily be calculated for each category or level of exposure.

Published case-control studies on multidrug-resistant TB found higher OR values for the following risk factors: AIDS [(OR = 20.2 (17), 1.6 (28), 4.5 (24) and 3.4 (35)]; noncompliance with treatment [(OR = 19.7 (17) and 3.6 (36)]; gastrointestinal symptoms [(OR = 11.5 (17)]; prolonged duration of disease [(OR = 2.7 (31)]; immigration [(OR = 4.8 (28), 5.5 (20), 4.5 (18) and 2.1 (31)]; and age group [(OR = 2.2 (28), 2.9 (36) and 2.1 (31)].

The OR calculation in a paired study will be different to that in an unpaired study. The analysis is conducted in pairs (if the study has one control for each case) or in case groups case and its matching controls. The case and control groups that match up in terms of exposure or nonexposure are not included in the analysis. As can be seen in Table 2, the OR calculation only includes unmatched groups where the case is exposed and the control is not, or where the case is not exposed and the control is.

In general, if the design has included pairing, the analysis must respect this. When the variable used to produce the pairing is associated with exposure, ignoring the pairing in the analysis leads to an underestimate of the RR; a paired analysis is therefore obligatory. In contrast, if the pairing variable is not associated with exposure, i.e. it is not a true confounding factor, paired analysis may be unnecessary, and if conducted, may even reduce the accuracy of the OR.

The absolute estimate of the effect in a case-control study is measured by the etiological fraction (37), also known as the attributable risk percentage (38). It expresses the number of cases that may be attributable to exposure or, in other words, the number of cases that could have been prevented if exposure had not occurred. It is, therefore, one of the principal measures that can be used in the public health sphere. It expresses the extent to which a specific form of exposure is a public health problem.

Table 2. Measuring the effect. Calculating the odds ratio (OR) in 1:2 paired case-control studies (2 controls to 1 case).

Exposed controls
  First Second First Second First Second  

  + + + - - -  
Exposed cases A B C A+B+C
Unexposed case D E F D+E+F
  A+D B+E C+F N

  OR =
Mc Newman c2

One of the most commonly used formulae for calculating the etiological fraction (EF) or risk attributable to a factor in the population is Miettinen's formula (37):

EFp = (OR - 1)P1 / OR

where P1 is the proportion of cases exposed in the study sample.

Miettinen has also put forward the calculation of the preventable fraction (PF) using of a factor equivalent to the fraction of potential cases prevented by exposure (as occurs in multidrug-resistant TB cases that can be prevented by full, adequate treatment), and which is calculated using a simple formula: PF = 1 – OR.

If the OR of a protection factor is 0.40, the PF would be 60%, meaning that 60% of cases have been prevented thanks to exposure to this particular factor.

Statistical significance. Confidence limits

The statistical significance of a result expresses the probability that this result may be due to chance.

From the point of view of verifying a hypothesis, there is the null hypothesis (which assumes that there is no association) and the alternative hypothesis. Checking the hypothesis aims to verify the null hypothesis.

Statistical tests make it possible to quantify the degree of certainty with which a null hypothesis is rejected or accepted. They are measure of random error and of the probability (the p-value) that the data reported are chance findings. If the null hypothesis is rejected, this reinforces the alternative hypothesis but does not prove it, as is often mistakenly believed in medical literature. Rejecting the null hypothesis does not mean that the alternative hypothesis must be accepted without further analysis.

The chi-square calculation is usually used in case-control studies to analyze the statistical significance of the difference in the amount of exposure in the cases in comparison with the controls. The null hypothesis indicates that the differences are due to chance. Using the chi-square calculation obtained, the value of p must then be sought in a table of p-values.

The calculation of the confidence interval (CI) of the OR is an alternative means of estimating statistical significance and has considerable advantages over the chi-square test (34). The interpretation of p makes it obligatory to establish a dichotomy in line with an arbitrary limit (an error of 5% is usually accepted) and the result may or may not be significant depending on whether this limit is higher or lower. The confidence limits are more informative in that they describe the size of the variability of the effect. They measure the extent of the probability of containing the specific observed value of the effect in subsequent repetitions. The confidence level (z value) is set by the researcher and is the expression of the accepted error (for an error of 5%, z is = 1.96).

The CI also indicates the accuracy of an estimate: the broader the interval, the lower the accuracy of the estimate. The confounding factors may be controlled a priori in the design phase by pairing, or a posteriori in the analysis by means of stratification or multivariate analysis.

The researcher may reach the conclusion that it is necessary to conduct stratified research through preliminary analysis of the data as a result of knowing or initially suspecting that a variable is or may be a confounding factor. Stratified analysis makes it possible firstly to compare the cases and controls with regard to exposure within the same category of confounding factor. Secondly, it makes it possible to calculate a weighted value of the observed effect that is not affected by the confounding factor, as in standardization.

Multivariate analysis is similar to stratified analysis. When a number of variables are analyzed simultaneously in a stratified model, the number of boxes created multiplies, and in many of them, if the size is not very large, they are not effective. Multivariate analysis has the advantage of not being limited by the number of strata. It also makes it possible to analyze continuous variables which would have to be categorized in stratified analysis. The logistic regression model is the multivariate analysis method used in case- control studies. Conditional (paired) and unconditional (unpaired) analysis is conducted with the calculation of a regression coefficient, which is an estimate of the relative risk of maximum likelihood for a single variable or for a variable adjusted for the effect of others. It also enables interaction and dose-response test coefficients to be calculated. It is the most commonly used method in case-control studies on multiresistance (20, 28, 39).


Although it is possible to conduct research into the possible causes of multidrug-resistant TB using a cohort study, when we decide on the design we opt to use a case-control study, since a cohort study requires greater financial resources, needs a larger number of subjects and takes longer.

In this first design phase, we consider TB to be class 3 of the current classification of the American Thoracic Society and the Centers for Disease Control, which is based on the pathogenesis of the disease and establishes six classes or categories ranging from class 0 (no exposure and no disease) to class 5 (suspected TB) (40). To this we add the isolation, identification and antibiogram of M. tuberculosis isolated in the patients.

A prospective study is chosen to separate the cases from the controls that arise from the time of the start of the study in order to avoid any bias in selection and the information.

As cases do not occur very frequently, it is easier to conduct an unpaired study. In any case, if the incidence of cases of multiresistance is higher, the design could be changed to a paired design by controlling variables such as age and/or gender, for which we would not draw a conclusion with regard to their role in the etiology of multidrug-resistant TB.

The study sample may be taken from a particular province, region, nation or continent. The decision on this will take into account multiresistance morbidity and mortality rates and the presentation of TB and multidrug-resistant TB in the various categories or strata into which we divide the sample.

We need to know, for example, whether one gender predominates in a particular case more than in another, the presentation in one or more age groups, in a particular race, etc. in order then to choose the most representative sample possible considering randomness (random selection) and stratification (different categories in the population maintained at similar percentages in the sample).

Cases will be defined as TB cases in which M. tuberculosis has been isolated and identified and then demonstrated by an antibiogram to be resistant to at least isoniazid and rifampicin. The controls will be TB cases in which M. tuberculosis susceptible to all antituberculous drugs has been isolated as well as identified and cases of monoresistant M. tuberculosis.

The cases and controls may be either incident or prevalent depending on whether we choose new patients identified within a set time period as having the disease or whether we choose the total number of patients with the disease at a given time.

We believe that the best returns are to be had with a study in which there are two controls for each case, one of which is resistant to one antimicrobial (isoniazid or rifampicin), and the other susceptible to all antituberculous drugs. It is possible to use more controls for each case (e.g. four controls, two monoresistant and two susceptible, for each case).

The information required for an epidemiological study requires careful selection of the questions on the variables that we think may be possible causes of multiresistance in TB. As a result, the design of the questionnaire will consider such variables as the following:

  1. Gender, date and place of birth, contacts with TB.
  2. Previous TB: (< one year without treatment), type (pulmonary and/or extrapulmonary), treatment, antimicrobials used and length of time.
  3. Place of residence (family home/living alone, institution, prison, health-care worker, homeless/no permanent address, other).
  4. Working status (employed, retired, unemployed, tramp, other, unknown).
  5. Intravenous drug user (no, yes, unknown).
  6. Immunosuppression due to a cause other than HIV infection (no, yes, unknown).
  7. Associated diseases (diabetes, gastrointestinal disorders, transplants, others).
  8. HIV status (HIV-negative, HIV-positive but AIDS has not developed, AIDS, unknown), CD4 lymphocyte count.
  9. Current TB: location (pulmonary, extrapulmonary, pulmonary and extrapulmonary).
  10. Current treatment, drugs, start date.

This information is to be gathered by interviewing the cases and controls in person or over the phone. In our experience, sending out the questionnaires by post makes it more likely that there will be no response. In order to obtain the information in the correct manner, the same person should be responsible for gathering it or teams of interviewers should be trained to fill in the survey questions using the same criteria. If the study covers a very large area, telephone interviews may be used.

The microbiological data are important and it is essential that the type of sample, isolation and identification techniques and tests on susceptibility to antituberculous drugs be gathered.

Despite the existence of more comprehensive database and spreadsheet programs (Access, Dbase and Excel) and programs for statistical handling of data (SPSS and SAS), we choose to enter, tabulate and analyze our data using the EPIINFO 6.04 b program because it is easy to use, it can be run on computers of different technological standards and with different operating systems, and above all because the WHO uses it to gather, analyze and circulate information on TB. The subprograms will be ENTER, ANALYSIS and EPITABLE. Nevertheless, we recommend SPSS for multivariate analysis.

To analyze the data presented in a 2 x 2 table, the first step might be to conduct multivariate analysis or logistic regression analysis, which makes it possible to predict the importance of the various variables gathered using the calculation of the corresponding regression coefficients.

The main test for working out statistical significance is the chi-square test, which analyzes whether the percentage difference of exposure between the cases and controls can be explained by chance occurrence.

The estimate of risk can be produced in the following ways:

  1. by calculating the repeated OR for multidrug-resistant TB (MDRTB) (or advantage).

  2.   % exposure of the considered variable,
    given MDRTB
    OR MDRTB =
      % exposure of the considered variable,
    without MDRTB
  3. by finding the etiological fraction of the different variables tested.

EF multidrug-resistant TB=(OR-1)P/OR

where P is the proportion of multidrug-resistant TB cases exposed to the different variables in the population studied.

It is necessary to consider the confidence limits with an a error of 5% in order to establish the interval of the chi-square and of the OR, whose values include 95% of the data of the population for which the results are estimated.

The results will be presented in a table providing the OR value with its confidence limits for each variable considered to be a risk factor for multidrug-resistant TB.

As mentioned earlier, the OR values express the difference in risk associated with exposure, as it is affected by whether the variable being considered is or is not a possible cause of multidrug-resistant TB.

  1. World Health Organization Tuberculosis control. Progress in 1995-1997. Weekly Epidemiological Record 1999; 27: 217-227.
  2. Hood, J., Amyes, S.G.B. The chromosomal b-lactamases of genus Acinetobacter enzymes which challenge our imagination. En: Towner, K.J., Bergogne-Bérézin, E., Fewson, C.A. (Eds.). The biology of Acinetobacter. Plenum Publishing Corp., New York 1991; 117-132.
  3. Crofton, J., Caullet, P., Maher, D. Guidelines for the management of drug-resistant tuberculosis. World Health Organization (WHO/TB/ 96.210) 1997.
  4. World Health Organization Antituberculosis Drug Resistance in the World. The WHO/IUATLD global project on antituberculosis drug resistance surveillance 1994-1997. World Health Organization (WHO/TB/ 97.229) 1997.
  5. Canetti, G. Present aspects of bacterial resistance in tuberculosis. Am Rev Respir Dis 1965; 92: 678-703.
  6. Frieden, T., Sterling, T., Pablos-Méndez, A. et al. The emergence of drug-resistant tuberculosis in New York city. N Engl J Med 1993; 328: 521-526.
  7. Dooley, S.W., Jarvis, W.R., Marlone, W.J., Sneider, D.E. Multidrug-resistant tuberculosis. Ann Intern Med 1992; 117: 257-259.
  8. Mahmoudi, A., Iseman, M.D. Pitfalls in the care of patients with tuberculosis. JAMA 1993; 270: 65-68.
  9. Centers for Disease Control. Management of persons exposed to multidrug-resistant tuberculosis infection. MMWR 1992; 41: 61-65.
  10. Kochi, A., Vareldzis, B., Styblo, K. Multidrug resistant tuberculosis and its control. Res Microbiol 1994; 144: 104-110.
  11. Centers for Diseases Control. Nosocomial transmission of multidrug-resistant tuberculosis among HIV-infected persons. MMWR 1991; 40: 585-591.
  12. Lambregts-van Weezenbeek, C.S., Veen, J. Control of drug-resistant tuberculosis. Tuberc Lung Dis 1995; 76: 455-459.
  13. Grundy, P.F. A rational approach to the "At risk" concept. Lancet 1973; 2: 1489.
  14. Schlesseman, J.J. Case control studies: Design, conduct, analysis. Oxford University Press 1982.
  15. Cole, P. The envolving case control study. J Chronic Dis 1979; 32: 15-27.
  16. Wacholder, S., McLaughlin, J.K., Silverman, D.T. et al. Selections of controls in case control studies. Design options. Am J Epidemiol 1992; 135: 1042-1050.
  17. González, C.A., López, G., Errezola, M. et al. Occupation and bladder cancer in Spain: A multi-centre case-control study. Int J Epiderm 1989; 18: 569-577.
  18. Bradford, W.Z., Martin, J.N., Reingold, A.L., Schecter, G.F., Hopewell, P.C., Small, P.M. The changing epidemiology of acquired drug-resistant tuberculosis in San Francisco. Lancet 1996; 348: 928-931.
  19. Lambregts-van Weezenbeek, C.S., Jansen, H.M., Nagelkerke, N.J., van Klingeren, B., Veen, J. Nationwide surveillance of drug-resistant tuberculosis in the Netherlands: Rates, risk factors and treatment outcome. Int J Tuberc Lung Dis 1998; 2 (4): 288-295.
  20. Suo, J., Yu, M.C., Lee, C.N., Chiang, C.Y., Lin, T.P. Treatment of multidrug-resistant tuberculosis in Taiwan. Chemotherapy 1996; 42 (Suppl. 3): 20-23.
  21. Rapiti, E., Fano, V., Forastiere, F., Agabiti, N., Geraci, S., Scano, M., Alichino, F., Rinnenburger, D. Determinants of tuberculosis in an immigrant population in Rome: A case-control study. Int J Tuberc Lung Dis 1998; 2 (6): 479-483.
  22. Pablos-Méndez, A., Blustein, J., Knirsch, C.A. The role of diabetes mellitus in the higher prevalence of tuberculosis among Hispanics. Am J Public Health 1997; 87 (4): 574-579.
  23. Del Amo, J., Petruckevitch, A., Phillips, A.N. et al. Risk factors for tuberculosis in patients with AIDS in London: A case-control study. Int J Tuberc Lung Dis 1999; 3 (1): 12-17.
  24. Poole, C. Exposure opportunity in case-control studies. Am J Epidemiol 1986; 123: 352-358.
  25. Weltman, A.C., David, N.R. Tuberculosis susceptibility pattern predictors of MDR, and implications for initial therapeutic regimens at New York City Hospital. Arch Intern Med 1994; 154: 2161-2167.
  26. McLaughlin, J.K., Dietz, M.S., Mehl, E.S. et al Reliability of surrogate information on cigarette smoking by type of informant. Am J Epidemiol 1987 1987; 126: 144-146.
  27. Sacks, L.V., Pendle, S. Factors related to in-hospital deaths in patients with tuberculosis. Arch Intern Med 1998; 158 (17): 1916-1922.
  28. Boudville, I.C., Wong, S.Y., Snodgrass, I. Drug-resistant tuberculosis in Singapore, 1995 to 1996. Ann Acad Med Singapore 1997; 26 (5): 549-556.
  29. Schwoebel, V., Decludt, B., de Benoist, A.C., Haeghebaert, S., Torrea, G., Vincent, V., Grosset, J. Multidrug resistant tuberculosis in France 1992-1994: Two case-control studies. BMJ 1998; 317: 630-631.
  30. Guerrero, A., Cobo, J., Fortun, J., Navas, E., Quereda, C., Asensio, A., Canon, J., Blázquez, J., Gómez-Mampaso, E. Nosocomial transmission of Mycobacterium bovis resistant to 11 drugs in people with advanced HIV-1 infection. Lancet 1997; 350: 1738-1742.
  31. Cole, P. Introduction. In: Breslow, N.E., Day, N.E. (Eds.). Statistical methods in cancer research, vol. 1. The Analysis of Case Control Studies Agency for Research on Cancer, Lyon 1980.
  32. Bloch, A.B., Cauthen, G.M., Onorato, I.M. et al. Nationwide survey of drug-resistant tuberculosis in the United States. JAMA 1994; 271 (9): 665-671.
  33. Wacholder, S., McLaughlin, J.K., Silverman, D.T. et al. Selections of controls in case control studies. Types of controls. Am J Epidemiol 1992; 135: 1029-1041.
  34. Casagrande, M., Pike, M., Smith, P. Sample size. Biometrics 1978; 34: 483.
  35. Rothman, K.J. Modern Epidemiology. Little, Brown & Company, Boston 1986.
  36. Corbett, E.L., Churchyard, G.J., Clayton, T., Herselman, P., Williams, B., Hayes, R., Mulder, D., De Cock, K.M. Risk factors for pulmonary mycobacterial disease in South African gold miners. A case-control study. Am J Respir Crit Care Med 1999; 159 (1): 94-99.
  37. Johnson, J.L., Okwera, A., Vjecha, M.J. et al. Risk factors for relapse in human immunodeficiency virus type 1 infected adults with pulmonary tuberculosis. Int J Tuberc Lung Dis 1997; 1 (5): 446-453.
  38. Miettinen, O.S. Proportion of disease caused or prevented by a given exposure, trait or intervention. Am J Epidemiol 1974; 99: 325-332
  39. Cole, P., MacMahon, B. Attributable risk percent in case-control studies. Brit J Prev Soc Med 1971; 25: 245-246.
  40. Ellison, E., Lapuerta, P., Martin, S.E. Cytologic features of mycobacterial pleuritis: Logistic regression and statistical analysis of a blinded, case-controlled study. Diagn Cytopathol 1998; 19 (3): 173-176.
  41. American Thoracic Society. Centers for Diseases Control diagnostic standards and classification of tuberculosis. Am Rev Respir Dis 1990; 142: 725-735.