Critical Appraisal of Evidence - Diagnosis Scenario

Are the results of this study valid?

Returning to our clinical scenario from the question formulation tutorial:

You admit a 75 year old woman with community-acquired pneumonia. She responds nicely to appropriate antibiotics but her hemoglobin remains at 100 g/l with an MCV of 80. Her peripheral blood smear shows hypochromia, she is otherwise well and is on no incriminating medications. You contact her family physician and find out that her Hgb was 105 g/l 6 months ago. She has never been investigated for anaemia. A ferritin has been ordered and comes back at 40 mmol/l. You admit to yourself that you're unsure how to interpret a ferritin result and don't know how precise and accurate it is.

Our search of the literature to answer this question retrieved an article from the Am J of Medicine (1990;88:205-9).

How do we critically appraise this diagnosis paper? We'll start off by considering validity first and the following list outlines the questions that we need to consider when deciding if a diagnosis paper is valid.

Was there an independent, blind comparison with a reference ('gold') standard of diagnosis?

In considering this question, we need to determine whether all patients in the study underwent both the diagnostic test under evaluation (in our scenario, the serum ferritin) and the reference standard (in our scenario, bone marrow biopsy) to show that they definitely do or do not have the target disorder. We should also ensure that those investigators who are applying and interpreting the reference standard do not know the results from the diagnostic test.

We also need to consider if the reference standard is appropriate. Sometimes a reference standard may not be clear cut, (such as in the diagnosis of delirium) and in this case, we'd need to review the rationale for the choice of reference standard as outlined by the study authors.

All patients in the study we found underwent serum ferritin testing and bone marrow biopsy.

Was the diagnostic test evaluated in an appropriate spectrum of patients (like those in whom we would use it in practice)?

The study should include both patients with common presentations of the target disorder and those with conditions that are commonly confused with the target disorder of interest. If the study only includes patients with severe symptoms of the target disorder (and who would be very obvious to diagnose) it is not likely to be useful to us. We need to find out if patients with varying severity of the disease were included in the study and also whether it includes patients with target disorders that are often confused with this one. For example, anaemic patients can be symptomatic or asymptomatic and the anaemia can result from a number of causes - we would want to ensure that the study we retrieved included patients with a variety of presentations and symptoms.

Reviewing the ferritin study, it included consecutive patients over the age of 65 who were admitted with anaemia to a university-affiliated hospital in Canada. It excluded patients from institutions and patients who were too ill or who had severe dementia. No details are provided on the definitions used for 'too ill' or 'severe dementia'.

Was the reference standard applied regardless of the diagnostic test result?

We need to check to see that even if a patient's serum ferritin was normal, the study investigators performed the reference standard. Sometimes if the reference standard is invasive, it may be considered unethical to perform it on patients with a negative test result. For example, if a patient with chest pain is suspected to be at low risk of a pulmonary embolism and has a negative V/Q scan, an investigator (who is performing a study looking at the accuracy of the V/Q scan in diagnosing pulmonary embolism) may not want to subject the patient to pulmonary angiography which is not without morbidity and mortality. Indeed, this was what the investigators did in the PIOPED study - if patients were considered to be at a low risk of a pulmonary embolism and had a negative V/Q scan, rather than undergoing a pulmonary angiogram, they were followed up clinically for several months, without receiving antithrombotic therapy to see if an event occurred.

In the ferritin study, all patients received both the diagnostic test and the reference standard.

Was the test (or cluster of tests) validated in a second, independent group of patients?

The tests should be assessed in an independent 'test' set of patients. This question is important in studies looking at multiple diagnostic elements.

If the study fails any of the above criteria, we need to consider if the flaw is significant and threatens the validity of the study. If this is the case, we'll need to look for another study. Returning to our clinical scenario, the paper we found satisfies all of the above criteria and we will proceed to assessing it for importance.

Are the results of this study important?

Let's begin by drawing a 2x2 table, using the results from the study that we identified:

		Target Disorder (iron deficiency anaemia)		Totals
		Present	Absent	Totals
Diagnostic test result (serum ferritin)	Test Positive (≤ 45 mmol/l)	70 a	15 b	85 a + b
Diagnostic test result (serum ferritin)	Test Negative (>45 mmol/l)	15 c	135 d	150 c + d
Totals		a + c 85	b + d 150	a + b + c + d 235

Our patient's serum ferritin comes back at 40 mmol/l and looking at the Table, we can see that she fits in somewhere in the top row (either cell 'a' or cell 'b'). From the Table we can also see that 82% (70/85) of people who have iron deficiency anaemia have a serum ferritin in the same range as our patient - this is called the sensitivity of a test. And, 10% (15/150) of people without this diagnosis have a serum ferritin in the same range as our patient - this is the complement of the specificity (1-specificity). The specificity is the proportion of people without iron deficiency anemia who have a negative or normal test result. We're interested in how likely a serum ferritin of 40 mmol/l is in a patient with iron deficiency anaemia as compared to someone without this target disorder. Our patient's serum ferritin is 8 (82%/10%) times as likely to occur in a patient with iron deficiency than in someone without iron deficiency anaemia - this is called the likelihood ratio for a positive test. We can now use this likelihood ratio to calculate our patient's posttest probability of having iron deficiency anaemia.

Our patient's posttest probability of having iron deficiency anaemia is obtained by calculating:

posttest odds/(posttest odds + 1)

where

posttest odds = pretest odds x likelihood ratio

The pretest odds are calculated as pretest probability/1-pretest probability. We judge our patient's pretest probability of having iron deficiency anaemia as being similar to that of the patients in this study (a+c/a+b+c+d = 85/235 = 36%) and therefore:

pretest odds = (0.36/(1-0.36)
pretest odds = 0.56

Using this we can calculate

posttest odds = 0.56 x 8
posttest odds = 4.5

And, finally,

posttest probability = 4.5/5.5
posttest probability = 82%

With this information, we can conclude that based on our patient's serum ferritin, it is very likely that she has iron deficiency anaemia (posttest probability > 80%) and that our posttest probability is sufficiently high that we would want to work our patient up for causes of this target disorder.

Instead of doing all of the above calculations, we could simply use the likelihood ratio nomogram. Considering that our patient's pretest probability of iron deficiency anaemia was 36%, and that the likelihood ratio for a serum ferritin of 40 mmol/l was 8, we can see that her posttest probability of iron deficiency anaemia is just over 80%.

Multilevel tests

In the paper we found, the serum ferritin results are divided into 3 levels: =45 mmol/l, 46-100 mmol/l and >100 mmol/l. We can see that more information about the diagnostic test is available when results are presented in multilevels:

Diagnostic test result	Target Disorder (iron deficiency anaemia)		Likelihood ratio
Diagnostic test result	Present	Absent	Likelihood ratio
≤ 45 mmol/l	70/85	15/150	8
> 45 ≤ 100 mmol/l	7/85	27/150	0.4
> 100 mmol/l	8/85	108/150	0.1

If our patient's serum ferritin was 110 mmol/l (and using her pretest probability of 36% and the likelihood ratio of 0.1), her posttest probability of iron deficiency anaemia would be less than 3%, virtually ruling out the possibility of this diagnosis. However, if her serum ferritin came back at 65, her posttest probability would be 10% and we'd have to decide if this was sufficiently low to stop testing or if we needed to do further investigations.