Critical Appraisal of Evidence - Therapy (Single Trials) Scenario

Are the results of this study valid?

Returning to our clinical scenario from the question formulation tutorial:

You admit a 65 year old man with a stroke. On examination you find that he has mild weakness of the right arm and right leg and bilateral carotid bruits. You send the patient for carotid doppler ultrasonography and subsequently receive the report that he has moderate stenosis (50-69% by NASCET criteria) of the ipsilateral carotid artery. You've noticed in the pile of journals that is accumulating in your office that there has been some recent literature addressing surgical versus medical therapy for patients with symptomatic carotid stenosis but you are unsure of what the results of these studies indicate.

Our search of the literature found article from the Best Evidence (1999;130:33).

How do we critically appraise this therapy paper? We'll start off by considering validity first and the following list outlines the questions that we need to consider when deciding if a therapy paper is valid.

Was the assignment of patients to treatment randomized? And, was the randomization list concealed?

Randomisation helps ensure that patients in treatment groups are identical at the study onset in their risk of the event we are hoping to prevent. It balances groups for prognostic factors (good or bad) that if they were unequally distributed amongst the groups, could increase, decrease or nullify the effect of the therapy.

We need to check if the randomisation list has been concealed from the clinicians who entered patients into the trial. This is done so that the clinicians won't be aware of which treatment the next patient would receive.

The study that we found was randomised (which is one of the inclusion criteria for a therapy article in Best Evidence). From the original article we can see that the randomisation list was concealed and details on the randomisation process were also provided.

Was follow-up of patients sufficiently long and complete?

We'd want to see that the duration of follow-up was sufficiently long to see the outcomes of interest. It is also important that the investigators provide details on the number of patients followed up and if possible, on the outcomes of patients who dropped out of the study. If we are unsure of what effect the dropouts may have on the study result, we can perform a 'sensitivity analysis' for a 'worst case scenario'. For the group that did better, assume that all the people who were lost to follow-up did poorly. For the group that did worse, assume all the people who were lost to follow-up fared well. If the result still supports the original conclusion, than the follow-up was sufficiently complete. It would be unusual for a study to be able to withstand more than a 20% loss of follow-up and therefore most journals of secondary publication (including ACP Journal Club and EBM) use this as an exclusion criteria for article selection.

From the abstract we identified in Best Evidence, 99.7%!! of patients were followed up for 5 years.

Were all patients analyzed in the groups to which they were randomized?

Anything that happens after randomisation can affect the chance that a study patient has an outcome event. Therefore, we need to see if the investigators analysed the patients in the groups to which they were randomised, even if they crossed over to the other treatment group. This 'intention to treat' analysis preserves the value of randomisation.

An intention to treat analysis was done in the study that we identified. (This information was provided in the abstract available on Best Evidence.)

Were patients and clinicians kept blind to treatment? And, were groups treated equally, apart from the experimental therapy?

Blinding of clinicians and patients helps to prevent additional treatment. The provision of treatment (received in addition to the experimental treatment) to just one of the groups is called cointervention. If either the patients or the clinicians weren't blinded it could lead to the reporting of symptoms or the interpretation of these symptoms to be affected by suspicion about the effectiveness of the treatment under investigation.

In the NASCET study, all patients received antiplatelet therapy (this was usually ASA and the dose was left to the discretion of the neurologist at each study centre), and when indicated they received antihypertensive and or antilipidemic medications.

Blinding is not always possible (such as in surgery trials) and in these situations we should check to see if outcome events were assessed by blinded investigators. For example in NASCET, outcome events were assessed by 4 groups: the participating neurologist and surgeon; the neurologist at the study centre; by 'blinded' members of the steering committee; and by 'blinded' external adjudicators.

Were the groups similar at the start of the trial?

This is usually reported in the 'Table 1' of the article. If the groups aren't similar, we need to see if there was an adjustment made for the potentially important prognostic factors.

The medical and surgical groups were similar in NASCET. For example, the percentages of patients who were prescribed antihypertensive or antilipidemic medications were similar.

If the study fails any of the above criteria, we need to decide if the flaw is significant and threatens the validity of the study. If this is the case, we'll need to look for another study. Returning to our clinical scenario, the paper we found satisfies all the above criteria and we will proceed to assessing it for importance.

Are the results of this study important?

What is the magnitude of the treatment effect?

There are several ways that information about treatment effects can be presented. This discussion will be illustrated using the results of NASCET (for any stroke at 5 years) as shown in the first row of numbers in the table below.

Control Event Rate	Experimental Event Rate	Relative Risk Reduction	Absolute Risk Reduction	Number Needed to Treat
0.264	0.198	25%	0.066	15
0.000000264	0.000000198	25%	0.000000066	15,000,000

The control event rate (CER) is the proportion of patients in the control group (in this study, the group that received medical care) that had the outcome event of interest (in our scenario, this would be any stroke). The experimental event rate (EER) is the proportion of patients in the experimental group (patients in the carotid endarterectomy group) that had the outcome of interest.

The relative risk reduction (RRR) is one way of describing the treatment effects and is calculated as:

RRR = |EER-CER|/CER
RRR = |0.198-0.264|/0.264
RRR = 25%

Applying this, we can say that if we treat people who have moderate carotid stenosis with carotid endarterectomy we can decrease their risk of future stroke by 25% compared to those people who receive medical therapy only.

If the experimental treatment increases the risk of a good event, we can use this same equation to calculate the relative benefit increase (RBI). Similarly, if the experimental treatment increases the risk of an adverse event we can use the equation to calculate the relative risk increase (RRI).

The RRR has limitations. Consider the second row of numbers in the table above - when the CER was incredibly small (0.000000264) the RRR remains at 25%. The RRR is unable to discriminate between small treatment effects and large ones and doesn't reflect the baseline risk of the event.

One measure that overcomes this is the absolute difference between the CER and EER or the absolute risk reduction (ARR). It is calculated as:

ARR = |EER-CER|
ARR = |0.198-0.264|
ARR = 0.066

If the experimental treatment increased the risk of a good event, we can use this same equation to calculate the absolute benefit increase (ABI). Or, if the experimental treatment increases the risk of an adverse event, we can use the equation to calculate the absolute risk increase (ARI).

Returning to the data in the table, we can see that the ARR reflects the baseline risk of the event and that it discriminates between small and large treatment effects. However, because it is not a whole number, it is often difficult to remember and to translate to patients.

To overcome these difficulties, we can take the inverse of the ARR which tells us the number of patients that we'd need to treat with the experimental therapy in order to prevent one additional bad event. This is called the number needed to treat (NNT) and in our example, the NNT is 15. We can see from the table that the NNT (like the ARR) is able to differentiate between small and large treatment effects - in the second row of the table, when the CER and EER are very small, the NNT is over 15 million!

When the treatment increases the risk of adverse events, we can calculate the number of patients that we'd need to treat with this therapy to cause one additional bad event and this term is called the number needed to harm (NNH). The NNH is calculated as 1/ARI.

How big should an NNT be for us to be impressed? Consider some examples. We'd need to treat 40 people who have suspected MI with aspirin to prevent 1 additional death. And, we'd only need to treat 20 people who have suspected MI with aspirin and thrombolysis to prevent 1 additional death.

What is the precision of the treatment effect?

The confidence interval around the NNT can be calculated as the inverse of the confidence interval for the ARR. The smaller the number of patients who have the event of interest, the wider the confidence interval.