Medical literature is replete with the use of more complex statistics these days, and many of our practice guidelines require more and more rigorously conducted studies with sophisticated statistics to be considered. It is important that clinical lipid specialists and others in the field of preventive cardiology understand and be wary of some key issues in the interpretation of different studies. This primer reviews the basics on interpretation of statistical significance versus clinical significance, relative risk versus absolute risk, and number needed to treat/harm, and the importance of primary versus secondary endpoints and subgroup analyses when interpreting the importance of clinical studies. Key examples from the lipid literature are used to illustrate these issues.
Statistical vs. Clinical Significance:
We live in a world in which a study generally is deemed to be positive if the primary outcome of interest is met with a p-value of <0.05, meaning a 5 percent risk of a false positive result. Whether a study reaches this level of significance can ultimately decide whether a drug is approved for a given indication. In reality, life is not as black and white as this appears. For one, statistical significance is really a continuum, and while p<0.05 is the arbitrary cut point most use to deem whether a study is positive, a result such as p<0.01 or p<0.001 would indicate a much more definitive result with much less room for error. In addition, just because a study reaches statistical significance does not necessarily mean the result is clinically significant. Take, for example, the recent Improved Reduction of Outcomes: Vytorin Efficacy International Trial (IMPROVE-IT)1 involving randomization of high-risk patients who suffered an acute coronary syndrome within 10 days on simvastatin alone or simvastatin plus ezetimibe. Yes, the trial reached statistical significance with a p-value of 0.016. However, with a hazard ratio of 0.94 in favor of the combination therapy, this 6 percent relative-risk reduction can be hardly considered a dramatic result. In fact, a Food and Drug Administration (FDA) advisory panel did not recommend expanding the label to include an indication for reducing cardiovascular disease events;2 often such a new indication requires two large trials with positive results or one very large trial with highly significant results. Yet, because this remains to date the only “positive” clinical trial showing event reduction from the addition of a non-statin on top of a statin, recent recommendations3 consider its use for high-risk patients not achieving acceptable response from a statin alone.
In contrast, the Factor-64 clinical trial4 randomized patients with diabetes to cardiac computed tomographic angiography (CCTA)-directed care (where those who were found to have atherosclerotic plaque were assigned to more intensive riskreducing therapy) versus usual standard guideline-directed care in a single-payer population of more than 800 subjects. A more impressive hazard ratio (compared to IMPROVE-IT) of 0.80 (meaning a 20 percent relative risk reduction) for the composite outcome of total mortality, myocardial infarction, or unstable angina in favor of the CCTA-directed care was noted, yet this result was not statistically significant. The authors conclude, based on the statistics, that CCTA-directed care was not effective in improving outcomes. However, one can argue the study was underpowered because of the relatively low event rate in the overall patient sample and/or limited follow-up time. Therefore, the lack of a statistically significant differences does not always preclude that a clinically significant difference is not present.
Precision vs. Imprecision:
When interpreting the point estimates (e.g., hazard ratio), it is important to understand its precision in terms of the confidence intervals associated with the estimate, because both can be significant but one can be substantially more precise (and worthy of your trust) than the other. For instance, the harm of an intervention versus control for cardiovascular events may be represented as hazard ratio (HR) of 1.30 with a 95 percent confidence interval (CI) of 1.20–1.40 — meaning that, if we repeated the trial 20 times, 19 of the 20 times our estimate would fall within that interval — as opposed to a much less precise estimate of HR=1.30 with a 95 percent CI of 1.02–22.5. Both are statistically significant (since they exclude the null value of 1.0), but the latter is substantially less precise, possibly because of a small number of outcomes in one group or the other.
Relative vs. Absolute Risk Reduction and Number Needed to Treat or Harm:
We often pay more attention to relative versus absolute risk, but it is crucial to incorporate both into our interpretation of studies. While the efficacy of an intervention in a clinical trial frequently is described in terms of relative risk reduction (RRR) — such as “drug A was associated with a 25 percent relative risk reduction in the primary endpoint” — such information is useless unless we know the absolute risk reduction (ARR). Such a relative risk reduction could be reduction in risk from 10 to 7.5 percent (ARR=2.5 percent) or from 0.4 to 0.3 percent (ARR=0.1 percent), a very different meaning, indeed. With the number needed to treat (NNT) defined as 1/ARR, in the former example with the ARR of 2.5 percent, the NNT would be 40, considered to be reasonable, as compared to the latter example with the ARR of 0.1 percent, where the NNT would be 1,000. Generally, a NNT of 50 is seen as favorable, although this may vary depending on the intervention and the time of treatment needed to achieve this. The RRR often is very dependent on the population studied and the length of follow-up. Take again the example of the IMPROVE-IT trial: While the RRR was only 6 percent, the fact that the trial enrolled very-high-risk acute coronary syndrome patients within 10 days of their event and followed them for seven years resulted in cumulative event rates of 32.7 percent for the intervention and 34.7 percent for the control groups, for an ARR of 2.0 percent and NNT of 50, which is considered reasonable for an intervention. Had the trial enrolled stable coronary disease patients (e.g., one year after their event), the ARR and NNT may have been much different. Finally, and of even greater relevance, is the concept of “net clinical benefit,” which is the difference between NNT and the number needed to harm (NNR), which considers the balance between adverse side effects and efficacy.
Primary vs. Secondary Endpoints:
While the success of a clinical trial customarily is tied to whether it met its primary endpoint, many feel this should not be the exclusive criterion on which to base one’s conclusion about the clinical trial. Clinicians often feel a careful examination of secondary endpoints, in particular those that may be most relevant to their patient population or practice is necessary. Take, for example, the Prospective Pioglitazone Clinical Trial in Macrovascular Events (PROactive trial),5 which evaluated pioglitazone versus placebo in patients with diabetes and where the primary outcome was death, myocardial infarction, stroke, acute coronary syndrome, endovascular surgery, or leg amputation. While the primary endpoint failed by a small margin because of a p-value of 0.08, which turned out to be greatly influenced by an adverse effect from endovascular surgery that most would agree typically is not included in a composite cardiovascular endpoint, the “principal secondary” and more conventional endpoint of death, myocardial infarction, or stroke was significant at p=0.03. More recently, the Empagliflozin, Cardiovascular Outcomes, and Mortality in Type 2 Diabetes trial (EMPA-REG OUTCOME trial)6 also can serve as a useful illustration. Randomization of type 2 diabetes patients to empagliflozin versus placebo did result in a statistically significant reduction in the primary outcome of cardiovascular death, myocardial infarction, or stroke, but to a modest degree (HR=0.86, p=0.04). In fact, myocardial infarction and stroke were not in and of themselves significantly reduced. The convincing reductions in heart failure hospitalizations (HR=0.65, p=0.002), cardiovascular death (HR=0.62, p<0.001), and all-cause death (HR=0.68, p<0.001) are motivating the enthusiasm for this trial and possible further investigation.
Subgroup Analyses:
While neither guidelines nor new drug approvals can be based on subgroup analyses, the careful reader (and clinician) will understand the potential significance and implications of pre-specified subgroup analyses and make their own judgements as to whether they should be applied to patient care. In clinical lipidology, probably some of the best examples derive from the Action to Control Cardiovascular Risk in Diabetes Lipid Trial (ACCORD-Lipid)7 and the Atherothrombosis Intervention in Metabolic Syndrome with Low-HDL/High Triglycerides: Impact on Global Health Outcomes Trial (AIM-HIGH).8 While each failed its primary endpoints, the prespecified subgroup analysis in those with high triglycerides and low high-density lipoprotein cholesterol (HDL-C), the patients that one could argue were most appropriate for the interventions targeted (fenofibrate and niacin, respectively), did show important reductions in risk (that were statistically significant in the case of ACCORD). While these interventions were never approved for reducing cardiovascular disease (CVD) events beyond statin therapy, some clinicians who understand and appreciate these subgroup analyses might still consider such patients for these therapies. Another conundrum deals with clinical trials that were done globally, or in more than one country or ethnic group, and a response to a medication may be very different from one patient to the next. The Heart Protection Study 2 — Treatment of LDL to Reduce the Incidence of Vascular Events (HPS-2 THRIVE)9 trial of the niacin-laropiprant combination is a perfect case in point. Here the primary endpoint was completely negative, without any discernible benefit seen from the intervention. Nevertheless, upon inspecting the subgroup data, patients enrolled in Europe did have a modest benefit (11.3 vs. 12.4 percent), whereas those in China had no benefit at all (15.8 vs. 15.5 percent) (p-heterogeneity=0.06) (supported by the fact that the Chinese patients had a greater side effect profile and less efficacy in terms of low-density lipoprotein cholesterol [LDL-C] reduction and high-density lipoprotein cholesterol [HDL-C] increase). This provides us insight into how regional/ethnic differences in response to a therapy may play into the results of a trial and its interpretation.
Conclusions:
We have presented several key statistical issues that need to be considered by the clinical lipidologist, cardiologist, or other clinician or scientist in the careful analysis of clinical trial or observational study results. While there often are rigid rules for criteria used to consider whether a study should form the basis of a new guideline, it is always prudent to advise the practicing clinician not to be “bound” to such evidence or guidelines when deciding whether a therapy is appropriate for an individual patient or select group of patients, but to consider them within the totality of the evidence given criteria such as those we have reviewed in this short primer.
Disclosure statement: Dr. Wong has received consulting fees from Merck, Amgen, and Pfizer. He has received honoraria from Sanofi-Regeneron, and participated in contracted research through his institution for Amgen and Regeneron.


.png)








