There is Madness in Their Methods
Mikel Aickin University of Arizona SEP 3rd 2005
In The Structure of Scientific Revolutions, Thomas Kuhn portrayed normal science as slipping into a moribund condition, in which it could no longer provide acceptable answers to its own questions. Then some kind of shift would come along to replace the dominant outlook with an improvement, which was in turn destined to become the new version of normal science, perpetuating the whole cycle. In Kuhn’s version, it was the discovery of new puzzles, consisting of observations that could not be satisfactorily explained in the current paradigm, that led to an apparent shift. He did not, however, consider a situation in which the methods of normal science might simply degenerate, producing the same kind of crisis, and possibly the same kind of resolution.
There is unsettling evidence that we are now in the midst of a methodologic degeneration in biomedical science. This appears to be occurring in, of all places, our fundamental approach to inference –using observation and evidence to decide how to act or believe. That it might be happening in medical research makes it of more than just academic interest.
One of the few benefits in a degeneration of conventional methods is that the normal scientists are unlikely to recognize that it is happening, and so the process will be not only made public, but actually touted as excellent science. So it is with a remarkable article published on 27aug05 in the Lancet regarding homeopathy1. The editors of Lancet are evidently proud of their publication, since they use it as the basis for a call to end homeopathy. Does this article justify the editorial, or is it in fact a betrayal of the very principles that the Lancet claims to stand for? Let us see how this specific article fares in the light of the conventional criteria that are applied to articles in clinical trials and biomedical science generally.
Treatment Comparisons. A clinical trial has investigated two therapies for a given condition. On a scale in which larger numbers are better, and zero stands for no effect, the effect of therapy A is 0.54 (SDE=0.196), while for therapy B it is .13 (SDE=0.154). The researchers conclude that treatment A is effective (p=0.006) while therapy B is not (p=0.40).
The article is, of course, not accepted for publication. The reason is that the whole point of having two groups in a study is to compare them with each other. The difference between the treatment effects in the two groups is 0.41 (SDE=0.249) with p=0.10. By the conventional criterion for making such comparisons, this result is not “statistically significant”. It means that there is no basis for saying that the two therapies have different effects. The study is null.
The data come from the abstract of the Lancet article. 0.54 is the negative log of the odds ratio (0.58) from conventional studies, and 0.13 is the same transformation of the odds ratio (0.88) in homeopathic studies. In the abstract, and in the comments elsewhere in the issue, the faulty analysis is treated if it were correct: therapy A (conventional medicine) is indeed effective, while therapy B (homeopathy) is not.
Differential Compliance. Another study has randomized 110 patients each to two therapy groups. The therapies are hard to maintain, and so only 21 patients comply in one group, while an even more disappointing 9 comply in the other group. The difference is “statistically significant” with p=0.018. The authors are surprised when their article is rejected, on the grounds that such a low rate of compliance, combined with a differential between the two groups, casts the results in serious doubt. The study has failed.
The numbers are from the abstract of the Lancet article. There were 21 “high-quality” homeopathic studies, and 9 “high-quality” conventional studies. The conclusion is clear; there has been a “statistically significant” demonstration that homeopathy articles are of higher quality than comparable conventional medical articles on the same topics. Unfortunately, this invalidates the rest of the paper. (As a footnote, it was only recently that the supposed poor quality of CAM research was being cited as the reason for a false excess of positive CAM studies. Now that the quality results are in the opposite direction, this argument is evidently no longer valid.)
Intent-to-treat. Yet another study also enrolls 110 pair-matched patients in each of two groups. One group has 8 evaluables while the other has only 6. The article is rejected on the grounds that once patients are entered into the study, they must be analyzed in their original group. This means, among other things, that if they did not contribute endpoint data, then some imputation scheme must be used. The results as presented are faulty not only because more than ninety percent of the data are missing, but because there is no guarantee that the patients actually analyzed are matched (that is, the pair matching was destroyed by the missing data, a point passed over by the authors). The process of selection that produced “evaluables” is not above question.
The data come from the abstract of the Lancet article. The odds ratios cited above are based on 8 homeopathic and 6 conventional articles (not 110 of each, as implied elsewhere in the article and in the Lancet editorial). The loss of pairing was ignored, of course. The validity of the measures used to include articles is not adequately justified, despite the fact that the results might well be almost entirely driven by them.
Post-study power computations. A study without a control group reports an apparent treatment effect of 0.13 (SDE=0.154). This is properly reported as not “statistically significant”. The article is only accepted subject to revision, since a negative study with a small sample size should provide a power computation (this is not, as often and erroneously thought, to justify the study in the first place, but to determine whether the results are worth anything at all). A conventional calculation gives a detectable effect of 0.462 (power 85%). The editors decide that this is too large to be reasonable, and reject the article.
The data come from the abstract of the Lancet article. A negative result is reported (homeopathy is no better than placebo) with a miniscule sample size, and no power calculation.
Control of confounding. A group of epidemiologists conduct an observational study of seven risk factors on a disease outcome. The issue is to determine whether the risk factors are the same in two groups of people. The data presentation consists of a series of univariate odds ratios, one for each risk factor, with p-values to test a null association. The article is rejected for two reasons. First, since the purpose was to compare risk factors across the two groups, the comparisons with null effects are not germane, and the obvious comparisons between the groups should be made. But more importantly, there is a known confounder that should have been controlled in the analysis (that is, there should have been a multivariate analysis), and moreover the risk factors that were analyzed are intercorrelated, so that again multivariate analyses should have been carried out.
The data are from Table 3 of the Lancet article. One could take quality as the confounder, or perhaps one of the other factors. There is, of course, no reason to dichotomize quality, and since this generally results in misclassification bias, there is reason not to. Obviously the analysis does not compare conventional medicine with homeopathy, but rather compares each to the null. An appropriate analysis would not only make the comparison between therapy groups, but would take the pairing into account.
Meta-analyses. There are, therefore, five areas in which the Lancet article does not meet the minimum, conventional criteria for publication in biomedicine. This is, however, not the most serious problem with the article. For this, we need to go back to recall the original aim of a meta-analysis, or overview. It is to assemble all of the obtainable, relevant literature on studies done for the purpose of comparing different therapies for a given condition. The original reasons for developing the concept were to collect scattered literature into one place, to apply uniform criteria for study selection and analysis, and to come to a conclusion about the best therapeutic approach, or to say that the evidence was not yet conclusive. Somehow this precise and useful form has degraded into an unrecognizable hash, in which any papers on any topic can be bundled together in an investigation of questions of unlimited ambiguity. A classic paper along this path has already been published in the Annals of Internal Medicine.2 Here the authors studied a single therapy (vitamin E supplementation), breaking the first rule of meta-analysis, for multiple conditions (breaking the second rule) in studies not designed to test the therapy (breaking the third rule). There is evidence that they were not sufficiently careful about the form of the various vitamin E treatments, violating a fourth rule.3 This study in effect concocted perhaps the most biased sample of human beings one could find in the biomedical literature, and then made the truly bizarre assertion that its results applied to everyone. One can only presume that the lack of a negative reaction to the Annals’s article paved the way for the Lancet article.
There are other examples, of a similar order of strangeness, but I will only mention the therapeutic touch article published in Journal of the American Medical Association.4 This article was on research carried out by a nine-year old girl, under the direction of her mother, an ardent opponent of therapeutic touch. The methodology was debunked in an article in Alternative Therapies,5 showing that it contained appalling, irremediable flaws. After the original article was published, the JAMA editor was criticized for poor judgment, by accepting a low-quality article to make a political statement. This could be seen as an unjustifiably beneficent interpretation, however, because no one seems to have noticed the very real possibility that the article, poor though it was, actually did meet JAMA’s scientific standards.
Malpractice. If you make a few simple assumptions, you can roughly compute the number of possible instances of malpractice that a physician might commit in a lifetime of practice. It is not a particularly large number. Now consider a journal that publishes papers which mislead health professionals and ordinary people about the effectiveness of medical practices. Surely one article, no less one researcher, can have a harmful effect through research malpractice that dwarfs the meager capacity of a single physician. The malpractice risk of a typical journal must be even larger.
If we are to see a continued degradation of methods in biomedical research, supported by “leading” journals, then perhaps it is time to think about the End of Biomedical Journals, as we know them. In the US at least, it would not seem unthinkable that the National Institutes of Health, through the National Library of Medicine, could undertake a web-based project to publish all funded, and much of the unfunded research, in all areas of biomedicine. The need for the current unregulated system would vanish. Editors and referees would continue to be needed, but they would operate under rational regulations, and would not be in a position to endanger the public health on the basis of personal whims. Incidentally, the job of meta-analyses would be infinitely easier, since the hunt for relevant articles would be all but accomplished. And, as we know because of PubMed, the technology is available, and the NLM knows how to apply it.
To return to Thomas Kuhn, we certainly have many biomedical puzzles that are worth working on, and which have not been addressed by normal biomedical science. Some of us are engaged in an experiment to see whether we can fashion research tools that will help us to understand more, by extending existing methods when feasible, and developing new ones when appropriate. For us, it is particularly discouraging to see normal biomedical scientists perverting their own tools for the evident purpose of attacking unconventional therapies.
References
1. Aijing Shang, Karin Huwiler-Müntener, Linda Nartey, Peter Jüni, Stephan Dörig, Jonathan A C Sterne, Daniel Pewsner, Matthias Egger. Are the clinical effects of homoeopathy placebo effects? Comparative study of placebo-controlled trials of homoeopathy and allopathy. Lancet 2005; 366: 726–32
2. Miller ER, Pastor-Barriuso R, Dalal D, Riemersma RA, Appel LJ, Guallar E. Metaanalysis: High-dosage vitamin E supplementation may increase all-cause mortality. Annals of Internal Medicine 2005;142(1):37-46
3. Neustadt J, Pizzorno J. Vitamin E and All-Cause Mortality. Integrative Medicine 2005;4(1):14-17
4. Rosa L, Rosa E, Sarner L, Barrett S. A close look at therapeutic touch. JAMA 1998;279(13):1005-1010
5. Cox T. A nurse-statistician reanalyzes data from the Rosa therapeutic touch study. Alternative Therapies 2003;9(1):58-65