Finding the Truth – research methods in biomedical science. V Meta –analysis

What is meta-analysis?

Individual intervention trials may not be large enough to produce a statistically definitive answer or different trials investigating a particular intervention may not always produce the same results. Meta-analysis is a statistical tool that allows researchers to get a consensus by combining together trials that are essentially similar to produce what is effectively a larger study of greater statistical power. When studies are combined in this way they are weighted according to their sample size and so the larger trials have a bigger impact on the final result than smaller ones. This technique is not only used to combine clinical and field trials but can also be used to combine animal experiments, cohort studies, case control studies and indeed almost any studies which are investigating the same hypothesis and are similar in their essential features and use the same outcome measures.

The first step in a meta-analysis is to conduct a systematic review of the literature that aims to identify all of the trials/studies that could possibly be included. Additional searching may be conducted to obtain the results of those that have not been formally published or are not listed on the search database used. All of these studies must then be sifted according to pre-determined and objective criteria to decide whether or not they should be included in the final analysis. The sorts of inclusion criteria that might be set for a meta-analysis of intervention trials would be things like:

  • Characteristics of the subjects e.g. sex and age range, current health status and maybe ethnic or socio-economic characteristics
  • The minimum duration of the trial
  • The minimum (and/or maximum) dose level of the intervention
  • The outcome measures that were used e.g. total or disease specific mortality, changes in symptoms, changes in a risk factor like blood cholesterol or blood pressure
  • If the effects of an intervention on a particular disease is being investigated then the diagnostic criteria used to select subjects for the trial and perhaps the initial severity of the condition
  • Quality criteria; one would probably decide that only RCTs would be included in the final analysis if there were a reasonable number of such trials and one might only include trials that met certain additional quality criteria.

Growth of meta-analysis

The idea of aggregating together clinical studies to try to give a more comprehensive judgement about effectiveness goes back over a hundred years. The term meta-analysis was coined in the 1970s and the technique was refined by researchers working in the field of education. The ability to search the literature and access papers electronically and the availability of software to perform the data analysis has made meta-analysis a relatively low-cost way of generating “original data” i.e. a cheap and not too labour-intensive way of producing primary research papers that are the key to advancement of scientific careers. No expensive laboratory space and equipment and almost no consumable items and technical support are required to conduct meta-analyses; most of the work can be done whilst sitting in front of an internet-linked computer. Figure 1 shows the exponential growth in the use of meta-analysis since the 1980s. To generate this figure, I found all the papers with “meta-analysis” in the title using the academic search engine PubMed and broke them down according to year of publication.

Figure 1           The exponential growth in the use of meta-analysis

meta analysis timeline2017

Summarising the results of a meta-analysis

The results of a meta-analysis are presented visually as what is known as a Forest plot. Figure 2 shows a Forest plot of five trials in which the effects of folic acid supplements given prior to conception and in early pregnancy on the incidence of neural tube defects NTD (like spina bifida) in the babies. The central vertical line is the no effect line and the five small dark squares represent the relative rates in each study for those given and not given extra folic acid. A square that lies on the central vertical line means that the rate is exactly the same in the control and test groups and that folic acid had no effect. If the square is to the left of the horizontal line then this favours the folic acid and any square to the right of the line would indicate the rate was higher in the folic acid group i.e. it was having a harmful effect. Note that the size of this square indicates the size of the trial and its weighting in the final analysis. The horizontal lines on either side of the square indicate what statisticians call the confidence limits of each trial and if these horizontal lines cross the vertical line then this individual trial is not considered to be statistically significant on its own. Clearly all five trials favour folic acid supplements but only the large, expensive and sophisticated trial at the bottom is statistically significant in its own right. The diamond-shaped block at the bottom shows the effect of a weighted combination of all five trials and this is very highly significant leaving no doubt that folic acid reduces NTD risk. If one did a meta-analysis and just combined together the top four individually not significant trials then this combination would also show a highly significant effect of folic acid; adding in the fifth large trial reinforces this.

Figure 2           A Forest Plot of five clinical trials of effect of folic acid supplements on the rate of neural tube defects.

folic acid meta-analysis

Adapted from De-Regil et al (2010).

The folic acid supplements reduce the risk of having an affected pregnancy by over 70%. The results of individual studies vary because of factors like the dose used, the type of women used and just chance. The numbers in the top trial indicate a much lower incidence in this trial compared to the others even in the control group and this is because this trial used a random selection of women whereas the others used “high risk” women.

This example demonstrates how a meta-analysis of several small and not statistically significant studies (the top four in figure 2) can be combined to give an outcome that is highly statistically significant. Of course the statistical analysis merely confirms objectively what most scientists would have concluded simply by using their judgement to interpret the data.

Some general problems with meta-analysis

Multiple publishing

The same data may be published more than once to boost the apparent research output and thus the careers of researchers. This is regarded as a form of scientific misconduct. It is not always easy to spot where this has happened. If the same data is included more than once in the meta-analysis then this increases it influence on the outcome of the meta-analysis. If a study with 100 subjects was included twice then it would effectively count as if it had 200 subjects.

Publication bias

It is common for negative data to remain unpublished e.g. data suggesting that an intervention has no effect or that there is no relationship between a lifestyle variable and a disease or a risk factor. Where this occurs, it would mean that a meta-analysis of the published literature would give a conclusion that had a positive bias. This can be for all sorts of innocent reasons; authors may be less enthusiastic about publishing a paper that indicates that an intervention does nothing and journal referees and editors may be less interested in publishing negative and therefore uninteresting results. There may also be more sinister motivations involved; in his book Bad Pharma, Ben Goldacre describes several drug trials that were unfavourable to a company’s product that have remained unpublished long after the trial was completed. In a paper by Erick Turner and colleagues published in the New England Journal of Medicine in 2008 , there was an analysis of the publication or non-publication of antidepressant drug trials. This group identified 74 clinical trials that had been registered with the Food and Drug Administration (FDA) in the USA. Only one trial of the 38 trials which were classified by these authors as having a positive outcome (the drugs work) remained unpublished but 22 of the 36 studies with negative or questionable results were not published. Of the 14 trials in this latter category that were published, the report’s authors suggested that 11 were published in a way that suggested a positive outcome. If a research sponsor like a drug or supplement manufacturer engineers the non-publication or delayed publication of negative data about one of their products then this is a form of serious research misconduct. It may mean that even some large and well-conducted trials do not get published and are not included in a meta-analysis and this can give a strong positive bias to the outcome.

If one makes the assumption that  smaller negative trials are more likely not to be published then it is possible to detect this using something called a Funnel Plot (see figure 3). Larger trials will tend to give the most consistent results and as the trials size gets smaller so the variability in the results would be expected to increase. If the effect is plotted against the sample size then one would expect to see the trials spread symmetrically around the average effect line and with gradually increasing deviation from this line as the sample size gets smaller. This is shown in figure 3 which shows a theoretical and symmetrical Funnel Plot in the top figure but at the bottom shows the effect on the Funnel of leaving out five smaller negative trials simulating the effect of non-publication of these trials. Of course if large and well conducted negative trials are purposely withheld from publication then this will not be detected by a Funnel plot.

Figure 3           Two hypothetical Funnel plots to show the effect of leaving out five small negative studies in the bottom plot to simulate the effect of publication bias.

funnel plots

From Sutton A.J. et al BMJ, 320 1774-1577. BMJ Publishing Group Limited

Selection bias

The decision about whether or not to include a trial in a meta-analysis must be made according to pre-set criteria. Normally at least two people decide independently whether any study should be included or not. Where they do not agree then closer scrutiny will be given to that decision ideally by another person. In an editorial published in 2003 in the BMJ, Hywel Williams discusses the controversy surrounding the use of Evening Primrose Oil (EPO) to treat atopic (allergic) eczema. Products containing EPO had recently had their licences as medicines withdrawn by UK regulators because of doubts about their efficacy. In this BMJ editorial Williams suggests that a meta-analysis with a positive outcome conducted by people working for one of the companies involved in the marketing of EPO had been quite influential in persuading regulators and the medical community about the effectiveness of EPO in the treatment of atopic eczema. Seven of the small trials included in the meta-analysis were sponsored by the company and one large negative study was not included in the analysis on grounds that Williams clearly considered to be dubious. The title of this editorial is Evening Primrose Oil for atopic eczema with the subtitle Time to say goodnight.

If you put rubbish in you get rubbish out

As with all statistical techniques, the reliability of the outcome of a meta-analysis is dependent upon the quality of the data that is put into it. If predominantly rather low quality trials are put in then the findings that it generates will not be reliable and will probably tend to be biased in favour of a positive effect of the intervention.  The inclusion of fabricated data will distort the outcome in favour of that promoted by the fraudster; it is much to generate large, positive data sets by fraudulent means than by honest endeavour. It is thus important that fraudulent or suspect papers are identified and either retracted or highlighted to readers. Could electronic literature searches give a warning to readers that publications have emanated from a known fraudster? In the Yoshitaka Fujii case-study, his mass of fabricated clinical trials of treatments for post-operative nausea and vomiting certainly distorted conclusions about the most effective ways of treating this problem. In the Ranjit Chandra case study, his fabricated data exaggerated the benefits of hydrolysed infant formulas in preventing allergic condition and the benefits of vitamin and mineral supplements in improving immune function and reducing infection risk in elderly people. This effect of fraudulent trials on a meta-analysis can work both ways i.e. not only potentially distorting the outcome but also highlighting how the results of an author or research group deviate markedly from those of other authors researching the same topic. In the Fujii case-study one of the reasons why Fujii was initially suspected of research committing fraud was because his trials of the anti-nausea drug granisetron gave quite different findings to those of all other authors. Likewise Chandra’s data on the effects of vitamin and mineral supplements on immune function in well elderly people gave results that were completely at odds with other authors. These meta-analyses highlighted the inconsistency of these authors’ findings compared to others working in the field. Such findings should raise suspicions about fraud and trigger a more thorough review of an author’s work.

The strong impression that I have obtained from reading a number of meta-analyses is that often when the analysis is conducted using all of the studies meeting wide inclusion criteria and then again using only those that are of the highest quality then this results in a reduction in the apparent benefit of the intervention i.e. the poorer studies tend to give the most positive results. This rather anecdotal impression is supported by a study published in the Lancet in 1998 by David Moher and colleagues who randomly selected eleven meta-analyses testing various medical interventions. For each of these meta-analyses they graded the included trials according to two different methods of assessing trial quality. They found that the lower quality trials gave a substantially greater positive effect of treatment than those of the highest quality.

Having read many meta-analyses, particularly those relating to the effect of diet, lifestyle, dietary supplements and herbal products, then I often find that they have not really added very much to my ability to make clear and objective judgements. Some authors freely admit that the data they have used in not of the highest quality; under these circumstances a meta-analysis may add extra credibility to dubious conclusions. Sometimes so little data suitable for inclusion in a meta-analysis is found that authors can do little more than conclude that because of the lack of data there is the need for more research. Sometimes where there is a clear positive outcome, then the meta-analysis may give objective confirmation of what most reviewers might have concluded from a more narrative review of the trials as in the folic acid and NTD example above.

Even where many apparently untainted trials have tested the effects of a particular intervention then meta-analysis does not always provide definitive clarification of whether an intervention is effective.  Sometimes meta-analyses which apparently test the same question do not always come up with the same answer. These issues are discussed in post VIII in this series.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s