Today, Science published an important comment pointing out that there were serious errors in a climate research article that it published in October 2004. The article concerned (Von Storch et al. 2004) was no ordinary paper: it has gone through a most unusual career. Not only did it make many newspaper headlines [New Research Questions Uniqueness of Recent Warming, Past Climate Change Questioned etc.] when it first appeared, it also was raised in the US Senate as a reason for the US not to join the global climate protection efforts. It furthermore formed a part of the basis for the highly controversial enquiry by a Congressional committee into the work of scientists, which elicited sharp protests last year by the AAAS, the National Academy, the EGU and other organisations. It now turns out that the main results of the paper were simply wrong.
Von Storch et al. claimed to have tested the climate reconstruction method of Mann et al. (1998) in model simulations, and found it performed very poorly. Now, Eugene Wahl, David Ritson and Caspar Amman show that the main reason for the alleged poor performance is that Von Storch et al. implemented the method incorrectly. What Von Storch et al. did, without mentioning it in their paper, was to remove the trend before calibrating the method against observational data – a step that severely degrades the performance of Climate Field Reconstruction (CFR) methods such as the Mann et al. method (unfortunately this erroneous procedure has already been propagated in a paper by Burger and Cubasch (GRL, 2005) where the authors refer to a personal communication with Von Storch to justify the use of the procedure). Another more recent analysis has shown that CFR methods perform well when used correctly. (See our addendum for a less technical description of what this is all about).
How big a difference does this all make? The calibration error in the temperature minimum around 1820, where one of the largest errors occurs, is shown as 0.6ºC in the standard case of 75% variance in the Von Storch et al analysis. This error reduces to 0.3ºC even in the seriously drift-affected ECHO-G run when the erroneous detrending step is left out. In the more realistic HadCM3 simulation, this error is just above 0.1ºC. The error margins (2 sigma) provided by Mann et al. and pictured in the IPCC report are ±0.17ºC (Fig. 2.21, the curves are reproduced in our addendum). It is therefore clear that the model test of Von Storch et al, had it been implemented correctly, would have shown a small but undramatic underestimation of variance and would have barely ruffled a feather.
Error made, error corrected, and all is well? Unfortunately not. A number of questions remain, which need to be resolved before the climate science community can put this affair to rest.
The first is: why did it take so long to correct this error, and why did the authors of the original paper not correct it themselves? The error is reasonably easy to spot, even for non-specialists (see addendum). And it was in fact spotted very soon after publication. In January 2005, a comment was submitted to Science which correctly pointed out that Von Storch et al. had calibrated with detrended data and had therefore not tested the Mann et al. method. As such comments are routinely passed to the original authors for a response, Von Storch et al. must have become aware of their mistake at this point at the latest. However, the comment was rejected by Science in May 2005.
In a paper dated July 2005, Zorita and Von Storch admit their error in passing, writing: “the trend is subtracted prior to the fit of the MBH regression/inflation model (von Storch et al. 2004). […] It seems, however, that MBH have exploited the trends”. It is thus clear that they knew that their central claim of the Science paper, namely that they had tested the Mann et al. method, was false. But rather than publishing a correction in Science, they wrote the above in a non-ISI journal called “Memorie della Societa Astronomica Italiana” that not many climatologists would read.
An unambiguous correction in Science, where the original paper appeared, would not only have been good scientific practice. It would have been particularly important given the large public and political impact of their paper. It would have been a matter of courtesy towards their colleagues Mike Mann, Raymond Bradley and Malcolm Hughes, who had suffered a major challenge to their scientific reputations as well as having to invest a large amount of time to deal with the Congressional enquiry mentioned above. And it would have been especially pertinent given the unusually vitriolic media statements made previously: in an interview with a leading German news magazine, Von Storch had denounced the work of Mann, Bradley and Hughes as “nonsense” (“Quatsch”). And in a commentary written for the March 2005 German edition of “Technology Review”, Von Storch accused the journal Nature for putting their sales interests above peer review when publishing the Mann et al. 1998 paper. He also called the IPCC “stupid” and “irresponsible” for highlighting the results of Mann et al. in their 2001 report.
There were at least two further issues with the Von Storch et al. paper:
– The model run of Von Storch et al. suffers from a major climate drift due to an inappropriate initialisation procedure. Despite starting in medieval times, the model was initialised from a present-day, rather than pre-industrial, climate state – i.e. from a climate affected by human-caused warming. As a result, the Northern Hemisphere temperature in the model drops by about 1.5 ºC during the initial 100-year adjustment phase and keeps drifting down for the coming centuries. This problem is never mentioned and this part of the experiment is not shown in publications, although climate modellers know that such severe disequilibrium must cause a long-lasting climate drift in the remainder of the run. After Osborn et al. (2006) documented this problem, Von Storch et al. repeated their experiment with improved initialisation. Their new run shows that about half the cooling from medieval times to the 19th Century in their original paper was due to this artificial drift, but again they have not published a correction or demonstrated the impact of this issue (see addendum).
– Von Storch et al. also looked at another model, stating: “Similar results are obtained with a simulation with the third Hadley Centre coupled model (HadCM3), demonstrating that the results obtained here are not dependent on the particular climate characteristics of the ECHO-G simulation.” They have repeatedly made similar claims in the media. This is important, as any model result is considered somewhat preliminary until confirmed with an independent model. However, their statement appears to us to be a serious misrepresentation of the HadCM3 results which were shown only in the online supplement to their paper (see addendum).
In their response to the Wahl et al critique, Von Storch et al acknowledge the original problem but in order to salvage their result, they introduce a large ‘red noise’ component into the proxies. This changes the nature of their test and implies an ‘a priori’ loss of low frequency variance instead of trying to calculate whether a particular methodology produces such a loss.
One could view this story as a positive example for the self-correcting process of science: erroneous results are eventually spotted and corrected, even if it sometimes takes time. If only science were at stake here, we’d need say no more: this would have been a sometimes inappropriately sharp, but otherwise regular technical debate about improving the methodology of proxy reconstructions.
Unfortunately, while the dispute has been used in the public arena to score political points, e.g. to discredit the IPCC process and to question all of the relevant climate science, the significance of this dispute for the bigger picture has been wildly blown out of proportion (see here for a previous discussion). We hope that after this new correction, the discussion can move on to a more productive level. The key issue is how we can improve reconstructions of past large-scale climate variability – of which by now almost a dozen exist. We should not lose sight of the fact that the debate here is about a few tenths of a degree – a much smaller change than is projected for the next century. It is also important to remember one principal point: Conclusions on whether recent warmth is likely to have been unprecedented in the past millennium, or the recent extent of human-caused warming, are based on the accumulation of evidence from many different analyses and are rarely impacted by a technical dispute about any one paper such as this.