RealClimate logo

Peer Review: A Necessary But Not Sufficient Condition II

Filed under: — group @ 27 January 2005

by Michael Mann and Gavin Schmidt

In a previous post, we discussed a number of examples where the “Peer Review” process has failed, and poor papers have been published in the ostensibly peer-reviewed literature. In this context, we revisit our previous discussions of the flawed work of McIntyre and McKitrick (henceforth “MM”). MM published a paper, in the controversial journal Energy and Environment, claiming to “correct” the proxy-based reconstruction of Northern Hemisphere temperatures published by Mann et al (1998–henceforth “MBH98”). Following the all-too-familiar pattern, this deeply flawed paper was heavily promoted by special interests as somehow challenging the scientific consensus that humans are altering the climate (an excellent account is provided by science journalist Dan Vergano of USA Today here). As detailed already on the pages of RealClimate, this so-called ‘correction’ was nothing more than a botched application of the MBH98 procedure, where the authors (MM) removed 80% of the proxy data actually used by MBH98 during the 15th century period (failing in the process to produce a reconstruction that passes standard “verification” procedures–an error that is oddly similar to that noted by Benestad (2004) with regard to another recent McKitrick paper). Indeed, the bizarre resulting claim by MM of anomalous 15th century warmth (which falls within the heart of the “Little Ice Age”) is at odds with not only the MBH98 reconstruction, but, in fact the roughly dozen other estimates now published that agree with MBH98 within estimated uncertainties.

All of their original claims have now been fully discredited (see e.g. this previous post as well as this discussion of a paper ‘in press’ in the Journal of Climate by Rutherford et al). MM however, continue to promote false and specious claims. McIntyre and McKitrick (2005), in a paper they have managed to slip through the imperfect peer-review filter of GRL, now simply recycle the very same false claims made by them previously in their comment on MBH98 that was rejected by Nature. Sifting through a large number of false and misleading statements in this latest paper, there are two primary criticisms of MBH98 that they raise, both of which are demonstrably specious. The first criticism claims that the “Hockey Stick” shape of the MBH98 reconstruction is in some way an artifact of the conventions used to represent certain proxy data networks by Principal Component Analysis (PCA). This has already been demonstrated to be false in detail. We quickly recap the points for readers who do not want to wade through the details: i) the MBH98 results do not depend on what kind of PCA is used, as long as all significant PCs are included, ii) the results are insensitive to whether PCA is used at all (or whether all proxies are included directly), and iii) the results are replicated using a completely different methodology (Rutherford et al, 2005).

Their second criticism is of the statistic employed by MBH98 as diagnostic of statistical skill, the “Reduction of Error” or “RE” (note that this statistic was favored as a skill diagnostic in prominent recent studies by Cook et al (2004) and Luterbacher et al (2004) in Science). This criticism was summarily dismissed by the reviewers of the rejected MM Nature comment. MM instead promote the use of a simple linear correlation coefficient (“r“) in its place. RE is favored by scientists in testing the “skill” (that is, in determining the reliability of a statistical model based on its ability to match data not used in constructing the model) of a statistical reconstruction, precisely because it takes into account not only whether a reconstruction is ‘correlated’ with the actual withheld test data, but also whether it can closely reproduce the mean and the standard deviation of the test data. Because a simple linear correlation coefficient (r) does not, it is widely recognized as not being a sufficient metric of reconstructive skill. An excellent discussion of these considerations is provided in the text book by Wilks (1995–see chapter 7 on “Forecast Verification”, section 7.3.3 on “Skill Scores”), while illustrative examples are provided in a supplementary appendix to Rutherford et al (2005). As the MM reconstruction fails verification tests using the accepted metric RE (by contrast with the MBH98 reconstruction, the MM reconstruction produces negative values of RE, indicating that it is inferior to a statistical model that simply assigns the mean to all predicted values–the very definition of ‘no skill’), it is perhaps understandable why MM might try to promote this alternative, but inappropriate, statistic.

On a more general note, the intense criticism leveled against MBH98 is peculiar in that the authors of that study have in fact emphasized and quantified the uncertainties in their reconstructions in published work, something that was very difficult in previously published methodologies. The follow-up to MBH98 by Mann et al (1999) was entitled “Northern Hemisphere Temperatures During the Past Millennium: Inferences, Uncertainties, and Limitations” (italics added for emphasis), and indeed emphasized the substantial remaining uncertainties in proxy-based estimates of Northern Hemisphere temperature change in past centuries. The validity of the so-called “Hockey Stick” can, of course, neither rest on the strength of MBH98, nor any one reconstruction or model simulation result alone. Rather, as demonstrated in IPCC(2001) [see this comparison here] and numerous additional studies since, it is what is perhaps more aptly termed the “Hockey Team”–that is, the multiple independent reconstructions and model simulations that now indicate essentially the same pattern of hemispheric mean temperature variation in past centuries, that support a “Hockey Stick” description of past temperature changes.

Ironically, while some continue to attack this nearly decade-old work, the actual scientific community has moved well beyond the earlier studies, focusing now on the detailed patterns of modeled and reconstructed climate changes in past centuries, and insights into the roles of external forcing and internal modes of variability (such as the North Atlantic Oscillation or “NAO” and the “El Nino/Southern Oscillation” or “ENSO”) in explaining this past variability. For example, it is relatively well established now that the “Little Ice Age” represented only a moderate cooling for the Northern Hemisphere on the average because larger offsetting regional patterns of temperature change (both warm and cold) tended to cancel in a hemispheric or global mean. Modelers now are comparing not just hemispheric mean series, but the actual spatial patterns of estimated and observed climate changes in past centuries. See e.g. our review paper (Schmidt et al, 2004), where the response of a climate model to estimated past changes in natural forcing due to solar irradiance variations and explosive volcanic eruptions, is shown to match the spatial pattern of reconstructed temperature changes during the “Little Ice Age” (which includes enhanced cooling in certain regions such as Europe) as well as the smaller hemispheric-mean changes.


Benestad, R.E., Are temperature trends affected by economic activity? Comment on McKitrick & Michaels. Climate Research 27:171-173, 2004.

Mann, M.E., R.S. Bradley, and M.K. Hughes, Global-scale temperature patterns and climate forcing over the past six centuries, Nature, 392, 779-787, 1998.

Mann, M.E., R.S. Bradley, and M.K. Hughes, Northern Hemisphere Temperatures During the Past Millennium: Inferences, Uncertainties, and Limitations, Geophysical Research Letters, 26, 759-762, 1999.

McIntyre, S. and R. McKitrick, Hockey Sticks, Principal Components, and Spurious Significance, Geophys. Res. Lett., 32, 2005.

Rutherford, S., Mann, M.E., Osborn, T.J., Bradley, R.S., Briffa, K.R., Hughes, M.K., Jones, P.D., Proxy-based Northern Hemisphere Surface Temperature Reconstructions: Sensitivity to Methodology, Predictor Network, Target Season and Target Domain, Journal of Climate, in press, 2005.

Schmidt, G.A., D.T. Shindell, R.L. Miller, M.E. Mann, and D. Rind, General circulation modelling of Holocene climate variability, Quaternary Sci. Rev., 23, 2167-2181, doi:10.1016, 2004.

Wilks, D.S., Statistical Methods in the Atmospheric Sciences, Academic Press, 1995.

5 Responses to “Peer Review: A Necessary But Not Sufficient Condition II”

  1. 1

    RealClimate – Climate Scientists Use The Blog
    :: RealClimate is new blog, launched in December 2004 by a group of concerned climate scientists, which describes it as a commentary site:RealClimate is a commentary site on climate science by working climate scientists for the interested public and jo…

  2. 2

    it would be constructive if younbwereto present a 1000 word precis of this to the editor of TCS . I for one would endorse threir running it, but then , I would endorse a full fledged public debate between the warring parties in this matter at AEI.

  3. 3
    CharlieT says:

    What is your position on the Bristlecone Pines?
    (I’m referring to MM’s suggestion that they are in some way anomalous)

    [Response: Thanks for the question. Much has been written on the potential influence of non-climatic factors in recent centuries (potentially associated with co2 effects) on the growth pattern of certain high elevation drought stressed trees such as the Bristlecone Pines you refer to. In Mann et al (1999) [Mann, M.E., Bradley, R.S. and Hughes, M.K., Northern Hemisphere Temperatures During the Past Millennium: Inferences, Uncertainties, and Limitations, Geophysical Research Letters, 26, 759-762, 1999], an attempt was made to remove these potential non-climatic influences. This was done by subtracting the anomalous pattern of growth that emerges over the past couple centuries in these chronologies relative to other tree-ring chronologies that otherwise exhibit very similar patterns of growth back in time, but which are unlikely to be influenced by the same non-climatic factors. More discussion of these issues (and references to relevant past work) can be found in the paper. -Mike]

    Isn’t it a bit wrong to use the the word ‘independent’ [above]?
    -If the various reconstructions use a common core of proxies, they wouldnt seem to be statistically independent to me -not that I am a statistician.

    [Response: No, its an appropriate description. Several of the reconstructions that have been performed are based on entirely independent proxy data and entirely independent methodologies. Other reconstructions use a small number of common series but an independent methodology. None of the reconstructions use largely the same dataset, or precisely the same methodology. -Mike]

  4. 4
    Neil Craig says:

    The Hockeystick theory & indeed part 4 of your defined consensus, that things are so bad that serious action is required, requires that there was not a period (the medieval warming) when average temperatures were several degrees warmer than now.

    On the other hand many students of history have said that there was & pointed out that there were extensive Norse settlements in Greenland which died out apparently because of worsening climate. What convinces you that this period was illusory?

    [Response: Sigh…. Try here for what the hockey stick implies (and doesn’t), here and here for more about the medieval warm period, and Jared Diamond for information about the Norse settlements. – gavin]

  5. 5