In a recent paper, McKitrick and Michaels (2004, or “MM04”) argue that non-climatic factors such as economic activity may contaminate climate station data, and thus, may render invalid any estimates of surface temperature trends derived from these data. They propose that surface temperature trends may be linked to various local economic factors, such as national coal consumption, income per capita, GPD growth rate, literacy rates, and whether or not temperature stations were located within the former Soviet Union. If their conclusions were correct, this would hold implications for the reliability of the modern surface temperature record, an important piece of evidence indicating 20th century surface warming. However, numerous flaws with their analysis, some of them absolutely fundamental, render their conclusions invalid.
First of all, there are a number of issues that they did not address that logically must must be addressed for their conclusions to be tenable. MM04 failed to acknowledge other independent data supporting the instrumental thermometer-based land surface temperature observations, such as satellite-derived temperature trend estimates over land areas in the Northern Hemisphere (Intergovernmental Intergovernmental Panel on Climate Change, Third Assessment Report, Chapter 2, Box 2.1, p. 106) that cannot conceivably be subject to the non-climatic sources of bias considered by them. Furthermore, they fail to reconcile their hypothesis with the established large-scale warming evident from global sea surface temperature data that, again, cannot be influenced by the local, non-climatic factors they argue contaminate evidence for surface warming. By focusing on thermometer-based land observations only, and ignoring other evidence conflicting with their hypothesis, MM04 failed to address basic flaws in their arguments.
Perhaps even more troubling, it has been noted elsewhere that MM04 confused “degrees” and “radians” in their calculations of areal weighting factors, rendering all of their calculations incorrect, and their conclusions presumably entirely invalid.
The focus of this piece, however, is on yet another fundamental problem with their analysis as identified by Benestad (2004). Benestad (2004) repeated their analysis using a different statistical model (linear and generalised multiple regression model) and the same data set. Benestad (2004) first reproduced the basic results of MM04 (i.e., established similar coefficients for the various factors used by MM04) using the full data set. This established an appropriate baseline for further tests of the robustness of their statistical model. As described below, their statistical model failed these tests, dramatically.
For one thing, the statistical significance they cited for their results was vastly overstated. One of the most basic assumptions in statistical modeling is that the data used as predictors in the model are Independent and Identically Distributed (‘IID’). It is well-known, however, that temperatures from neighboring stations are not independent. Due to the large-scale structure of surface temperature variations, nearby measurements partly describe the same phenomenon. Any statistical analysis using such temperature data must account for the fact that the actual degrees of freedom in the data is far lower than the nominal number of stations (see e.g. Wilks, 1995). McKitrick and Michaels, however, failed to account for this issue in estimating the statistical significance of their results. Had they accounted for this “spatial correlation”, as Benestad (2004) points out, they would have found their results to be statistically insignificant.
Benestad (2004) then tested the skill of the model through a ‘validation’ experiment. Such an experiment seeks to construct a statistical model using part of the dataset, and then independently test the model’s validity by seeing how well it predicts the rest of the data that weren’t used. Benestad (2004) thus divided the data into two independent batches. Temperature station data between 75.5S and 35.2N were used to calibrate the statistical model, while the remaining data (stations north of 35.2N representing less representing something under 25% of earth’s surface) were used for validation of the model. It is clear that the model was not able to reproduce the trends in the independent data (see Figure 1). The conclusion of McKitrick and Michaels that surface temperature measurements are significantly influenced by the non-climatic factors used in their statistical model, hence appears to be false.
In their reply to Benestad(2004), McKitrick and Michaels (2004b, or “MM04b”) argue that such validation experiments (i.e, splitting up the data to test the validity of statistical modelling) is not common in the refereed climatological literature. That argument is puzzling indeed, as such tests are standard in statistical modeling exercises, and have been used and documented in many peer-reviewed articles in the meteorological and climatalogical literature (see this list of publications by just one researcher alone or even the introductory textbook by Wilks, 1995).
MM04b also complain that in Benestad (2004), the statistical model was calibrated with the ‘worst’ data (and that ‘better’, data covering less than 25% of earth’s surface, should have been used instead). This too is puzzling, since any hypothesised deterioration of data quality should in principle, as we understand the very premise of their hypothesis, be taken into account in the statistical model through the use of factors such as literacy or GDP.
In their reply to Benestad(2004), McKitrick and Michaels (2004b) claim that I do not dispute their approach (i.e., multivariate regression using economic variables as potential predictors of surface temperature). That claim is both peculiar, and misses the point. A method is only valid when applied correctly. As described, above, MM04 failed egregiously in this regard. The purpose of my paper was simply to demonstrate that, whether or not one accepts the merits of their approach, a correct, and more careful, repetition of their analysis alone is sufficient to falsify their results and their conclusions.
The conclusions of McKitrick and Michaels (2004) thus clearly do not stand up to independent scrutiny. This alone does not mean that their analysis was not a potentially useful contribution to the field. A critical analysis of past work by other researchers can provide independent quality control on scientific undertakings, with the caveat that the analysis is performed properly. Unfortunately, in the case of the McKitrick and Michaels (2004) analysis, this does not appear to have been the case.
FIGURE 1. Results of regression analyses with different models using half the data for calibration and half for prediction. The blue dots represent the calibration interval and if this were a valid model, the red, green and black symbols (circles, crosses and triangles) would show the predicted values for the independent data using different model configurations (red corresponds to McKitrick and Michaels’ analysis). The grey dots are the actual trend data that the model tries to predict. The y-axis is deg C per decade. After Benestad (2004).
Benestad, R.E. (2004). Are temperature trends affected by economic activity? Comment on McKitrick & Michaels. Climate Research 27:171-173
McKitrick, R., and Michaels, P.J. (2004). A test of corrections for extraneous signals in gridded surface temperature data. Climate Research, 26: 159–173.
McKitrick, R., and Michaels, P.J. (2004b). Are temperature trends affected by economic activity? Reply to Benestad (2004)Climate Research, 27:175-176.
Wilks, D. S. (1995). Statistical Methods in the Atmospheric Sciences: An Introduction. Academic Press, New York, 467 pp.