Are Temperature Trends affected by Economic Activity?

For one thing, the statistical significance they cited for their results was vastly overstated. One of the most basic assumptions in statistical modeling is that the data used as predictors in the model are Independent and Identically Distributed (‘IID’). It is well-known, however, that temperatures from neighboring stations are not independent. Due to the large-scale structure of surface temperature variations, nearby measurements partly describe the same phenomenon. Any statistical analysis using such temperature data must account for the fact that the actual degrees of freedom in the data is far lower than the nominal number of stations (see e.g. Wilks, 1995). McKitrick and Michaels, however, failed to account for this issue in estimating the statistical significance of their results. Had they accounted for this “spatial correlation”, as Benestad (2004) points out, they would have found their results to be statistically insignificant.

Benestad (2004) then tested the skill of the model through a ‘validation’ experiment. Such an experiment seeks to construct a statistical model using part of the dataset, and then independently test the model’s validity by seeing how well it predicts the rest of the data that weren’t used. Benestad (2004) thus divided the data into two independent batches. Temperature station data between 75.5S and 35.2N were used to calibrate the statistical model, while the remaining data (stations north of 35.2N representing less representing something under 25% of earth’s surface) were used for validation of the model. It is clear that the model was not able to reproduce the trends in the independent data (see Figure 1). The conclusion of McKitrick and Michaels that surface temperature measurements are significantly influenced by the non-climatic factors used in their statistical model, hence appears to be false.

In their reply to Benestad(2004), McKitrick and Michaels (2004b, or “MM04b”) argue that such validation experiments (i.e, splitting up the data to test the validity of statistical modelling) is not common in the refereed climatological literature. That argument is puzzling indeed, as such tests are standard in statistical modeling exercises, and have been used and documented in many peer-reviewed articles in the meteorological and climatalogical literature (see this list of publications by just one researcher alone or even the introductory textbook by Wilks, 1995).

MM04b also complain that in Benestad (2004), the statistical model was calibrated with the ‘worst’ data (and that ‘better’, data covering less than 25% of earth’s surface, should have been used instead). This too is puzzling, since any hypothesised deterioration of data quality should in principle, as we understand the very premise of their hypothesis, be taken into account in the statistical model through the use of factors such as literacy or GDP.

In their reply to Benestad(2004), McKitrick and Michaels (2004b) claim that I do not dispute their approach (i.e., multivariate regression using economic variables as potential predictors of surface temperature). That claim is both peculiar, and misses the point. A method is only valid when applied correctly. As described, above, MM04 failed egregiously in this regard. The purpose of my paper was simply to demonstrate that, whether or not one accepts the merits of their approach, a correct, and more careful, repetition of their analysis alone is sufficient to falsify their results and their conclusions.

The conclusions of McKitrick and Michaels (2004) thus clearly do not stand up to independent scrutiny. This alone does not mean that their analysis was not a potentially useful contribution to the field. A critical analysis of past work by other researchers can provide independent quality control on scientific undertakings, with the caveat that the analysis is performed properly. Unfortunately, in the case of the McKitrick and Michaels (2004) analysis, this does not appear to have been the case.

FIGURE 1. Results of regression analyses with different models using half the data for calibration and half for prediction. The blue dots represent the calibration interval and if this were a valid model, the red, green and black symbols (circles, crosses and triangles) would show the predicted values for the independent data using different model configurations (red corresponds to McKitrick and Michaels’ analysis). The grey dots are the actual trend data that the model tries to predict. The y-axis is deg C per decade. After Benestad (2004).


Page 2 of 3 | Previous page | Next page