# Dummies guide to the latest “Hockey Stick” controversy

by Gavin Schmidt and Caspar Amman

Due to popular demand, we have put together a ‘dummies guide’ which tries to describe what the actual issues are in the latest controversy, in language even our parents might understand. A pdf version is also available. More technical descriptions of the issues can be seen here and here.

This guide is in two parts, the first deals with the background to the technical issues raised by McIntyre and McKitrick (2005) (MM05), while the second part discusses the application of this to the original Mann, Bradley and Hughes (1998) (MBH98) reconstruction. The wider climate science context is discussed here, and the relationship to other recent reconstructions (the ‘Hockey Team’) can be seen here.

NB. All the data that were used in MBH98 are freely available for download at ftp://holocene.evsc.virginia.edu/pub/sdr/temp/nature/MANNETAL98/ (and also as supplementary data at *Nature*) along with a thorough description of the algorithm.

Part I: Technical issues:

1) What is principal component analysis (PCA)?

This is a mathematical technique that is used (among other things) to summarize the data found in a large number of noisy records so that the essential aspects can more easily seen. The most common patterns in the data are captured in a number of ‘principal components’ which describe some percentage of the variation in the original records. Usually only a limited number of components (‘PC’s) have any statistical significance, and these can be used instead of the larger data set to give basically the same description.

2) What do these individual components represent?

Often the first few components represent something recognisable and physical meaningful (at least in climate data applications). If a large part of the data set has a trend, than the mean trend may show up as one of the most important PCs. Similarly, if there is a seasonal cycle in the data, that will generally be represented by a PC. However, remember that PCs are just mathematical constructs. By themselves they say nothing about the physics of the situation. Thus, in many circumstances, physically meaningful timeseries are ‘distributed’ over a number of PCs, each of which individually does not appear to mean much. Different methodologies or conventions can make a big difference in which pattern comes up tops. If the aim of the PCA analysis is to determine the most important pattern, then it is important to know how robust that pattern is to the methodology. However, if the idea is to more simply summarize the larger data set, the individual ordering of the PCs is less important, and it is more crucial to make sure that as many significant PCs are included as possible.

3) How do you know whether a PC has significant information?

This determination is usually based on a ‘Monte Carlo’ simulation (so-called because of the random nature of the calculations). For instance, if you take 1000 sets of random data (that have the same statistical properties as the data set in question), and you perform the PCA analysis 1000 times, there will be 1000 examples of the first PC. Each of these will explain a different amount of the variation (or variance) in the original data. When ranked in order of explained variance, the tenth one down then defines the 99% confidence level: i.e. if your real PC explains more of the variance than 99% of the random PCs, then you can say that this is significant at the 99% level. This can be done for each PC in turn. (This technique was introduced by Preisendorfer et al. (1981), and is called the Preisendorfer N-rule).

Page 1 of 4 | Next page