Dummies guide to the latest “Hockey Stick” controversy

The figure to the right gives two examples of this. Here each PC is plotted against the amount of fractional variance it explains. The blue line is the result from the random data, while the blue dots are the PC results for the real data. It is clear that at least the first two are significantly separated from the random noise line. In the other case, there are 5 (maybe 6) red crosses that appear to be distinguishable from the red line random noise. Note also that the first (‘most important’) PC does not always explain the same amount of the original data.

4) What do different conventions for PC analysis represent?

Some different conventions exist regarding how the original data should be normalized. For instance, the data can be normalized to have an average of zero over the whole record, or over a selected sub-interval. The variance of the data is associated with departures from the whatever mean was selected. So the pattern of data that shows the biggest departure from the mean will dominate the calculated PCs. If there is an a priori reason to be interested in departures from a particular mean, then this is a way to make sure that those patterns move up in the PC ordering. Changing conventions means that the explained variance of each PC can be different, the ordering can be different, and the number of significant PCs can be different.

5) How can you tell whether you have included enough PCs?

This is rather easy to tell. If your answer depends on the number of PCs included, then you haven’t included enough. Put another way, if the answer you get is the same as if you had used all the data without doing any PC analysis at all, then you are probably ok. However, the reason why the PC summaries are used in the first place in paleo-reconstructions is that using the full proxy set often runs into the danger of ‘overfitting’ during the calibration period (the time period when the proxy data are trained to match the instrumental record). This can lead to a decrease in predictive skill outside of that window, which is the actual target of the reconstruction. So in summary, PC selection is a trade off: on one hand, the goal is to capture as much variability of the data as represented by the different PCs as possible (particularly if the explained variance is small), while on the other hand, you don’t want to include PCs that are not really contributing any more significant information.

Part II: Application to the MBH98 ‘Hockey Stick’

1) Where is PCA used in the MBH methodology?

When incorporating many tree ring networks into the multi-proxy framework, it is easier to use a few leading PCs rather than 70 or so individual tree ring chronologies from a particular region. The trees are often very closely located and so it makes sense to summarize the general information they all contain in relation to the large-scale patterns of variability. The relevant signal for the climate reconstruction is the signal that the trees have in common, not each individual series. In MBH98, the North American tree ring series were treated like this. There are a number of other places in the overall methodology where some form of PCA was used, but they are not relevant to this particular controversy.

2) What is the point of contention in MM05?

MM05 contend that the particular PC convention used in MBH98 in dealing with the N. American tree rings selects for the ‘hockey stick’ shape and that the final reconstruction result is simply an artifact of this convention.

3) What convention was used in MBH98?

MBH98 were particularly interested in whether the tree ring data showed significant differences from the 20th century calibration period, and therefore normalized the data so that the mean over this period was zero. As discussed above, this will emphasize records that have the biggest differences from that period (either positive of negative). Since the underlying data have a ‘hockey stick’-like shape, it is therefore not surprising that the most important PC found using this convention resembles the ‘hockey stick’. There are actual two significant PCs found using this convention, and both were incorporated into the full reconstruction.

PC1 vs PC4

4) Does using a different convention change the answer?

Page 2 of 4 | Previous page | Next page