Responses to McShane and Wyner

On that specific issue, presumably just an oversight, MW apparently used the “Start Year” column in the M08 spreadsheet instead of the “Start Year (for recon)” column. The difference between the two is related to the fact that many tree ring reconstructions only have a small number of trees in their earliest periods and that greatly inflates their uncertainty (and therefore reduces their utility). To reduce the impact of this problem, M08 only used tree ring records when they had at least 8 individual trees, which left 59 series in the 1000 AD frozen network. The fact that there were only 59 series in the AD 1000 network of M08 was stated clearly in the paper, and the criterion regarding the minimal number of trees (8) was described in the Supplementary Information. The difference in results between the correct M08 network and spurious 95 record network MW *actually* used is unfortunately quite significant. Using the correct data substantially reduces the estimates of peak medieval warmth shown by MW (as well as reducing the apparent spread among the reconstructions). This is even more true when the frequently challenged “Tiljander” series are removed, leaving a network of 55 series. In their rebuttal, MW claim that M08 quality control is simply an ‘ad hoc’ filtering and deny that they made a mistake at all. This is not really credible, and it would have done them much credit to simply accept this criticism.

With just this correction, applying MW’s *own procedures* yields strong conclusions regarding how anomalous recent warmth is the longer-term context. MW found recent warmth to be unusual in a long-term context: they estimated an 80% likelihood that the decade 1997-2006 was warmer than any other for at least the past 1000 years. Using the more appropriate 55-proxy dataset with the same estimation procedure (which involved retaining K=10 PCs of the proxy data), yields a higher probability of 84% that recent decadal warmth is unprecedented for the past millennium.

However K=10 principal components is almost certainly too large, and the resulting reconstruction likely suffers from statistical over-fitting. Objective selection criteria applied to the M08 AD 1000 proxy network as well as independent “pseudoproxy” analyses (discussed below) favor retaining only K=4 PCs. (Note that MW correctly point out that SMR made an error in calculating this, but correct application of the Wilks (2006) method fortunately does not change the result, 4 PCs should be retained in each case). Nonetheless, this choice yields a very close match with the relevant M08 reconstruction. It also yields considerably higher probabilities up to 99% that recent decadal warmth is unprecedented for at least the past millennium. These posterior probabilities imply substantially higher confidence than the “likely” assessment by M08 and IPCC (2007) (a 67% level of confidence). Indeed, a probability of 99% not only exceeds the IPCC “very likely” threshold (90%), but reaches the “virtually certain” (99%) threshold. In this sense, the MW analysis, using the proper proxy data and proper methodological choices, yields inferences regarding the unusual nature of recent warmth that are even *more confident* than expressed in past work.

An important real issue is whether proxy data provides more information than naive models (such as the mean of the calibrating data for instance) or outperform random noise of various types. This is something that has been addressed in many previous studies which have come to very different different conclusions than MW, and so the reasons why MW came to their conclusion is worth investigating. Two factors appear to be important – their use of the “Lasso” method exclusively to assess this, and the use of short holdout periods (30 years) for both extrapolated and interpolated validation periods.

So how do you assess how good a method is? This is addressed in almost half of the discussion papers – Tingley in particular gives strong evidence that Lasso is not in fact a very suitable method, and is outperformed by his Composite Regression method in test cases, Kaplan points out that using noise with significant long term trends will also perform well in interpolation. Both Smith and the paper by Craigmile and Rajaratnam also address this point.

