RealClimate logo

Antarctic warming is robust

Filed under: — gavin @ 4 February 2009

The difference between a single calculation and a solid paper in the technical literature is vast. A good paper examines a question from multiple angles and find ways to assess the robustness of its conclusions to all sorts of possible sources of error — in input data, in assumptions, and even occasionally in programming. If a conclusion is robust over as much of this as can be tested (and the good peer reviewers generally insist that this be shown), then the paper is likely to last the test of time. Although science proceeds by making use of the work that others have done before, it is not based on the assumption that everything that went before is correct. It is precisely because that there is always the possibility of errors that so much is based on ‘balance of evidence’ arguments’ that are mutually reinforcing.

So it is with the Steig et al paper published last week. Their conclusions that West Antarctica is warming quite strongly and that even Antarctica as a whole is warming since 1957 (the start of systematic measurements) were based on extending the long term manned weather station data (42 stations) using two different methodologies (RegEM and PCA) to interpolate to undersampled regions using correlations from two independent data sources (satellite AVHRR and the Automated Weather Stations (AWS) ), and validations based on subsets of the stations (15 vs 42 of them) etc. The answers in each of these cases are pretty much the same; thus the issues that undoubtedly exist (and that were raised in the paper) — with satellite data only being valid on clear days, with the spottiness of the AWS data, with the fundamental limits of the long term manned weather station data itself – aren’t that important to the basic conclusion.

One quick point about the reconstruction methodology. These methods are designed to fill in missing data points using as much information as possible concerning how the existing data at that point connects to the data that exists elsewhere. To give a simple example, if one station gave readings that were always the average of two other stations when it was working, then a good estimate of the value at that station when it wasn’t working, would simply be the average of the two other stations. Thus it is always the missing data points that are reconstructed; the process doesn’t affect the original input data.

This paper clearly increased the scrutiny of the various Antarctic data sources, and indeed the week, errors were found in the record from the AWS sites ‘Harry’ (West Antarctica) and ‘Racer Rock’ (Antarctic Peninsula) stored at the SCAR READER database. (There was a coincidental typo in the listing of Harry’s location in Table S2 in the supplemental information to the paper, but a trivial examination of the online resources — or the paper itself, in which Harry is shown in the correct location (Fig. S4b) — would have indicated that this was indeed only a typo). Those errors have now been fixed by the database managers at the British Antarctic Survey.

Naturally, people are interested on what affect these corrections will have on the analysis of the Steig et al paper. But before we get to that, we can think about some ‘Bayesian priors‘. Specifically, given that the results using the satellite data (the main reconstruction and source of the Nature cover image) were very similar to that using the AWS data, it is highly unlikely that a single station revision will have much of an effect on the conclusions (and clearly none at all on the main reconstruction which didn’t use AWS data). Additionally, the quality of the AWS data, particularly any trends, has been frequently questioned. The main issue is that since they are automatic and not manned, individual stations can be buried in snow, drift with the ice, fall over etc. and not be immediately fixed. Thus one of the tests Steig et al. did was a variation of the AWS reconstruction that detrended the AWS data before using them – any trend in the reconstruction would then come solely from the higher quality manned weather stations. The nature of the error in the Harry data record gave an erroneous positive trend, but this wouldn’t have affected the trend in the AWS-detrended based reconstruction.

Given all of the above, the Bayesian prior would therefore lean towards the expectation that the data corrections will not have much effect.

The trends in the AWS reconstruction in the paper are shown above. This is for the full period 1957-2006 and the dots are scaled a little smaller than they were in the paper for clarity. The biggest dot (on the Peninsula) represents about 0.5ºC/dec. The difference that you get if you use detrended data is shown next.

As we anticipated, the detrending the Harry data affects the reconstruction at Harry itself (the big blue dot in West Antarctica) reducing the trend there to about 0.2°C/dec, but there is no other significant effect (a couple of stations on the Antarctica Peninsula show small differences). (Note the scale change from the preceding figure — the blue dot represents a change of 0.2ºC/dec).

Now that we know that the trend (and much of the data) at Harry was in fact erroneous, it’s useful to see what happens when you don’t use Harry at all. The differences with the original results (at each of the other points) are almost undetectable. (Same scale as immediately above; if the scale in the first figure were used, you couldn’t see the dots at all!).

In summary, speculation that the erroneous trend at Harry was the basis of the Antarctic temperature trends reported by Steig et al. is completely specious, and could have been dismissed by even a cursory reading of the paper.

However, we are not yet done. There was erroneous input data used in the AWS reconstruction part of the study, and so it’s important to know what impact the corrections will have. Eric managed to do some of the preliminary tests on his way to the airport for his Antarctic sojourn and the trend results are as follows:

There is a big difference at Harry of course – a reduction of the trend by about half, and an increase of the trend at Racer Rock (the error there had given an erroneous cooling), but the other points are pretty much unaffected. The differences in the mean trends for Antarctica, or WAIS are very small (around 0.01ºC/decade), and the resulting new reconstruction is actually in slightly better agreement with the satellite-based reconstruction than before (which is pleasing of course).

Bayes wins again! Or should that be Laplace? ;)

Update (6/Feb/09):The corrected AWS-based reconstruction is now available. Note that the main satellite-based reconstruction is unaffected by any issues with the AWS stations since it did not use them.

375 Responses to “Antarctic warming is robust”

  1. 351
    pete best says:

    Re #350, regardless, its all carbon use and the military use a lot and most people do not have the luxury of bring in the Peace Corps and do it themselves travelling around the globe in planes and burning carbon to do things. I just begs the question, what is the solution?

    Do nothing – BAU scenario
    Do something but not enough – 2 ppmv down to 1 ppmv perhaps
    Do a lot but wish upon a star if you think life will continue as it is now – avert the climate disaster people seem happy to tell everyone about and avert population decimiation and mass unrest.

    But how do we avert it all? We can all shout about the science endlessly and get wound up by the deniers and skeptics but where is the solution, there is not one. Just another load of people arguing about nuclear, renewables etc whilst we continue the carbon burn and carry out our normal lives. Prosperity and progress will render this site useless.

  2. 352
    Mark says:

    “where is the solution, there is not one.”

    Yes there is.

    Don’t burn fossil fuels.

    There’s your solution.

    At the moment we have too much infrastructure that relies on fossil fuels. So change the infrastructure (in exactly the same way as you build more roads to new cities, build new houses for new families to live in, or build new powerstations to produce more power and all the infrastructure building that comes from entropy being an unstoppable feature of reality).

    At the moment we demand too much energy to easily change power sources. So use less. Scotland uses 20% of the average US household. Sweden uses something less than 1/3. Neither are in temperate areas that don’t rely a lot on heating or lighting to become comfortable in its luxuries. Neither are second world or third world countries with a lower standard of living. Neither are living in caves, eating tree roots and berries and whatever dead animal they find.

    There may not be ones you WANT to solve the problem with, but then again if my arm is gangrenous, I STILL don’t want it sawn off. If I could keep the arm and lose the gangrene, I would much prefer that option.

    But you can blame your parents, your grandparents and back 5-8 generations for the mess they made so that they could have a good life and hang the mess they caused. If it hadn’t been for their profligacy CO2 levels would not be so high and the infrastructure would not be so dependent on fossil fuels.

    And if YOU don’t take the hard steps, your children will have to work even harder to undo the mess YOU left behind and his ancestors back 6-9 generations.

  3. 353
    PaulM says:

    One of the flaws of this paper is the reliance on a small number of principal components. I’ve just been looking at Steig et al again: they say “peninsula warming averages 0.11 +- 0.04 C per decade”. That’s 0.55 degrees over the last 50 years. But if you look at the BAS site they say the peninsula has warmed by 2.8 C over the last 50 years, which is roughly what you can see by looking at the station data. So if Steig et al’s “reconstruction” is correct, the station data overestimate warming by about a factor of 5! One of their conclusions ought to be that the peninsula is warming much less than previously thought. But curiously this is not the conclusion they reached or the story reported in the media.
    So the “reconstruction” is clearly wrong. Where has the peninsula warming gone? Answer: it has been smeared and diluted over the continent by the inappropriate use of a small number of PCs. All of this ought to have been obvious to Eric Steig and his co-authors and the Nature referees and editors.
    [For those familiar with Fourier analysis: It is like taking a spiky function and trying to represent it with just 3 Fourier modes. The spike gets lower and broader]

    [Response: Surprisingly enough, this was realised by the authors and they discussed it: “A disadvantage of excluding higher-order terms (k . 3) is that this fails to fully capture the variance in the Antarctic Peninsula region. We accept this trade-off because the Peninsula is already the best-observed region of the Antarctic.” – gavin]

  4. 354
    Ray Ladbury says:

    Pete, I share your frustration. Indeed, I think the frustration you feel is one of the reasons why denialists have such a hard time accepting the science. It looks as if the only solution would be to cut back to zero fossil fuel use immediately. However, this isn’t an option in a world of 6.5 billion people dependent on fossil fuels for everything from the food they eat to the clothes they wear. We cannot conserve ourselves out of this mess.

    At the same time, conservation is critical, since it is something we can do now that buys us time for developing more effective solutions, mitigations and models.

    FWIW, I am fully cognizant of how privileged I have been to have such experiences, and I hope I have used them to advantage–for instance indicating the shortsightedness of Tom’s trolling. To commit to a fight, it’s sometimes an advantage to have experienced some of what you are fight for.

  5. 355
    SecularAnimist says:

    Ray Ladbury wrote: “My goal is to preserve civilization and make it sustainable, not to go back to a hunter-gatherer existence, or even a medieval feudalism.”

    Medieval feudalism refers to a political system, not a level of technological development. It is certainly conceivable to imagine a high-tech version of medieval feudalism. I believe such a scenario has been imagined in various science fiction novels. Frank Herbert’s Dune would be an example. Some, including myself, see parallels between the modern corporation and the medieval feudal state.

    The fundamental basis of any civilization is agriculture. My view is that the essential ingredient of a technologically advanced civilization is the ability to generate, distribute and use electricity. Fortunately there are plenty of ways to produce abundant electricity without carbon emissions, principally with solar, wind and hydro energy.

  6. 356
    pete best says:

    Re #352, Lets forget the blame game shall we. Lets just accept the othodox position on the matter and devise the solution(s) but as it stands I doubt its going to happen as yet. Yes some definite large scale projects on Wind, and CSP etc but its not strategic enough and the post kyoto treaty coming up to start by 2012 is it needs to be major step forward, 80% cuts within all round by some time in the future (2050 I beieve) or Hansen’s moritorium on coal by 2030 or the introduction of CCS to keep coal or postpone the moritorium until it is ready. And so it goes on and on. Talk talk talk so far, some action but cuts small and often gobbled up in China and India etc.

    I reckon its peak fossil fuels before we go on a eco war.

  7. 357
    Jim Eager says:

    “Where has the peninsula warming gone?”

    Hmmmm, heat of fusion comes to mind.

  8. 358
    Mark says:


    Most of the world’s energy is gobbled up by the USA.

    A 30% reduction (doesn’t even take them down to UK levels of use) and frees up around 15% of the energy demand.

    What are the chinese going to do with all that extra energy? Run a Stargate?

    Don’t let someone else doing bad stop you from doing good. Or you’re worse than they are.

  9. 359
    pete best says:

    #358, I doubt it works like that and since when is the USA going to consume less energy, they are just going to try getting it from a different source at best or lobby for the CCS revolution that will come too little too late.

    President Obama stated that our way of life is non negotiable again even though he wants an alleged energy revolution. So what gives apart from the rhetoric of his speeches I guess.

  10. 360

    Mark writes:

    Most of the world’s energy is gobbled up by the USA.

    This isn’t even close to being true. The USA’s share of world energy use is about 20%.

  11. 361
    Sekerob says:

    BPL, just let him and few more. Not worth your time.

    China few months ago was reported to just having passed the USA on CO2 contribution, now each sitting on about 25%. How efficient the path from Fossil Fuel to CO2 is, I don’t know, but my feeling is that the USA might be a bit better, so 20-25% range on global energy consumption I’d guesstimate. The French just smiling ear to ear and just signed to build 4 new NP in Italy.

    Now question for the American brethren. Why are you running your office airco’s at 20C and lower, when 22-24C is much better, and healthier? Just imagine if the USA would just switch those energy gobblers 1 or 2 C up.

    Dead back on mean, NSIDC daily charts are back up, currently showing about 1.2 million km square less summer sea ice extent than same time last year… global cooling we wish.

  12. 362
  13. 363
    Hank Roberts says:

    On topic:

    Interactive map:

    Click on Antarctica.
    See this text pop up, in Google Earth:
    Western Antarctica

    This area will be simply unrecognisable. Instead of the vast ice sheets, it will be densely inhabited with high-rise cities.

    Get directions: To here – From here

  14. 364
    ApolytonGP says:

    353, Gavin reply:

    But Gavin, I don’t think you are addressing his point that the low number of PCs is inapporopriate given the spikiness of the function. And that it transfers peninsula warming into the interior. Now, that guy may be wrong…but you are not addressing the issue in debate. Merely making a textualist comment about the paper.

    [Response: If people want to make specific points, they should make them. I’ll often try to intuit what someone is trying to get to, but I’m not psychic. In this case, I didn’t find people expressing shock that they have worked out the number of eigenmodes used in an analysis, when it was clearly stated in the paper, particularly interesting. With respect to your point, the number of modes is always a balance between including more to include smaller scale features, and not including modes that are not climatically relevant or that might be contaminated by artifacts. There are a number of ad hoc rules to determine this – ‘Rule N’ for instance, but I don’t know exactly what was used here. It doesn’t appear to make much difference, but it’s a fair point to explore. – gavin]

  15. 365
    Aylamp says:

    363 Hank Roberts

    From New “Scientist”.

    “According to models, we could cook the planet by 4 °C by 2100.”

    According to who – Kate Moss?

  16. 366
    Philippe Chantreau says:

    OK BPL but the US represents what fraction of the world’s population? So the per capita energy consumption is quite high, perhas not as high as Canada but still much higher than most, including China. Of course China is striving to change that, scary prospect.

  17. 367
    Hank Roberts says:

    Aylamp inquires about the New Scientist article:

    >> According to who — Kate Moss?

    For the statement Aylamp questions,
    New Scientist cites their source.

    Fifth paragraph.

  18. 368
    Hank Roberts says:

    Some sources on per capita energy consumption:

    The page I linked recently above may help:

    As of 2005, some numbers here:
    (cited to: )

  19. 369
    Ryan O says:

    Gavin says: “If people want to make specific points, they should make them.”
    With all due respect, he did make a specific point: The method used by the authors smeared the peninsula warming over the interior. Or, to put it slightly differently, the low number of PCs used was insufficient to properly capture the geographical distribution of the temperature trends.
    Stating that the authors “realized that the disadvantage of not including higher order terms” led to an inaccurate depiction of peninsula warming in no way addresses the statement that the failure to include higher-order terms also had the effect of transferring peninsula warming to the interior. The followup that “there are ad hoc rules” to determine this is similarly irrelevant. It doesn’t matter what the ad hoc rules are . . . if the application of one or more of those rules resulted in an inaccurate geographic distribution of temperature trends, then the rule was either wrong, inappropriately used, or both.
    And if, as you imply, the higher-order terms are contaminated by artifacts (and thus cannot be used) – while simultaneously the lower-order terms are shown to be insufficient to accurately depict the geographical distribution of trends – then the obvious conclusion is that the available information is insufficient to support the main conclusion of the paper: heretofore unreported significant warming in West Antarctica.

    [Response: We have a situation where we don’t have complete information going back in time. The information we do have has issues (data gaps, sampling inhomogeneities, possible un-climatic trends). The goal is to extract enough information from the periods when there is more information about the spatial structure of temperature covariance to make an estimate of the spatial structure of changes in the past. Since we are interested in the robust features of the spatial correlation, you don’t want to include too many PCs or eigenmodes (each with ever more localised structures) since you will be including features that are very dependent on individual (and possibly suspect) records. Schneider et al (2004) looked much more closely at how many eigenmodes can be usefully extracted from the data and how much of the variance they explain. Their answer was 3 or possibly 4. That’s just how it works out. The fact is that multiple methods (as shown in the Steig et al paper) show that the West Antarctic long term warming is robust and I have seen no analysis that puts that into question. You could clearly add in enough modes to better resolve the peninsular trends, but at the cost of adding spurious noise elsewhere. The aim is to see what can be safely deduced with the data that exists. – gavin]

  20. 370
    Ryan O says:

    Gavin, thanks for the reply. I do appreciate it. However, I feel that it misses the point. For the sake of this discussion, I will accept the offering that using more than 3 or 4 PCs results in incorporating artifacts into the reconstruction. In that case, the authors are perfectly justified in truncating the rank at 3.
    What the reply doesn’t address, however, is that by using only the first 3 PCs, the geographical distribution of temperature trends is not accurately captured. The Big Deal associated with the paper – and it definitely generated a lot of excitement – was that the warming was not merely restricted to the peninsula. But a 3-PC analysis appears to inaccurately transfer peninsula warming to all of West Antarctica.
    So I am in agreement that the aim is to see what can be safely deduced with the data that exists. Unfortunately, if the higher order modes cannot be used due to contamination or artifacting and the lower order modes do not provide enough geographical resolution, then the data does not support the conclusion.
    By the way, the AWS recon shows no statistically significant warming in West Antarctica post-1969 and most of the stations show a negative trend post-1979 – a very different picture of West Antarctica.

    [Response: You appear to under some mis-apprehension here. Neither the PCA nor the RegEm methodologies know anything about physical location. Thus any correlation between stations, or similar weightings for a particular mode occur because there are real correlations in time. There can be no aphysical ‘smearing’ simply because of physical closeness in the absence of an actual correlation. Higher order modes are not distinguishable from noise and so shouldn’t be used (whatever the cut-off point), but the remaining modes define the spatial scales at which the reconstruction is useful. And that is roughly at the semi-continental scale in this case. Making strong statements about smaller regions would not be sensible, though the pointwise validation scores in held back data given in the supp. mat. indicate that there there might not be much of a problem. – gavin]

  21. 371
    ApolytonGP says:

    I don’t quite get the argument for higher PC cutting off. Isn’t it true that infinite PCs gives back the original data? And aren’t lots of stats analyses done on data themselves? You seem to be a bit too strong in thinking that PCA will cut bad stuff and keep good. IT has a real danger of doing the reverse. It’s just a math transform.

    [Response: You don’t need an infinite number of PCs. There are only as many PCs as there are initial timeseries (assuming they are not degenerate), and using all of them gives you the original data back – at which point there was no point in doing PCA at all. In these kinds of applications, the PCs are used as filters, with higher modes explaining less and less of the joint variance. If you are interested in what the ensemble of the data gives that doesn’t depend on one or two elements, it makes sense to focus on the first few modes. Higher modes are much more sensitive to small random variations in ways that can be checked using Monte Carlo simulations and which form the basis of some of the criteria for how many modes you keep. – gavin]

  22. 372
    Ryan O says:

    Gavin says: “You appear to under some mis-apprehension here. Neither the PCA nor the RegEm methodologies know anything about physical location. Thus any correlation between stations, or similar weightings for a particular mode occur because there are real correlations in time.”
    No, that doesn’t have anything to do with it. It doesn’t matter that PCA and RegEM don’t know or care about physical distance. What matters is whether the amount of variation captured by the PC changes with location.
    Let’s say that the first PC captures 50% of the variation in the data set as a whole. That does not allow you to say that it captures 50% of the variation in the peninsula, 50% of the variation along the Adelaide coast, 50% of the variation near the Ross ice shelf, and so forth. It may capture 80% of the variation in the plateau but only 15% of the variation at the Ross ice shelf. This can result in major over/under estimation of the signal depending on location. The “smearing” does not depend on RegEM or PCA “knowing” the station locations; it’s entirely dependent on the geographic distribution of the variation in the real data. Using fewer PCs results in less noise, certainly, but also results in an inability to fully capture variation that changes with physical location. PCs aren’t magic. Every one of them contains some information and some noise. Not using higher-order PCs necessarily results in removing some information. You are exactly right in that it is a trade-off. The question here is whether the trade-off chosen by the authors results in a conclusion that can be supported by the data.
    The way to test this is to compare the results from the PCA to independent records by location. I have done this, and done it in detail. I have compared the sat recons to the AWS recon. I have compared all the recons to the ground data. And I can tell you that for the peninsula, the grid points corresponding to ground stations in the main/PCA recons spend more time outside the 95% confidence intervals for the means (doesn’t matter whether you use 24, 36, 48, 60, 72, 84, 96…192 month means) than within. Many West Antarctica stations spend 30% or more of their time outside the 95% confidence intervals, and most are outside the lower 95% interval in the 1970-1980 timeframe and outside the upper 95% interval in the 2000-2006 timeframe (paired Wilcoxon test). Not only that, but the shape of the difference in means curves have geographical significance – meaning the choice of 3PCs insufficiently captures the evolution of temperature with location.
    The fact that RegEM or PCA don’t care about physical distance is not relevant.

    [Response: But what’s your null hypothesis here? That temperatures at a location must be gaussian? That they have no auto-correlation? Why? There is nothing magic about 3 PCs – they were chosen based on the prior analysis of Schneider et al. Nothing much is going to change with 4 or 5 – and how much variance to they explain in any case? – gavin]

  23. 373
    Ryan O says:

    First, I know you’re in the difficult position of answering questions about a paper you did not write – so I hope I’m not coming off as too argumentative. :) I’m not sure you understood what I meant, which is most likely lack of clarity on my part.
    The test I describe is comparative between ground record and recon or recon A and recon B. The null hypothesis is nothing more complex than there is no difference in the sample means. There is no underlying model. I make no assumptions about the distribution of temperatures or degree of spatial autocorrelation. I simply compare the means corresponding to the same physical location over various time periods in order to determine how the means evolve temporally and whether differences in the evolution are statistically significant.
    All of the provided recons, when compared to each other gridpoint to gridpoint (or, for the AWS and ground station comparison, station location to the corresponding gridpoint in the satellite recon), show differences in the means that are statistically unlikely if they were due to chance alone. The shape of the curve of differences is easily associated with geographical location. In other words, the geographic distribution of temperature change differs between the two satellite recons (minor), either satellite recon and the AWS recon (major), and either satellite recon and the manned station records (major).
    As a finer point, one could potentially assume that the satellite recons are correct and it is the station records that are incorrect. However, this requires that the ground station records within specific geographic regions all evolve incorrectly in the same manner. This is entirely implausible.
    The plots of the differences in means are not random. They are so not random that I was able to group the manned stations with 10+ years of data and all of the AWS stations from the AWS recon (a total of 79 stations) simultaneously into 6 geographically distinct regions by curve shape alone with only 1 incorrectly placed station (I had placed D-47 in an adjacent region). The differences are not due to ground instrumentation error. They are solely a function of the lack of geographical resolution in the satellite recons.
    The comment about whether 3 or 4 PCs would have changed anything is irrelevant. The only relevant question to ask is “How many PCs would be required to properly capture the geographic distribution of temperature trends?” with the followup question of, “Does this number of PCs result in inclusion of an undesirable magnitude of mathematical artifacts and noise?”
    If the answer to the followup is “yes”, then the conclusion is that the analysis is not powerful enough to geographically discriminate temperature trends on the level claimed in the paper. It doesn’t mean the authors did anything wrong or inappropriate in the analysis (I am NOT saying or implying anything of the sort). It simply means that the conclusion of heretofore unnoticed West Antarctic warming overreaches the analysis.
    In answer to your question about how much variation they need to explain, that is entirely dependent on the level of detail they want to find. If they wish to have enough detail to properly discriminate between West Antarctic warming and peninsula warming, then they must use an appropriate number of PCs. If they wish to simply confirm that, on average, the continent appears to be warming without being able to discriminate between which parts are warming, then the number of PCs required is correspondingly less.

    [Response: Sorry, but any statement that contains a 95% confidence interval implies an underlying null hypothesis and a distribution. You don’t get a pass on that. – gavin]

  24. 374
    Ryan O says:

    Gavin, the null hypothesis is that there is no difference in the means. The Wilcoxon test does not require any assumptions on the distribution of differences. Had I used a t-test, you’d have a point; but the Wilcoxon test does not have that restriction.

    [Response: Fair enough, but that just gives a finding that there is “a significant difference in the means” over some time period. Your interpretation assumes that there is some expected distribution of what this should be in any particular reconstruction and that must depend on the temporal auto-correlation of the time-series among other things. Wilcoxon also does not tell you how big the deviations are. The fact is that the number of retained PCs above two doesn’t impact the long term trends (for instance, for the AWS reconstruction the trends in the mean are -0.08, 0.15, 0.14, 0.16, 0.13 deg C/dec for k=1,2,3,4,5). – gavin]

  25. 375
    Ryan O says:

    “Fair enough, but that just gives a finding that there is “a significant difference in the means” over some time period.”
    “Your interpretation assumes that there is some expected distribution of what this should be in any particular reconstruction and that must depend on the temporal auto-correlation of the time-series among other things.”
    Not quite. The results of the test itself can only tell me whether I can accept or reject the null hypotheses (no difference in means) at the chosen confidence level (in this case, 95%) for a particular set of paired data. The test itself gives me no additional information. At this point, the only thing I know is that they are different. By the way, the Wilcoxon test most certainly can be used to give you estimates of the difference in sample means and can be used to yield exact confidence intervals and p-values.
    For the next part, that I assume an expected distribution, this is true only in the sense that the null hypothesis is that there is no statistically significant difference in the means. I do not need to make any additional assumptions.
    At this point, you are entirely correct that we have exhausted what the Wilcoxon test can tell us. It cannot tell us that there are geographical differences and it cannot tell us which data set is “right”.
    You can, however, run an additional test. In this case, the null hypothesis I chose is that there is no correlation between curve shape and geographical location. I make no prior assumptions about the shape of the curve. I then attempt to group the curves based on shape. Were the curves entirely arbitrary, this would be a fruitless task, but they are not. The groupings are easy to see. Following the grouping, I plot the locations of the groups and find that they are strongly correlated geographically.
    If you wish to challenge the rigor of this, your concerns would be legitimate based only on the information I’ve provided so far. I have not yet challenged my grouping statistically (though I will). As a warm-and-fuzzy test, however, I would have no problem providing all the curves to you and letting you try to group them yourself. I am confident that you will arrive at a very similar grouping.
    “The fact is that the number of retained PCs above two doesn’t impact the long term trends (for instance, for the AWS reconstruction the trends in the mean are -0.08, 0.15, 0.14, 0.16, 0.13 deg C/dec for k=1,2,3,4,5).”
    This makes no statement on where those trends occur – which is how this latest discussion started. The aggregate measurement is similar. The geographic distribution of those trends may not be. I haven’t run it myself, but I’d be willing to bet that the geographic distribution changes significantly as higher-order PCs are included.