RealClimate

Comments

RSS feed for comments on this post.

  1. (something called for by )

    ??

    Otherwise quite clear.

    [Response: oops. Fixed-thanks! -mike]

    Comment by David B. Benson — 14 Dec 2010 @ 12:39 AM

  2. I do appreciate the back and forth, but when MW gets the initial paper and the rejoinder, the appearances of the last word favors them. I’m glad you have some thoughts on it, and I hope this will be further addressed here and in the lit proper.

    Comment by Pinko Punko — 14 Dec 2010 @ 1:05 AM

  3. On code that didn’t work, isn’t MW talking about the RegEM code in MATLAB? Out of curiosity, I was able to run the MATLAB code in Octave, but I had to fiddle around with the folders to make it right. It looks like the R code only takes in output from the EM stuff.

    btw I liked the symphony metaphor from Tingley :)

    Comment by apeescape — 14 Dec 2010 @ 1:35 AM

  4. Gavin,

    can I suggest a forum where you limit the commenters to the authors who submitted and one designated “second” for each team. It’s worth a shot, maybe some fruitful dialog would ensue. There are plenty of other blogs for the peanut gallery to engage each other.

    Say hi, when you hit SF for AGU.

    Comment by steven mosher — 14 Dec 2010 @ 1:39 AM

  5. Well, the last statement really does cover it. I first heard this phrase from Hank Shugart (who was a global change scientist long before any such thing really existed): ‘The problem with studying the Earth is that n=1 and df=0.’ So stats won’t help us solve this problem of 20th century warmth in the context of the last few centuries. But fortunately we have 100+ years of radiative physics to help point the way…

    Comment by Andy — 14 Dec 2010 @ 2:46 AM

  6. Interesting post. I can’t claim to be a whiz at statistics but I remember telling some skeptics on another forum, Accuweather / climate change I believe, that the major point and problem with this paper were that the results still showed a ‘hockey stick’ indicating current warming was pretty anomalous and that the authors were not climatologists, nor did they seem to consult any to discuss why certain methods were used over the ones they decided to use. That criticism seems to be borne out here.

    Comment by Lazarus — 14 Dec 2010 @ 4:01 AM

  7. The difference in results between the correct M08 network and spurious 95 record network MW actually used is unfortunately quite significant – significantly reducing the spread of reconstructions and reducing their estimates of peak medieval warmth.

    I stumbled on this sentence. It seemed to say MW’s use of data gave a lower estimate of peak medieval warmth. Anyway, it’s clear from the context (and your paper) that it’s the other way around.

    [Response: Thanks--rephrased this for clarity. - mike]

    Comment by CM — 14 Dec 2010 @ 4:21 AM

  8. I know of papers with peer review, that have a large number of unjustified, unsupportable and irrelevant statements.

    [Response: My experience is that they go down tremendously as a function of the quality of the reviews. - gavin]

    Comment by Ibrahim — 14 Dec 2010 @ 4:58 AM

  9. Nice. A really good glimpse of how science actually gets done. Of course it well be ignored by the denialati.

    Comment by Ray Ladbury — 14 Dec 2010 @ 5:15 AM

  10. “The problem of anthropogenic climate change cannot be settled by a purely statistical argument. … Rather, the issue involves the combination of statistical analyses and, rather than versus, climate science.”
    I’m happy to see this point made (again), one obvious to scientists who use statistics as a tool to discover cause and effect relationships and build their theories (models) of bio-physical reality. That is why the denialist “argument” that climate change is “natural” is anti-science, since it is never supported by a credible theory (or any predictive theory, for that matter) that explains how natural causes lead to the observed evidence.

    Comment by Hugh Laue — 14 Dec 2010 @ 5:30 AM

  11. I welcome the fact that all the contributions include all their data and code, and the clear call, in the accompanying editorial, for this to be a requirement.

    Comment by Nick Barnes — 14 Dec 2010 @ 5:38 AM

  12. The long version of M&W’s rejoinder doesn’t appear to be up on the AOAS site yet, but McShane has posted a copy here.

    Comment by MartinM — 14 Dec 2010 @ 6:03 AM

  13. Hugh Laue #10

    Oh, denialists do come up with predictive theories now and then. They just don’t match the reality they tried to predict. Maybe that’s why the avoid doing it very often…

    Comment by Alexandre #10 — 14 Dec 2010 @ 6:25 AM

  14. Criteria -> criterium

    [Response: I bet you're a bike racer...jim]

    Comment by Neven — 14 Dec 2010 @ 9:56 AM

  15. You see this type of work by economists all the time. I’ve come to call it the one-size-fits-all approach.

    Why did they chose that particular method? In the prior version of the paper, the motivation was along the lines of “there’s no best way, so we’re justified in using the lasso”.

    In my experience, their choice of method was dictated by one key constraint: They has to pick a method that didn’t require any actual detailed understanding of the subject matter. Hence the lasso. No need to know the relationships among the data series, perform any preliminary data reduction, selected or reject items, and so on.

    Just toss them all in and let the machine sort it out for you.

    And those one-size-fits-all approaches typically yields statistically inefficient estimators. It’s no different from leaving sets of nearly multicolinear regressors in a regression. You’ll get an answer, probably even a mostly reasonable set of predicted values, but you won’t have minimized the confidence intervals around your predicted values.

    Normally, that’s not a huge deal. In normal statistical analysis (even in the social sciences), you’re trying to reject a null hypothesis. As long as your estimator is unbiased, if you’re dumb enough to inflate your confidence intervals, that’s your tough luck. So there’s a nice alignment of incentives — sloppiness costs you.

    Here, this is normal science stood on its head. They succeed by failing to reject the null hypothesis (that current temperatures are no different from the historical reconstruction.) The large size of the confidence interval is the point of the research. The incentives here are perverse — the sloppier the analysis, the better.

    Upshot: In this topsy-turvy analysis, where they are seeking to fail to reject the null, they want the least efficient estimator they can get away with. That gives them the highest chance of failing to reject.

    So here, if they used anything other than the most efficient estimator available, that’s just bad science. Basically, if-they-screw-it-up, they get the result they are after.

    So, IMHO, what they proved not that Mann et. al are wrong, but that a statistically inefficient estimator … is inefficient.

    Comment by Christopher Hogan — 14 Dec 2010 @ 11:39 AM

  16. A brief grammar note: “as is declaring that one of us to be a mere blogger” should either lose the “that” or else change “to be” to “is.”

    I’d also (in a way) strongly second Steve Mosher’s preference that we respect Gavin and Michael’s plea “to open the discussion here to all of the authors to give their thoughts…” Only the authors should post comments (at least initially, and for some time after that), and the peanut gallery should watch.

    I’m not suggesting at all that the moderators should enforce this. It should be a self-imposed ban. Listen and learn for a while (although I think brief, meaningful inquiries should be entertained, as long as they don’t involve long winded dissertations or combative replies by the person posing the question). Keep it short and polite, and only to address one’s confusion or misunderstanding.

    Comment by Bob (Sphaerica) — 14 Dec 2010 @ 12:23 PM

  17. At least criterium is single, which makes it less wrong than criteria.

    Now, can we all just agree on criterion?

    Comment by CTG — 14 Dec 2010 @ 1:08 PM

  18. Thanks once again to RealClimate for digesting a long and complicated set of papers and comments into something that is clear and comprehensible to an interested and scientifically literate layman.
    I remember the issue of deciding how many principle components to use in my thesis work involving automated land use classification based on remote sensing, and it is clear that using too many PCs will begin to NOT explain the variability, while using the RIGHT number gets us the best answer. So our understanding of different climate factors gets better with time, now all we need to do is ACT on our collective knowledge. Why do we act as if our descendants do not matter?

    Comment by Scientific American — 14 Dec 2010 @ 1:09 PM

  19. DC has an interesting comment:

    http://deepclimate.org/2010/12/10/open-thread-7/#comment-6909

    Comment by J Bowers — 14 Dec 2010 @ 2:32 PM

  20. Re comment 13: The singular of criteria is criterion. Sorry, I am a classics educated physicist, which makes me a nit-picking nuisance – but I am a very good editor! I see criteria used instead of criterion so often that I had to say something….

    Comment by David Beach — 14 Dec 2010 @ 2:36 PM

  21. I’m curious to know which flaws of MW2010, if any , were identified by *multiple* discussion pieces in the AOAS set? I would assume these to be the most egregious.

    Comment by Steven Sullivan — 14 Dec 2010 @ 2:50 PM

  22. Ah yes, indeed, criterion. In Dutch we say ‘criterium’. I should’ve been more specific:

    “that the main criteria in choosing a method should be how long it has been used”

    and

    “A very odd criteria”

    Comment by Neven — 14 Dec 2010 @ 4:30 PM

  23. “Second, we take the data as given and do not account for uncertainties, errors, and biases in selection, processing, in-filling, and smoothing of the data as well as the possibility that the data has been ”snooped” (subconsciously or otherwise) based on key features of the first and last block.”

    [Response: Except that they didn't. They used there own set of data, and are complaining about 'ad hockery' when it was pointed out. - gavin]

    Comment by Carmen S — 14 Dec 2010 @ 5:56 PM

  24. Re #17 :
    “So our understanding of different climate factors gets better with time, now all we need to do is ACT on our collective knowledge. Why do we act as if our descendants do not matter?”
    Why ask a question like this and then not allow any discussion of the answers? Like many people, we would like to hear the expert views on what should be done. If its not allowed on here, then why allow the original post.

    Comment by Bill — 14 Dec 2010 @ 6:17 PM

  25. As you suggest, a main point of contention between MW and SMR seems to be the use of 95 vs. 59 series. MW claim the M08 choice of series was ad-hoc, whereas you mention here that:

    “To reduce the impact of this problem, M08 only used tree ring records when they had at least 8 individual trees, which left 59 series in the 1000 AD frozen network.”

    Perhaps elaborating on why the specific number 8 was chosen for the minimum number of trees can put this issue to rest?

    [Response: Mike might be better placed to answer - but the main reason is that reconstructions with small numbers of trees have much greater variance. However, in this case it's irrelevant why that was done because if someone is purporting to analyse the same data as a previous paper, they should do so - or provide some justification why not. Neither option was followed by M&W, and all we did was point it out. - gavin]

    Comment by Troy Ca — 14 Dec 2010 @ 7:35 PM

  26. One thing I dislike in these discussions is the constant assumption that the other person is just wrong on a specific. This has been happening on both sides in this discussion and it makes it painful to follow. It would be much better if you guys were emailing each other when you thought you noticed something and resolving the issue rather than a game of gotcha. If in the end you couldn’t resolve it that way then you could report that explaining each view. The prime example at the moment is centering. They thought you were dead wrong, so they should have emailed and asked. You now think they are dead wrong, so you should have emailed and asked.

    [Response: I agree it's stupid. When I found an error in their code, I emailed McShane directly and they fixed it without a fuss. The first i heard of them having an issue with a technical part of our code is reading the rejoinder two days ago. They are correct in that i implemented the fit to the log eigenvalue spectrum in fig S4 incorrectly, but fortunately it makes no difference (as stated above) - I have no idea why they didn't let me know when they found it. As for the centering issue, there is no issue. All PCs are calculated with "center=true" - and again, if they thought there was something silly that would get in the way of scientific issue (which i still presume they are interested in), i have no idea why they didn't email me. McShane has emailed me previously (for explanations on the area weighting we used, and the locations and use of the gcm data), so it is not as if he's shy. Right now i'm not anywhere i can look into these issues in detail - or decide what to do about it, but to reiterate, i'd much rather be talking about something serious than dealing with supposed gotcha's. - gavin]

    Comment by Nicolas Nierenberg — 14 Dec 2010 @ 7:49 PM

  27. In their “A Comment on ‘A statistical analysis…’” (PDF), Schmidt, Mann, and Rutherford mention the Lake Korttajarvi varved sediment (Tiljander) data series twice.

    Line 38ff:

    The further elimination of 4 potentially contaminated “Tiljander” proxies [as tested in M08...], which yields a set of 55 proxies, further reduces the level of peak Medieval warmth.

    Lines 159/160:

    Results using the M08 frozen AD 1000 network of 59 minus 4 “Tiljander” proxy records…

    There are not four Tiljander data series — only three. The primary series recorded by Tiljander et al. were X-Ray Density, varve Thickness, and Lightsum. Lightsum is the portion of varve thickness contributed by mineral matter. (Varve Thickness and Lightsum can each be measured in millimeters; diagrammed here.)

    Darksum is taken to be the portion of varve thickness contributed by organic matter. It was calculated as:

    Darksum (mm) = Thickness (mm) – Lightsum (mm)

    There are only two degrees of freedom among Thickness, Lightsum, and Darksum.

    The authors of Tiljander et al. (2003) suggested that the pre-1720 portions of XRD, Lightsum, and Darksum contain climate-related signals. They made no such claim for Thickness.

    Comment by AMac — 15 Dec 2010 @ 12:51 AM

  28. Why the concern with peer review? All of your peers are now reviewing all of the articles, and they are reading this thread and the threads at Climate Audit, Watts Up With That, and others. These might be the most thoroughly peer-reviewed paper and commentary in your field. They just are not the peers that you would have preferred. Not only that, almost all the peers will have access to almost all of the data and code used in preparation of the paper and commentaries.

    Personally, I am awaiting the print version (I am a member of the IMS which publishes AOAS, and I pay for it as it has become my favorite periodical), with which I shall spend much time before downloading and running the available code on the available data.

    [Response: Because it will be extremely difficult for future readers to see where the discussion leads to - fatuous statements that are now printed will be quoted for a long time, while their rebuttals (on blogs, or future papers etc) won't be. It is far more efficient to have less error in the first place. - gavin]

    Comment by Septic Matthew — 15 Dec 2010 @ 1:38 AM

  29. Bill,
    14 December 2010 at 6:0 PM

    Like many people, we would like to hear the expert views on what should be done. If its not allowed on here, then why allow the original post.

    The most common source of confusion. It drives these false skeptics that are trying to sell us the idea that we have no problem because they don’t like the solution. Problem and solution are not the same and they are different debates.

    This post is, like the majority of posts on RealClimate, not about ‘views what should be done’, but analysis of how the planetary climate system works and what consequences we can expect from our collective actions.

    With regard to solutions, you’ll not easily see RealClimate going further than a simple: “reduce CO2 emissions, and other warming agents too. ASAP.” How that should be done is of course the issue that merits debate. But it is a different debate about political, societal, economical and technological changes. This is not the venue for such a discussion.

    Scientific American merely expressed the common frustration that no real action is being taken, although the analysis has been showing beyond a reasonable doubt, for at least decades already, that there is a problem.

    Comment by Anne van der Bom — 15 Dec 2010 @ 3:18 AM

  30. “MW found recent warmth to be unusual in a long-term context: they estimated an 80% likelihood that the decade 1997-2006 was warmer than any other for at least the past 1000 years.”

    Oh yes, they are to be commended for anything that refutes the Medieval Warm Period. A simple question for Gavin and Michael Fann, why is the last 1000 years important, given that it is so insignificant in the earth’s lifetime. I mean, there’s a 100% chance that the temperature was much higher in early periods of earth’s history. There is a 100% chance that co2 was much much higher, especially during the time of the dinosaurs.

    Basically, you scientists are telling us that something that has happened before cannot be allowed to happen again because it will somehow be worse. Despite the fact that we have had co2 concentrations much higher than the 780ppm doubling that you all fear. So given that the earth has sustained higher co2 levels, higher temperature, why is 780ppm now too high? Did I miss something? Did history begin when I was born?

    [Response: The short answer is that there is now an advanced civilization in which many millions of people are dependent on an agricultural system that was designed and implemented to work within a relatively stable climate regime, especially that of the last couple hundred years. There are now many, and complex, dependencies on a stable climate. Also, the doubling typically referred to is 560 ppm, not 780.--Jim]

    Comment by Dr. Shooshmon, phd. — 15 Dec 2010 @ 9:16 AM

  31. I thought that the editorial accompanying the paper and responses was quite revealing. I read it as being, basically, an apology for having accepted the paper.

    First, the editor carefully describes the review process, in great detail. Editors, assistant editors, reviewers, incoming new editor, rounds of review, ending with this:

    “Acceptance of a paper reflects our opinion that the work represents a meaningful contribution to applied statistics, broadly construed, and that the authors have made a good faith eff ort to respond to the concerns of the reviewers.”

    That is, he takes great pains to show that they did all normal due diligence, and his only guarantee is that the authors met those standard of due diligence.

    A few other things stand out.

    First, he says it’s so obvious that C02 warms the earth that it’s pointless to test an hypothesis of no warming. (Which is, I think, the hypothesis this paper just tested, isn’t it?)

    “I particularly object to the testing of sharp null hypotheses when there is no plausible basis for believing the null is true. An example of an implausible sharp null hypothesis would be that a large increase in the concentration of CO2 in the atmosphere has exactly zero e ffect on the global mean temperature.”

    Second, there’s the clear statement that getting the data right is the most important thing. Is this in response to the commenter’s pointing out that MW got the data wrong?

    “One claim I frequently make is that, in terms of what is
    most important about using statistics to answer scienti fic questions,
    data are more important than models and models are more important
    than speci fic modes of inference.

    Third, he clearly states that the right way to do this is with teams that include climatologists. But that’s saying that MW is exactly the wrong model for how to make real progress in this area:

    “Greater cooperation between the climatological and statistical communities would bene fit both disciplines and be invaluable in the broader
    public discussion of climate change. There have been great strides
    made in this regard in recent years, which is reflected in the diversity of affiliations of the discussants and the extent to which they
    demonstrate their understanding of both statistics and climatology.”

    Finally, it ends with a strong, unambiguous policy recommendation unsupported by the analysis, in what I’m pretty sure is not actually a policy journal:

    “Thus, while research on climate change should continue, now is the time for individuals and governments to act to limit the consequences of greenhouse gas emissions on the Earth’s climate over the next century and well beyond.”

    My paraphrase:

    “We did our normal due diligence. In hindsight, statisticians shouldn’t undertake this alone. For example, how could we know they’d screwed up the data, until after we’d accepted the paper and gotten the commentaries from climatologists. And in any event, it’s so obvious that C02 warms the earth that it isn’t worth testing the “sharp null” that it doesn’t. Ignore the thrust of what we just published here (that we can’t say that current temperatures are anomalous), and start restricting GHG emissions now.”

    Comment by Christopher Hogan — 15 Dec 2010 @ 10:03 AM

  32. @25:
    a very basic concept in dendrochronology is replication of proxy series – the usual conceptual visualisation for this is the “linear aggregate model” of Ed Cook (Cook, E., Kairiūkštis, L. 1990, Methods of dendrochronology: applications in the environmental sciences, pp. 98). It basically says that ringwidth is a function of:
    age trend + climate signal + endogenous (local) disturbances + exogenous (standwide) disturbances + unexplained variation
    Basically, this concept also underlies all other dendro proxies, be it maximum latewood density or stable isotope ratios, even though there are variations, e. g. age trend in density data is treated differently from ring width data). Of those factors, the climate signal (and possibly exogenous disturbances) comprise the common signal of all trees in one site, while endogenous disturbances and unexplained variations are supposed to occur randomly. Averaging as many trees as possible will strengthen the common (climate) signal and reduce the noise, therefore improve the reconstruction quality (without some weighted mean statistics, for two trees, each will contribute 1/2 of the variance of a proxy time series – for 8 trees only 1/8 and so on). Working “hands on” with dendro data, this means that a chronology comprising 8 or 10 trees in one time span will not change significantly, even if you add some other trees. The other way is more problematic – having only 1 or 2 trees in a chronology means that dating is not completely certain (missing/false rings) and that there is no guarantee that the growth depression you see is not caused by any disturbancy, instead of temperature. This becomes especially important once you get to the juvenile growth phase of those trees, keep in mind that what was a 1200-year old mighty tree at sampling in 1980 or whenever, was much thinner a 1000 years earlier, and therefore much liklier to be influenced by competition, insect outbreaks, avalanches, rock movements, fire or other disturbances during its youth.

    [Response: Thanks for the nice explanation. The biggest issue with small vs large trees is typically the influence of tree size on ring characteristics, but the things you mention can play a role too.--Jim]

    Comment by is(de) — 15 Dec 2010 @ 10:24 AM

  33. Gavin, Thank you for your response. You say that there is no issue on the centering. I’ll assume you are correct. Did you email McShane to confirm your belief? If you did then it would make it much easier on the reader to include the result of the exchange and I could know that the issue had been settled. As it stands it is just an argument.

    [Response: No it's not. The code is online and anyone can check it for themselves. As I stated, right now I'm not anywhere where I can do any work on this or engage in technical back and forths, but when I am, i certainly will be engaging on this further. - gavin]

    On a higher level topic. I understand that you don’t agree with the interpretation that these statisticians have on the results and uncertainties in various reconstructions. How about enlisting some other prominent statisticians to do a joint paper on the issues? Cross field collaboration is a very big thing these days and clearly there are issues here that are deeply statistical in nature rather than having to do with expertise in climatology.

    [Response: There are already lots of statisticians working on this - Rougier, Nychka, Tingley for instance all submitted commentary on M&W, who are by no means the voice of 'statisticians'. Their paper was just not done very well, and our addition of the pseudo-proxy analysis from long GCM runs showed that clearly. I'm confident that other people will take this forward, and if I can be useful to that I will be, but that remains to be seen. - gavin]

    Comment by Nicolas Nierenberg — 15 Dec 2010 @ 11:27 AM

  34. Dr Shooshman:

    Did I miss something?

    Yeah, you missed the point that some of us don’t want our species to go the way of the dinosaurs for as long as we can put it off.

    Comment by dhogaza — 15 Dec 2010 @ 12:09 PM

  35. To Dr. Shooshmon @30 – the last 1,000 years isn’t significant in the lifetime of the planet, but it’s extremely significant in the lifetime of civilization.

    As I understand it, not just the change in CO2 and temperature is a problem, but the rate of change. It’s astronomical, in geological time.

    Comment by Maya from the peanut gallery — 15 Dec 2010 @ 2:09 PM

  36. #30 said: “the earth has sustained higher co2 levels…Did history begin when I was born?”

    Dude, I didn’t see you around 15 million years ago.

    Comment by CM — 15 Dec 2010 @ 2:13 PM

  37. 30 Dr. Shooshmon, phd. says:

    Despite the fact that we have had co2 concentrations much higher than the 780ppm doubling that you all fear.

    This is about the third time just this week that I’ve seen this ‘CO2 doubling = 780ppm’ meme appearing in an AGW comments thread. As Gavin already pointed out, the (first) doubling that climatologists are concerned about is the one from the *pre-industrial* concentration of 280ppm to 560ppm. NOT from the 390ppm we have today. So of course, we will reach 560ppm, with all the resultant consequences thereof, *much quicker* than 780ppm.

    What happens is these memes spring up on contrarian sites and get repeated without question, and then they are impossible to kill off without a lengthy explanation. *Every* place they crop up. Whack-A-Mole indeed.

    Comment by Steve Metzler — 15 Dec 2010 @ 2:55 PM

  38. Berliner’s last paragraph: “The problem of anthropogenic climate change cannot be settled by a purely statistical argument …”

    I’m sorry, but what exactly is the “problem” that needs to be “settled”?

    [Response: The epistemological problem of how much statistics alone, when not tightly integrated with physical knowledge, can tell you. Sort of an age-old question.--Jim]

    We know what the problem is and we know what needs to be done to solve it. We have known both of those things for years.

    And we know that we are still not doing what needs to be done to solve it. Not even close.

    Comment by SecularAnimist — 15 Dec 2010 @ 4:12 PM

  39. Since this is likely to come up anyway, there is another conspiracy-laden post at CA with regards to the peer review of a new publication by McKitrick and Nierenberg. This had it’s genesis in a ‘comment’ on my 2009 paper on spurious correlations in work by McKitrick and Michaels and separately, de Laat and Maurelis. Instead of submitting a comment on my paper, M&N submitted a ‘new’ paper that was in effect simply a comment and specifically asked that I not be a reviewer. I was therefore not chosen as a reviewer (and I have no idea who the reviewers were). Nonetheless, since the submission was so highly related to my paper, and used some of the data I had uploaded as part of my earlier paper, the editor of IJOC asked me to prepare a counter-point to their submission. I did so, and in so doing pointed out a number of problems in the M&N paper (comparing the ensemble mean of the GCM simulations with a single realisation from the real world, and ignoring the fact that the single GCM realisations showed very similar levels of ‘contamination’, misunderstandings of the relationships between model versions, continued use of a flawed experimental design etc.). I had no further connection to the review process and at no time did I communicate directly to the reviewers.

    The counter-point I submitted was fair and to the point (though critical), and in no way constituted any kind of improper influence. Editors make decisions about who reviews what paper – not authors, and they make the decisions about what gets accepted or not, not reviewers. Authors who seek to escape knowledgeable scrutiny of their work often come up with lists of people who they claim are unable to give a fair review, and editors need to use their discretion in assessing whether this is a genuine issue, or simply an attempted end run around the review process.

    I have not yet seen the ‘new’ M&N paper, but it is very likely to be more of same attempts to rescue a flawed analysis. It should be noted that the main objection to my 2009 paper was that I didn’t show that the residuals from McKitrick’s regression contained auto-correlation. This could have been usefully added (and can be seen here), and in any case was admitted by McKitrick in yet another flawed paper on the topic earlier this year. The overwhelming reason why McKitrick is wrong though is because he is using an incorrect null hypothesis to judge the significance of his results. A much more relevant null is whether the real data exhibit patterns to economic activity that do not occur in single realisations of climate models, and this is something he has singularly failed to do in any of these iterations.

    Comment by gavin — 15 Dec 2010 @ 8:15 PM

  40. The short answer is that there is now an advanced civilization in which many millions of people are dependent on an agricultural system that was designed and implemented to work within a relatively stable climate regime, especially that of the last couple hundred years.

    You mean’t ‘billions‘, right?

    [Response: The larger point is that many of us could stand to have a better awareness of how much of societal/global security is dependent on a climate stability that we mostly, like so many things, take for granted.--Jim]

    Comment by Thomas Lee Elifritz — 15 Dec 2010 @ 10:03 PM

  41. Gavin @ 8.15PM 15 Dec

    It would help clear the air if you could share the counter-point you submitted. Is that possible?

    [Response: I'll think about it. - gavin]

    Comment by HAS — 16 Dec 2010 @ 12:46 AM

  42. #40
    Not really, it’s “many millions,” but only “a handful of billions” (unless you are using UK billions in which case you have only “a tiny fraction of a billion”).

    Whatever, the point is that, to the best of our current knowledge, many millions of people will die and otherwise suffer as a result of our not taking the actions we should have back in 1990.

    Comment by James Killen — 16 Dec 2010 @ 1:43 AM

  43. “The only support for ‘Lasso’ comes from McIntyre and McKitrick who curiously claim that the main criteria in choosing a method should be how long it has been used in other contexts, regardless of how poorly it performs in practice for a specific new application. A very odd criteria indeed, which if followed would lead to the complete cessation of any innovation in statistical approaches.”

    Gavin, Mike, this assertion sounds reasonable, but does it imply that the reliability of your results depends entirely on the novelty of the method you introduced? this may be a questionable point for its solidity.

    [Response: No, as any elementary class in logic would have taught you. - gavin]

    Comment by Gilles — 16 Dec 2010 @ 2:02 AM

  44. The overwhelming reason why McKitrick is wrong though is because he is using an incorrect null hypothesis to judge the significance of his results. A much more relevant null is whether the real data exhibit patterns to economic activity that do not occur in single realisations of climate models, and this is something he has singularly failed to do in any of these iterations.

    #####
    That’s a good point Gavin. With his code in hand could you do that?

    [Response: Yes, and I have for each iteration I have looked at. And McKitrick's conclusions fail to hold up every time. There is a limit to how many times I'm going to do it again. Anyone else could do so themselves using the archived data for Schmidt (2009) or data from IPCC AR4. Note that this requires looking at individual runs, not ensemble means. - gavin]

    Comment by steven mosher — 16 Dec 2010 @ 2:58 AM

  45. @gavin #39
    I particularly liked McKitrick’s complain about not being “given a chance to reply” to the “inane” reviews, and about the editor “refusing” to “reconsider their paper” after it was rejected…

    Comment by ICE — 16 Dec 2010 @ 6:17 AM

  46. Gavin,

    are you saying that your approach would be to mine through individual runs of individual models to find something that matches? That sounds odd. Are you not bound to find examples of matching patterns in very noisy data if you look hard enough? Sorry if I’m misunderstanding.

    [Response: Huh? Where did I say that? No, I am saying that the patterns of temperature trends are spatially complex due to a whole host of reasons - internal variability of the climate system, regionally specific forcings, local and micro-climate issues etc. Deciding that a spatial pattern that is correlated to a 'socio-economic' variable is causative, requires an understanding of what the distribution of that pattern is under a null hypothesis of no 'contamination'. GCMs can produce such a distribution (albeit imperfectly), and so should be used for the null. In Schmidt (2009), I used the 5 runs I had easily available, to demonstrate that the significance test that McKitrick had used vastly overstated the importance of his correlations. I speculated that this was due to him not appreciating the spatial auto-correlation structure of the variables and over-estimating the degrees of freedom. This was true (as he now has admitted in McKitrick (2010) and the new paper). He claims that this can be corrected for, but he still isn't using the proper null - in M&N they show the results from the ensemble means (of the GISS model and the full AR4 model set), but seem to be completely ignorant of the fact that ensemble mean results remove the spatial variations associated with internal variability which should be the exact thing you would use! Now, I haven't done a full analysis of the AR4 individual runs to do this properly, and I am not motivated to do so, but if McKitrick was serious, he would have done it already, and of course he still could. - gavin]

    Comment by Stephen — 16 Dec 2010 @ 6:42 AM

  47. Gavin, at least they acknowledge you are a “popular” blogger :)

    In their rejoinder MW claim they didn’t agree with reducing the data set to 59 as follows: “the application of ad hoc methods to screen and exclude data increases model uncertainty in ways that are unmeasurable and uncorrectable.” I would have thought that excluding data that fail to meet quality criteria is reasonable. The method for excluding data appears to me quite clearly described in the supplement to Mann et al. though the reason for requiring specific features could have been explained (e.g. exactly why “there must be at least eight samples during the screened period 1800–1960 and for every year used.”)

    Comment by Philip Machanick — 16 Dec 2010 @ 7:46 AM

  48. There has been some reports that the area weighting might not have been done corrected. This would have overweighted the arctic region and. in consequence, the slope of the baseline. Could anyone you has actually read the code can comment on it?

    [Response: No idea what you are referring to. Can you be more precise? - gavin]

    Comment by Yvan Dutil — 16 Dec 2010 @ 9:10 AM

  49. Re. Gavin 39: Thanks for that response. Useful already.

    Comment by J Bowers — 16 Dec 2010 @ 6:20 PM

  50. Regarding the relatively stable climate regime of the last couple hundred years, has anyone asked whether that was the anomaly. IS it not possible that the recent agricultural boom was the result of a preferencial climate. History tells us of mass chaos caused by a massive downfall in agricultural yields occurring numerous time. Granted, technology has contributed, but how many millions of deaths have been averted because of the abundance of food?
    I know some will point to various droughts and floods ruined crops, which happen frequently. However, recent crop failures pale in comparioson to those that have occurred throughtout history. It may just be possible that whatever direction we move from here; hotter, colder, wetter, or dryer, agriculture will suffer.

    Comment by Dan H. — 16 Dec 2010 @ 7:01 PM

  51. Hi Gavin,

    thanks for the considered reply to my #46 above. I believe I follow that; you’re saying that the information that one would want to test against is lost in averaging in the ensemble mean. My point remains, I think. Which models should one examine? You say look at all of them. Which runs? – you’d presumably want to do a bunch of them and look at the results you get. But is there not so much variability, so much noise, across different models and different runs of different models that you are bound to find – by chance – examples of the claimed real data pattern? How many possible “false positives” would you need to see satisfy yourself that you were getting useful information, and not coincidental matched from the individual runs? A single example? A certain percentage?

    [Response: This is hardly a unique problem to this case. Classical statistics of the sort practiced by McKitrick by convention uses a 95% cutoff, so if you found that this pattern occurred less than one time in 20 runs you might start to think that there was something to be explained. However, it still would not prove that there was contamination - perhaps the real world was just one of those times. McKitrick's hypothesis would have a lot more traction if it actually predicted something observable, rather than being a post hoc validation. - gavin]

    Furthermore, the “results” are just the (highly variable) output of a computer model. What’s to stop a researcher from simply keep pushing the model “go” button until he or she got the desired result (in whatever direction), and their opponent doing the same in the opposite direction (or at least each side being suspicious that the other had done that)?

    [Response: The set of model runs useful for this kind of exercise are quite limited - essentially being limited to the simulations archived for IPCC AR4/CMIP3 (and that are now being produced for CMIP5). They take months of super-computer time and I assure you that no group is doing them for any reason remotely connected to McKitrick. That's an amusing thought actually... - gavin]

    In short, how can it be useful to declare “ah look, I found the same pattern in a model run that you claim to have found in the real data, therefore the null hypothesis is not disproved” (or the reverse, mutatis mutandis) when there is so much noise and variability in individual model runs. Different people doing different runs of different models will (may) get different results.

    [Response: It's precisely because there is variability in the runs that this is useful. At minimum, the fact that I found similar 'significant' patterns in the 5 runs I looked at (and I only looked at those 5), indicated to me that the significance tests McKitrick proposed are way too generous to his hypothesis - which in my opinion is hopelessly flawed in conception in any case. But it is open to him to show his case more forcefully, but unfortunately each of his attempts have only added noise to an already very weak signal. I told him years ago what would be needed to make a reasonable case, but he has yet to do it. - gavin]

    Comment by Stephen — 17 Dec 2010 @ 6:13 AM

  52. Philip #47, I would say that this supposed issue of removing “poor” proxies is really a diversion. I have been playing with M&W’s code (well written and easy to use BTW) and, while I did notice that the number of proxies they used was larger than that of Mann et al. for the corresponding period, I never got around to figuring out why.

    What I did notice was, that M&W performed a set of traditional regressions (as reported in their Figure 14) truncating not only the proxy data set by retaining only a limited number of principal components, but similarly also truncating the instrumental data used for calibration. In this way they produced 4×4 = 16 of the total of 27 reconstruction curves plotted in Figure 14, for numbers of PCs of 1, 5, 10 and 20, for both the proxy and the instrumental data.

    The remarkable thing that I found was that of those 16 recons, no less than 8 — for proxy PC numbers of both 5 and 10, and for any number of instrumental PCs — lie very close to each other and the Mann et al. 2008 corresponding reconstruction curve!

    It thus appears to me (as an amateur) that PCA truncation of the calibration data acts as a regularization technique rendering harmless the effect of including these “poor” proxies, as well as that of retaining a too large number (10) of proxy principal components.

    So, M&W tried very hard not to reproduce the Mann et al. 2008 solution, but failed in no less than eight instances… see here, the curves in blue and cyan. See also my annotated Figure 13, showing that, even for the somewhat questionable verification metric used by M&W, these recons are actually performing pretty well…

    Comment by Martin Vermeer — 17 Dec 2010 @ 6:23 AM

  53. #50–Interesting speculation, Dan, but what consequence is to be drawn? Do we ignore the knowledge we have (imperfect and incomplete though it may be) on the off chance that no matter what we do, things will get worse?

    After all, there’s a chance that Earth could get hit with a intragalactic gamma ray burst or, more prosaically, a really massive asteroid, too. And then all this time spent commenting on RC would be wasted. . .

    Comment by Kevin McKinney — 17 Dec 2010 @ 7:44 AM

  54. #48 “There has been some reports that the area weighting might not have been done corrected. This would have overweighted the arctic region and. in consequence, the slope of the baseline. Could anyone you has actually read the code can comment on it?

    [Response: No idea what you are referring to. Can you be more precise? - gavin]”

    The contentious point was that M&W combined the temperature proxy without weighting by the area. The claim cames from the fact they did not mention they did in the the paper and also the fact that their curve was almost a carbon copy of the Kaufman et al curve for northern region. This gave the impression that high latitudes were overweighted.

    Comment by Yvan Dutil — 17 Dec 2010 @ 9:05 AM

  55. 50 (Dan H),

    History tells us of mass chaos caused by a massive downfall in agricultural yields occurring numerous time.

    Yes. In fact, such climate change is suspected in the downfall of any number of civilizations (Maya, Anasazi, etc.). Of course, in almost all of those cases, the droughts/negative effects have been “local” (e.g. North America only), not global.

    However, recent crop failures pale in comparison to those that have occurred throughout history.

    This is in fact true in respect to the “relatively stable climate regime” of the past two thousand years. In that time, the regular, long term swings of the climate have been enough to destroy individual civilizations, but not all human civilization at once.

    Today, however, we face a problem which is radically different in three respects.

    First, civilization is global, not local. We’re not talking about Rome or North America, we’re talking about everyone, everywhere.

    Second, the human population is currently dependent upon a greatly enhanced agricultural output. We’re not talking about a city state of 50,000 or 200,000 people where most of the population is directly involved in food production, we’re talking about an interconnected civilization of billions where (at least in the west) almost no one does anything more than shop for food. People don’t provide for themselves… the system provides.

    Lastly, we’re not talking about a sustained 1C swing in global temperatures (and associated precipitation changes, which are the real killer) as in the past 2,000 years (with local swings of — I’m guessing — 3C to 5C). We’re talking about a 3C to 6C global swing, with local swings up to 8C or 12C or even 20C (very, very worst case, so don’t jump on that). And this isn’t something that will necessarily pass in “just” a few hundred years.

    [As a side note, part of the problem in your presentation is the use of “recent crop failures” as a reference point. No one is all that worried about what has happened to date. Today isn’t the problem. 30 or 50 years from today is the problem.

    This is very much like the rather optimistic man who fell off of the top of the Empire State Building. Each time he passed an open window, he was heard to say “so far, so good.”

    Regarding the relatively stable climate regime of the last couple hundred years, has anyone asked whether that was the anomaly.

    Short answer — Yes, of course. That’s why deniers like to argue about the Medieval Warm Phallacy, because it implies that this has happened before, so it’s not a problem. But a rather large pile of paleoclimate evidence leads to a markedly different conclusion.

    Comment by Bob (Sphaerica) — 17 Dec 2010 @ 9:21 AM

  56. Gavin,

    thanks for taking the time to respond again at #51; your thoughts are much appreciated.

    Comment by Stephen — 17 Dec 2010 @ 9:30 AM

  57. Phil #47: As the author responsible for suggesting the selection criteria for tree-ring data in the Mann et al papers (1998, 1999, 2000, 2008, 2009 etc) I can tell you why the particular criteria were used, and why there is a clear basis for them. It is important to note that the requirement for at least 8 series was coupled with the criterion that there be a mean correlation equal to or greater than 0.5 between the individual series at one site that were combined to produce the site chronology. Wigley et al (1984) derived and tested a statistic they called ‘Expressed Population Signal’ (EPS) that has since been referred in many, many publications (444 by December 17 2010 according to the ISI Web Of Knowledge). They wrote ‘time series are averaged to enhance a common underlying signal or combined to produce area averages. How well, then, does the average of a finite number (N) of time series represent the population average…?’. To calculate EPS you need N and the mean correlation between the N series (rbar). In FORTRAN terms it is given by N*rbar/(1+(N-1)*rbar). If you write a simple MS-Excel formula you can calculate EPS for various values of rbar and N. Setting rbar as 0.5 shows EPS rising steeply up to 0.89 at N ~ 8, and then yielding very little increase in EPS for each additional series. By the way, the Wigley et al. (1984) paper includes not only testing of this and another statistic in real-life use, but also includes a formal derivation of them. Of course, as with any statistic, EPS is a guide to judgment and the assumptions on which it is based must be borne in mind. Given how much attention has been given to the problem of replication of tree-ring data in the published literature, as witnessed by the frequent citing of the Wigley et al (1984) paper, McShane and Wyner’s rejoinder reveals a distinct lack of familiarity with the most basic material on which they chose to pronounce.
    Reference:
    Wigley et al. 1984. Journal of Climate and Applied Meteorology, 23, 201-203.

    Comment by Malcolm Hughes — 17 Dec 2010 @ 12:18 PM

  58. That’s a good point Gavin. With his code in hand could you do that?

    [Response: Yes, and I have for each iteration I have looked at. And McKitrick's conclusions fail to hold up every time. There is a limit to how many times I'm going to do it again. Anyone else could do so themselves using the archived data for Schmidt (2009) or data from IPCC AR4. Note that this requires looking at individual runs, not ensemble means. - gavin]

    ##
    Thanks gavin. I think that kind of test could move discussions forward and get them off the whole discussion of “peer review.” The latter question just devolves into a side show, while the former is actually discussable even by people who don’t like each other personally. FWIW

    Comment by steven mosher — 17 Dec 2010 @ 1:52 PM

  59. Bob #54, re: “Medieval Warm Phallacy”,

    Clearly you’re as fed up as I am with the phallic imagery of the [winks knowlingly] “hockey stick” debate.

    (Sorry. Friday evening. Will shut up now.)

    Comment by CM — 17 Dec 2010 @ 2:51 PM

  60. 57 (CM),

    Must be something in the air. Visit my recent comments here and here.

    Nothing, however, beats retitling the Musthavebeen Warm Period for quick, random laughs.

    Tis the season.

    (Sorry. Friday afternoon, too near to Xmas. Will shut up now, too.)

    Comment by Bob (Sphaerica) — 17 Dec 2010 @ 3:40 PM

  61. 1) M&W have convinced me that the previous 30 years are sufficiently outside of the norm that there is something significant going on. Quite frankly I don’t think it makes a difference what happened 1,000 years ago. The M&W paper has significantly strengthened the case that something is happening.

    2) The most telling comment actually appears as a footnote: “On the other hand, perhaps our model is unable to detect the high level of and sharp run-up in recent temperatures because anthropogenic factors have, for example, caused a regime change in the relation between temperatures and proxies. While this is certainly a consistent line of reasoning, it is also fraught with peril for, once one admits the possibility of regime changes in the instrumental period, it raises the question of whether such changes exist elsewhere over the past 1,000 years. Furthermore, it implies that up to half of the already short instrumental record is corrupted by anthropogenic factors, thus undermining
    paleoclimatology as a statistical enterprise.”

    Going back to the editorial the comment regarding the null hypothesis remains. The physics of warming have been known since Fourier and Tyndall; the question is how quickly it will occur. Once again I don’t care about what happened 1,000 years ago… the data is screaming that soemthing is happening now.

    While the statistical methods cannot say what is causing the deviation, the first place to look is to somehow incorporate what the large supercomputer runs are saying as a proxy within the proxy.

    Bringing in another paper, the Dressler… 2% correlation coefficient? You have to be kidding… and the technique is very elementary, not even filtering out the known MJO impacts. I look at the interpretation in that paper and simply note it convinces me that other things are much more important than the effect which they are trying to measure!

    No, somebody is going to have to form a proxy of the long term models to incorporate as “physics” for the past 30 years for these models to be worthwhile…

    3) There are lies, damn lies, and statistics–Mark Twain

    Comment by Energy Moron — 19 Dec 2010 @ 9:38 AM

  62. Figure 8 of M&W is the most important result. Once again this shows that no matter how one slices and dices the information something happened in the 1960′s. Statistics alone cannot tell you anything about the causes.

    It’s getting hotter folks.

    On a lighter note it should be noted that this website made an important advancement as to the possible causes back in 2007 with the rediscovery of the sheep-albedo feedback caused by the introduction of polyester back in the 1960′s.

    I have to point out that the effect (not the cause) was noted by E. Presley back in 1972 with his observation that “I feel the temperature rising” without realizing the impact of the polyester jumpsuit.

    Comment by Energy Moron — 19 Dec 2010 @ 10:36 AM

  63. Figure 8 of M&W is the most important result. Once again this shows that no matter how one slices and dices the information something happened in the 1960′s. Statistics alone cannot tell you anything about the causes.

    On a lighter note it should be noted that this website made an important advancement as to the possible causes back in 2007 with the rediscovery of the sheep-albedo feedback caused by the introduction of polyester back in the 1960′s.

    I have to point out that the effect (not the cause) was noted by E. Presley back in 1972 with his observation that “I feel the temperature rising” without realizing the impact of the polyester jumpsuit.

    Comment by Energy Moron — 19 Dec 2010 @ 10:38 AM

  64. Meta note on blog science:

    When folks reading here run into discussions about this topic elsewhere, it will inform them if you add a cross-reference to info above from Malcolm Hughes (17 December 2010 at 12:18 PM) that begins:
    “Phil #47: As the author responsible for suggesting the selection criteria for tree-ring data in the Mann et al papers (1998, 1999, 2000, 2008, 2009 etc) I can tell you why the particular criteria were used, and why there is a clear basis for them. It is important to note ….”

    Comment by Hank Roberts — 19 Dec 2010 @ 5:16 PM

  65. Gavin, since you commented on our paper I have put up my comments at my blog.

    Comment by Nicolas Nierenberg — 20 Dec 2010 @ 12:18 PM

  66. Malcolm #57: Thanks for the explanation. Once again the willingness of the real experts to help out those of us willing to learn is a great feature of this site.

    I don’t know of any other area of science where those unfamiliar with the field have the arrogant presumption that they know better than those who’ve worked with the data for decades, and attack their work without a thorough analysis of how they got there – and do so publicly. An outsider can of course expose errors missed by those close to the work, but I’m still waiting for that to happen with anything significant to any fundamental findings.

    Comment by Philip Machanick — 21 Dec 2010 @ 4:42 AM

  67. Nitpick: “A very odd criteria indeed” should be “A very odd criterion indeed”.

    Comment by Philip Machanick — 21 Dec 2010 @ 4:47 AM

  68. Nicolas #65: if you have something to add to the discussion, please add it here. Some of us are limited in time to go to every blog where someone has an opinion.

    Comment by Philip Machanick — 21 Dec 2010 @ 4:58 AM

  69. Phillip,

    A very silly comment indeed. I am the co-author of the paper, and all you would have to do is follow the link to see my remarks. That is hardly going to every blog where someone has an opinion. If you don’t want my opinion about my paper that’s fine. I really keep hoping we can raise the level of discourse.

    Comment by Nicolas Nierenberg — 21 Dec 2010 @ 1:03 PM

  70. 67 (Philip),

    Nitpick counter…

    “Criteria” has been in fairly common, informal usage as an alternative singular form of criterion for over half a century (see Webster here). As such, it is likely to soon be granted status as a new, official definition. Consider the similar but different case of the word “agenda“.

    The word “nitpick,” on the other hand, has itself been around for less than 50 years, and so its particular use in accosting the use of “criteria” in the singular puts you in a bit of an embarrassingly contradictory sticky wicket!

    Comment by Bob (Sphaerica) — 21 Dec 2010 @ 5:34 PM

  71. Nicholas – it really would be nice if you did put your comments here. The reason being is that it makes it very hard to follow the discussion if you put all your refutations/comments about what you see as errors etc that you feel that people like Gavin made in your blog versus putting them here where there has been an invitation to use this spot as a point where the discussion can be focused.
    Its very hard to keep flipping back and forth to try and see what you think Gavin did/didn’t do – particulary because you don’t provide any links to his side of the discussion.

    Comment by Donna — 22 Dec 2010 @ 11:53 AM

  72. Hi Donna, if you have questions for Nicolas Nierenberg, you may want to post them at his site, however, I’ve pasted his response below :

    “Socioeconomic Patterns in Climate Data” by Ross McKitrick and me (MN 2010) has just been accepted for publication by the Journal of Economic and Social Measurement. It can be accessed here. This paper is largely in response to Gavin Schmidt’s 2009 paper “Spurious Correlations…” (S09) that I have discussed earlier. S09 was published in the International Journal of Climatology (IJOC), which subsequently rejected an earlier version of MN2010. I was very happy to provide a bit of the work on this paper. In particular I did some analysis, some modeling, and helped a bit with the editing.
    There is, as often seems to be the case in climate science, some heated discussion surrounding two distinct areas with our paper. First there is the question of whether we received a fair hearing in peer review from Journal of Climate. Second once again Gavin is saying that our conclusions are incorrect. I should add that he has done this without benefit of reading our actual paper, but it seems fairly clear that reading the paper will not change his mind.
    For me there are two distinct fairness and good practice issues. First S09 was clearly a response on Ross’s earlier work. I’m sure this is too much to ask, but Gavin should have sent his paper to Ross for comments before publishing. It would have been the right thing to do scientifically, but I’m not sure how much this is about science. Failing Gavin doing that then IJOC certainly should have asked the author’s of the previous papers if they had comments. At the absolute minimum they should have offered space for responses in their publication. They didn’t do any of these things, and it doesn’t appear to me as if the reviewers of either S09 or MN2010 even read the predecessor papers. Second the objections to MN2010 from IJOC didn’t have to do with whether we were right. They had to do with whether they felt the predecessor papers were the right approach at all. But the problem is that they were different, and less specific, arguments than those in S09. The weird thing is that these comments weren’t themselves subject to peer review or response, so from IJOC’s perspective Gavin’s incorrect arguments were allowed to stand, because the reviewers had altogether different objections to Ross’s earlier work. In my opinion they should have asked us to submit a response rather than a paper in order to resolve the situation, but they didn’t.
    In response to our paper Gavin is now making new technical arguments about why we are incorrect. The first argument is that he has drawn a graph that shows spatial autocorrelation (SAC) of the residuals. It is at least nice of him to acknowledge that the argument is S09 was incorrect, and that you need to look at the residuals. The problem is that he is still not doing any type of standard test for SAC. These are well known, and we have done those tests in our paper. This part is really amazing. I’m not an expert in this area, but back when I was looking at this I was able to quickly find a text on the subject and find these standard tests. Who would make a statistical argument without using the standard statistical tests in the literature? We have also shown the effect of allowing for SAC where necessary and that the results stand. So in my opinion that is what he needs to respond to. His second argument is that it is possible to see these types of correlations in a single instance of a GCM run. This will take a little more examining.
    In S09 Gavin showed several GCM runs. Using those he showed that some economic variables were significant in the same regression. Since, of course, socioeconomic variables can’t be influencing a GCM this shows that these types of correlations are spurious. There are two problems. First, where they were significant the coefficients were very small, and of the opposite sign of those found with the real world climate data. Second, and rather ironically, if you allow for SAC they lose all significance, unlike those from real world climate data. In other words he managed to incorrectly argue that Ross’s earlier results were wrong because of SAC, and then make a flawed argument because he didn’t allow for SAC.
    Now he is making a different argument, which is that if you do a whole bunch of GCM runs you will see a result exactly like Ross’s earlier work. The problem is that none of the runs in S09 look like that, and he isn’t producing any others. If he does then I guess we could take a look. Even if it does happen sometimes, and I guess it could as a matter of random outcomes, it would need to happen a lot for our conclusions to be incorrect. That is the whole idea of significance testing.
    These results indicate urban heat island (UHI) and other measurement issues may be affecting the published long-term land temperature trends. I believe that this result is plausible given what is known about UHI and the lack of meta data for large portions of the world. The results also indicate that it is in fact areas where we have the least amount of meta data and the poorest records that are the most affected. Also remember that land makes up only one third of the Earth’s surface so even if there were a 50% error in land trends this would only be a 15% difference in the overall trend. Therefore this shouldn’t be an argument over the big picture. But people building models need accurate measurements of the various portions of the temperature trend, so they should be quite interested if corrections need to be made. The results of any one study aren’t definitive of course, but it should be taken seriously and additional work should be encouraged rather than huge amounts of energy and time being spent on spurious arguments trying to get rid of it.

    Comment by oneuniverse — 23 Dec 2010 @ 4:59 PM

  73. oneuniverse – thank you for posting the comments. I guess I would have preferred that Nicolas Nierenberg would have done so because that would be a sign to me that he understtod the point I was making.
    Given some of his complaints about wanting people to email, to send papers to others for a read before publishing, his behavaior seems odd. If he and his coauthor did specifically request that gavin not be a reviewer and he thinks that is fine, then why all the comments that others should have sent their work to those that they were commenting on. Does not appear to be a consistent standard of behavior.
    Also his last statement seems, for lack of a better term “off” – it seems like their paper was treated seriously. Whether or not he agrees with the arguments/comments, people evidently did not just dismiss it out of hand but took the time to do a technical analysis of it and post what they thought were the weaknesses. That hardly counts as spurious arguments trying to get rid of it.

    Comment by Donna — 23 Dec 2010 @ 10:28 PM

  74. Bob (Sphaerica) #70: Websters have not yet accorded it official status. What next? Accepting that “begs the question” now means “invites the question”, rather than a flawed logical argument? “Agenda” is a different case as you could argue it is a kind of collective noun. “Nitpick” may be new but that doesn’t make it wrong.

    Back to the main topic: Nicolas #69: calling me silly is hardly raising “the level of discourse”.

    While I agree that there is value in letting an author know if you are planning a rebuttal, this is not in my experience (in fields other than climate science) universal practice. Did you ask your co-author if he routinely does this? If you read this article by Ben Santer, you may get some hint as to why the “level of discourse” may be problematic.

    Comment by Philip Machanick — 24 Dec 2010 @ 3:06 AM

  75. one 72: First there is the question of whether we received a fair hearing in peer review from Journal of Climate.

    BPL: Right, because the warmers are out to suppress dissent! Why don’t you try Energy & Environment, instead? I’m sure they’ll give you a fair hearing–i.e., one which lets you publish whatever you want, as long as it’s anti-AGW theory.

    Comment by Barton Paul Levenson — 24 Dec 2010 @ 6:58 AM

  76. One useful comment in the excerpt that one pulled over from Nicholas Nierenberg’s blog is where he states “The weird thing is that these comments weren’t themselves subject to peer review or response, so from IJOC’s perspective Gavin’s incorrect arguments were allowed to stand”. I don’t know much about comments being subject to peer reviewed or response but it sems unlikely to me that any journal would do what this seems to imply. Comments are comments and I would guess that most people would know that, just like blog posts, they are hardly the final say on anything. Of course, since my impression on science is that nothing is the “final say” since people are always challenging how we can best describe how things work, that is not surprising. I guess what is surprising is that someone would think that the comments were the final say.
    I think in science what constitues the closest to the final say is when you go back and look to see that a mechanism/model/idea in a paper is getting cited and used in more research so that it has become one of the building blocks to an understanding of some aspect of how things work.

    Comment by Donna — 24 Dec 2010 @ 10:15 AM

  77. Has anyone read the blog post by Dr. Rosemary Redfield — Arsenic-associated bacteria (NASA’s claims) — addressing the recent rather well publicized and seemingly ground-breaking paper by Dr. Wolfe-Simon, along with the long stream of comments??

    The parallels with McShane and Wyner are striking, as well as with climate change science in general and specifically many other “climate soap operas.”

    What is interesting is to see it happen in a completely different branch of science, in a microcosm totally isolated from the wider, carnivorous world of AGW science. It’s almost like a test tube experiment in itself of the collision of traditional science, media promotion, human nature and the Internet. There are even implications of using deceptive tricks in constructing graphs!!!

    Key components:

    1) The science goes against the mainstream.
    2) The science was announced in a live NASA press briefing (an unusual step, and one that apparently puts off other scientists).
    3) The idea that science was done “by press release” (i.e. a dramatic press briefing) was criticized.
    4) A scientist used a blog post to criticize (with details) the paper.
    5) That scientist also got more than a little “snarky” in the blog post.
    6) Commenters, often fairly erudite and apparently qualified, piled on, or attacked the blogger, or each other (including a fair number comments on spelling and grammar, just like here!).
    7) Science was discussed.
    8) Grammar was discussed.
    9) Etiquette (in both science and blogging) was discussed.
    10) The quality of the reviewers was discussed.
    11) The value and quality of the peer review process was discussed (should the paper ever have made it past peer review?).
    12) The value and quality of the blog-post approach to science was discussed.
    13) The authors of the original paper asked that comments and questions be sent to them rather than discussed on the blog, which was seen as contradictory to the public media fanfare given to the paper’s announcement.

    I’d strongly recommend reading the blog post and comments to everyone. It makes fascinating reading, and the fact that the parallels occur in a closed environment (a sort of test tube experiment, where the scientists are in the tube!), isolated from the political nature of the climate change debate, helps to highlight the fact that science itself may (under the pressures of the Internet Revolution) be on the cusp of a dramatic change in the paradigm that defines business-as-usual, or at least it’s at a crisis point where perhaps science as a culture needs to get together and lay out new ground rules on when, where, what and how things should be discussed — and what is appropriate blog behavior for reputable scientists in the field, and how they treat each other.

    Basically, one could afford to be snarky in 1960, because it such snarkiness was expressed in phone calls or conversations at conventions, in ways that lived and died in moments, or perhaps by he-said-she-said rumors circulated among other scientists. That is nothing like the open-to-everyone-out-there-forever buzz-mill that the Internet has created.

    Comment by Bob (Sphaerica) — 24 Dec 2010 @ 10:56 AM

  78. 74 (Philip),

    I was joking (mostly).

    Comment by Bob (Sphaerica) — 24 Dec 2010 @ 11:00 AM

  79. Dear moderators, I’ve posted a nearly identical comment to my #72 (reworded to get through the ‘duplicate comment’ block), apologies, please delete if possible. As explanation, I received an ‘ERROR: That reCAPTCHA response was incorrect.’ message for every submission, and thought that none had got through.

    Donna, my only complaint against M&N’s behaviour during the review process would be if they hadn’t provided Gavin with a copy of MN2010. I’m more interested in the technical arguments, however, which is why I reproduced his comments.

    Philip, I thought your comment to Nicolas was unnecessarily rude. It was maybe humurously ironic – you wrote: “Some of us are limited in time to go to every blog where someone has an opinion.”, yet you link your name to your blog, which is called “Opinionations – Philip Machanick’s views on the world”.

    BPL, you’ve wrongly attributed Nicolas’s comment as mine in your #75. Your own response then mischaracterises what he said – his comments are restricted to the particular review process, he makes no extrapolations (BPL: “Right, because the warmers are out to suppress dissent!”). I think you’ve created a strawman.

    Comment by oneuniverse — 24 Dec 2010 @ 12:51 PM

  80. oneuniverse – while your only complaint against M&N’s behaviour during the review process would be if they hadn’t provided Gavin with a copy of MN2010, I noted several things that seemed off. I remain unconvinced that they held themselves to the standard to which they seem to think others should have been held.
    I am also interested in the technical discussions though I suspect that some may be tired of explaining their points again.

    Comment by Donna — 24 Dec 2010 @ 4:28 PM

  81. Donna, there’s an interesting discussion at the ClimateAudit thread – eg. commenter ‘pete’ has made some good critical points, Ross McKitrick has responded, including some new calculations, although more work that needs to be done before the area illuminated by the criticism is explored by analysis.

    I don’t think that such discussion can be fairly described as one ‘side’ futilely explaining things to the other.

    Comment by oneuniverse — 24 Dec 2010 @ 6:28 PM

  82. one,

    Sorry I misattributed that quote. My other comments I think I’ll let stand. Anyone who complains that a respected journal didn’t give them a fair hearing usually has an axe to grind.

    Comment by Barton Paul Levenson — 25 Dec 2010 @ 5:24 AM

  83. > flipping back and forth to try and see what you think …
    > you don’t provide any links to his side of the discussion.

    Quoting and citing sources — not required for blog science.
    Nevertheless, it would be a good idea to raise the level that much.

    Comment by Hank Roberts — 25 Dec 2010 @ 10:25 AM

  84. Barton (#81) : “Anyone who complains that a respected journal didn’t give them a fair hearing usually has an axe to grind.”

    Perhaps, but each case should be judged on its own merit. Not all complaints against respected institutions are bogus. Not sure if you’ve read McKitrick and Nierenberg’s letter to the IJOC explaining their grievances with the review, but if their descriptions of and quotations from the reviewers’ comments are accurate, they have a good point – I’d agree with them that two of the three reviewers seem to have raised many spurious or incorrect criticisms as justifications for not publishing their paper. You’ll note that MN weren’t given an opportunity to respond to the reviewers’ comments

    Hank: “Quoting and citing sources — not required for blog science.”

    Nicolas’ post was written partly in response to Gavin’s comment #39. Both made reference only to S09 and MN10.

    Comment by oneuniverse — 25 Dec 2010 @ 3:10 PM

  85. Re #72: It appears on his blog (see #65), in his comment 12/26/2010, Nicolas Nierenberg acknowledges not having tested whether “the significance was lost if you adjusted for SAC” also applies if you use individual runs instead of ensembles. (This refers to the false-positive significance in GCMs). He also indicates he doesn’t intend to make that test as “we could keep going indefinitely”, meaning he intends to ignore Gavin’s point since he doesn’t know if it would be the “definite test”.

    Comment by Someone — 1 Jan 2011 @ 1:27 AM

  86. > on his blog

    December 26, 2010 10:20 AM

    Comment by Hank Roberts — 1 Jan 2011 @ 3:15 AM

  87. one – “Perhaps, but each case should be judged on its own merit. Not all complaints against respected institutions are bogus. Not sure if you’ve read McKitrick and Nierenberg’s letter to the IJOC explaining their grievances with the review, but if their descriptions of and quotations from the reviewers’ comments are accurate, they have a good point – I’d agree with them that two of the three reviewers seem to have raised many spurious or incorrect criticisms as justifications for not publishing their paper. You’ll note that MN weren’t given an opportunity to respond to the reviewers’ comments”
    To decide if there was really a case against the journal then there is a whole lot of information that is missing. You said “if their description of and quotation from the reviewers are accurate” which minus the actual complete documents there is no way to prove one way or the other. Also – their claim is that they were treated badly (implying that they were given special treatment different than others recieve). They need to prove that too – if others also get criticism that the authors think is spurious or incorrect would tend to show that its not a “they are out to get us” but the fact that reviewers may make mistakes (if the criticism really is inaccurate) or don’t see things exactly the same as the authors (what the author thinks is spurious, the reviewer thought was pertinent).
    I really doubt that they can show that they got treated worse than others – particulary when they seem to have gone out of their way to avoid having the person who was most likely to have been the best reviewer deliberately excluded.
    And then you say that MN weren’t given the opportunity to respond to the reviewers coments. Again, this is only relevent if the most common practice at that journal was to have the authors respond to the reviewers comments and that they were treated outside of the norm (which I doubt).
    They more come across as people who are new to a process and don’t understand how it works than as having any true case of unequal treatment. And like many people who believe that they are “right” – are haivng some trouble understanding why others are saying that they don’t agree and think that they missed some important items. Of course they (MN) don’t like it but it hardly constitutes any sort of proof a consiparcy against them.
    I read the blog etc and find some of the comments weird. A comment was made that a critique made wasn’t part of the original concerns raised so it would basically be ignored. If I cared about whether what I was proposing was accurate or not, then whether or not a point was raised today or yesterday would be irrelevent. Was the critque valid, did it raise questions about the results I got would be what matters, not some weird sort of of ‘well you raised that point too late”.
    Maybe they think this a game of some sort versus an issue with profound implications. It isn’t.

    Comment by Donna — 1 Jan 2011 @ 4:08 PM

  88. Two points: It is standard in statistical journals to have the discussions of a paper be judged by the editors only. They are typically not peer reviewed. The original paper, however, is typically reviewed more carefully than most.
    (I know this because I am one of two co-editors of Environmetrics).
    Second, the lasso method is a standard approach to constrained regression in modern statistical science. The constraint is on the absolute values of the coefficients.
    That being said, I think the paper by McShane and Wyner suffers from a lack of understanding of the data. But it is an effort by skilled statisticians to analyze what they think is a standard data set. As such it is a useful addition to the literature, and what I really think is needed is more skilled statisticians getting interested in the field. There has been some growth, lately, and I am involved in several efforts to increase that growth.

    [Response: The problem with basically unsupervised discussion on an issue that has a lot of 'background', is that the quality of many statements rarely rises above a blog comment. Whether M&W's efforts are 'useful' depends clearly on whether the few innovative elements they introduced can be extracted from post-hoc justifications like the idea that large interannual variability must be plotted to avoid anyone noticing the variation in skill in reconstructing long term variability across different methods. Their use of lasso in one section to demonstrate how bad reconstructions are, and then it's abandonment for another method in another section without any declared justification is another issue that peer-review from within the paleo-climatological community would have likely flagged. As the discussion papers clearly show, there is no shortage of independent people capable of assessing new statistical papers in this field. Outside peer-review is even more warranted when people come in and make sweeping statements based on, as you say, a lack of understanding of the data, and a lack of having even read (let alone understood) most of the extant literature. I'm sure a better managed approach from the outset would have given rise to a much more focused and useful exercise. - gavin]

    Comment by Peter Guttorp — 11 Jan 2011 @ 1:09 AM

  89. Peter Guttorp@88 I would suggest that any value the McS&W paper has is largely negated by its implicit encouragement for statisticians to analyse environmental datasets without fully engaging with the existing litterature and hence gaining an awareness of the likely issues. It would indeed be a good thing for more statisticans to become interested in the field, however it is best done by close collaboration with climatologists (as McS&W demonstrate).

    BTW the McShane and Wyner paper will be discussed at the Cross-Validated journal club later this month, I expect the discusion would greatly benefit from statistically minded RCers contrubutions!

    http://meta.stats.stackexchange.com/questions/685/second-cross-validated-journal-club

    Comment by Dikran Marsupial — 12 Jan 2011 @ 4:33 AM

Sorry, the comment form is closed at this time.

Close this window.

0.435 Powered by WordPress