RealClimate

Comments

RSS feed for comments on this post.

  1. A related, common confusion is the meaning of statistical significance in all fields, not just climatology. Often it is explained “there is only a five percent chance that this experiment’s results were due to chance.” Lay folk often take that to mean “there is only a five percent chance that this drug doesn’t work.” Nope.

    The significance test probability of that experiment is relevant to an extraordinarily narrow conclusion about that particular experiment. You can repeat that particular experiment slavishly hundreds of times and thereby verify that probability, but the experiment’s design might well be fundamentally flawed in regard to the take-home message about the drug’s effectiveness.

    There is no rock-solid, objective, quantitative way to compute the probability of the real, broader question of whether the drug is effective. That’s the role of “consensus” of expert scientists’ judgments.

    Comment by Tom Dayton — 8 Feb 2009 @ 7:59 AM

  2. To be clear, then, was there no need for dLM06 and MM07 to have attempted to provide transparency? Could you have made your paper’s essential points without access to archived data and/or code or, in one case, a quick e-mail to an author?

    You have offered this new thread, presumably, in the context of the debate about how much transparency should be offered re. the Schmidt paper now that a few minor errors have surfaced. I take it that you think he has done enough at this point and that there is no further need to help Murelka, for example, work through his reconstruction.

    [Response: Perhaps you could be clearer in your questions. What 'Schmidt' paper are you talking about (do you mean Steig)? and who is Murelka? As to what was necessary above, I made it clear (I think) that both sets of authors provided enough data and information to replicate their results. - gavin]

    [Response: If you are talking about about Steig et al, please don't repeat the dishonest talking point that "errors have surfaced". Surely, if you've actually read what we've written on this, you are aware that there are no errors at all in the analysis presented in Steig et al, the problems are only with some restricted AWS stations which were used in a 2ndary consistency check, and not the central analysis upon which the conclusions are based (which uses AVHRR data, not AWS data). Those who continue to repeat this manufactured controversy will not find their comments published -mike]

    Comment by wmanny — 8 Feb 2009 @ 8:55 AM

  3. So seriously as the result of this process, which you felt to be useful you don’t think the author’s of papers should provide as much as possible? In both cases you cited this either did save you time, or would have. As you point out the replication of the previous analysis was a necessary means to an end.

    I agree completely that science is generally not determined by trivial errors. However it is a very fine line when you cross over from trivial to important. The tests that you did in the cases, changing time periods, looking at auto correlation etc. to you seem non trivial, but that is because you found them relevant. The original authors might not agree.

    [Response: No. They should provide as much as necessary. - gavin]

    Comment by Nicolas Nierenberg — 8 Feb 2009 @ 9:37 AM

  4. Re Murelka, I’ve seen references made to a Roman M. Guess he’s the same since he’s found in even todays’s discussions on Steig last work.

    Comment by Sekerob — 8 Feb 2009 @ 10:42 AM

  5. Outstanding post Gavin. The essence of what the scientific enterprise is all about. The crux: don’t get hung up on uncertainties or supposed system determinants that may in fact be unimportant. Understand sensitivities, don’t leave out the major players, and don’t ignore scale issues.

    “…the vast majority of papers that turn out to be wrong, or non-robust are because of incorrect basic assumptions, overestimates of the power of a test, some wishful thinking, or a failure to take account of other important processes…”

    I would add poor understanding of appropriate statistical methods, inattention to scale issues, and lack of existing, appropriate empirical data with which to test proposed models, the latter usually being beyond the control of the researcher.

    Comment by Jim Bouldin — 8 Feb 2009 @ 10:58 AM

  6. Gavin, Thanks for this post. It makes some very important points that nonscientists often miss–namely that “replication” is not merely “copying”. Underlying the scientific method is the assumption that there is an objective reality that will point to the same conclusion for any valid method of inquiry. Thus, if a scientist understands the data sources and what constitutes a valid method, the new analysis should replicate the conclusions of the previous analysis. If it is truly independent, it will also validate the methodology of the previous study to some extent (i.e. it will show that any errors do not materially change the conclusions).
    Another common mistake: The model is not the code. Rather the code is an expression of the model. So archiving the “code,” which is obsolete by the time the study is published, doesn’t really accomplish that much.
    Finally, a mistake even many scientists make: If you want cooperation, learn to play nice with others. If someone is a jerk and making accusations of fraud, incompetence or conspiracy, why in the hell would anyone want to work with them?

    Comment by Ray Ladbury — 8 Feb 2009 @ 11:11 AM

  7. Nicolas, Just because Rutherford said that physics should be explainable to a barmaid does not mean that a journal article must be a cookbook so that a barmaid can copy (not replicate) what was done. There is a reason that it takes ~10 years of graduate and undergraduate education in a specific field before one is usually considered capable of contributing to that field. There is also a reason why the scientific method–including publication–has evolved the way it has. It’s the most efficient way of elucidating understanding of the natural realm.

    [Response: For those who have been following the Steig et al paper discussion, this is a different (not Scott!) Rutherford ;) - mike]

    Comment by Ray Ladbury — 8 Feb 2009 @ 11:20 AM

  8. reproducibility of an analysis does not imply correctness of the conclusions.

    It’s true that reproducibility doesn’t prove correctness, but it does rule out several possible failure modes and is therefore a necessary-but-not-sufficient criteria on which to judge the plausibility of a conclusion you previously had reason to doubt.

    Suppose a new study has been published which claims to overturn the prior consensus view on some issue. Before this study came out, everybody believed “A”, but the author of Study X has done some nontrivial math and analysis which he claims supports conclusion “B”. How should we decide whether to update our beliefs?

    One unfortunate possibility is that the author of Study X might be lying or might have made a simple mistake that affected the conclusion. If the code is available, we can quickly rule those possibilities out and move on analyzing the sources and methodology in greater detail. But until the code is available we can’t rule those out and the likely level of dialog is correspondingly impoverished.

    If I’m a layman with no relevant expertise I might not be qualified to judge whether the code/data is correct, but even I can probably tell if it present. If it is present, simply knowing that other people who share my attitudes on the subject matter have had the opportunity to inspect it gives me warm fuzzy feelings about the possibility that the new study might be valid. And when – inevitably – a few bugs discovered, if the code is present I am more willing to accept assurances regarding the minimal impact of those bugs than I am when I essentially have to take it on faith.

    To answer your questions in the other thread: yes, we do see studies receive this level of scrutiny in other fields in similar circumstances. Consider how we treat economic studies that claim to reach new and unlikely conclusions related to gun control or minimum wage laws.

    Comment by Glen Raphael — 8 Feb 2009 @ 11:28 AM

  9. 2r1 I apologize for the typo (Schmidt should be Steig, yes) and Murelka is the person over at CA trying to reconstruct the Steig’s AWS backup analysis. The question I was trying to ask was [and yes, it is a mere layman’s question] are you attempting to defend the lack of transparency through the example of your paper’s refutation of the other two papers? In the context of the Steig paper controversy/tempest-in-a-teapot, the criticism has been that he has not done enough to make his work transparent, and there have been many arguments put forth here about why transparency is not necessary. I find it confusing, then, for you to have offered up three transparent papers to make an argument that transparency is not really required. I assume I am misinterpreting something, then, and hoping you can clear that misinterpretation up.

    [Response: let's get something straight. Scientific work needs to be replicable - I have never suggested otherwise, so please stop accusing me of being against transparency every time I point out it that it is more complex than you think. My two examples here showed that replicability doesn't require what is being continually demanded every time some one doesn't like the conclusions of a study. Looking at the Steig et al data page, it is clear that there is enough information to replicate the AWS reconstruction with only a little work, which presumably interested parties will put in. - gavin]

    r2 As to the “errors that have surfaced”, I don’t know what else to call them, since the BAS is listing “corrections”, and I understand, because of your previous explanation on “Robust”, that they have only to do with the AWS backup, which is why I referred to them as “minor”. I don’t see what’s so dishonest about that, though I understand your sensitivity about the integrity of the study.

    [Response: Errors in a secondary input data set are not the same as errors in the analysis. And they have already been incorporated and make little or no difference to the results. - gavin]

    Comment by wmanny — 8 Feb 2009 @ 12:11 PM

  10. Wildly OT

    What kind of effect do fires, like the current one in Australia, have on climate. Do they cool, like Pinatubo, or do the soot and smoke trap more heat than they turn away? Since drought and attendant fires are expected to increase due to AGW, this seems like a reasonable issue to explore.

    [Response: Short answer - lots. Biomass burning is a big source of black carbon and organic aerosols (warming), CO and VOCs (ozone precursors), also SO2 (leading to sulphate aerosols) (cooling). It's complex, and depends on many poorly determined factors (fuel, fire temperatures etc.). - gavin]

    Comment by duBois — 8 Feb 2009 @ 12:13 PM

  11. One unfortunate possibility is that the author of Study X might be lying or might have made a simple mistake that affected the conclusion. If the code is available, we can quickly rule those possibilities out and move on analyzing the sources and methodology in greater detail. But until the code is available we can’t rule those out and the likely level of dialog is correspondingly impoverished.

    So the “skeptic” position is that climate scientists are lying or incompetent until one can prove that they aren’t.

    This open admission does more to explain the antagonism between working climate scientists and the stone-throwing mob than anything I’ve seen posted thus far …

    Comment by dhogaza — 8 Feb 2009 @ 12:14 PM

  12. Nice work Gavin.

    I’ve never understood all the hoopla about the codes. If papers omitted relevant data sources, there might be an argument.

    Comment by wildlifer — 8 Feb 2009 @ 12:20 PM

  13. Dr. Schmidt’s work illustrates one of the major differences in ‘culture’ surrounding replication. In one culture the issue is whether your calculations and their description actually match each other. In the other culture, the goal is to reinvent the analysis guided by the description.

    Among the former culture, which Dr. Schmidt has tentatively entered, people release their code and data along with their results and description of the procedure. Using materials provided by the authors, occasionally subtle but important errors are found later on by people looking at the code. That is, the code and the published description/implications do not jibe. Sometimes even when the code checks out they may show that some step in the procedure that is a judgment call has an important impact.

    An analogy in symbolic math: a person states a theorem and provides a detailed proof. Another person confirms the proof but notices that one step includes “assume a solution to f(x)=0 exists. The replicator goes on to show that this is only true under more restrictive conditions than the other assumptions. They publish a follow up that shows that the initial results are less interesting than advertised. (As I understand Dr. Schmidt, the analogy in the other culture would be as follows: the full proof is not released, just key steps. Others would recreate the full proof from these guideposts. If they got stuck in this process they would expect no help from the original authors, especially if they were considered antagonistic.)

    A real-life example in the first culture: Donohue and Levitt published an (in)famous paper concerning abortion and crime [Quarterly Journal of Economics 119(1) (2001), 249–275].
    By releasing the Stata code they allowed other researchers to discover an error [Foote and Goetz Quarterly Journal of Economics February 2008, Vol. 123, No. 1: 407–423.]. It turned out that they had not actually reported state-level fixed effect results as the article claimed (due to an error in the code). The replicator went on to make a case that the results are much less robust than advertised Besides getting the chance to get rich popularizing their faulty result (i.e. Freakonomics) the original authors got a chance to reply. [Donohue and Levitt Quarterly Journal of Economics February 2008, Vol. 123, No. 1: 425–440.] Now people can make up their own minds.

    Of course, if the code and the description check out then one usually would expect to get no publication out of it. In replicating the results of MM07 using materials provided publicly by the authors, Dr. Schmidt has achieved what would be expected of a senior project for applied econometrics undergraduate course. My congratulations to him. I would give him an A-.

    [Response: You are too generous. I agree, replication projects are generally good for students to do, but without further work, the scientific value is small. But the point here is not to replicate this for the sake of it, but to discover whether the conclusions drawn from their analysis were valid by testing their procedure in test situations where one knows the answer already. I certainly wouldn't have written a paper based purely on a replication without looking further into the science of what was being analysed. I gave this example to demonstrate not how clever I am (oooh! I can replicate!) but to highlight issues that people were discussing without nuance or practical examples. - gavin]

    Comment by Chris Ferrall — 8 Feb 2009 @ 12:35 PM

  14. I think some people are arguing for the sake of trying to establish (pardon my oversimplification) that science is not sound unless the scientists can show every scrap of thought process.

    Maybe they they think this will show that this whole silly idea about humans causing the earth to warm will go away.

    Just my thoughts.

    Gavin, excellent post!

    OT For those that may be interested I did a new rebuttal on John Coleman about his article “The Amazing Story Behind the Global Warming Scam”.

    http://www.uscentrist.org/about/issues/environment/john_coleman/the-amazing-story-behind-the-global-warming-scam

    It’s a fun article, especially where he ambiguously insinuates catalytic converters helped reduce Co2 output from car exhaust (reduction by association with reduction of other pollutants?).

    Comment by John P. Reisman (OSS Foundation) — 8 Feb 2009 @ 12:44 PM

  15. Glenn (#8)

    “One unfortunate possibility is that the author of Study X might be lying or might have made a simple mistake that affected the conclusion. If the code is available, we can quickly rule those possibilities out and move on analyzing the sources and methodology in greater detail. But until the code is available we can’t rule those out and the likely level of dialog is correspondingly impoverished.”

    I’d have to disagree with that sequence of steps. If the author(s) did any sort of decent job in the Methods section, there should be a number of questions you could probe in your mind, alluded to above. The very last thing I would do is start wading into reading computer code, and I highly doubt there would be anything “quick” about it if I did. At any rate, the lion’s share of science is done with standard methods in +/- standard statistical packages, and this “code” consideration doesn’t even apply. You look for bigger-type study design issues long before you think about computer code. And lying? You can rule that out in 99.9999% of the cases just by knowledge of science culture and practice.

    Comment by Jim Bouldin — 8 Feb 2009 @ 12:44 PM

  16. What an excellent posting! Thank you.

    Denialists are now shut down, however their secondary goal – delay, continues to be wildly successful.

    Excessive challenges to your work means that you must now do more precise, more careful science, but you are doing it slower. And then you re-do it.

    This means we need more science.

    Or perhaps public policy can catch up some.

    Comment by Richard Pauli — 8 Feb 2009 @ 1:27 PM

  17. Chris Ferrall nailed it. Many skeptics come out of competing scientific traditions (math, stats, econ, compsci…) in which “can I replicate this exactly?” is a primary concern.

    JimB and dhogaza:

    I work in software quality assurance. One thing I’ve learned from that tradition is that the person who wrote a batch of code often isn’t the best person to evaluate its robustness. Software developers acquire blind spots; they have hidden assumptions they don’t even realize are assumptions. Somebody who expects the code to work uses it in the manner they expect to work and might on that basis confidently say “this works flawlessly!” right before a QA engineer finds dozens of bugs in the same program by using it a slightly different way or different environment or different mindset. The QA engineer is indeed trying to find something wrong with it but that’s a good thing, because the code is intended to ultimately work in a much larger context than “just when this one guy uses it”. Being vetted makes programs better.

    My default assumption about both software developers and researchers is merely that they are fallible. All code has bugs. Some bugs “matter” and many go unnoticed until an antagonistic second party – somebody who expects to see problems – attempts replication.

    Thus, if I explicitly withhold information that would allow my code to be verified by others that sends a message that I don’t care if my code is robust. This should reduce confidence in any conclusions reached based on that code.

    My conclusion: Ease of replicability does correlate to the believability of the scientific result.

    Caveat: this applies most strongly in the case of studies that utilize novel computational procedures.

    Comment by Glen Raphael — 8 Feb 2009 @ 2:32 PM

  18. I’m baffled on several counts.

    First of all, teapot or not, I think I had a hand in stirring up this tempest and wonder why I am neither credited nor blamed.

    Second, I can’t understand how the distinction between “all of the code” and a “barely sufficient” amount of code (the distinction that got me mixed up in all this) seems to be escaping people. In theory they may be the same, assuming that the code was error-free to begin with, but in practice they are drastically different.

    Third, the statement “Ease of replicability does not correlate to the quality of the scientific result.” delivered with a flourish at the end of this article seems totally obvious to all concerned. What it affects is the ability for others to advance the conversation, either by building on the result or by challenging it.

    Fourth, the discussion in point 3 above is baffling. Nicolas asks “So seriously as the result of this process, which you felt to be useful you don’t think the author’s of papers should provide as much as possible? ” and Gavin replies “No. They should provide as much as necessary.” The only way I can reconcile this is with different meanings of “should”. Arguably, the minimum necessary to replicate the result, in the 19th century sense, is what is traditionally required for publication. If, however, one places the advancement of knowledge ahead of the advancement of one’s own position, a different normative structure applies. I can see how Gavin’s position is legalistically correct but with all due respect it seems hollow.

    I will note that the distinction between standard practice among scientists and those of other trained producers of digital products is remarkable, and that repeatability of a much higher order is built into commercial workflows everywhere. If nothing else, this feeds into the perception by our critics that we are hiding something. To the extent that what we are doing is actually important, meeting the minimal standards of publication that have existed in the past is simply not rising to the occasion.

    Finally, I again refer the interested reader to the recent special issue of IEEE Computers in Science and Engineering on the subject of “Reproducible Research”, and notably the introductory editorial essay by Claerbout and Fomel.

    [Response: Ok, let's talk cases: The STATA script used by MM07 was indeed 'all the code'. Did it aid replication? No. dLM07 provided no code at all. Did that make a difference? No. But because both papers provided documentation, pointers or data, replication was relatively easy. Thus the 'all the code' mantra is not correlated to the ease of replicability. If you use exactly the same programs/styles/proprietary software/operating system then a script like the STATA one would be instantly usuable to you. But that doesn't include 95% of people. Thus more general and traditional concepts of replicability have to dominate. The issue here is that there is always a cost to any new standard. Time taken to provide unnecessary and superfluous details is time taken away from blogging doing real work. Given a cost, the benefit needs to outweigh it - and you seem to be implying that even mentioning this is somehow 'old-school'. It's not, and if you want to bring people along with you on this, you need to be explicit about the costs as well as trumpeting the benefits, otherwise it's going to be seen as utopian and unrealistic. - gavin]

    Comment by Michael Tobis — 8 Feb 2009 @ 2:42 PM

  19. Dr. Schmidt writes:
    I gave this example to demonstrate not how clever I am (oooh! I can replicate!) but to highlight issues that people were discussing without nuance or practical examples.

    Perhaps he is not aware that the issue of replication in the ‘first culture’ has been explored with nuance and practical examples. So perhaps that explains when they are surprised/perplexed/infuriated that the other culture not only does not support it but actually trivializes the culture and claims superiority of their approach, apparently ignorant that other fields have some familiarity with non-experimental data.

    I recommend people in the other culture consider this paper
    http://economics.ca/cgi/jab?journal=cje&article=v40n3p0715

    And from that I draw this quote from a highly influential economist who ‘benefited’ from this type of replication.
    “The best model for this admission is Feldstein’s (1982) ‘Reply,’ the first sentence of which was ‘I am embarrassed by the one programming error that Dean Leimer and Selig Lesnoy uncovered but grateful to them for the care with which they repeated my original study.’”

    Dr. Schmidt is right, I was being too generous. I really would have given his replication a B-, but given that he misses the point of my comment I would adjust the grade down.

    [Response: One the guidelines in cross-cultural communication is that one should learn not to patronize people whose culture you don't appreciate. I am perhaps sometimes guilty of that, but so are you. If you want to move past snide insinuations of cultural superiority, you would be most welcome. There may well be lessons worth learning from other fields, but unless one recognises that different fields do have a different culture, imposing inappropriate standards that work well in one case may not be that fruitful. The biggest barrier is related to how results are valued in a field. I would venture to suggest that it is very different in economics than in climatology. We tend to grade based on getting the answer right, rather than the attitude of the student. - gavin]

    Comment by Chris Ferrall — 8 Feb 2009 @ 2:57 PM

  20. Michael, I replied just after Eric’s inline response — I pointed out how the guy first asked for “the code” then “your code” and finally “antarctic code” — each request making clearer he knew little, and escalating as Eric was leaving and couldn’t respond.

    You didn’t know who the guy was at the time. Would that have changed how you responded — to his specific request, at that particular time, to Eric in particular, on this?

    Hard cases make bad law, as they say. This guy’s request is not a real good example of an appropriate request for a scientist’s available time.

    There’s a reason scientists cooperate with other scientists. Because they can, eh? It’s mutual

    But this guy seemed more like: http://abstrusegoose.com/98

    Maybe he’ll publish, and prove me wrong. Let’s see.

    Comment by Hank Roberts — 8 Feb 2009 @ 3:09 PM

  21. My default assumption about both software developers and researchers is merely that they are fallible.

    You brought in the word “lying”, not us: “One unfortunate possibility is that the author of Study X might be lying “.

    Somebody who expects the code to work uses it in the manner they expect to work and might on that basis confidently say “this works flawlessly!” right before a QA engineer finds dozens of bugs in the same program by using it a slightly different way or different environment or different mindset.

    So you’re saying a QA engineer doesn’t simply replicate the tests done by the developer?

    Are you suggesting that a QA engineer does something a bit different than the developer, attacks the code in different ways than the developer, in order to learn whether or not it is robust. In other words, are you suggesting that a QA engineer acts ANALOGOUSLY TO THE EXAMPLE PROVIDED BY GAVIN ABOVE RATHER THAN SIMPLY MIMIC WHAT THE DEVELOPER DID?

    Gosh.

    Here’s another trick question: does a QA engineer ever design tests without reading the code being tested? Or would you claim that the only way a QA engineer can do their job is to be able to read the code before designing tests?

    When a company like MS or Apple releases beta versions of new operating systems to developers of third party software to test, do MS and Apple release full source of that operating system to each of these third party software developers?

    Do these thirty party software developers insist that the only way they can test the new version is to have full access to all the source?

    Comment by dhogaza — 8 Feb 2009 @ 3:12 PM

  22. I will note that the distinction between standard practice among scientists and those of other trained producers of digital products is remarkable, and that repeatability of a much higher order is built into commercial workflows everywhere

    But not into the production of software resulting from research into software engineering or computer science. Yet strangely you and the others aren’t calling for such standards in those fields.

    Comment by dhogaza — 8 Feb 2009 @ 3:18 PM

  23. Glen, #17 but the “issue” isn’t whether the code is reliable, but are the RESULTS reliable.

    And it’s actually BETTER if you don’t know the code that did it, because you have to make it up yourself.

    And with all your high-falutin’ checking going on, why are fly-br-wire systems not only made up of three independently written software codes but run on three different architectures? Either

    a) FBW system programmers don’t know how to write code
    b) systems checks don’t work reliably

    Comment by Mark — 8 Feb 2009 @ 3:23 PM

  24. Also …

    Somebody who expects the code to work uses it in the manner they expect to work and might on that basis confidently say “this works flawlessly!” right before a QA engineer finds dozens of bugs in the same program by using it a slightly different way or different environment or different mindset.

    The fact that someone might find that the software used to generate results reported in a paper fails if used differently is not relevant to the results reported in that paper. Think about it. A researcher won’t claim “this works flawlessly!”, only that “this worked flawlessly when used in our working (computer and software) environment, on a given dataset, in this way”.

    That’s a much weaker quality requirement than is necessary for generalized commercial software which is expected to work correctly under a wide range of unanticipated circumstances.

    Comment by dhogaza — 8 Feb 2009 @ 3:24 PM

  25. Gavin in reply to #18: “The issue here is that there is always a cost to any new standard. Time taken to provide unnecessary and superfluous details is time taken away from blogging doing real work. Given a cost, the benefit needs to outweigh it – and you seem to be implying that even mentioning this is somehow ‘old-school’. It’s not, and if you want to bring people along with you on this, you need to be explicit about the costs as well as trumpeting the benefits, otherwise it’s going to be seen as utopian and unrealistic.”

    There is no doubt that there is an activation barrier, but experience shows that the long term result is a net benefit in productivity for the individual researcher as well as for the community in codifying the workflow for every digital product.

    As Claerbout says in the CiSE article I linked: “I began inflicting this goal upon a team of graduate students – all our research should be reproducible by other people by means of a simple build instruction. … Although I made the claim (which was true) that reproducibility was essential to pass wisdom on to the next generation, our experience was always that the most likely recipient would be the author herself at a later stage of life.”

    Again, this is such common practice in industry, including applied sciences and engineering, that many readers assume it is common practice among scientists. On that basis alone it is difficult to see this simple technical advance as “utopian and unrealistic”.

    [Response: Well, most scientists are pretty much self-taught in everything useful and they are almost always working in an exploratory mode. This is a huge contrast from a large firm (think Google, Accenture or McKinsey) that spends millions of dollars training their employees to code the same way and use the same workflow methods on all their (very repetitive) projects. First there aren't the same level of resources, second the work is much less repetitive, and third no-one has designed workflow methods that are going to work over the large range of methods that scientists actually use. Methods just don't easily translate. - gavin]

    Comment by Michael Tobis — 8 Feb 2009 @ 3:31 PM

  26. Re 10.
    Gavin, in your short answer, you forget that biomass burning is a source of greenhouse gases: CO2, CH4 and N2O.

    [Response: Yes of course. I was thinking too far ahead... and worrying about the pre-industrial biomass burning estimates that we need for our new control runs that haven't been released yet.... sorry. - gavin]

    Comment by Aye — 8 Feb 2009 @ 3:35 PM

  27. Ross McKitrick did prepare a response to the argument re spatial correlation. http://www.uoguelph.ca/~rmckitri/research/jgr07/SpatialAC.pdf

    According to his website: “I submitted it [the paper] to JGR. The editor said that it is, technically, a response to comments from critics, but none of our critics have submitted their comments for peer review, so they cannot proceed with the paper.” http://www.uoguelph.ca/~rmckitri/research/jgr07/jgr07.html

    [Response: Well the editor is right. One can't submit a response to comment that hasn't been submitted. However, he could have written a new paper discussing this issue more thoroughly. My take on his preprint is that his conclusion that 'zero spatial correlation cannot be rejected' is astonishing. Fig 4. in my paper indicates that the d-o-f of all the fields is significantly less (and sometimes much, much less) than what you would get in the zero spatial correlation case. - gavin]

    Comment by Brian Rookard — 8 Feb 2009 @ 3:40 PM

  28. Regarding portability, I think it’s a red herring.

    Steig et al had no dependencies other than Matlab and Tapio Schneider’s published library. Accordingly, portability is not an issue in the case at hand. Their scripts (if they exist) should work anywhere that has Matlab, and fail instantly with “Matlab not found” elsewhere.

    In other cases, portability may be a bigger deal. Many of the difficulties in portability trace directly to the use of Fortran, whose design predates many contemporary standards. Things that work on one configuration commonly fail on another, which is one of the main reasons to argue that Fortran is a huge productivity sink.

    Even in the worst case, like say a multi-parallel executable Fortran90/MPI/infiniband configuration running on a queue-managed cluster (sigh), whatever, a build from source and data to final output should be achievable in a single script locally. Such a script, although it cannot work everywhere, is surely helpful to others attempting to replicate results elsewhere but can be crucial to local workers attempting to revive dormant research projects.

    Comment by Michael Tobis — 8 Feb 2009 @ 4:02 PM

  29. Gavin,

    With all due respect, I think you misread the conclusion.

    RM states in the conclusion … “Across numerous weighting specifications a robust LM statistic fails to reject the null hypothesis that no spatial autocorrelation is present, indicating that the estimations and inferences reported in MM07 are not affected by spatial dependence of the surface temperature field.”

    The null hypothesis was “no spatial autocorrelation is present.”

    They could *not* reject the null hypothesis – meaning that “no spatial autocorrelation is present.”

    Too many double and triple negatives – and I had to re-read it to make sure that I wasn’t cracked.

    [Response: Right. I think that is very unlikely. There is lots of spatial correlation in the data. - gavin]

    Comment by Brian Rookard — 8 Feb 2009 @ 4:06 PM

  30. Tangential, but perhaps relevant — snapping photographs of people’s work at poster sessions is increasing.
    Bad? http://network.nature.com/people/noah/blog/2008/09/10/poster-session-paparazzi

    Hat tip to Bryan Lawrence, who addresses that, critically, down the page here:
    http://home.badc.rl.ac.uk/lawrence/blog

    “… he sees “taking” (information) without “giving” (feedback) as not keeping up with the takers part of a two-way process. He’s also worried about what he calls “espionage”, and data getting discussed before it’s peer reviewed.

    “Oh please!

    “Firstly, as to the taking without giving: In some communities, presenting is the price of attendance ….”

    Comment by Hank Roberts — 8 Feb 2009 @ 4:14 PM

  31. The stata script provided did aid replication; it is just that Gavin did not want to stump up the hundreds of $ to purchase STATA. Whether it costs money to do an experiment, is a different issue from whether you can replicate that experiment; so conflating the two issues does not help.

    This commentary seems to consider only the small world of climate science. There is a larger environment (science generally), where replication of results is a critical issue. There are several important papers that have been withdrawn (e.g. Science Vol. 277. no. 5325, pp. 459 – 463) after failure to replicate (e.g. Nature 385:494). Professor JPA Ioannidis has made a career out of pointing out where epidemiology studies show poor replication, and the consequent implications for clinical practice. Given that even replication is such a high hurdle, it is very helpful to have all the information to be able to replicate, rather than an unusably terse subset.

    per

    [Response: I'm not saying whether it could have potentially been useful, but in this case it wasn't. But is your point that replication is fine if it's only theoretical (i.e. if everyone bought STATA)? - gavin]

    Comment by per — 8 Feb 2009 @ 4:16 PM

  32. Gavin wrote: “One the guidelines in cross-cultural communication is that one should learn not to patronize people whose culture you don’t appreciate…The biggest barrier is related to how results are valued in a field. I would venture to suggest that it is very different in economics than in climatology. We tend to grade based on getting the answer right, rather than the attitude of the student.”

    Yup – economists don’t care about getting the answer right, just the attitude of the student. Ben Bernanke doesn’t care about fixing the current problems in the US, just that people feel OK about it. I think you might have confused economists with politicians. Care to reconisder your characterisation of economics?

    [Response: My comment wasn't directed at economics, just Dr. Ferrall. ;) - gavin]

    Comment by JS — 8 Feb 2009 @ 4:24 PM

  33. Interesting post. But I’d like to present a counter example:

    I’ll use the Steig, et al paper in my example. Suppose I’m interested
    in exploring RegEM, but with a different regularization scheme and I’d
    like to compare the results of my new scheme with the results obtained
    by Steig. I decide that I’ll use the MATLAB code referenced at Steig’s
    Web site as a starting point to save time and add my regularization
    method as a new option. Unfortunately, I don’t know and/or don’t like
    MATLAB (the language used for the Antarctic analysis; since Gavin
    (still) uses FORTRAN he can probably identify with this!), but am
    proficient in R and decide to port the code. As a test, I’d like to
    run the analysis using the TTLS calculation described in the
    paper. The Steig site, however, only contains pointers to the
    Antarctic station data and AVHRR satellite data, so I download the
    data from those sites, convert it, and run the analysis using my
    freshly ported R code. I look at the results, compare it to the Steig
    analysis and it doesn’t match.

    How do I determine what went wrong? Is it my code, or has the data
    changed in some way? Have I made an error when converting the data?
    Note that I’m not just trying to reproduce the results of the paper as
    an exercise, but as a means of testing a new hypothesis — that my new
    regularization scheme is more reliable than TTLS. Now I have to do a
    lot of tedious debugging to determine the source of the problem.

    To make matters worse, what if I’m analyzing this data five years from
    now and funding has been cut for archiving the Antarctic data so the
    data is no longer available? Or the links referenced in the paper have
    changed? Or, suppose the algorithm used to generated the Tir data from
    AVHRR has changed?

    In short, I believe it’s extremely useful to have both code and source
    data archived. Really, it’s not that difficult to do. And while it’s
    true that ease of replicability doesn’t increase the quality of the
    science, it does make it easier for others to build on that science.

    [Response: Two issues (at least). Eric's page is not a permanent archive either - and my guess is that BAS is a more robust institution than a faculty page at UW. But it does underline the real lack of a permanent citeable databases with versioning. That's not Eric's fault though. As part of those databases, I would love to see upload areas for analysis code that could be edited and improved collectively. Such a system would deal with all your needs and would allow us to build more effectively on the science and as you know, I am advocating strongly for such a thing. But, since that system doesn't yet exist, everything possible is a compromise, which since it is not ideal, is always open to criticism. - gavin]

    Comment by Joe S — 8 Feb 2009 @ 4:26 PM

  34. Dr. Schmidt replied: “The biggest barrier is related to how results are valued in a field. I would venture to suggest that it is very different in economics than in climatology. We tend to grade based on getting the answer right, rather than the attitude of the student.”

    Anyone might learn something from taking an econometrics course that required a replication exercise just like the one Dr. Schmidt carried out using publicly provided data and code from the authors. However, it appears people from the other culture might refuse to submit their replication code to the teacher on the grounds that it is better if the teacher reads the methods section and writes their own code. I guess in Dr. Schmidt’s class the person would get an A for following the accepted norms. In my class such a student would get marked down for not understanding the bigger point (even while benefiting from it!).

    Comment by Chris Ferrall — 8 Feb 2009 @ 4:39 PM

  35. Just as a reminder of the benefits of writing one’s own code to deal with data sets (I found this for a friend just now), one of the more famous cases (also discussed at RC in the past) illustrates the reason to write one’s own code and not rely on even long-used prior work:

    http://jisao.washington.edu/print/reports/2006_AnnualReport.pdf

    “… The apparent lack of consistency between the
    temperature trends at the Earth’s surface and aloft was troubling because climate models indicate temperatures aloft should be rising at least as rapidly as temperatures at the Earth’s surface. Fu and collaborators devised a new algorithm for retrieving temperatures from the satellite measurements.

    “In contrast to previously published work, their results indicated that the lower atmosphere is warming at a rate consistent with model predictions. Subsequent studies by other groups have borne out the reliability of Fu et al.’s trend estimates and they have shown that the algorithm used in prior estimates had a sign error which led to spuriously small trends. These studies lend greater confidence to the detection of human-induced global warming and they serve to reduce the level of uncertainty inherent in estimates of the rate of warming.”

    ————————-

    Comment by Hank Roberts — 8 Feb 2009 @ 4:42 PM

  36. #13, #19: Chris Ferrall,

    On replication in mathematics: I recall being told by a math prof years ago that the easiest way to understand and check the work of another mathematician is NOT to pore through his proof, but to focus on the major milestones and prove them oneself.

    In other words, focus on verifying the general conceptual consistency, not the specific steps taken. There is generally more than one way to skin a cat, and individuals may differ on which tools they like to do the skinning.

    This is much closer to Gavin’s philosophy of replication of results than it is to M&M’s “turn over all your code” approach.

    Comment by Neal J. King — 8 Feb 2009 @ 4:51 PM

  37. Many of the difficulties in portability trace directly to the use of Fortran, whose design predates many contemporary standards. Things that work on one configuration commonly fail on another, which is one of the main reasons to argue that Fortran is a huge productivity sink.

    Portability isn’t necessarily a design requirement for a lot of research-related software.

    Hell, it’s not even a design requirement of a lot of software that only runs on (say) Windows. C# applications, for instance.

    Comment by dhogaza — 8 Feb 2009 @ 4:54 PM

  38. Once again rather than the general, let’s be specific. I believe it removes the issues of how much extra work would be created. Dr. Steig has said that he is willing to provide the data to legitimate researchers. My response is to simply post what he would provide. I still haven’t heard from Dr. Schmidt what the objection is to that concept.

    Also as to the specifics in Dr. Steig’s paper. I believe that there is probably sufficient information on AWS trends. However I don’t think there is sufficient information to reproduce the gridded AVHRR temperature results. They are quite dependent on corrections for clouds, and manipulation to produce temperature values as I understand it.

    [Response: Joey Comiso is apparently working on making that available with appropriate documentation - patience. - gavin]

    Comment by Nicolas Nierenberg — 8 Feb 2009 @ 5:30 PM

  39. re. #27 — McKitrick will be posting his response tomorrow, apparently, with a preliminary comment on the thread “Gavin on McKitrick and Michaels” just begun this afternoon on CA. It should be an instructive exchange, or at least I hope it will be.

    Comment by wmanny — 8 Feb 2009 @ 5:31 PM

  40. Excellent post.

    1) No need to apologize for Fortran! It has often been said (starting with Backus, I think):
    Q: what high-performance language will we be using in year xxxx?
    A: don’t know, but it will be called Fortran.

    xxxx has generally been picked to be 10-20 years away. After all, some hoped that Algol 60 and then PL/I would make Fortran go away… and certainly, Fortran 90/95 have come a long way from Fortran II or IV.

    2)I’d missed that mess on protein-folding, but not surprising, given how touchy those things are.

    3) Like I said in earlier thread:

    a) Many people keep over-generalizing from subsets of computing applications to the rest. Sometimes there are reasonable arguments, sometimes it feels like Dunning-Kruger.

    b) People keep talking about version control, makefiles, rebuild scripts, etc. Many of the modern versions of those are rooted in code and methodologies done at Bell Labs in the 1970s, by various BTL colleagues. In many cases, the tool code has been rewritten (like SCCS => RCS => CVS, for example, and the current make’s have evolved from Stu’s original), but we know where the ideas came from. Likewise, in statistics, S came from Bell Labs (John Chambers) about then, and of course, John Tukey was around to stir people up to do meaningful analyses rather than torture data endlessly.

    We were quite accustomed to using toolsets for automation and testing that went way beyond those widely available. [Which meant: we tried pretty hard to get our stuff out to the industry, despite the best efforts of certain lawyers worried about Consent Decrees and such. Of course old-timers know open-source code, especially in science, goes back probably to ~1948, maybe earlier, certainly much of modern, non-vendor-user-group open source approaches are rooted in those BTL efforts, although they were hardly the earliest. SHARE and DECUS go way back.]

    But still, inside BTL, the amount of machinery and Q/A done varied tremendously from {code written to analyze some lab data by a physics researcher} to the {fault-tolerance and testing done for an electronic switching system whose design goal was ~1980), but much is still quite relevant.

    I especially recommend, page 182 (in one of 1995′s new chapters, “No Silver Bullet – Essence and Accident in Software Engineering”):

    “I believe the hard part of building software to be the specification, design, and testing of this conceptual construct, not the labor of representing it and testing the fidelity of the representation.” (emphasized in the original)

    Summarizing several chapters: we’ve done a lot of automate the simple stuff, but building the right thing is something different. Fred tends to talk more about software *products* of course, so this is a slightly different domain, but I think the principle applies.

    I know of two very large BTL 1970s projects, whose project methodologies were OK, who dedicated huge resources to Q/A, who were using automation tools way ahead of much of the industry at the time, with complex testframes able to provide workloads to Systems Under Test, etc, etc … and they both failed, miserably, because they turned out not to be the right products.

    4) Maybe it’s worth enumerating other examples around climate science. For example, one might consider the UAH-vs-RSS example, in which the error was *not* discovered by having hordes of people paw through code.

    Comment by John Mashey — 8 Feb 2009 @ 5:39 PM

  41. post #11 said: So the “skeptic” position is that climate scientists are lying or incompetent until one can prove that they aren’t

    That is not a correct definition of a skeptic.
    http://en.wikipedia.org/wiki/Skepticism

    an excerpt here:

    A scientific (or empirical) skeptic is one who questions the reliability of certain kinds of claims by subjecting them to a systematic investigation. The scientific method details the specific process by which this investigation of reality is conducted. Considering the rigor of the scientific method, science itself may simply be thought of as an organized form of skepticism. This does not mean that the scientific skeptic is necessarily a scientist who conducts live experiments (though this may be the case), but that the skeptic generally accepts claims that are in his/her view likely to be true based on testable hypotheses and critical thinking.

    Comment by Ed (a simple old carpenter) — 8 Feb 2009 @ 5:47 PM

  42. To amplify on Neal J. King’s point in #36 with respect to the hypothetical situation described by Chris Ferrall in #13 (“A person states a theorem and provides a detailed proof. Another person confirms the proof but notices that one step includes “assume a solution to f(x)=0 exists.”) In fact, mathematicians never give all the details of their proofs (with the exception of a few logicians, and, even among them, precious few since Russell and Whitehead 100 years ago). Recently, there was a bit of a kerfuffle over the question of whether Grisha Perelman’s proof of the Poincaré conjecture was really a proof or just a suggestion of a proof. The concensus was that it qualified as a proof if a professional in the field could fill in the missing steps without the need for truly original work. Projects were undertaken to fill in the details, but even these efforts were not intended to provide a level of detail that a non-professional could follow. That’s just the way things are done in the real world, even in the totally rigorous field of mathematics. It seems to me both unreasonable and unproductive to expect a different standard from climatology.

    Comment by S. Molnar — 8 Feb 2009 @ 6:04 PM

  43. As part of those databases, I would love to see upload areas for analysis code that could be edited and improved collectively. Such a system would deal with all your needs and would allow us to build more effectively on the science and as you know, I am advocating strongly for such a thing. But, since that system doesn’t yet exist, everything possible is a compromise, which since it is not ideal, is always open to criticism.

    Why wouldn’t something like Google Code (or something similar) work for the code portion of this?

    [Response: There is some merit to that idea....how would it work in practice? - gavin]

    Comment by Joe S — 8 Feb 2009 @ 6:04 PM

  44. That is not a correct definition of a skeptic.

    Which is why I put “skeptic” in quotes, which are often used to indicate sarcasm.

    Comment by dhogaza — 8 Feb 2009 @ 6:06 PM

  45. Dr. Schmidt, thank you for the response to #38. I am in no hurry at all, and I am very happy to hear that this will be done. I know the world doesn’t change in a day.

    Comment by Nicolas Nierenberg — 8 Feb 2009 @ 6:07 PM

  46. I am reminded of some work I did a long time ago, when we were simulating the behaviour of a bolometer maintained at a constant temperature with a feedback loop. The simulation, a curve fit combined with a fourth order adaptive Runge-Kutta had no free parameters (!) and reproduced the data extremely well. We were very pleased and proceeded on our way. Some time later I reused this approach for another problem, and discovered that the algorithm was seriously flawed (an LU decomposition subroutine was horribly miscoded) so i spent some considerable time redoing a buncha calculations. Amazingly, in this case, the error made absolutely no difference ! So I spent even more time analysing why this was so. So publishing the original flawed code would have not really helped. In any event, this simulation was only a small part of the publication (a faster, less complicated simulation also worked, with experimentally determined input parameters)

    Another point I ought to make here regards the use of closed source software. In my work we use symbolic algebra packages since the days of REDUCE and MACSYMA, these days the closed source Mathematica, Maple and others. I recall a case where both Mathematica and Maple got the answer wrong, (although in fairness, Mathematica made more sophisticated errors, and Maple failed naively…) and we took some considerable time writing and verifying our own routines (three independent routines written by three different people cross checked against each other) before we were satisfied. In retrospect, we ought to have done this earlier since the error took about a person year to resolve. Someday, if Mathematica and Maple are ever open sourced, I might devote some of my copious spare time (not!) to digging out the flaw, if it still exists.

    Hoe did we find the error? Because, as we proceeded, we were always doing pen and paper approximations to the calculations and comparing with the experimental results. We did not rely entirely on machine calculation.

    In this second case, we did convey bug reports. In both cases, I feel it is better to let the reviwers and readers, satisfy themselves with their own calculations that the results are correct, rather than have them use our own, possibly erroneous, code.

    Comment by sidd — 8 Feb 2009 @ 6:13 PM

  47. 1. Pretty moderate and pleasant discussion going on. Kudos to my side and to Gavin for that.

    2. Captcha is giving me fits.

    3. I don’t know why some of my posts get through and others not (even moderate remarks). It’s hard to be engaged in the discussion and/or make meaningful remarks with that much uncertainty that they will get posted.

    4. Better methods descriptions are a longstanding cry from science purists (Katzoff, Wilson).

    5. For papers that use elaborate or new or both statistics or that turn mostly on the data analysis (as opposed to someone doping silicon and measuring conductivity and doing a 1/T plot and getting best fit on it and slope of that) it is a really good idea to show more thorough methods descriptions. Will even improve quality of the work for ther workers.

    Comment by ApolytonGP — 8 Feb 2009 @ 6:30 PM

  48. Nice post, Gavin. However, while I agree that exact reproducibility is not as crucial as the fundamental science, in this era of ever increasing amounts of data, I think it is important to keep track of data and methods. You mentioned emailing authors for clarifications. What happens if you want to try to reproduce results/methods in light of new information 10 years from now and the authors are no longer working in the field and/or no longer have records of their data or processing steps?

    That is why it is crucial to provide and preserve solid documentation on the data and methods used. I have seen numerous papers that use “NSIDC sea ice” in their methods, with no reference to the exact dataset or version use. While it may be that as in the case shown above this doesn’t matter much, but in another case it could be crucial. I urge all scientists to be vigilant in making sure that refereed journal articles not only provide solid scientific results, but also solid information on their data and methods.

    Walt Meier
    National Snow and Ice Data Center

    [Response: Hi Walt, well in ten years time, most of this kind of analysis being done now will be obsolete (not the methods, just the results) since much more data will hopefully be available. Your larger point is well taken, and that goes to the citeability of datasets - especially when they are continually evolving. I asked on a previous thread whether anyone knew of a specific database suite that gave all of that functionality (versioning of binary data, URL citeability of specific versions, forward citation to see who used what etc.), but other than vague references, there wasn't anything concrete. Perhaps NSIDC are working on such a thing? - gavin]

    Comment by Walt — 8 Feb 2009 @ 7:09 PM

  49. [Response: There is some merit to that idea….how would it work in practice? - gavin]

    1) Code author creates account at Google (he already has an account if he has a GMail address)
    2) Author creates a project and uploads code
    3) Author determines which other users are allowed to upload/modify the code database
    4) Any user can download code, but only authorized users are allowed to modify code/data.

    Google code uses subversion as the source code control system. Note that subversion allows versioning of binary as well as text files. Don’t know what the limit on storage at Google is, but it is possible that smaller datasets as well as code could be stored in this way.

    I asked on a previous thread whether anyone knew of a specific database suite that gave all of that functionality (versioning of binary data, URL citeability of specific versions, forward citation to see who used what etc.), but other than vague references, there wasn’t anything concrete. Perhaps NSIDC are working on such a thing?

    This would be ideal. But even a system without version control might satisfy most of these requirements — for instance, if the data can be tagged with text and a URL, then it’s pretty easy for a user to figure which version is which, especially if the number of versions are small. We’ve developed (warning: shameless plug) a system (in beta) at PMEL that allows users to archive, tag and visualize gridded netCDF data. We might be able to expand it to support text data. Or, perhaps Google code will be adequate for most users. My point is that there are solutions to this problem that are available now — not perfect, but good enough.

    Comment by Joe S — 8 Feb 2009 @ 7:55 PM

  50. Gavin
    if you are looking for examples of database versioning, you could look at the DNA and protein databanks. Individual data entries are versioned, text description is versioned and the whole database release is versioned. See, e.g. :
    http://www.ebi.ac.uk/embl/
    example versioned file:
    http://www.ebi.ac.uk/cgi-bin/sva/sva.pl?query=X56734&search=Go&snapshot=

    Re: Stata, my point is that you could have bought Stata (it is commercially available), and you could have done a direct replication if it was important. In this case, there seems to be no difficulty in repeating the results. In other cases, there are difficulty in repeating the results; and in those cases, it is enormously helpful to be able to replicate exactly.

    Comment by per — 8 Feb 2009 @ 8:51 PM

  51. Ed #41 the carpenter:

    That is not a correct definition of a skeptic.

    There’s a reason why “skeptic” was in quotation marks ;-)

    Comment by Martin Vermeer — 8 Feb 2009 @ 11:27 PM

  52. You said:
    “MM07 used an apparently widespread statistics program called STATA and archived a script for all of their calculations. While this might have been useful for someone familiar with this proprietary software, it is next to useless for someone who doesn’t have access to it. STATA scripts are extremely high level, implying they are easy to code and use, but since the underlying code in the routines is not visible or public, they provide no means by which to translate the exact steps taken into a different programming language or environment.”

    That statement is not exactly true. There are a limited number of resources available to guide people in the conversion of STATA scripts to R:

    http://wiki.r-project.org/rwiki/doku.php?id=getting-started:translations:stata2r

    http://wiki.r-project.org/rwiki/doku.php?id=guides:demos:stata_demo_with_r

    It may be time consuming but it is doable without the proprietary software.

    Alternatively, I am sure McKitrick would be happy to provide an alternative R script for the analyses. Well he would if he values open replication of results.

    You only have to ask I guess.

    Comment by Richard Steckis — 9 Feb 2009 @ 12:08 AM

  53. 21. dhogaza:

    Yes, QA performs both types of testing, but having the info available to exactly replicate a usage case does not preclude more elaborate or independent tests! So far as I can tell from Gavin’s example, MM07′s data provision is strictly better than dLM07′s because it’s the only one that allows both precise and approximate reconstruction. You have the freedom to roll your own methods or include your own alternative data sources and the security of being able to exactly replicate the reference result set if you get confused by a possible bug and need to track down where in the process differences are creeping in – the best of both worlds.

    (And it sounds like both MM07 and sLM07 are strictly better than most of the studies that skeptics complain about. Maybe standards are improving.)

    Most of the time when a developer “lies” about how his code works he doesn’t know he’s lying because he has fooled himself too. Again, this is human nature – I don’t regard it as a symptom of either dishonesty or incompetence. (It is often said that QA “keeps developers honest” in regard to claims made about the code, and that’s part of the role I see for “auditors” as well.)

    31. Gavin:
    Yes, replication is fine if it’s only theoretical. When the issue comes up, what we code-obsessed nitpickers want to see is the actual code you used to generate your result, even if it’s in COBOL or uncommented APL or requires million-dollar hardware. What matters is that somebody could in theory reproduce your work precisely if they really needed to; that, for us, is part of what makes the work “science”. And yeah, you might still hear people whine about other issues – insufficient comments or inelegant code or whatever, which is fine, but all they have a reasonable right to expect is to see the code you yourself used, in whatever state it’s in. Anything more is gravy.

    You wrote “the total emissions I came up with differed with the number given in the paper. A quick email to the author resolved the issue”. Ah, but what if the author of the paper got hit by a bus? What if he’s too busy to respond or just doesn’t feel like doing so? MM07-style disclosure means you don’t have to worry about any of that; those who want to see what was done can figure it out if they need to.

    Comment by Glen Raphael — 9 Feb 2009 @ 12:21 AM

  54. Yes, QA performs both types of testing, but having the info available to exactly replicate a usage case does not preclude more elaborate or independent tests!

    Of course. But in the context of results reported in a research paper – hate to belabor the point, but you’ve apparently missed it – replication of the “usage case”, i.e. the computation(s) used to generate those results is all that’s important regarding the correctness of those results.

    Someone above posted an example of exactly what I’m stating, i.e. that extended use of some code showed a serious bug, but the work done in the generation of results for a particular paper didn’t hit that bug. Therefore … the conclusions of the paper were not affected.

    Most of the time when a developer “lies” about how his code works he doesn’t know he’s lying because he has fooled himself too

    And quit trying to back off from your original statement regarding the possibility that a researcher might be lying about their research results. Not “lying”. Lying. Your meaning was clear. If you regret it, man up and apologize.

    Comment by dhogaza — 9 Feb 2009 @ 1:07 AM

  55. dhogaza #55, he CAN’T man up and apologize. Doing so would be admitting being wrong. And his entire point is that if you’re wrong in one thing, no matter how small, you can be readily accused of being wrong elsewhere in bigger things and you then have to prove you’re NOT wrong.

    So no apologizing is allowed.

    Comment by Mark — 9 Feb 2009 @ 5:26 AM

  56. Glen Raphael writes:

    what we code-obsessed nitpickers want to see is the actual code you used to generate your result, even if it’s in COBOL or uncommented APL or requires million-dollar hardware. What matters is that somebody could in theory reproduce your work precisely if they really needed to; that, for us, is part of what makes the work “science”.

    Then you don’t understand what reproducibility means in science. It means you get the same results, not you use exactly the same procedure and code. If you can’t reproduce the same results independently, you aren’t really doing any useful work. You need the method and the algorithm. You don’t need the code.

    Comment by Barton Paul Levenson — 9 Feb 2009 @ 5:42 AM

  57. I do find this very puzzling. They have data, they have code, they have run the code against the data. Why would they not want to publish it? It makes no sense. You have lots of more or less convoluted and emotional points here about this, but that’s the bottom line: why on earth not?

    Don’t think this is missing something. Its the whole and only point.

    Comment by michel — 9 Feb 2009 @ 7:40 AM

  58. 58, why does publishing it make sense? More work. No benefit. cost/benefit analysis negative.

    Pop along to Oracle and tell them it makes no sense to have closed source with copyrights, since open source with copyrights gives them the same power.

    Comment by Mark — 9 Feb 2009 @ 8:01 AM

  59. michel, you wouldn’t be able to run the code if it bit you in the bacon. The data and the code are out there to the satisfaction of any competent scientist. All you have to do is call little scripts in the right order on the right intermediate files. The Unix way, the science way. There is no end-user hand-holding mega-script. Writing one would be a major additional chore — precisely the intention of the bitchers.

    I know from experience, been there, done that. You don’t. That’s a big point you’re missing. Figure it out.

    Comment by Martin Vermeer — 9 Feb 2009 @ 8:09 AM

  60. 55. dhogoza
    56. mark

    I regret using the word “lying” in such a politically-charged context. I apologize for doing so.

    Although I didn’t intend by my generic hypothetical example to accuse anyone in particular of doing anything in particular, I can see how the background context of this discussion might have suggested one or more specific targets. I did not intend that implication, do regret it, and will try to be more careful when using similar language in the future.

    Comment by Glen Raphael — 9 Feb 2009 @ 8:58 AM

  61. #56 BPL:

    “You need the method and the algorithm”

    I have looked at McKitrick’s code. I am half way through converting it to R code. I have only learn’t R over the last month or so and do not know STATA at all. But getting the data and some of the analyses into R is a no-brainer.

    McKitrick has well documented his method and the algorithms used in the code file.

    He used Hausman’s Test which is apparently well used in Econometrics. R has a package that will perform the Hausman test.

    What I am trying to say is that with just the code and data files, McKitrick’s work is easily replicable.

    [Response: I certainly didn't disagree - in fact that was my main point. With a little expertise (with R in your case), all of these scientific results are quite easy to replicate. - gavin]

    Comment by Richard Steckis — 9 Feb 2009 @ 9:01 AM

  62. Hi Gavin,
    I had thought this post would be about the actual findings in your IJOC paper, and on that point I disagree with your interpretation of your results. But that can wait for another day. The immediate point of your post seems, to me, to be that there is a difference between reproducing results versus replicating an effect; and a difference between necessary and sufficient disclosure for replication. Full disclosure of data and code sufficient for reproducing the results does not ensure an effect can be replicated on a new data set: Agreed. But that is not an argument against full disclosure of data and code. Such disclosure substantially reduces the time cost for people to investigate the effect, it makes it easy to discover and correct coding and calculation errors (as happened to me when Tim Lambert found the cosine error in my 2004 code) and it takes off the table a lot of pointless intermediate issues about what calculations were done. Assuming you are not trying to argue that authors actually should withhold data and/or code–i.e. assuming you are merely pointing out that there is more to replication than simply reproducing the original results–one can hardly argue with what you are saying herein.

    [Response: And why would anyone want to? ;) - gavin]

    I do, however, dispute your suggestion that I am to blame for the fact that dispensing with the spatial autocorrelation issue has not appeared in a journal yet. Rasmus posted on this issue at RC in December 2007. I promptly wrote a paper about it and sent it to the JGR. The editor sent me a note saying: “Your manuscript has the flavour of the ‘Response’ but there are no scientists that have prepared a ‘Comment’ to challenge your original paper. Therefore, I don’t see how I can publish the current manuscript.” So I forwarded this to Rasmus and encouraged him to write up his RC post and submit it to the journal so our exchange could be refereed. Rasmus replied on Dec 28 2007 “I will give your proposition a thought, but I should also tell you that I’m getting more and more strapped for time, both at work and home. Deadlines and new projects are coming up…” Then I waited and waited, but by late 2008 it was clear he wasn’t going to submit his material to a journal. I have since bundled the topic in with another paper, but that material is only at the in-review stage. And of course I will go over it all when I send in a reply to the IJOC.
    Briefly, spatial autocorrelation of the temperature field only matters if it (i) affects the trend field, and (ii) carries over to the regression residuals. (i) is likely true, though not in all cases. (ii) is generally not true. Remember that the OLS/GLS variance matrix is a function of the regression residuals, not the dependent variable. But even if I treat for SAC, the results are not affected.

    [Response: I disagree, the significance of any correlation is heavily dependent on the true effective number of degrees of freedom, and assuming that there are really 400+ dof in your correlations is a huge overestimate (just look at the maps!). I'm happy to have a discussion on how the appropriate number should be calculated, but a claim that the results are independent of that can't be sustained. I look forward to any response you may have. - gavin]

    Comment by Ross McKitrick — 9 Feb 2009 @ 9:40 AM

  63. #58 why do you keep bringing up Oracle? I’m sure you realize that a commercial developer of software, where the software itself is the product, has no connection to the output of a scientific researcher.

    Other than a couple of people the conversation seems to be converging on the fact that providing the code and data is preferable. Dr. Schmidt has said that this will be done in the case of the Steig paper. [editor note: this was done in the case of Steig et al with respect to code though perhaps not with as much hand-holding as you seem to want. some of these data are proprietary (NASA), but will be made available in the near future]

    By the way the statement that scientific replication means getting the same result with your own methods just isn’t correct. In the case of experimental data, the idea is to produce the same result using the same methods. It is very critical in the case of experimentation to completely understand the methods used in the original experiment.

    It is also possible then to vary the methods to see if the result is robust.

    In this case the purpose of replication is to make sure that you completely understand what was done before varying the tests or adding new tests to see if the result is robust.

    Comment by Nicolas Nierenberg — 9 Feb 2009 @ 10:34 AM

  64. > vary the methods to see if the result is robust

    I believe this is not what “robust” means in this field. Would someone knowledgeable check the sense there? As I understood it “robust” means some of the data sets can be omitted — for example if you’re looking at ice core isotopes, tree ring measures, and species in a sequence of strata, a robust conclusion would be one where any one of those proxies could be omitted.

    I am sure that given complete access to someone’s work, it would be possible to vary the method used until it broke beyond some point and, if one were a PR practitioner, then proclaim it worthless.

    You can break anything. Point is do you know the limits of the tool, within which it is usable and behond which it fails, and how it fails.

    Comment by Hank Roberts — 9 Feb 2009 @ 11:32 AM

  65. #58 why do you keep bringing up Oracle?

    The claim was made that open source is “standard practice” in the commercial world. I originally provided a set of counterexamples which then devolved into nit-picking attempts to demonstrate that since Oracle does support a handful of open source projects, that the “standard practice” claim is correct.

    Other than a couple of people the conversation seems to be converging on the fact that providing the code and data is preferable.

    Which is different than demands that it is MANDATORY and that scientific results can’t be trusted unless a source code is available.

    Which, of course, means that any paper based on the results of STATA or other proprietary software needs to be immediately rejected by the “auditing” community, if they follow their demands to a logical conclusion.

    Which they won’t, as long as the “right people” are using STATA. As evidence I note that the rock-throwing “auditing” crowd isn’t demanding full source disclosure, or the use of open source software, from M&M.

    I can’t think of any reason for this double standard … (snark)

    Comment by dhogaza — 9 Feb 2009 @ 12:03 PM

  66. Many of those concerned about fraud and mistakes in scientific publications seem to forget about or misunderstand the peer review process. Top journals reject a high percentage of submitted papers and scientific reviewers typically do a good job of critical, objective evaluation. Acceptance of a paper depends on a number issues including, most importantly, the support of conclusions by evidence and the originality and significance of the results.

    People who submit manuscripts for peer review (or act as editors or reviewers) know how detailed and picky but sometimes helpful and insightful reviewer can be. Authors also appreciate detailed and critical reviews because they want their papers to be as clear and as error-free as possible when they appear in print. Often it is most helpful when a reviewer expresses a misunderstanding, because this usually means that the authors were unclear, even to specia-lists.

    I’ve been a reviewer or editor for about 600 manuscripts submitted to scientific peer reviewed ecological journals (something that I need to document for annual reports). Scientists place trust in the review process and I think that this trust is well placed. This does not mean that no questionable work gets through. However, peer reviewers are very good at finding mistakes, misstatements, questionable citations, problems with statistics and data analysis, etc. When a reviewer feels that more documentation or basic data are needed he or she can ask for it. If a paper is considered a potentially significant contribution, editors give authors a chance to defend and/or revise their manuscripts in response to the reviewers’ comments. Often times, controversial (and important) papers receive very critical review but different reviewers may criticize for completely different reasons. In this situation, in my opinion, the editor has a rational for supporting the authors in many cases.

    Recently, many journals are allowing authors to submit “supplemental online supporting material” with more details about methods, statistics etc. I occasionally see a paper in my field which I think has basic problems, but most published scientific research is made credible by the review process. Often papers are unimpressive the first time I read them, but seem much more valuable when I become interested in the specific issues that they deal with. In contrast, blog postings are not reviewed and should viewed with skepticism unless the poster is an author talking about his or own peer reviewed publications.

    Comment by Bill DeMott — 9 Feb 2009 @ 12:12 PM

  67. Editor, all I have seen is a pointer to a set of routines implemented in matlab. It is far less than “hand holding” to point someone to a routine library and say that some combination of these routines were used. I am not saying that replication is impossible with this amount of information, but it would be a lot of work with an uncertain outcome. Admittedly I haven’t spent a lot of time on this, but I am sure that there is quite a bit in the use of those routines which is at the discretion of the user.

    Before someone starts talking about incompetent people etc. I am the founder of two software companies, and did most of the initial technical development on both of them. I am qualified to comment on this.

    I will repeat my earlier statement, which has not been contradicted, whatever Dr. Steig would make available to other researchers if asked should be posted. This appears to be more than a pointer to general purpose statistical routines.

    I also note that the idea that some of this is proprietary is new to these threads. What portion is NASA claiming as proprietary?

    Also nothing about that library resolves the questions about the gridded AVHRR data as far as I know.

    Contrasting the MM paper outlined here. Even if Dr. Schmidt didn’t use the particular software that was originally used he could see the code and comments which would allow him to determine what steps were taken. I don’t think Dr. Schmidt is contending that he replicated MM without reviewing that code.

    [Response: The code is too high level to be useful in replication if you aren't running the exact same software. Calculations are written "regress surf trop slp dry dslp" and "test g e x" and the like. The written descriptions in this case were much more useful. - gavin]

    Comment by Nicolas Nierenberg — 9 Feb 2009 @ 12:47 PM

  68. Before someone starts talking about incompetent people etc. I am the founder of two software companies, and did most of the initial technical development on both of them. I am qualified to comment on this.

    Well, I founded a compiler company, worked as a compiler writer for much of my life, and currently manage an open source project, which I founded, used quite widely in fairly arcane circles.

    And *I* don’t feel qualified to “audit” climate science papers, nor the statistics being evaluated, etc. Source to the software used isn’t really of help other than allowing me to say, “oh, it compiles and runs and gives the same answers in the paper”, but that’s it.

    And that’s true of the rest of the software engineering rock-throwing crowd that’s insisting here that they need the source code to properly vet such papers.

    This exercise is really only relevant to those who 1) really do believe that climate science is fraudulent and that researchers do lie about results (and it’s clear from reading CA and WUWT that this population exists and is very vocal in demanding code etc) or 2) don’t understand what is, and is not, of scientific value. Simply verifying that the same code run on the same dataset gives the same results really tells us nothing.

    It’s telling, don’t you think, that scientists working in the field aren’t those making these demands? That it’s really only amateurs who are asking?

    Do you ask for the source of the flight deck software before taking a flight? If you got it, could you, having founded two software companies, evaluate whether or not the software interacts properly with ground based navigation systems, correctly implements the model of the physics governing the flight characteristics of the airplane that allows modern cockpits to land an airplane automatically, etc?

    Now, the scientific community’s interested in increasing sharing in ways that meet their needs. It appears that online availability of datasets has moved a lot more quickly than the availability of source to code. This seems totally reasonable as it meets the needs of scientists to a far larger extent.

    Comment by dhogaza — 9 Feb 2009 @ 1:16 PM

  69. Gavin

    It is not really the topic of your post, but I don’t understand the importance of the problem.
    The TLT RSS trend is about the same that the surface temperature trend.
    Since 30 years the both are about 0.16°C/dec.
    The anthropogenic heat was, in 1998, 0.03W/m2 (AR4 chapter2).
    Very roughly, the anthropogenic heat trend is about 0.005W/m2.dec
    So if this latter should be considered as TOA forcing (it’s not the case) and if we suppose a climate sensitivity of 0.8°C.m2/W the temperature anomaly trend from anthropogenic heat should be about 0.004°C/dec.
    Thus, we can estimate that the RSS trend is “polluted” by anthropogenic heat only by 0.004°C/dec (if we suppose a constant lapse rate)or only 3%.
    Since RSS and surface trends are the same, can we say that the urban or economic effects are completely insignificant on surface trends?

    [Response: Well, that presupposes that RSS is correct and UAH wrong - which is unclear. It is clear that there is structural uncertainty in those numbers which makes it difficult to assess whether there is a true discrepancy between them and the surface data.Thus it isn't a priori a waste of time to look for surface effects - as I suggest in the paper, there may be issues with local climate forcings being badly specified as well as potential contamination of the signal by noise. However, you are correct, the ancilliary data (ocean heat content increases, phenology shifts, glacier retreat, Arctic sea ice etc.) all point towards the surface data not being fundamentally incorrect. - gavin]

    Comment by pascal — 9 Feb 2009 @ 1:24 PM

  70. Good post. In response to your response to Walt (#48)… I don’t know of an online system that provides all the data-citation functionality, but it is certainly something the data archiving community is working on. As one example, the International Polar Year data policy requires formal data acknowledgement and IPY provides some guidelines on “how to cite a data set” (http://ipydis.org/data/citations.html), but it is still an evolving practice. There are technical challenges that are slowly being addressed through better versioning and greater use of unique object identifiers, but the greater challenges lie in the culture of science. In many ways, data should be viewed as a new form of publication that correspondingly should be acknowledged, vetted, and curated. You rightly point out that few scientific errors are the result of bad analysis or coding, but I suspect erroneous results may often be due to errors in the data that may not be well characterized. As you say, replication is only part of the issue.

    Mark Parsons
    National Snow and Ice Data Center

    Comment by Mark — 9 Feb 2009 @ 1:54 PM

  71. dhogaza,

    If you go back to the original open source comment it was made by Dr. Schmidt, and it referred to the use of open source by enterprises not the publication of it. Everyone knows that not all software is open source so I don’t know what you are trying to prove. The question was whether the use of open source is standard/normal in enterprises. My answer is that it clearly is. Of course the use of proprietary software is also standard/normal. Most of the discussion after that has been a waste of time.

    I would say that if I were king it would be mandatory to make code and data available for scientific papers where code and data are used. I don’t know if I would require that it always be based on open source.

    I want to make it clear that for me this is not at all an issue of trust. It is an issue of understanding. Dr. Schmidt was curious about the conclusions of two papers. His ability to quickly be able to quickly replicate MM because the data, and at least a high level detailed description of the code was available, allowed him to focus on the interesting parts of his analysis of the result.

    Mr. Roberts,

    Robust means exactly what Dr. Schmidt implied in his post. He looked at things like varying the start dates and time periods to see if the results changed. I don’t know what you mean when you say that you can break anything. If changing the start and end periods changes the results, then they might not be robust. This is part of the argument that Dr. Schmidt makes in his paper

    BTW for those of you having issues with Captcha. I have found that if you click on the little circular arrows you can quickly get to a readable entry.

    Comment by Nicolas Nierenberg — 9 Feb 2009 @ 1:54 PM

  72. It would appear that some folks are confusing “the model” with “the code”. They aren’t the same thing. Unless you understand the physical model, of which the code is an expression/approximation, you probably won’t be able to come to an intelligent assessment of either model or code.

    [Response: I think there may be more fundamental miscommunication. Michael Tobis and others see the build scripts and makefiles and data formatting as part of the code - and from a computer science stand point they are right. The scientists on the other hand are much more focussed on the functional part of the algorithm (i.e. RegEM in this case) as being the bit of code that matters - and they too are right. This is possibly the nub of the issue. - gavin]

    Comment by Ray Ladbury — 9 Feb 2009 @ 1:59 PM

  73. The question was whether the use of open source is standard/normal in enterprises.

    The claim, not question, was that open source is a “standard practice” in commercial enterprises.

    This is a much stronger claim. Feel free to do the google to see what is meant by a “standard practice” in engineering etc.

    Obviously no one disagrees that a mix of open source and proprietary software is used in commercial enterprises.

    It’s all a dodge anyway. Ray Ladbury, above, makes the crucial distinction between model and code, same point I’ve made in a couple of posts earlier. Code access only makes sense if you understand the model (or statistical analysis), or mistrust the researchers. Which camp a large number of wannabee “auditors” fall into is obvious from their posts at places like CA and WUWT.

    Comment by dhogaza — 9 Feb 2009 @ 2:31 PM

  74. Hank, 64, the best I can get from the context *I* hear robust is that a robust answer is one where the answer doesn’t change if you got something unexpected wrong.

    E.g. the measurement 15.6 was actually misscribed and was 16.5. If the answer didn’t change if you redid it, then the answer is robust.

    OR

    You used a 1-degree resolution model. If you used a 2 degree resolution model or a 1/2 degree resolution model, and the answer is the same, then the answer is robust.

    and so on.

    A robust answer can ONLY result from someone doing the work again THEMSELVES without following exactly the same path (if you add 1 and 1 to get 2, don’t expect to get any different if someone else runs the same code to add 1 and 1 together) and if the answer is the same, the result is robust.

    If the answer is different, then that shows how the answer may be variable based on assumptions made.

    Both are useful answers from a repeat of the work.

    But running the same code, on the same machine with the same data? That’s only proven

    a) you didn’t deliberately lie
    b) you didn’t print out the wrong report

    Comment by Mark — 9 Feb 2009 @ 2:38 PM

  75. Mark,

    I generally agree, but that doesn’t mean it is useful or even best practice to do all the work again yourself. For example Dr. Schmidt didn’t have to go and gather up all the econometric data.

    The useful work is the change, not the replication.

    Comment by Nicolas Nierenberg — 9 Feb 2009 @ 2:56 PM

  76. “Robust” doesn’t refer to the experiment. It refers to the hypothesis and the underlying reality it’s describing. It’s “robust” because one needn’t approach it with kid gloves in a delicately restricted fashion. The hypothesis is sound enough to be pounded on. Etc.

    Comment by duBois — 9 Feb 2009 @ 3:06 PM

  77. In response to Ross, Dr. Schmidt writes “the true effective number of degrees of freedom, and assuming that there are really 400+ dof in your correlations is a huge overestimate (just look at the maps!).”

    Here there appears to be a simple misunderstanding. Effective degrees of freedom is not a concept used in applied econometrics. That doesn’t mean it is not dealt with. I believe you are simply saying that if the Gauss-Markov theorem assumptions do not hold then the usual estimated variance matrix is too small. Non-diagonal terms in the variance of the residual qualifies as a violation. However, I believe Ross is trying to say that they used Generalized Least Squares, which is robust to this kind of deviation from plain vanilla OLS. If I recall their ‘main’ result does not use GLS, but they explored this types of issues in subsequent sections. There is a tradeoff between using GLS when it is not called for and OLS, so it is accepted practice to report OLS and focus on it then follow up with sensitivity tests and not focus on those results if they end up not being ignored.

    One imagine mapping this approach into one in which we adjust critical values as if we do not have N-k degrees of freedom. But Dr. Schmidt, you seem to think that correlation among observables is itself a problem to be addressed. And the focus on a priori effective degrees of freedom seems to support that view. But this type of correlation is not an issue even for OLS unless the residuals display SIGNIFICANT violations of Gauss-Markov. In that case, GLS is a completely standard approach. The textbooks referenced by MM07 provide complete discussions of these concerns.

    [Response: Possibly there is a confusion of terms here. The issue is not the coefficients that emerge for OLS should be different (they aren't), but their significance will be inflated. Let's give a very simple example, I have a data set with one hundred pairs of data and I calculate the correlation. Now I duplicate that data exactly, so that I have 200 pairs of numbers. The regression is identical. However, if I tried to calculate the significance of r (which goes roughly like 1/sqrt(n)) using n=200 instead of 100, I would find that magically my nominal significance level has increased dramatically. The same thing is happening in this case because neighbouring points are not independent. Take 'e' the educational attainment, this has only a few dozen degrees of freedom, and it doesn't matter how many time you sample it, you can't increase the effective 'n'. - gavin]

    Comment by Chris Ferrall — 9 Feb 2009 @ 3:13 PM

  78. Hi Gavin,

    As Mark Parsons pointed out, there are indeed efforts being developed to formalize dataset citation, documentation, etc., and NSIDC is involved in several. Most notable for NSIDC is the IPY data that Mark mentions. At a larger scale is the Group on Earth Observations (GEO) group, working to implement a Global Earth Observation System of Systems (GEOSS). But efforts are still in their infancy and there are many issues to still to resolve. One such issue is that most of these efforts have focused on data (satellite, aircraft, in situ, etc.) and not model outputs, but it seems to me that model outputs should be considered in the discussions as well.

    Comment by Walt — 9 Feb 2009 @ 3:32 PM

  79. I note the implicit defense of not archiving data/methods completely is slipped in with “a quick email to the author resolved the issue”.

    What if a quick email to the author is met with “why would I waste my time on you ?”, or just plain silence – what then ? If everything is archived, personal animosity between parties can not hinder the process.

    [Response: But there is always the potential for error and/or ambiguity in what was done - that doesn't go away because the code looks complete. - gavin]

    Comment by JohnWayneSeniorTheFifth — 9 Feb 2009 @ 3:38 PM

  80. “[Response: But there is always the potential for error and/or ambiguity in what was done - that doesn’t go away because the code looks complete. - gavin]”

    But the probability is greatly diminished? And that is certainly worth something, is it not?

    [Response: Not sure how you'd be able to evaluate that, maybe, maybe not. - gavin]

    Comment by Mike Walker — 9 Feb 2009 @ 5:34 PM

  81. Maybe the northern hemisphere winter triggers a focus on icy topics here in the past few weeks, but the current heatwaves and fires (death toll 175 and rising) in Australia right now might trigger some discussion about the effects of climate change on arid areas. It’s not at all that you neglect this topic, but for us in Australia right now, melting ice is not the most salient aspect of climate change.

    As one of our media commentators said this week, our firestorms are an indication that climate change will make the financial crisis look like a garden party.

    Thanks for the great work you do.

    Comment by Gillian — 9 Feb 2009 @ 7:19 PM

  82. As one of our media commentators said this week, our firestorms are an indication that climate change will make the financial crisis look like a garden party.

    While not nearly as horrifying in human or property loss thus far, the western US is also experiencing more frequent and more intense fires.

    We probably share one problem with SE australia that makes it potentially difficult to tease out any global warming signal – fire suppression. But with recent work showing that trees are dying at an increased rate across the West (almost certainly due to global warming), northward expansion of insect pests and invasive non-native pest species (while some are disappearing in their more southern reaches), and a bunch of other ecological changes, it’s getting harder and harder to ignore the role climate change will play in the future, and is playing now. I know, for instance, that fire season has been starting a couple of weeks earlier in recent years than three decades ago …

    The brou-ha-ha over the fact that a *portion* of the US has been experiencing a cold winter not only ignores Australia’s exceptionally warm summer, but also the fact that the west coast of the US has had an overall mild winter. Water managers are already wailing about the low snowpack in the mountains and expressing worries about summer.

    Comment by dhogaza — 9 Feb 2009 @ 9:01 PM

  83. Mr Nierenberg wrote at 10:34 am on the 9th of February:

    “By the way the statement that scientific replication means getting the same result with your own methods just isn’t correct. In the case of experimental data, the idea is to produce the same result using the same methods.”

    I cannot agree. If I publish a result that the boiling point at STP of a certain compound is 190 C, you hardly need to use exactly the same thermometer and the same heating pad, and the same beaker to replicate my result.

    Comment by sidd — 9 Feb 2009 @ 10:16 PM

  84. Gavin,

    In your reply to Chris you state that the standard errors may be wrong in OLS. This is well recognised and in this circumstance you use GLS – apparently as McKitrick did. (Or, you can do Newey-West corrections, or White corrections or other well-established approaches as the circumstances dictate.)

    So, yes there would seem to be a confusion of terms – the use of GLS would seem to directly address your concerns – as Chris has suggested.

    GLS allows for a covariance matrix that is not the identity – it explicitly allows for the possibility that you raise that adjacent (or other) entries are not independent in a clearly specified manner through the specification of the covariance matrix. Thus, one does not need to talk about effective degrees of freedom, one has a parameterisation that deals with departures of the covariance matrix from the identity. Heck, if you use White’s correction it doesn’t even require you to specify the form of heteroskedasticity.

    [Response: Let's make a rule - any digressions into econometrics-speak require links to define terms, and a sincere effort not to bludgeon readers with unrefereced jargon, which while it might make perfect sense, is not what is required on a climate science blog. For a start, instead of heteroskedasticity, can we say unequal variance (or similar)? Thanks.

    I'll cheerfully admit to not being an econometrician, and I'm not going to get involved in esoteric-named-test duels (way above my pay grade!). I am however willing to discuss (and learn), but you have to also see where I'm coming from here. The tests used by MM07 give statistical significance to regressions with synthetic model data that can have had no extraneous influences from economic variables. This is prima facie evidence that the statistical significance is overstated, as is the fragility of 'noteworthy' correlations when RSS MSU data is used instead of UAH. My estimates of the spatial correlation were an attempt to see why that might be. The numbers I used are in the supplementary data, and so if you (or someone else) can show me that all these various tests or corrections get it right for the synthetic data (i.e. showing that the correlations are not significant), then I'll be willing to entertain the notion that they might work in the real world. - gavin]

    [Response: One little puzzle though. The significance reported in MM07 Table 2 is the same as I got in my replication - but I was just using OLS and no corrections. If GLS made a difference (and if it didn't then we are back to the my original point), where is that reported? - gavin]

    Comment by JS — 9 Feb 2009 @ 10:29 PM

  85. It just occurred to me that almost everyone here seems to be overlooking the main reason why scientific results should be replicable. It’s not so that they can be checked, or “audited” by those hoping to find some trivial error that they can use as a talking point. It’s so the results can be used as a basis for further work.

    If for instance I read a computer science journal and see a paper on a new algorithm to do X, I’m not likely to code it just to check whether the authors knew what they were doing. If I do try to implement their algorithm, it’s because I’ve found that I need some way of doing X in order to achieve Y.

    [Response: Of course. And whether that further work gets done clearly marks the difference between people who genuinely interested and those who are grandstanding. - gavin]

    Comment by James — 9 Feb 2009 @ 11:39 PM

  86. ssid,

    If you can’t understand the difference between methods and objects then perhaps programming isn’t for you ;-)

    Comment by Nicolas Nierenberg — 10 Feb 2009 @ 12:38 AM

  87. Dr. Schmidt,

    Could you comment on the fact that while you show significant relationships with model data they are of the inverse sign to MM? You note this in the paper, but that seems relevant to me.

    [Response: Sure, I think those relationships are spurious. They aren't really significant (despite what the calculation suggests) and therefore whether they are positive or negative is moot. If anyone thinks otherwise, they have to explain why models with no extraneous contamination show highly significant correlations with economic factors that have nothing to do with them, and which disappear if you sub-sample down to the level where the number of effective degrees of freedom is comparable to the number of data points used. - gavin]

    Comment by Nicolas Nierenberg — 10 Feb 2009 @ 12:42 AM

  88. My apologies for using jargon. Here are some links that might make some slight ammends:
    GLS
    White’s correction for heteroskedasticity
    Newey West standard errors (A Stata help page – I couldn’t resist, it was the best reference I could find on a quick Google.)

    Section 4.1 states that “Equation (3) was estimated using Geleralized Least Squares…following White”. The table notes say that the standard errors in Table 2 are ‘robust’. This implies to me, based on the text, that they have used the White correction in reporting the standard errors in this table. It could be that the White corection makes no difference so you get the same results with OLS but to delve deeper I might suggest you contact the author for more details ;)

    But, recognising that this is taking this thread a little away from its original focus I will curtail comment here. (Or is this a case in point about replication and what is necessary?)

    [Response: Thanks (not sure I'm much the wiser though). (By the way, I have nothing against Stata, it seems extremely powerful. It's just that I don't have it on my desktop). - gavin]

    Comment by JS — 10 Feb 2009 @ 12:58 AM

  89. I still don’t get it. Whatever the code is, if its in snippets of script, or in one large program, why not release it? Why do you have to do extra work? Just release whatever you used, and let people make what they can of it.

    And don’t all pile in and tell me I have never done things, and don’t understand things, like write shell scripts, run shell scripts or write programs, or use Unix, that I do every day, thank you!

    [Response: Samuel Johnson may have once written that "I did not have time to write you a short letter, so I wrote you a long one instead." Encapsulating what is necessary in an efficient form is precisely what scientists do, and this is the case whether it's in a paper or an online submission. My working directories are always a mess - full of dead ends, things that turned out to be irrelevent or that never made it into the paper, or are part of further ongoing projects. Some elements (such a one line unix processing) aren't written down anywhere. Extracting exactly the part that corresponds to a single paper and documenting it so that it is clear what your conventions are (often unstated) is non-trivial. - gavin]

    Comment by michel — 10 Feb 2009 @ 4:12 AM

  90. #79
    [Response: But there is always the potential for error and/or ambiguity in what was done - that doesn’t go away because the code looks complete. - gavin]

    Ambiguity is generally larger in prose than in code.

    Scientific papers today seem to rely on a lot more data than just a few years back. Computer processing is no longer optional, it has become a requirement. Thus, the code exists, and it will also be highly unambiguous, regardless of what language was used.

    One simple way of assuring that replication is possible is to make the code available. If someone wants to run it, fine. But the main thing is that code availability reduces the algorithmic ambiguity that will always be in the paper itself.

    Comment by Thor — 10 Feb 2009 @ 5:55 AM

  91. Re: #88 (JS)

    If you use the White correction to least squares, it’s a very restrictive form of GLS which assumes the noise is uncorrelated (although heteroskedastic), so the variance-covariance matrix is not the identity matrix but is still diagonal. If the var-covar matrix is not actually diagonal, then the White correction doesn’t give the right answer.

    Newey-West (as referenced in Stata) isn’t GLS at all; it’s a correction to the standard errors for autocorrelated/heteroskedastic noise when using OLS. That’s a perfectly valid procedure, and is computationally simpler than GLS, but not quite as precise.

    But the bottom line is: for any method to be right you have to get the variance-covariance matrix right. If you assume no correlation (i.e., a diagonal matrix) when it ain’t, your results will be incorrect.

    Comment by tamino — 10 Feb 2009 @ 7:06 AM

  92. re: #90.

    How can ambiguity be “larger”? What does “larger” mean?

    And if English isn’t your first language, this shows that ambiguity can happen in code too. NOBODY has it as their first language.

    There is a reason why ADA is still being used and that’s because it was designed as a formally provable programming language. This being the case, it is obvious to deduce that most languages (and all the popular ones) are not formally provable.

    And if they can’t be formally proved, that must mean there is ambiguity in what the program is saying it’s doing and what it really IS doing.

    Comment by Mark — 10 Feb 2009 @ 7:12 AM

  93. re: 75. You missed the point there.

    You shouldn’t even be recreating the process exactly. As someone else put it, if you are told in a paper that water boils at 100C at 1 atmosphere, you don’t need to take the same thermometer to check if they’re right.

    In fact, if their thermometer wasn’t calibrated correctly, you may be WORSE off if you used their thermometer.

    So do the work again yourself. If you use exactly the same programs, exactly the same data and exactly the same procedures all you’ve proven is that the original didn’t mistype.

    Whooo. Big it up for the science.

    I mean, when it came to cold fusion, people were trying to replicate the process. And when failing, said so. And the original authors said “You have to do it EXACTLY our way”. Now do you think if, instead of trying it again themselves, they went and put the data from the experiment they did into the program they used, you’d get the wrong result? Possibly, but just as likely you’d get the same result because the measurements were incorrectly done or someone in the next lab was using a polonium target or something.

    So redoing their work rather than redoing their experiment could easily have proven cold fusion.

    Way to work!

    Comment by Mark — 10 Feb 2009 @ 7:25 AM

  94. Nicolas Nierenberg writes:

    The question was whether the use of open source is standard/normal in enterprises. My answer is that it clearly is.

    And we objected because your answer is clearly insane. It is NOT “standard” for software companies to make their code open source. If it were, there probably wouldn’t be any software companies.

    Comment by Barton Paul Levenson — 10 Feb 2009 @ 7:41 AM

  95. #92 Mark,

    I’m sure you understood perfectly well what I meant – even though my prose might have had a slightly higher level of ambiguity than intended. And yes you are correct, English is not my first language.

    My entire point is that algorithmic ambiguity should be at as low a level as feasible. Formal proofs in that context is just silly.

    Comment by Thor — 10 Feb 2009 @ 8:20 AM

  96. My entire point is that algorithmic ambiguity should be at as low a level as feasible. Formal proofs in that context is just silly.

    Not silly, just impractical.

    Tamino above illustrates a point several of us have been making about reading code in order to “audit science”:

    for any method to be right you have to get the variance-covariance matrix right. If you assume no correlation (i.e., a diagonal matrix) when it ain’t, your results will be incorrect.

    You need to have sufficient knowledge in order to have the proper insight to what the underlying algorithms, stat analysis, physical model which is being programmed actually does.

    In tamino’s example, if your assumption regarding correlation is wrong, it doesn’t matter a bit of the code doing the analysis is correct or buggy as hell: you will get an incorrect result.

    Now, scroll backwards to michel’s comment, in which he talks about why he wants the code at a level which seems fairly representative of the rock-throwing “free the source” hacker mob:

    And don’t all pile in and tell me I have never done things, and don’t understand things, like write shell scripts, run shell scripts or write programs, or use Unix, that I do every day, thank you!

    This clearly demonstrates the misconception that seems held by that crowd that understanding something about software (and operating environments and how to type on a keyboard) is somehow relevant to really understanding what a piece of scientific programming does.

    Comment by dhogaza — 10 Feb 2009 @ 9:26 AM

  97. Thor 95, And you missed my point. NOBODY outside a mom’s basement at 47 years of age has “C” as their first language.

    The algorithms are in the paper and that is what the code SHOULD be doing.

    Write your own code to the algorithm presented. And, like the airplane fly-by-wire servers, you have a MUCH better answer.

    The source code might be nice, but so would a pony. Do we go demanding a pony too?

    Comment by Mark — 10 Feb 2009 @ 9:39 AM

  98. Tamino #91 and there’s a great example of why the code isn’t really worth as much as the paper itself.

    As a computer porogrammer (though not a CS major), I am all “WTF???” about that.

    So to make appropriate use of the code and paper (enough to enable us to DEMAND it all be available), we need a team of

    climatologists
    statisticians
    logicians
    CS majors

    and a typist.

    Anyone out there asking for the code got such a group ready to work on it???

    Comment by Mark — 10 Feb 2009 @ 9:43 AM

  99. Mr. Levenson,

    I’m sorry if my wording confused you. In my business the word enterprises generally refers to the users of software not the publishers. So please read my comments in that context. Many if not most large enterprises have open source as at least a portion of their corporate standard.

    It doesn’t seem like that profitable a discussion anyway since I don’t believe that Dr. Schmidt is arguing that the software they developed is proprietary. I think instead his point had to do with a perception that I would require the use of open source software. I wouldn’t although all things being equal it would be preferable.

    Comment by Nicolas Nierenberg — 10 Feb 2009 @ 10:46 AM

  100. Dr. Schmidt you comment that at the bottom line you think the correlations in MM07 are spurious. I find it interesting that when withholding a significant number of points from the regression that it is successful at predicting the values of the held out points. I am not an expert on this type of statistical analysis, but it seems to me that this is a good test of whether it is completely spurious.

    [Response: But this test was done picking at random. If there is spatial correlation, picking one point instead of another point in the same region will give pretty much the same result (there is a lot of redundancy). So you can in fact discard most points and find the same correlation, but this actually confirms that the points are not all independent. A better test would have been whether results from one region (say Europe and surroundings) predicted values elsewhere. - gavin]

    I guess I’m also not completely surprised that there would be some correlation between the results of model output and the factors studied in MM07. The models are designed to emulate the real world, and to some degree must have been corrected to improve this emulation. This would mean that they correlate generally with real world temperature distributions over historical periods as measured. This is the same historical measurement system studied in MM07 as I understand it.

    [Response: Regional down to grid-box trends for short time periods (we are looking at a 23 year period) are not robust in coupled models due to the different realisations of internal variability. These models do not include any temperature data either (though they do have volcanic effects included at the right time, and good estimates of the other forcings). - gavin]

    Comment by Nicolas Nierenberg — 10 Feb 2009 @ 10:53 AM

  101. #98 I guess a cheap supercomputer, an inexpensive wideband internet connection, and instant access to all human knowledge just doesn’t cut it in the modern world anymore.

    Comment by Thomas Lee Elifritz — 10 Feb 2009 @ 10:58 AM

  102. Reply to #89:

    “[Response: My working directories are always a mess - full of dead ends, things that turned out to be irrelevent or that never made it into the paper, or are part of further ongoing projects. Some elements (such a one line unix processing) aren’t written down anywhere. Extracting exactly the part that corresponds to a single paper and documenting it so that it is clear what your conventions are (often unstated) is non-trivial. - gavin]“

    Gavin, what you are describing here is what would be called, in any commercial or industrial setting, bad practice.

    That is exactly the point I am trying to make. It’s not a point about openness, it’s about effectiveness. Good practice in any discipline evolves from long experience. The behavior you are describing is behavior every programmer occasionally does on quick projects. However, most of us know better than to defend such behavior on major work products.

    It is considered bad practice with good reason. It takes a lot of effort to go back and replicate your own results from memory, but very little to maintain a script which can do all of it. If steps are expensive, you need to learn a tiny bit of rule-based logic, but that is hardly beyond the abilities of anybody doing scientific computations. The payoff is not just throwing the CA folks a bone to chew on. It’s a very important component of reasoning about computations, which are, after all, error-prone.

    Basically, you are making it easier to make mistakes.

    Why should pure science be held to a lower standard than applied science or commerce? Does climate science matter or doesn’t it?

    Per #98,

    So to make appropriate use of the code and paper (enough to enable us to DEMAND it all be available), we need a team…

    The issue is not whether the CA people are competent to examine the process or not. They might or might not be.

    The issue is that when Gavin claims that this is unreasonably difficult, he is making a claim that many readers already know, as a consequence of their own daily practice, to be false and indeed absurd. Indeed, these readers overlap strongly with the group of nonscientists most competent and most willing to evaluate scientific claims. This does the credibility of RC, and by further extension the whole of climate science, no good.

    In any case, whether this is sound practice on the part of the scientist or not, whether it is responsible behavior on the part of the hobbyists or not, one can expect demands for replication on any observational climatology analysis. Observational climatology is not at fault for having the perhaps worst relationship with its interested public of any science ever. There really are, after all, some truly malign forces involved. But it’s nothing to celebrate, and it’s worth making some effort not to make it worse.

    In summary: First, it is not true that maintaining end-to-end scripts is onerous. If large calculations are involved a series of scripts or a rule-based makefile may be practical, but these are easy skills to develop compared to the background needed to do science. Doing so in commercial and engineering settings is standard practice because it dramatically reduces error and increases the potential for objective tests.

    Second, that some branches of science don’t do this is going to be perceived as an embarrassment. Defending the absence of a practice of end-to-end in-house repeatability is difficult, and coming from someone who has not spent much time thinking about it, likely to make silly claims.

    Of course, as climatologists, we are barraged with silly claims, but in that we are not unique. We tend to lose patience with people making strident and erroneous claims about things they don’t understand. In this, we are not unique.

    Strong programmers also tend to dismiss the opinions of those making strong claims they know to be untrue. Every technical mailing list has plenty of examples, some quite funny. (Strong programmers can be quiet clever in their putdowns.) Generally, if one is trying to convince others of the validity of one’s ideas, it pays to be modest and willing to learn about points where one may be less expert than the person one is trying to convince.

    [Response: Michael, you spend your time decrying the fact that everything isn't perfect. I am trying to explain to you why that is the case, that is not 'absurd', it is reality. Things just don't work as well as perfectly executed flawless plan says they should. I was not defending my bad practice, (though show me anyone among us who does everything the way it should be done at all times?) - I'm just observing it. It would indeed be great if I knew exactly what was going to work ahead of time, that I never got interrupted in the middle of things, that I never made mistakes, that I never did preliminary estimates to see what was worth doing, or that I never hacked something so that it would work now because I had a deadline rather than doing it properly even though that would be better in the long run. But that is simply not the real world. Methodologies stick when they work with a culture, not against it. Doing science is messy - it just is. Discussing how to make it prettier is fine, but as long as you think of scientists as willfully ignoring your wonderful advice you aren't going to get anywhere.

    Here's an analogy. Chefs in big kitchens with hundreds of diners and dozens of menus choices have to have an extremely regimented workflow. They have sous chefs by the dozen, well-trained waiters and strong commercial pressures to make it work well night after night (and we've all seen Gordon Ramsey's Kitchen Nightmares to know what happens when it doesn't). Most scientists however are the equivalent of the home cook, mostly small scale stuff and the occasional big dinner party. For bigger projects (such as team GCM development) better practice has to be enforced, but most scientific work is not like that. Do these cooks use the same methods as the professional chef? No. They don't have the resources, nor the same training, nor the same pressures. Thus the kitchen after a domestic dinner party is usually a mess, and the clearing up is left until afterwards. Your comments are the equivalent of saying that the meal tastes bad because there is washing-up in the sink. I'm sure the host would much rather hear you offering to help clean up. - gavin]

    Comment by Michael Tobis — 10 Feb 2009 @ 11:24 AM

  103. 101, you don’t need them to just understand the code, just to run it. However, you do need the list I gave to understand whether the code is CORRECT.

    Comment by Mark — 10 Feb 2009 @ 11:59 AM

  104. Nicholas, 99, so I’m a user of Windows XP

    Where’s my source code???

    It’s even worse for the users. They don’t get to choose whats in the license. It’s a hobson’s choice for them. Take it. Leave it. Choose one.

    And for the biggest customers, they must pay with money and NDA (and the contractual interference in their natural work practices that the NDA in perpetuity place upon them).

    You’re getting less real life each time you backpedal.

    Comment by Mark — 10 Feb 2009 @ 12:02 PM

  105. TLE, your comment made me smile, but I have to add that it isn’t really “instant access to all human knowledge;” it’s instant access to lots of human information. Information isn’t knowledge until someone understands it, and “ain’t none of us” understand it all.

    Comment by Kevin McKinney — 10 Feb 2009 @ 12:04 PM

  106. Michael, take that back.

    “The issue is that when Gavin claims that this is unreasonably difficult, he is making a claim that many readers already know, as a consequence of their own daily practice, to be false and indeed absurd. ”

    1) Tell Microsoft they are cack because not only do they have bad documentation, they have to rely on third parties reverse engineering procedures to find out what their code is doing.

    2) Gavin would have to do the work. Not you. So it’s easy for YOU to say you won’t do it but demand Gavin proves he can’t either. And YOU won’t do the work of replicating the papers work why? Are you saying it’s impossible? Maybe it is maybe it isn’t, but that’s not the point, is it. YOU COULD.

    3) Don’t you tell others that because they haven’t done what you DEMANDED of them that they are scurrilous ne’erdoowells. You prove you’re not trying to kill research so that you can go back to thinking “It;s not my fault, it’s not my fault”. All it requires is your complete job history and your bank receipts statements. That shouldn’t be hard, should it. And if you have nothing to hide, you have nothing to fear, right????

    Comment by Mark — 10 Feb 2009 @ 12:06 PM

  107. Michael, Gavin is honest about his working conditions.

    Ask most people in “applied science or commerce” environments to state publicly the condition of their working conditions and they will indeed say they meet your standards.

    Of course they will. Anyone who hasn’t learned in a business environment to lie about this stuff in public was fired long ago.

    Stuff gets shoved under the rug to make a pretty looking environment. But not necessarily sorted out perfectly. Everyone has a “respected trash” file of bits they may need — no?

    Tidying up gets done, but nowhere near as much as people like to pretend it was already done before you asked.

    Heck, ask ReCaptcha about the commercial environment, let’s see:
    _____________
    “Dupont cult”

    Comment by Hank Roberts — 10 Feb 2009 @ 12:17 PM

  108. Re: #102 (Michael Tobis)

    Gavin, what you are describing here is what would be called, in any commercial or industrial setting, bad practice.

    Yet science has just about the best track record for progress of any human endeavor. Scientists do things differently; Gavin’s situation is commonplace. And: it works brilliantly! But: bean-counters just can’t handle it.

    We’re not slaves to procedure because if we took the time to make things “good practice” by your definition, we’d cut our productivity by 99%. God save us from industry types who think they know how to do science better than scientists.

    Comment by tamino — 10 Feb 2009 @ 12:31 PM

  109. Tidying up gets done, but nowhere near as much as people like to pretend it was already done before you asked.

    Anyone who has managed, or has helped manage, a “real” software project, knows that release management consumes sizable resources. Programs written for the private use of a researcher or research group (no matter what the field), or just for private chores on a home computer, are very unlikely to be tidied up in this way. For the intended use, there’s simply no need to expend the resources, and in a research group, I’d be very surprised to see doing so be part of an approved budget.

    Comment by dhogaza — 10 Feb 2009 @ 12:38 PM

  110. “We’re not slaves to procedure because if we took the time to make things “good practice” by your definition, we’d cut our productivity by 99%.”

    Nice theory but isn’t good practice what is supposed to be assured with peer review and Journal publishing?
    And I’m not seeing such freedoms granted skeptic scientist?
    Quite the contrary they appear to be held to a higher standard because they are outside looking in.

    Comment by Richard — 10 Feb 2009 @ 1:05 PM

  111. 110, that isn’t what’s being asked for here. It’s the stuff that doesn’t go in to the paper that’s being asked for.

    For people who don’t have to do the work there’s apparently no effort involved…

    You’re not seeing freedoms granted because you refuse to ackknowledge their existence.

    People have asked for no more than the scientists at the IPCC et al ALREADY DO.

    However, these “scientists” post on blogs, write opinion pieces and usually aren’t even in the field.

    So when they get around to PUBLISHING their “proof it’s all a swindle!!!” and THEN We ask about their code etc, THEN you can say there’s a double standard going on.

    At the moment, just asking the denialists to do what the scientists in the field are doing. Not more, like you an the other waste of oxygen meatsacks on here.

    Comment by Mark — 10 Feb 2009 @ 1:19 PM

  112. Hey Michael.

    Would not the Precautionary Principle require that we listen to those who have actually put into practice the concepts they propose for consideration, in contrast to listening to those who have not yet even attempted to apply the concepts?

    Especially those who have yet to attempt the practices and insist they cannot under any conditions known to human-kind be useful and simply dismiss them with hand-waving.

    ——-

    reCAPTCHA: Reeves country … would that be Jim?

    Comment by Dan Hughes — 10 Feb 2009 @ 1:34 PM

  113. I’m sorry to have been seen as so disagreeable, but I just don’t accept that I am being unrealistic at all.

    In exchange for a modest change in behavior you can have improved productivity and dramatically improved error reduction. I am not suggesting you stop exploring, just to consistently leave a trail when you make progress. This could hardly amount to a 1% tax on your time if you are so competent that you never need to backtrack. For most people, that 1% will pay back handily on the first occasion that there is the least correction somewhere in your workflow.

    Regarding Tamino’s claim “Yet science has just about the best track record for progress of any human endeavor.” I appreciate the qualifier. However, I believe the applied sciences (notably engineering and medicine) actually do better than the pure sciences in terms of track record for progress, precisely because they can’t escape rigorous quality control. It is indeed slightly less fun to practice these disciplines, because they are more disciplined. It’s difficult to compare across disciplines, but it’s my impression that pure sciences are not as productive as applied ones. I also note that climatology has become an applied science, so the increased demand for formal method is a consequence of being consequential.

    The productivity of science is due to the brilliance of its strongest practitioners, and I make no claim to being one of those myself. So perhaps it is absurd of me to criticize. On the other hand, I know enough about what scientists do to find these arguments hollow. I feel that the productivity of science could be vastly increased if science paid some attention to how productivity is achieved in other fields.

    I am indeed trying to construct tools to help make this sort of thing easier. That is, in fact, what I do.

    But it’s really a worthwhile endeavor in any case. It seems to me a matter of principle that every graphic you publish should be a graphic you can reproduce exactly from the raw data, ideally matching to the last pixel. This is one of the main advantages of computation and it seems very strange to me to reject it as too onerous.

    Comment by Michael Tobis — 10 Feb 2009 @ 2:01 PM

  114. Folks I’m sorry but twenty years ago good source control and release management took a lot of time. Today it saves time, and essentially everyone who does this for a living practices it. The tools have become so integrated that there is no excuse other than training for not using them. The fact that you have to start and stop on projects is a reason to use them. The fact that you might change your mind is a reason to use them.

    But I just believe that Dr. Schmidt hasn’t been exposed to and trained on these tools and so doesn’t see the value yet. I hope he does as it will improve his life.

    [Response: But I do use these systems - particularly for big software projects. Just not for every random little thing I do. - gavin]

    But I will bring up that this is all theoretical. In the case of Dr. Steig’s paper it appears that the data and code are available for sharing, or will be soon. So I hope that they will be posted in an open repository without worrying about who looks at it.

    My father used to regularly get manuscripts from people claiming they had disproved the general theory of relativity. He always wrote them a nice note in response saying he would give it “all due attention.”

    Comment by Nicolas Nierenberg — 10 Feb 2009 @ 2:09 PM

  115. Dr. Schmidt,

    If the model data isn’t significant down to the grid cell level why not just compare with randomly generated data? I assumed that your point was that the models predicted some of the spatial pattern, and that it just happens to correlate with areas of economic activity. Wasn’t this the statement in AR4?

    [Response: I was initially thinking that the patterns might be part of the forced response and something related to the land/ocean mask or topography etc. I don't think that is the case. Then I thought that maybe that it was related to internal patterns of variability (not just the big things like ENSO, but the wider impacts of internal variability). These patterns do have structures that isn't random. However, your idea has merit - one could generate synthetic fields with about the same level of spatial correlation as in the data and see what you got. Any volunteers? - gavin]

    Comment by Nicolas Nierenberg — 10 Feb 2009 @ 2:11 PM

  116. Re 102 Michael Tobis

    49 years ago, struggling with the first commercial computers, we came to the conclusion that in programming for mass data processing it was worth keeping a detailed record allowing one to retrace and examine every single step. The investment was worthwhile becasue of the time it saved curing bugs and improving programs.As the scale of programming has increased, that investment has become steadily more worth while, and the procedures for keeping the record have improved accordingly.

    In no other human activity that I have encountered is that investment worth making. In some engineering and maintenance work, it is worth making a very substantial effort to record every step (think airliners). But I do not think any design or physical systems engineer works with records as precise as those of programmers.

    Most of the world’s activities which need records which enable one to trace and replicate operate with systems analagous to a good audit trail in accounting. Such science as I am acquainted with falls in that group. The scientific initiatives to archive source data and procedures have analogies in modern accounting.

    In my own principal field of economics, we tend to be a bit slapdash in our provision of the information needed to replicate studies; but we are no worse than many branches of applied science (and only a little worse than mainstream accountants). Our principal methodological faults are frequent failure to try and generate evidence which could disprove our hypotheses, and forgetting assumptions underlying some of our cook-book standard methods (see 91 above, and reams of comment on the assumed data distributions that had a lot to do with the financial world’s failure to assess its risks correctly so far this century). We had better spend time on improving our performance on those points rather than on better archiving of our work.

    Comment by D iversity — 10 Feb 2009 @ 2:24 PM

  117. Would not the Precautionary Principle require that we listen to those who have actually put into practice the concepts they propose for consideration, in contrast to listening to those who have not yet even attempted to apply the concepts?

    No. It’s a bit like insisting that before being allowed to fly a Cessna, you must qualify for all ratings up to and including your multi-engine commercial jet certificate.

    Would you apply the precautionary principle here?

    Have those who developed the idea ever insisted on it?

    Especially those who have yet to attempt the practices and insist they cannot under any conditions known to human-kind be useful and simply dismiss them with hand-waving.

    I’ve been a professional software engineer for nearly forty years, and am well aware of what’s involved in commercial software production (ran and was principle engineer for a compiler products company for many years), open source software production (I’m the release manager for two open source products). There are others posting here with software engineering experience (John Mashey has a vast background) who disagree with the rabble.

    I don’t dismiss the well-meant (I hope) advice by simply hand-waving, but rather by pointing out what any software engineer should know: the requirements for the production and release of software products is vastly different than the requirements for one-off bits of code cobbled together for a particular purpose (for instance, the analysis of data for a single paper).

    Comment by dhogaza — 10 Feb 2009 @ 2:45 PM

  118. However, I believe the applied sciences (notably engineering and medicine) actually do better than the pure sciences in terms of track record for progress, precisely because they can’t escape rigorous quality control.

    Hmmm … galloping gertie … flopping 737 rudder … solid-fuel booster o-rings (challenger) … lockheed electra … I could go on for a long time.

    medicine … lot of drug recalls out there …

    I won’t argue that research has a better track record, but I really don’t see any basis for arguing that applied sciences are any better.

    Comment by dhogaza — 10 Feb 2009 @ 2:51 PM

  119. Michael, Look, I’m all for transparency, but transparency does not mean sharing code or even sharing data. If everything worked as you envision it, then you might be right that we could improve science by archiving. The thing is that we have to not only envision how such a development would be useful, but also how it could be misused. We’ve seen plenty of examples of ignorant food tubes who love nothing better than to comb through code for every error, however trivial; every inelegant branch, however inconsequential. Do you really think responding to such idiots wouldn’t place demands on an author’s time. Hell, look at what Eric has gone through on this site.
    I can also envision that if code existed for a tedious task, it might find its way into code from other groups, compromising independence and propagating any errors therein. Now maybe you could find ways to address these risks. Maybe you could anticipate other risks and mitigate them, too.
    However, there’s no evidence that your solution even has a problem. Science is conservative. It’s the way it is for a reason. It will change, but only as it becomes necessary, and I for one am very leary of any change that has the potential to compromise the independence of research efforts.

    Comment by Ray Ladbury — 10 Feb 2009 @ 3:06 PM

  120. Michael #113 about your claim “I believe the applied sciences (notably engineering and medicine) actually do better than the pure sciences in terms of track record for progress, precisely because they can’t escape rigorous quality control.”

    You have already been called to account on the veracity of that statement, however, you have not even shown what errors have been removed only by the rigorous quality controls and how they measure up to the ones picked up by other methods.

    And rigorous quality control has to include a metric for utility (which the above metric is). Additionally, the cost/benefit analysis should be available so that the correct level of congtrol is maintained. To operate without these elements would be counterproductive to the aims of quality control if not in actual conflict with the procedures themselves.

    Oh, and isn’t it unpleasant when someone puts your words back to you with “And as to X’s claim ….”.

    [edit - please try to stay polite]

    Comment by Mark — 10 Feb 2009 @ 3:49 PM

  121. As to putting code out there, good caution about others picking it up and using it. Remember the pointer to the protein folding papers that were withdrawn because of a sign error? And that problem was in “legacy code” they’d taken from elsewhere? I wonder how many other papers were based on the same code — wherever and whenever it originated, for however long it had been recycled. Likely a few more that didn’t rise to the level of notoriety of the particular group that had to be retracted so publicly.

    As Mark Twain warned about reading medical articles: “Be careful, you could die of a typographical error.”

    Comment by Hank Roberts — 10 Feb 2009 @ 4:03 PM

  122. I can also envision that if code existed for a tedious task, it might find its way into code from other groups, compromising independence and propagating any errors therein. Now maybe you could find ways to address these risks. Maybe you could anticipate other risks and mitigate them, too.

    This is precisely the point at which the kind of software methodology processes being described start to make sense, because in such a case you’re moving from a situation where someone is using code they (or a close co-worker) has written for some one-off (or nearly one-off) use to a situation where code’s being shared and used by a potentially wide audience. Used in novel ways not anticipated by the original author, in slightly different operating environments, etc. The “borrower” or user might not be aware of limitations or constraints on data which if not met might lead to error, etc.

    This is the point where people start organizing the bundling together of their bits of code, give the bundle a name (“RegEM”, perhaps, I’m not aware of the history of that code but it’s the kind of library set that often begins as a personal tool then grows into something that’s distributed, documented, etc), make clear constraints on the kind of datasets it works well with, make clear limitations of the code, and so forth.

    Comment by dhogaza — 10 Feb 2009 @ 4:03 PM

  123. RE 7 Rutheford was a Kiwi joker and most barpersons in today’s deregulated labour market in New Zealand are tertiary students or graduates, so explaining physics to them is probably a bit easier than it was in the past…On the other hand, as someone who had two years of school physics (back in the days when computers were room size) I downloaded Christoph Schiller’s book Motion Mountain The Adventure of Physics in order to improve my knowledge. Its a great book, but physics certainly is a tough subject! One point he makes is clear enough for any layperson to understand “Global warming exists and is due to humans.”

    Comment by Paul Harris — 10 Feb 2009 @ 4:21 PM

  124. Ok so Dr. Schmidt does use these systems, but not for every random thing. I assume that published papers don’t fall in the random little thing category, or at least I think they shouldn’t. So in the end, no dispute, no problem. If these tools are used, publishing the code is a non-issue. Then it just gets down to philosophy of openness.

    [Response: Some papers are little things, some are part of a much larger project, as always it depends. - gavin]

    Comment by Nicolas Nierenberg — 10 Feb 2009 @ 5:04 PM

  125. Dr. Schmidt,

    In AR4 what they said was “However, the locations of greatest socioeconomic development are also those that have been most warmed by atmospheric circulation changes.”

    This had the sound of circular logic since I assume they are measuring the warming through the same mechanism that MM was saying was affected by the socioeconomic development.

    AR4 doesn’t mention the spatial correlation problem.

    So I thought that your use of the climate model output was to show the warming in these areas is predicted by the models, and is therefore independent of development. But if that was what you were trying to show then the negative correlation would tend to disprove the position taken by AR4.

    If however you are saying that the model output is essentially random at those levels, then I’m not sure what the basis is of the statement is in AR4. The references are to the spatial trend patterns which is again based on the measured results.

    [Response: The AR4 statement probably refers to the trend in the NAO over the period (peaking around 1995). It's a reasonable hypothesis but not what is happening the model runs I looked at. I am not aware of anyone else looking at these statistics with a wider range of AR4 model runs - though that could certainly be a fruitful next step. In fact I would strongly suggest that it be looked at by anyone wanting to claim that my results were a fluke of some sort. - gavin]

    Comment by Nicolas Nierenberg — 10 Feb 2009 @ 5:17 PM

  126. Ray Ladbury: “Look, I’m all for transparency, but transparency does not mean sharing code or even sharing data.”

    I’ll have to take Feynman as a much better authority on this than you, Ray.

    Here is an excerpt from the Caltech commencement address given in 1974:
    http://www.lhup.edu/~DSIMANEK/cargocul.htm

    …It’s a kind of scientific integrity,
    a principle of scientific thought that corresponds to a kind of
    utter honesty–a kind of leaning over backwards. For example, if
    you’re doing an experiment, you should report everything that you
    think might make it invalid–not only what you think is right about
    it: other causes that could possibly explain your results; and
    things you thought of that you’ve eliminated by some other
    experiment, and how they worked–to make sure the other fellow can
    tell they have been eliminated.

    Details that could throw doubt on your interpretation must be
    given, if you know them. You must do the best you can–if you know
    anything at all wrong, or possibly wrong–to explain it. If you
    make a theory, for example, and advertise it, or put it out, then
    you must also put down all the facts that disagree with it, as well
    as those that agree with it. There is also a more subtle problem.
    When you have put a lot of ideas together to make an elaborate
    theory, you want to make sure, when explaining what it fits, that
    those things it fits are not just the things that gave you the idea
    for the theory; but that the finished theory makes something else
    come out right, in addition.

    In summary, the idea is to try to give all of the information to
    help others to judge the value of your contribution; not just the
    information that leads to judgment in one particular direction or
    another.

    Comment by Steve Reynolds — 10 Feb 2009 @ 5:25 PM

  127. (oops, I see back in #40 that I lost a line: the quote was from Fred Brooks, “The Mythical Man-Month, 1995-version).

    In the old days, we’d have been asking:

    “what tools do people need to automate away repetitive work, with overhead appropriate to the nature of the project?”

    That means that:
    a) A large commercial program suite, like that of SAS Institute has a huge amount of machinery, and needs it, and may trade away ease of use to allow huge scalability.

    b) A substantial project (like say GISS Model E) can afford a certain amount of setup effort [and seems to have it, at least when I looked at it a while ago]

    c) Individual papers: less.

    See Small Is Beautiful, a talk from 1977, specifically slide 24 on size of support methodology. I.e., unnecessarily heavyweight tools pushes projects bigger, needing higher budget. (The middle bar)

    Trying to avoid that overhead led to things like:

    a) SCCS. We needed to have version control for both small and large projects, and easy enough to use to avoid needing a bunch of “program librarians” of the “chief programmer team” vintage. SCCS was *still* too much bother for individual physics researchers.

    b) Serious UNIX shell programming (1975-), using an interpretive language and familiar tools to let programmers easily automate their work without needing to write C code. UNIX sed, awk, etc.

    c) Nroff/troff macro packages to automate repetitive typing …
    Troff wizards could do amazing things (for that era), but it needed tbl, eqn, and macros to be usable by researchers. It took really robust, flexible macros to be usable in large BTL typing pools.

    d) UNIX “make”, originally done by Stu Feldman in Research to automate drudgery. However, it sometimes needed more work than people were willing to do to set up [because sometimes cc*.c -o myprog seemed enough], which is why people quickly whipped up *makefile generators* to start from existing work.

    e) SOLID, which we did originally for one application, but got generalized later. Big projects usually rolled their own configuration management, repositories, workflows, etc, but small projects would rarely use them, because there was just too much machinery. So, they would start simple, … and end up growing their own, so that at one point, there must have been 50 different flavors (all built on top of UNIX, SCCS, etc … but still different).

    My team generalized what we’d done for one project, and turned it into an easy-to-set-up-and-customize repository/workflow/templating system for software & documentation. It both needed to be simpel for a tiny team, and be able to scale up. My lab director fortunately was willing to allocate budget to do this and even support its usage outside our division [monopoly money was very nice - we actually got to think long-term.]. In the early 1980s, it was one of the very few such that got wide use around BTL.

    f) So, I still think the *real* question is: what tools do *individual scientists* think they need that they don’t have, that reduces the overhead of doing the software-engineering-repetitive-stuff? (Ideally, to the point where it’s so trivial that someone whose real job is science, not software engineering, can just do it without wasting their time. One thinks of the moral equivalent of a makefile generator.)

    I don’t expect that will make papers right or wrong, or improve the use and mis-use of statistics, or change variable names to be more meaningful, or eliminate GOTOs in old Fortran, or discover new physics … but maybe there would be less waste of time arguing about it. Remember that Fred Brooks thinks this is the easy part.

    Comment by John Mashey — 10 Feb 2009 @ 5:28 PM

  128. OT, but the “FAQ on Climate Models Part II” seems to be closed for comments. More objections from the persistent denialist I am dealing with elsewhere. His general approach is to cherry-pick findings from papers, ignoring their conclusions, and when these are pointed out, cherry-pick a finding from another paper in an attempt to throw doubt on these; and he has a very strong ideological bias as a “libertarian”; but he has clearly spent a lot of time reading the literature.

    Specifically:

    1) He claims that all current GCMs get the evaporation/precipitation cycle wrong (too little of each); and that they could therefore miss a large negative feedback from increased transport of heat to the upper troposphere where it can radiate away. Have you dealt with this here, or are there relevant papers?

    [Response: don't know where that comes from. Most GCMs are slightly high on precip (and therefore evap) in comparison with the best estimates (~3mm/day vs. 2.8 mm/day in GPCP/CMAP). - gavin]

    2) He claims that all current GCMs have underestimated the shift toward earlier NH spring snow melt, hence albedo change, and hence their apparent success at reproducing temperature increase must be concealing some other, significant negative feedback. I have found a paper “The role of terrestrial snow cover in the climate system” by Steve Vavrus (Climate Dynamics 2007, 29:73-88) reporting simulations where snow was turned to rain on reaching the ground (i.e. snow cover was completely removed), with a resulting temperature rise of .8K), so I guess the general form of the answer (assuming he’s right about the GCMs not getting the spring snow melt dates right) is that such an error will not make much difference – but are there any other papers I should look out?

    [Response: He's probably referring to the recent Huybers et al paper, but he's misreading it. All the models have the onset of sprng earlier as a function of the general level of warming - what they don't appear to have is an additional shift in the phasing that is un-related to the mean warming. But this is of course is an error of the models being not sensitive enough. Hardly something to make one confident about the future. - gavin]

    3) He cited “The Climate Change Commitment”, T. M. L. Wigley (2005) Science 307:1766-9, which used MAGICC to model the “climate commitment” i.e. warming “in the pipeline, as support for his claim that most of the warming up to now could be solar in origin. The paper looks at what would happen if all GHG emissions could be stopped now (i.e. in 2005), and includes the following:
    “Past natural forcing (inclusion of which is the default case here) has a marked effect. The natural forcing component is surprisingly large, 64% of the total commitment in 2050, reducing to 52% by 2400.”
    This would seem to suggest (but I may be wrong here), that most of the warming since 1970 could be solar. I’ve noted that the history of solar radiance Wigley uses relies on a 1995 paper by Lean, Beer and Bradley, and that more recent work by Lean (2005) “SORCE CONTRIBUTIONS TO NEW UNDERSTANDING OF GLOBAL CHANGE AND SOLAR VARIABILITY”, Solar Physics 230:27-53 suggests much smaller past variability; but is this still a matter of debate?

    [Response: Well these things are alway debated, but the general feeling is that solar is smaller than we thought a few years back. More precisely, the reasons that people had for thinking solar was important seem to have gone away with more observations and better data. As for the warming now being solar, the answer is definitely not. You would have had a decelarating trend in that case, what has happened is the opposite, and then you have stratospheric cooling trends which are in complete contradiction with a solar source. - gavin]

    Comments? Thanks.

    Comment by Nick Gotts — 10 Feb 2009 @ 5:30 PM

  129. RE 119,121:

    You surely can’t be serious in thinking that sharing code and data with the public will result in compromised scientific results. If this is true then there are quite a few scientists I know who need to immediately begin work on coding their own LINPACK routines to avoid contaminating their work with errors from this open source library.

    And note that the same argument could be applied to the publication of almost any kind of information, including journal articles and, yes, blogging.

    Comment by Joe S — 10 Feb 2009 @ 5:45 PM

  130. Re: the edit to 120. It’s kind of hard when one side DOESN’T have to keep it polite.

    It also doesn’t help when the whole thing becomes like the house of commons in the UK where you can lie cheat and steal, but if you say that someone else is lying in the chamber, you can be in SERIOUS trouble. So you use euphemisms like “I believe the right honourable gentleman is mistaken”.

    It ISN’T nice to say “And according the J Smith’s claim “…”. Especially when

    a) you’re already in a bit of trouble trying to get someone to do work that is more work for them, less for you
    b) already stuffed it up at least once before off your own bat
    c) supposed to be a people manager

    I mean, don’t MBA’s get taught conflict management? I know the police don’t any more, but I thought it was still a required course for managers-to-be.

    Comment by Mark — 10 Feb 2009 @ 6:06 PM

  131. Re #21:

    Here’s another trick question: does a QA engineer ever design tests without reading the code being tested? Or would you claim that the only way a QA engineer can do their job is to be able to read the code before designing tests?

    Rhetorics aside, there are really both modes of testing and they both have their merits. One can do black-box testing without knowing the code and one can do white-box testing using the original code. There is also an in-between mode called grey-box testing where some, but not all knowledge on the internals is available and being used.

    Whichever mode one employs, one will need a way of creating test cases as well as a test oracle deciding whether observed behavior and results are correct or not. Creating test cases might be as simple as producing random input (fuzzing) or confronting the program with real users. (As an aside, a RealUsers blog might be funny. ;)) Deciding what is correct and what isn’t is often much more difficult.

    According to my personal experience the different approaches to testing tend to work well for different types of errors and different objectives of testing. So if the problem at hand is really comprehensive software testing, a combination of approaches and techniques is the way to go and not having the source code renders some techniques unavailable.

    But comprehensive software testing includes testing for e.g. security, real-time properties, or stability under invalid inputs, which seem irrelevant for scientific computations. Furthermore, the (idealized) objective of testing is often to find all relevant instances of defects, including for instance defects in error handling routines or related to technical tasks such as file handling.

    To validate the computational functioning of a program alone thus appears to me as a pretty narrow notion of testing and even more so of QA. I conjecture that for the computational aspects alone a black-box along with a decent specification will do the job and I suppose that scientific papers provide such specifications. They would be pretty useless if they didn’t.

    Just my 2c.

    Comment by Sven Türpe — 10 Feb 2009 @ 6:28 PM

  132. Re #33:

    How do I determine what went wrong? Is it my code, or has the data
    changed in some way? Have I made an error when converting the data?

    Known answer test vectors might do the trick if the computation isn’t too complex in terms of input and output. Run the program on a few input vectors and publish those vectors along with the results. This approach is used with cryptographic algorithms to facilitate independent implementations.

    Comment by Sven Türpe — 10 Feb 2009 @ 6:46 PM

  133. Sven, 127.

    Worked as a programmer for a company. For a time (because I’m a quick learner) I was put into the QA team for the same company, so the people who tested the code I did.

    One report had canned data.

    But I knew it was wrong because I knew the problem space, the code, and the expected answer. A bit of a renaissance man but not necessarily a good one.

    Turns out the report was wrong. Transposition of elements meant the values were 48.7% and 205.3%. But because it was canned data, the QA team just assumed that the canned data for testing was just atypical.

    Expectations. Need to manage them. But they aren’t part of Michael’s “Rigorous procedures” because a procedure that covers everything of interest must include the procedure itself. And like an ourobouros, this leads to trouble. If only for the procedure itself.

    Comment by Mark — 10 Feb 2009 @ 8:05 PM

  134. Increasingly influential highlevel DC gossip rag “Politico” employs remarkably credulous stenographer Erika Lovley to spread fertilizer both old and new (31,000 “scientists” waving a petition, 5 decades of cooling in the U.S., etc) on behalf of various entrenched commercial interests:

    “Scientists urge caution on global warming

    Climate change skeptics on Capitol Hill are quietly watching a growing accumulation of global cooling science and other findings that could signal that the science behind global warming may still be too shaky to warrant cap-and-trade legislation.

    While the new Obama administration promises aggressive, forward-thinking environmental policies, Weather Channel co-founder Joseph D’Aleo and other scientists are organizing lobbying efforts to take aim at the cap-and-trade bill that Democrats plan to unveil in January.

    [blah-blah, woof-woof redacted]

    Armed with statistics from the Goddard Institute for Space Studies and the National Oceanic and Atmospheric Administration’s National Climate Data Center, [Weather Channel co-founder Joseph] D’Aleo reported in the 2009 Old Farmer’s Almanac that the U.S. annual mean temperature has fluctuated for decades and has only risen 0.21 degrees since 1930 — which he says is caused by fluctuating solar activity levels and ocean temperatures, not carbon emissions.

    Data from the same source shows that during five of the past seven decades, including this one, average U.S. temperatures have gone down. And the almanac predicted that the next year will see a period of cooling. ”

    Full dreck here:
    http://dyn.politico.com/printstory.cfm?uuid=D0C4924D-18FE-70B2-A808D77A9C1FFFD3

    Old Farmer’s Almanac? I didn’t realize that was a scientific journal, but then I’m no expert.

    Anybody here have enough energy to comment on this latest zombie attack coming over the parapets? Sounds as though they’re seriously intent on eating some more brains…

    Comment by Doug Bostrom — 10 Feb 2009 @ 11:46 PM

  135. You surely can’t be serious in thinking that sharing code and data with the public will result in compromised scientific results. If this is true then there are quite a few scientists I know who need to immediately begin work on coding their own LINPACK routines to avoid contaminating their work with errors from this open source library.

    Oh, it’s totally serious in the context in which it was stated. Packages like LINPACK are subjected to a great deal of testing, vetting, release management to ensure it works with a variety of compilers and processors, etc. Not just pieces of code that a scientist cobbled together for a particular problem.

    Surely you understand the difference? If not, please stay as far away from the implementation and management of software products, open source or closed.

    Comment by dhogaza — 10 Feb 2009 @ 11:53 PM

  136. Regarding comment #63: “editor note: this was done in the case of Steig et al with respect to code … some of these data are proprietary (NASA), but will be made available in the near future”

    I second the question in comment #67.

    If possible, please provide more information about the “proprietary NASA data” used in Steig 2009. There have been several (at least 4) clear statements from Dr. Steig, both at his website and RealClimate, that his study used only “publically available data sources.”

    [reply: the raw data are public; the processed data (i.e. cloud masking) are not yet, but will be in due course. so relax]

    Comment by Fabius Maximus — 11 Feb 2009 @ 2:00 AM

  137. Gavins reply in #48

    Oh my GOD!!!
    Do you really mean that saving data and methods is obsolete because ten years from now we will have improved our methods?????!!!!!????

    That is the most unscientific argument I have ever heard…..
    And since I have teaching science to both high school kids, undergraduates and graduates……that says a lot.

    Forgive me if I have missed a previous comment. If not, then I am even more surprised that no one has commented on this earlier.

    [Response: Well, no one else had your ridiculous interpretation of the comment. Perhaps you think that the 'state-of-the-art' on Antarctic temperture trends in 10 years time (2019) will use data-sets that stop in 2006? Please try to have a clue. - gavin]

    Comment by Anders — 11 Feb 2009 @ 3:50 AM

  138. I’m coming late to the talk – life always seems to get in the way. As a researcher who has feet firmly planted in both fields – science and computer science – I can feel the frustration on both sides. From the science side, I thoroughly agree with Gavin et al… you only need to publish enough to document and replicate your work. Scientific papers are for scientist, after all, not the general public. If you doubt that, try opening a copy of Nature to the original work section, and hand it to your local coffee slinger.

    On the other hand, as a computer scientist, replication is an awesome criterion. It can be done in the scientific fields as well, largely NOT by publishing only the code and walking away, but rather by packing up the data set, governing formulae, statistical analysis parameters, and the statistical and other codes required, zipping them, writing a nice front-end user interface so that the replicator doesn’t get just a mishmash of hundreds of brittle files causing them grief and spitting out the number for climate sensitivity or whatever the key parameter of the study is. The computer science fanboys should appreciate that that is a LOT of work, hundreds of hours, and that it should be someone else’s job to do that, not the scientists. After all, they’re trained in *science*, not in scripting; they don’t typically work in such a way that automation is streamlined and easy to do; and they are trying to get papers correct and out, which is their *job*.

    With respect to the deep reason *why* there’s an unprecedented amount of oversight on the climate file, looking for gremlins in data analysis shouldn’t buy your conscience a pass from indulging in climate-changing activities. You’d have to ensure yourself that *all* of the papers, models, journal articles, replicates, evidence and whatnot contained invalidating errors. No computer scientist looking at the data in 2009 should really be able to convince themselves of this – even if they did hold the view originally.

    Myself, I think that a round-up of “untouched by humans” evidence might be beneficial in convincing those who could be convinced by evidence. The ice caps aren’t melting for nothing, Australia’s not drying up and burning away, and neither are the animals migrating towards the poles for nothing. No human analysis features in those facts, so you will have to contend with them as external (unbiased) evidence in your world-view, or be established as a hypocrite.

    Anyway, rant over, keep up the good work, chums!
    Stef

    Comment by Stef in Canada — 11 Feb 2009 @ 10:00 AM

  139. Joe S.,
    There is a big difference between a commercial or at least widely distributed and tested package and code used within a research group. The former have been validated in a wide range of applications. The latter may be adapted to a specific purpose and/or based on a model that has limited applicability. As an example of scientific code eventually gaining wide applicability, I would suggest looking at the GEANT collaboration for particle physics. Each group uses routines and can make modifications for their own code, but the collaboration has a stringent process of validation for each new piece of code and makes sure the models used and their limitations are defined.

    I would suggest that perhaps if you haven’t done programming in an environment that you may not fully understand the limitations and priorities for the code. More important, if you don’t understand the physical models, you will not understand the coded models.

    I would expect that eventually, you may have some more formal treatment of code, etc. However, it’s most likely to grow out of climate science as the need arises than it is to be imposed upon it from outside.

    Comment by Ray Ladbury — 11 Feb 2009 @ 10:38 AM

  140. Stef, I can’t speak for anyone else, but please don’t confuse scientific curiosity with any particular belief about climate sensitivity. In my case I’m interested in this topic because it is important, and because so much has been written. I am also interested in general in issues of scientific collaboration and openness.

    Dr. Steig’s paper uses a unique, and I might add very clever, new method to come to a conclusion that is somewhat different than prior results. I am quite interested to understand more of the details, and to see the comments of others as the data and code are made available.

    I actually think it might be possible to apply this technique to global temperatures as a test of the conclusions about the accuracy of the land based temperature network.

    Comment by Nicolas Nierenberg — 11 Feb 2009 @ 11:22 AM

  141. To expand on Stef in Canada’s “untouched by humans” comment, I would like to add phenological evidence – particularly those recorded in long-term datasets such as the Marsham Phenological Record & the Kyoto cherry-blossom festival dates (although there could perhaps be an UHI argument for the latter’s abrupt change).

    Comment by Chris S — 11 Feb 2009 @ 11:26 AM

  142. Steve (126),
    What does that excerpt have to do with publishing code or data? His points are nothing novel. That is the standard information that is already provided in the discussion/conclusion section of virtually every paper published.

    Comment by Mike G — 11 Feb 2009 @ 11:34 AM

  143. However, the bigger point is that reproducibility of an analysis does not imply correctness of the conclusions.

    I agree, however failure to replicate the publish results is an issue and without the possibility to access datasets and scripts you will never know if whether or not reproducibility is an issue. In my field, computer vision, we are plagued by the reproducibility issue because results published are more the product of a particular implementation than the method as expressed in terms of equations in the published paper. I’ve spend countless hours trying to reproduce published methods with promising results without success. In the rare case when the scripts are available, you’ll discover a lot of implementation details and parameter values that are not in the original paper.

    I’m a strong advocate of the Open Science 2.0 concept. Standards vary greatly between scientific fields, in Statistics for instance, most published methods are usually accompanied with R or S scripts from which the published results can be replicated. I understand that such a high standard cannot be followed in all fields because of the complexity of a particular research environment but publishing results from black boxes will certainly hurt the science in the long run. Also, an open and transparent Science is the best weapon against skeptics because it will greatly reduce endless speculations about implementation issues.

    Comment by Khebab — 11 Feb 2009 @ 12:02 PM

  144. re: 129. Nope, it will result in LESS SCIENCE being done. And what science is done will be held on hold in its validity until each and every query and nitpick about the code, program, data or phase of the moon has been answered. Given how “Mars is warming, so it’s gotta be the sun” and “It’s happened before, people” ***still*** gets wheeled out, probably several times.

    Which delays any action.

    Which *who* wants, again?

    Comment by Mark — 11 Feb 2009 @ 12:22 PM

  145. Dr. Schmidt,

    Based on your comment I would say that AR4 would have been better worded with a conditional essentially saying that there results may have been caused by a coincidental warming caused by circulation changes that coincide with areas of maximum economic development.

    Section 3.2.2.2 references a number of other studies that support the idea that the instrumental record isn’t polluted, and it isn’t clear why MM got this result. But the reason given in AR4 is just speculation.

    I’m sure there will be a lively debate over whether your paper shows that MM07 is fundamentally flawed.

    Do you understand the comment “greenhouse-induced warming is expected to be greater over land than over the oceans” as it relates to MM?

    [Response: That comment is clearly true, and given that MSU2LT data is more dispersed, similar global trends in both the surface station and MSU-2LT fields will imply that the land-only surf trends would be expected to be larger than the co-located MSU-2LT trends. - gavin]

    Comment by Nicolas Nierenberg — 11 Feb 2009 @ 1:28 PM

  146. Dr. Schmidt,

    I can see why the land trends would be higher than the MSU trends in those areas, but how does that relate to the MM study? They are looking at the differences as it relates to economic activity and other variables. Is the thought that economic indicators would be higher at coastal locations?

    [Response: It relates to what the true null hypothesis should like. Clearly the differences between MSU and surface stations will not be random or spatially uncorrelated. I used 5 model runs - with the same model - in lieu of having an appropriate null. But I am not claiming that they define the proper null hypothesis. I think looking at more models would be useful if you wanted to do that (but you still wouldn't be certain). The bigger problem here is that no-one apart from the authors thinks this methodology is valid, regardless of the results. I used it because I was interested in what would happen with model data - and the fact that there are very clear "significant" correlations occurring much more frequently than the nominal power of the test would imply, tells me that there is something wrong with the test. I'm happy to have other people pitch in and give their explanation of reasons for that, but continuing to insist that the test is really testing what they claimed seems to be unsupportable. - gavin]

    Comment by Nicolas Nierenberg — 11 Feb 2009 @ 2:56 PM

  147. Here is the fundamental issue as I see it.

    The Intergovernmental Panel for Climate Change (IPCC) Fourth Assessment Report (AR4, IPCC, 2007) states:

    A major advance of this assessment of climate change projections compared with the TAR is the large number of simulations available from a broader range of models. Taken together with additional information from observations, these provide a quantitative basis for estimating likelihoods for many aspects of future climate change. [ My bolding. ]

    Do the numbers from these “large number of simulations available from a broader range of models” GCM calculations have any meaning. My response is that the numbers have yet to be shown to be correct.

    One crucial and necessary first step is that application of Verification procedures have shown that the numbers produced by the software accurately reflect both (1) the original intent of the continuous equations for the models, and (2) the numerical solution methods applied to the discrete approximations to the continuous equations. That is, Verification shows that the equations have been solved right. Do the numbers actually satisfy the Verified-to-be-correct-as-coded discrete equations and do the solutions of the discrete equations converge to solution of the continuous equations. Neither of these has been demonstrated for any GCM. I will be pleased to be shown to be wrong on this point.

    All software can be Verified. Objective technical criteria and associated success metrics can be developed and applied in a manner that provides assurances about the correctness of the coding of the equations and their numerical solutions. Lack of Verification leaves open the potential that the numbers from the software are simply results of “bugs” in the coding.

    The present-day software development community, in all kinds of applications and organizations, is keenly aware that lack of SQA policies and procedures, and successful applications of these to the software, leaves open a significant potential for problems to exist in the software. So far as I am aware, there are no precedents whatsoever for public policy decisions to be based on software for which no SQA procedures have been applied.

    There is no other examples of calculations, the results of which guide decisions that affect the health and safety of the public, that are not Independently Verified. All aspects from front-end pre-processor to post-processing of results for presentation, are Verified. Everyone encounters numerous examples every day, all day.

    And this applies to all calculations; from one-off data processing and analysis to GCMs. Whenever Press Conferences are called, or Press Releases produced, to announce the results of even the most trivial calculation, the purpose is to influence the public. And the public, it is hoped, is provided information that helps guide and shape their individual thinking about the concepts reported. Under these conditions, the numbers must be Verified to be correct prior to public announcements. Every time.

    [Response: And all Adjectives must be Capitalised. Every time. - gavin]

    Comment by Dan Hughes — 11 Feb 2009 @ 4:21 PM

  148. There is no other examples of calculations, the results of which guide decisions that affect the health and safety of the public, that are not Independently Verified. All aspects from front-end pre-processor to post-processing of results for presentation, are Verified.

    Prove it.

    Your claim. Your burden.

    Comment by dhogaza — 11 Feb 2009 @ 5:00 PM

  149. It is really stunning that climate scientists are subjected to such demands while the denialists are quite content to unskeptically accept any old blatant pseudoscientific rubbish that comes along from any old ExxonMobil-funded propaganda mill.

    Comment by SecularAnimist — 11 Feb 2009 @ 5:02 PM

  150. Dr. Schmidt,

    I think that a statement like “no-one apart from the authors thinks this methodology is valid” is a little hard to prove and unlikely to be true.

    But I wasn’t asking about your paper, which did raise an interesting challenge to their approach, I was asking about what the comment in AR4 about oceans and land had to do with. If they were just commenting on the fact that satellite measured anomalies that cover land and ocean were likely to be lower than the surface measured anomalies at coastal locations, then I don’t see why that, in particular, was a comment on MM. Unless there is some correlation between coastal location and economic activity?

    Or maybe you think that comment in AR4 wasn’t really on point?

    [Response: I didn't write the comment in the AR4 report, and so my insights into their thought processes are no more likely to be insightful than yours. I imagine that it is a rebuttal to the null hypothesis in the MM07 paper that the presence of correlations between the difference between surf and trop and the economic variables automatically imply extraneous biases. That is clearly mistaken. - gavin]

    Comment by Nicolas Nierenberg — 11 Feb 2009 @ 5:04 PM

  151. Seems to Eli that if a largish number of models delivers about the same answer that is pretty good assurance that what coding gremlins there are, have only a small effect. OTOH, the place to look for significant stuff is if there are cores of routines used in all of the models. Since those shared elements tend to be the ones produced by the professional coders (usually for mathematical applications such as integration), the Ball is Back in Your Court Dan.

    Comment by Eli Rabett — 11 Feb 2009 @ 5:24 PM

  152. All software can be Verified. Objective technical criteria and associated success metrics can be developed and applied in a manner that provides assurances about the correctness of the coding of the equations and their numerical solutions. Lack of Verification leaves open the potential that the numbers from the software are simply results of “bugs” in the coding.

    If only the world were this simple. It’s not. Read a bit about the halting problem for a deeper understanding of some of the issues.

    Comment by Joe S — 11 Feb 2009 @ 5:37 PM

  153. > There is no other examples …, that are not
    > Independently Verified.

    Blather.

    http://www.agent2005.anl.gov/2005pdf/Kennedy%20et%20al.pdf
    VERIFICATION AND VALIDATION OF SCIENTIFIC AND ECONOMIC MODELS

    “… subjective methods typically require less effort than quantitative methods, can detect flaws early in the simulation process, and are often the only applicable verification and validation methods for exploratory simulation studies.

    “We next describe some of the subjective
    techniques proposed by Balci (1998) that may be applicable to economic and agent-based scientific simulations. His techniques are widely used in validating the models of manufacturing, engineering, and business processes. …

    “1. Face validation. This preliminary approach to validation involves asking domain experts whether the model behaves reasonably and is sufficiently accurate….”
    ________

    Comment by Hank Roberts — 11 Feb 2009 @ 5:46 PM

  154. I’m sure nobody’s heard of the halting problem, Joe. It’s not like it’s ever brought up as the pitfall of the Newton-Raphson method in mathematics…

    Comment by Mark — 11 Feb 2009 @ 6:02 PM

  155. Dan Hughes #147: No validation, huh? Gee, I wonder why none of the dozens of scientific professional and honorific societies that have looked IN DETAIL at the consensus has reached a similar conclusion. One has to ask, Dan, what you know that the National Academies, the American Physical Society… hell, even the American Association of Petroleum Geologists doesn’t. Maybe, to paraphrase Mr. Twain–what you know is something that just ain’t so.

    Comment by Ray Ladbury — 11 Feb 2009 @ 8:09 PM

  156. Dan: “demanding tests of climate models” are indeed important, but first, take a look at these:

    http://www.aip.org/history/climate/simple.htm

    http://www.aip.org/history/climate/co2.htm

    http://www.aip.org/history/climate/oceans.htm

    http://www.jamstec.go.jp/frcgc/manabe/cbiblio.html
    (A bibliography of Syukuro Manabe, a good record of climate model development)

    Manabe, S., and R.T. Wetherald, The Effects of Doubling the CO2 Concentration on the Climate of a General Circulation Model, Journal of the Atmospheric Sciences, 32(1): 3-15, 1975.

    Validation: That’s when you do a 20th century model run with known CO2 and temperature data (as well as CH4, etc) and see what happens when you leave out the anthropogenic increases. If you do that, the models underestimate 20th century warming. If you include the forcing, you get an approximate fit to the observed warming.

    Verification: That’s where you make predictions based on expected CO2 levels over the late 20th and 21st century, and compare them to events as they unfold. Predictions include warming at the poles and at elevations, leading to a loss of mountain glaciers and increased melt from Greenland and Antarctica, as well as expansion of subtropical dry zones associated with a warming troposphere. That’s all been observed.

    The funny thing about denialists is that they are inconsistent – if the models show warming but the observational data shows less, they’ll howl about poor models – and if the data shows more warming than the models, they’ll point to the model prediction as proof that there’s something wrong with the observations – like clockwork. To see how this works, look here:

    http://www.realclimate.org/index.php/archives/2008/12/2008-year-in-review/#comment-107910

    Sallie Baliunas 1999: “One demanding test of the validity of the computer simulations of the climate of the earth is based on temperature records from the Arctic…”

    You see, in the early 1990s a large volcanic eruption threw a lot of aerosols high into the atmosphere, providing a unique semi-experimental test of global model predictive abilities. It also cooled the planet slightly – but nevertheless, there was a record of steady sea ice loss in the Arctic in the classified archives of the U.S. Navy. The validated model’s predictions of the Pinatubo effect were also verified. Now, if you compared a model prediction prepared in 1988 with the observed temperature in 1997, you would find things cooler than expected… for obvious reasons.

    Of course, after the past decade’s Arctic warming was noted, as per predictions, the denialists all admitted they were wrong and printed retractions – but only after the press dragged them all before the national cameras and gave them a Hansen-style grilling on Glen Beck and Larry King Live… or did they?

    Comment by Ike Solem — 11 Feb 2009 @ 8:47 PM

  157. Of course, after the past decade’s Arctic warming was noted, as per predictions, the denialists all admitted they were wrong and printed retractions – but only after the press dragged them all before the national cameras and gave them a Hansen-style grilling on Glen Beck and Larry King Live… or did they?

    I always knew you normally reside in Bizarro World, where everything is the opposite of they way they are here! :)

    Comment by dhogaza — 11 Feb 2009 @ 9:54 PM

  158. Dr. Schmidt,

    I think it would be interesting to do additional tests on the model driven data from your paper. Are the surface temperature data and emulated MSU data from your model runs available for download?

    [Response: Have at it - here. - gavin]

    Comment by Nicolas Nierenberg — 11 Feb 2009 @ 10:25 PM

  159. Dan Hughes writes at 4:21 pm on the 11th of February 2009:
    “…do the solutions of the discrete equations converge to solution of the continuous equations. Neither of these has been demonstrated for any GCM.”

    Mr. Hughes will no doubt be shocked an appalled to learn that there is not even any proof or disproof of the mere existence, or lack thereof, of solutions to the 3-D Navier-Stokes equations, even in the incompressible case, and that a large prize is being offered by the Clay Institute for such proof or disproof.

    His scepticism notwithstanding, I can, and do, wield the 3-D compressible Navier-Stokes in anger, and obtain many useful results therefrom. So does every fluid engineer in the world. Merely because we umble fizicists cannot prove that a solution exists does not prevent Nature from solving it every day all around us. All we have to do as someone said, is get close enough.

    Comment by sidd — 12 Feb 2009 @ 9:21 AM

  160. Sorry, this isn’t necessarily relevant to the topic being discussed but I thought I would solicit a response here.

    Although I disagree with the consensus of this community in regards to AGW, I do find that most of you are rational in terms of the predicted consequences of AGW which are at odds with how the media general portrays those consequences.

    Recently, James Hansen has been attibuted with the claim that we only have 4 years to do something about AGW, which seems to be an outlandish and alarmist claim. How do the members of this community regard this claim?

    Your rational and non defensive responses are appreciated.

    [Response: When somebody quite sensible is quoted as saying something patently ridiculous, the response should be to question whether that was really what was said and whether the context supports the interpretation. Hansen's statements are all available on his website and are well worth reading carefully. In this case (see page 3), he is clearly referring to the length of a presidential term, not the imminent collapse of world civilisation. - gavin]

    Comment by Robert — 12 Feb 2009 @ 1:44 PM

  161. “I’ve never understood all the hoopla about the codes.”

    Ahhh…job security for me! ;) I’m a software test engineer, a.k.a. software quality assurance analyst, a.k.a. whatever we happened to be called at a given company.

    In *any* field where results depend on numbers being crunched by software, there is the possibility that an error can be introduced by a bug in the code. It may be trivial or insignificant, but it can also be catastrophic (Mars lander, anyone?). “Catastrophic” doesn’t happen often, thank goodness, but having someone *other than the person who wrote the code* check it for bugs is a basic precaution against spectacular failure. It’s amazingly easy for the author of the code to overlook something that will be obvious to a tester, simply because the author knows what he/she *meant* the code to do, therefore it *looks* like the code does it that way.

    So my point is, it may seem like bugs in the code are not a big deal because ideally they get fixed before software is used or released. You won’t notice they aren’t there because, well, they aren’t there.

    Comment by Maya — 12 Feb 2009 @ 3:17 PM

  162. Maya, #161, so you’re requesting work that you can and are employed to do should be done in an entirely new field..?

    Looking to make a lot more money, are we?

    Comment by Mark — 12 Feb 2009 @ 5:56 PM

  163. @161,
    Wouldn’t it behoove one to create and test one’s own code on the data? It seems to be all about theft over toil to me.

    Comment by wildlifer — 12 Feb 2009 @ 6:05 PM

  164. I’m finding the discussion here reminiscent of my own career – 5 years as a postdoc mostly running computational codes of one sort or another, added to my graduate degree in physics, left me with somewhere around 50,000 lines of code I had either written or heavily modified for my purposes (mostly C, some fortran, some perl – this was 15 years ago). A few bits and pieces were original and I put some effort in to make them shareable – graphics and PostScript creation, a multi-dimensional function integrator, etc. A few were done as part of much larger projects and at least ended up under proper revision control as a contribution to that project (that was my intro to CVS). But most were one-off things that tested some hypothesis, interpreted some data file, or were some sort of attempt at analysis. 90% of the time they weren’t a lot of use, and spending extra time documenting would have seemed pretty worthless – I used “grep” a lot to find things later. Sure they could have been made public, but nobody would have any idea what command-line arguments I’d used or the processing steps I’d taken, except in those rare instances where I anticipated my own reuse and created an explanatory “README”. Probably simpler for another scientist to just do it over from scratch than try to figure out what I’d done from looking at the code.

    And now I’m a professional software developer in a group where we have quite rigorous test and development procedures, everything is checked into a version control system and regularly built and run against regression tests to keep things robust. Nevertheless, I still have a directory with hundreds of one-off scripts that fit in that same category of being easier to rewrite than to generalize, and there’s little purpose in making them publicly available or putting them under version control since at most I’ll use them as starting points for other scripts rather than re-using as they are in any significant way.

    I’m not sure it was Fred Brooks or somebody else, but the expression I recall reading long ago was that turning a prototype into an internal software product took roughly a factor of 3 more effort, and turning an internal product into something you could publicly distribute (or sell) took roughly a factor of 3 times the effort beyond that. Software always falls along this spectrum, and most of what scientists use tends to be at the “prototype” level, simply because of the exploratory nature of science. Theoretically it would be nice to have the resources to keep everything clean and nicely polished, but if 90% of it is code you’re never going to re-use, what’s the point?

    Comment by Arthur Smith — 12 Feb 2009 @ 6:43 PM

  165. Gavin,

    I appreciate the response. I read page 3 and take James Hansen interpretation to mean that his original answer gave the impression that we had 4 years to solve the problem but wasn’t reflective of what he meant. His solution was to be more careful in how he answered those types of questions. I applaud him for recognizing that he should be more precise when answering questions from the media.

    Thanks for addressing my question.

    -Robert

    Comment by Robert — 12 Feb 2009 @ 8:58 PM

  166. re: #164 Arthur
    Yes, that was Fred. It’s Figure 1.1, the first one in the book, a 2×2 square:
    Program (1), Programming System (3)
    Product (3), Programming System Product (9)

    Comment by John Mashey — 13 Feb 2009 @ 12:24 AM

  167. Thanks to this site for leading me to Spencer Weart’s great
    book THE DISCOVERY OF GLOBAL WARMING.

    I literally could not put it down. So much of the puzzle fell into place for me as I read it. I intend to recommend it (along with this site) in future articles for the Rico Bugle and will mention it on both the ASA (American Scientific Affiliation) mail list and my own web site.

    Comment by John Burgeson — 13 Feb 2009 @ 10:24 AM

  168. On 13 February 2009 at 10:24 AM John Burgeson wrote:

    “Thanks to this site for leading me to Spencer Weart’s great book THE DISCOVERY OF GLOBAL WARMING. .. So much of the puzzle fell into place for me as I read it…”

    If you don’t mind my asking, I am interested in learning which pieces of the puzzle the book helped you with in order to know if it would be worth my time to read. Thank you.

    Comment by Naj Tam Hudat — 13 Feb 2009 @ 12:08 PM

  169. A fair question. My introduction to the whole GW issue took place only recently (late 2006) when I was urged to read and review Al Gore’s book for the Rico Bugle. That review was published in PSCF (the journal of the ASA) and also in the Bugle. A copy of it is at http://www.burgy.50megs.com/inconvenient.htm

    I had read (off and on) other articles and books on the issue, but Gore’s book convinced me that the subject was one in which I needed to invest some serious study. (For the past 15 or 20 years my efforts have been directed to fighting the pseudoscience of the “young earth” advocates.) Once a physicist, I spent most of my life in the computer business with IBM, retiring almost 15 years ago. I know — at my advanced age, what should I care? Well — I have 12 grandchildren!

    Initially my reading was pretty much ad hoc. I wrote about the issue in December 2008 (www.burgy.50megs.com/gw.htm) but at that point I still had not really understood the science-history behind it all.

    Weart’s book filled in the gaps. I had had no idea the work behind GW went back so many years, or had involved so many peple. It is simply a great story, and forms, I assert, a foundational understanding for anyone who, having been sucked into the controversy, wants to understand it from the beginnings. The “puzzle peices missing” were the pre-21st century efforts, studies and events.

    Again, I recommend it. It is definitely “worth your time” to read.

    Thanks for asking.

    John (Burgy)

    Comment by John Burgeson — 13 Feb 2009 @ 12:36 PM

  170. Re: #168 (Naj Tam Hudat)

    As about a thousand people who regularly read this blog can tell you, Spencer Weart’s Discovery of Global Warming is definitely worth the time to read. As for which pieces of the puzzle will fall into place … ALL of them.

    Comment by tamino — 13 Feb 2009 @ 12:42 PM

  171. Naj, Weart’s book is excellent for placing current science in a historical context. If you want to understand how climate scientists have reached their understanding of the driving forces in climate, this is an invaluable resurse. READ IT!!! It’s also a good read.

    Comment by Ray Ladbury — 13 Feb 2009 @ 12:51 PM

  172. To Burgy, tamino, & Ray (#’s 169,170,171) above:

    Thank you very much for your comments. I will definitely look into it.

    Comment by Naj Tam Hudat — 13 Feb 2009 @ 1:53 PM

  173. Naj Tam Hudat,

    You will be interested also in Spencer Weart’s website. RC has a link on the right side of your screen “AIP:Discovery of Global Warming” which I personally find more useful and comprehensive than the book (and obviously more accessible).

    Comment by Chris Colose — 13 Feb 2009 @ 6:32 PM

  174. “Maya, #161, so you’re requesting work that you can and are employed to do should be done in an entirely new field..?

    Looking to make a lot more money, are we?”

    Um, no….. I’m really puzzled by your question. To what field do you imagine software testing is confined? Software = computer code. Certainly some fields use computers more than others, but I honestly can’t think of any scientific endeavor that doesn’t use computers at all.

    “Wouldn’t it behoove one to create and test one’s own code on the data? It seems to be all about theft over toil to me.”

    Theft by whom? The tester? Why? I have a hard time imagining what would be worth the risk. I mean, I won’t swear it never happens, but I’ve never personally encountered a case of industrial espionage. Heck, even insider trading will get you crucified – I shudder to think what would happen to someone caught selling proprietary source code.

    And yes, a developer tests his/her own code, but it’s a basic principle of software development that you also have someone else look at the code. Even in shops that weren’t big enough to have a dedicated QA staff, we checked each other’s code, both in code reviews and in functional testing. It’s just way, way too easy to overlook an error in your own code. I’ve been a developer as well as a tester, so I’m not pointing fingers at developers – it’s just human nature. It’s metaphorically akin to not being able to find your eyeglasses because they’re on top of your head.

    Comment by Maya — 13 Feb 2009 @ 8:22 PM

  175. Maya Says (13 February 2009 at 8:22 PM):

    “And yes, a developer tests his/her own code, but it’s a basic principle of software development that you also have someone else look at the code…”

    Even/especially in development. There’ve been plenty of times when I’ve been beating my brains out over some problem or other, and had someone else look at the code and spot the problem in minutes – and of course I’ve done the same for other people. All too often, you see what you expect, not what’s really there, as for instance repeated word errors in writing…

    Comment by James — 13 Feb 2009 @ 11:54 PM

  176. But the product of software development IS the software.

    The product of software development in science is the science.

    Now, if there ARE errors, they could be as easily found by proper review of the code as it could be by running the science again with different software.

    After all, fly-by-wire systems use triple redundancy in hardware, software and language used because they don’t believe that even the most rigorous review possible is not enough to determine there is no error and the only way to determine there is no error is to run the same process on the same data with different software and check the results.

    Which, oddly enough, is what science does at the moment…

    Comment by Mark — 14 Feb 2009 @ 8:10 AM

  177. Mark seems to think I am a a delayer, because I have a disagreement with Gavin.

    In fact, as far as policy goes I think I’m pretty much in Hansen’s camp. CO2 emissions need to be restrained as vigorously and as early as is feasible. I consider that to have been established by 1990. Whatever further climate research is or should be about, it is not that. (We can call on new science to argue about what the safe level should be, but that is no argument for delay in getting started on a new trajectory.)

    So it’s peculiar to be cast as a member of the bad guys on the basis of a secondary disagreement. It’s both amusing and disturbing. Perhaps Mark is making this mistake because he has an oversimplified model of public discourse which has probably been set by the horribly polarized politics of the US over the past twenty years.

    All I’m saying here is that the argumentative position taken by Eric and Gavin is one they ought to loosen their grip on.

    Much talk here has been about publication and portability of code. These indeed are thorny issues. However, the idea that one should be able to replicate published results locally should not be controversial. What’s more, for desktop codes on commercial platforms like Matlab, this is easy to achieve. That’s all I’ve been trying to say here. Arguments to the contrary on those points seem awfully strained to me.

    My reason for concern and active participation here is that arguments that seem plausible to scientists and implausible to practicing engineers weaken the reputation of the field among influential groups where it matters. This stems from a mutual disrespect between scientists and engineers that is very counterproductive. As someone who is a little of both, it hasn’t helped me any, either.

    Comment by Michael Tobis — 14 Feb 2009 @ 10:57 AM

  178. Gavin,

    I’m glad you started this discussion. Clearly a lot of people who are interested in climate science, but are not scientists themselves, underestimate the inherent difficulty of perfectly duplicating a result that appears in a peer-reviewed scientific journal. Nevertheless we must exercise due diligence in presenting our results ways that make them reproducible.

    To me the key distinction is between “data sharing” and “data auditing.” The former is essential for science to progress. The latter, except in special cases where misconduct is reasonably suspected, just slows things down.

    Regards,
    Curt

    Comment by Curt Covey — 14 Feb 2009 @ 3:20 PM

  179. http://www.google.com/search?q=define%3Asciolist

    Comment by Hank Roberts — 14 Feb 2009 @ 9:54 PM

  180. So, how’s the science coming?

    Comment by Walt Bennett — 14 Feb 2009 @ 10:54 PM

  181. #178 Curt Covey (14 February 2009 at 3:20 PM )
    I think the argument that Michael T has been stressing is that some scientists underestimate the inherent ease “of perfectly duplicating a result that appears in a peer-reviewed journal.” Some scientific fields simply haven’t embraced the standards and work practices that will enable this. To argue otherwise opens those claiming it is difficult to a lack of credibility which casts a shadow on the substance of their research.

    Comment by Bernie — 15 Feb 2009 @ 12:12 AM

  182. Why must we wait another 5 years for the IPCC 5th assessment report, what can’t this be fast-tracked to 2013 or earlier. My understanding is the 4th IPCC assessment did not give mention to many +ve feedback loops due to poor understanding of the mechanics behind them , such as the 1GTonne of methane and CO2 tied up in the melting arctic tundra, also the fact that more equatorial forest is burning thus turning a vital carbon sink into a very efficient carbon source and the fact that the low altitude ocean wind currents are increasing thus bringing to the surface CO2 saturated water that cant absorb much more atmosperic C02. All these factors were not mentioned in the last report. Oh! almost forgot.and the fact that China and india are steaming ahead despite a Global financial crises blip at spewing millions of additional tonnes of soot and CO2 into the air from uncontrolled reliance on coal fueled industry and power stations. If you feed those positive feedback loops into the equation of how long we have until the point of no return is reached, I think you’ll find we have crossed that fateful mark already, what do you think? Unstoppable? Yep! good description.

    Comment by Lawrence Coleman — 15 Feb 2009 @ 2:53 AM

  183. Bernie, Bernie, I says, who but the scientist knows or needs the science? Surely since they are the ones who have to use the paper to replicate or refute the result, and the code doesn’t have to follow the documentation (see for example MSDN code that doesn’t work), should they not be considered when assessing the relative work/benefit assessment?

    Comment by Mark — 15 Feb 2009 @ 7:04 AM

  184. Re 177, no, I think your idea is a delayer. Stop trying to work the victim angle. You’re not. You’re idea isn’t going to do much more than slow things down.

    In fact, if you figure that it is already proven, there’s no need to persue more funding to investigate, and no need to re-prove old findings. And the idea you’re peddling just a way to get more work (which requires more funding).

    Comment by Mark — 15 Feb 2009 @ 7:13 AM

  185. Duplication in climate sciences must be extremely difficult, it’s not easy to compress our biosphere down to test tube size. It’s very dangerous as well because if two results are out even slightly you will tend to cast suspicion on the entire experiment. Nothing in our biosphere is exactly the same moment by moment. There is way too much complex interaction by miriads of variables from the atomic level up to entire weather systems. If we wait and wait and delay and procrastinate and call for other tests and universities to comfirm previous experiments for precise duplication/ replication and do nothing to address climate change we are history..simple as that.
    This is where governmental logic must play a vital part. If Barack Obama understands the mechanics and jist of climate change without having to wait for every scientific/bureaucratic ‘i’ to ne botted and ‘t’ crossed and takes swift and decisive action we might have a glimmer of a chance, but not if he waits for a certain unrealistic degree of accuracy or perfect replication in the data.

    Comment by Lawrence Coleman — 15 Feb 2009 @ 10:08 AM

  186. #184
    Markie Mark: I am confused as to the meaning of your comment. Which scientists are you referring to? (Cui decide?) You also seem to be arguing that there is some significant burden in documenting the code and packaging the data? This simply does not seem to be the case in the vast majority of instances.

    Comment by Bernie — 15 Feb 2009 @ 10:50 AM

  187. You also seem to be arguing that there is some significant burden in documenting the code and packaging the data? This simply does not seem to be the case in the vast majority of instances.

    Then why does it cost commercial software companies so much money?

    Comment by dhogaza — 15 Feb 2009 @ 12:16 PM

  188. > this simply does not seem

    Baloney, read the thread. The nonscientists can’t see the problem any scientist has with the notion of freely giving them everything they need to act like scientists — in a cookbook recipe form tied up with a bow. You want help to act like a scientist without knowledge.

    If it were that easy, you’d have been taught how to do real science in grade school. Instead they tried to teach you basics. Guess why.

    Comment by Hank Roberts — 15 Feb 2009 @ 12:18 PM

  189. Dr. Schmidt,

    Could you describe the process that you used to convert the RSS data for use in your paper. I note that both RSS and UAH are on 2.5 degree grids, but I believe the surface temperature data is on a 5 degree grid.

    I’m asking because the trend in both the the RSS data that you provided and the UAH data that Dr. McKitrick provided show identical trends over the time period (.237 versus .232), but they have very different standard deviations (.183 versus .133).

    Given that both of these are measuring the same thing, and the fact that their trends are identical I’m wondering what is causing the difference in standard deviations between the two provided anomaly trends.

    I note that I have posed the same question to Dr. McKitrick over at CA.

    Comment by Nicolas Nierenberg — 15 Feb 2009 @ 1:27 PM

  190. Hank:
    I followed the thread. Your “cookbook” argument is pure unadulterated evasion, a head fake. Just as the pretense that it takes too much time or that “scientists” need to write undocumented code. It is a silly and juvenile argument. Michael T called Eric on it. Times have changed. Expectations have changed. Standards have changed. This is an argument those who decline to post their code and data cannot possibly win.

    Comment by Bernie — 15 Feb 2009 @ 5:02 PM

  191. To the points argued here about replication.

    I am currently looking at the Schmidt paper and the MM paper. Both authors have provided data sets. But as I looked at the data sets I was surprised to find that the distribution of data was different in the summarized RSS data than in the summarized UAH data even though the mean trends were identical.

    At the moment I can’t tell if this was an artifact of the original data, or the way that it was summarized to produce the decadal trends on these particular grid locations.

    The fact is that I don’t have the code or method that Dr. Schmidt used to summarize the RSS data. I have some old files written in an obscure language that Dr. McKitrick has pointed me to.

    Now if I want to investigage I am forced to do work to try to duplicate the methodology used by each author, and I won’t know I am doing it correctly until I get something that matches their data. Wouldn’t it have been nice if they had used the same summarization code, or commented on why that wasn’t appropriate?

    I want to point out that it is really no big deal in the grand scheme of things, but it is a good example.

    Comment by Nicolas Nierenberg — 15 Feb 2009 @ 5:07 PM

  192. Bernie, First, who says scientists are writing undocumented code? It is merely that the documentation is not intended for an outsider. My PhD was in another computer intensive area–experimental particle physics. Each group writes most of its own analysis, precisely because each analysis is unique. In our group, we usually had multiple people looking for the same particles, and we were careful not to share code even within our own group in order to preserve the independence of the efforts. Eventually, particle physics did start producing common code–for example, the GEANT package of Monte Carlo and analysis codes. The point is that when the need arises, the solution will be developed by the scientists themselves, and it will be a helluva lot better than any solution imposed from without by a bunch of self-appointed “auditors”.
    I would suggest that if you are interested in the subject, you start with learning the science. You will then be able to produce your own code.

    Comment by Ray Ladbury — 15 Feb 2009 @ 8:29 PM

  193. Dr Nierenberg (#191)

    Although I do not work in the area of climate science, what I will do when faced with such a situation in my own research is
    (i) mail the authors,
    (ii) irrespective of the reply, test the algorithm for both the grids. The fact that you find different results is a more interesting observation than mere duplication (note: not confirmation) of the results would have been. This means that either the results are not robust, or they might depend on some other factor. Figuring out why is where the next step in our understanding would lie.

    In passing, your comment about the code from Dr McKitrick being in an obscure language summarizes one of the major arguments against why providing code is in any way required or useful for science. Coding languages change, software versions change (not always in a backward-compatible manner), and even algorithm implementations change. A mathematical description is the only useful thing beyond a couple of years. I would even venture that with the proliferation of versions of software tools, as well as code that any functional research group would invariably produce, a mathematical description is the only thing that I would advise to be made public.

    Comment by trying_to_make_sense — 15 Feb 2009 @ 9:33 PM

  194. Re 191

    I am currently looking at the Schmidt paper and the MM paper. Both authors have provided data sets. But as I looked at the data sets I was surprised to find that the distribution of data was different in the summarized RSS data than in the summarized UAH data even though the mean trends were identical.

    UAH and RSS use quite different approaches in their coverage of the antarctic, probably accounting for the seasonal difference between the two, perhaps that is related to your problem?

    Comment by Phil. Felton — 15 Feb 2009 @ 10:21 PM

  195. This whole discussion seems to center on the lack of programs used to arrive at the conclusions in published papers. I suggest that the place to look is not in the papers, rather in the dissertations of the grad students who are listed as authors. I recall that the last several score pages of my dissertation was a listing of code that I used. In my grad school days, some decades ago, when I wanted to steal code, if for some reason could not get it directly from the principals, I found that University Microfilms (in the USA) was my friend. (I must say that I usually regretted stealing the code and rapidly wrote my own…) And these days, one can usually find the code on the web directories of the usual suspects.

    Comment by sidd — 16 Feb 2009 @ 12:31 AM

  196. Re #182: “Why must we wait another 5 years for the IPCC 5th assessment report?”

    Lawrence, your plea has been answered! Next month in Copenhagen there will be held what amounts to a “mid-course correction” conference for the IPCC. We’ll be hearing a lot about it, I’m sure.

    Re #191: “Now if I want to investigage I am forced to do work to try to duplicate the methodology used by each author, and I won’t know I am doing it correctly until I get something that matches their data.”

    But if you come up with a valid method that gets a different result, you’ve learned something important. It seems to me that if you’re not doing something along those lines, you’re doing “auditing” rather than science. I think this nicely illustrates Curt Covey’s point in #178.

    Comment by Steve Bloom — 16 Feb 2009 @ 12:50 AM

  197. Re 190, events leading up to it: I think we’ve just witnessed the birth of a brand-spanking new red herring. How touching.

    I’m sorry to see how much time has been spent on this topic by the tiny handful of people actually performing useful work on the topic of climate change. I’m even sorrier to imagine the continued drain on time as disingenuous and strangely inconsistent demands for climate simulation-related code multiply like rabbits.

    All this talk of textbook perfect documentation and transmission of what are essentially one-off runs is useless smoke; I’m sure there are a few people who can’t help but actually care deeply about buttoned down collars and perfectly knotted ties but in point of fact most of us prefer to use our time more productively.

    In any case all this generalized discussion of how nice it would be to have prettified code sort of begs the question of whether anybody asking for it has even the most rudimentary capability of understanding what it does. Personlly, I doubt it, otherwise they’d be publishing in the climate field.

    At the end of the day, the only thing that really matters is data. It’s up to the peanut gallery to come up with their own processing methods.

    Comment by Doug Bostrom — 16 Feb 2009 @ 3:21 AM

  198. Re #182, I doubt that scientists will comment on anythin that is not knowable in this realm. Lots of things that happen on our planet are short term change, natural variability and hence all the talk of a faster warming Arctic and the potential of large scale methane clathrates release may or may not happen. The pro climate change media seem to be stating that its a bona fide fact that it will be released but its not a scientifc fact as yet. When in 2007 the Arctic sea ice melted faster then predicted it could be due to AGW or it could be due to natural variability and within a few years the sea ice may indeed recover sufficiently that the predictions of AGW may be right.

    the IPCC has got to be sure on this and cannot just state that their models are conservative relative to the real world and warming is hapenning faster than predicted. Their next report is 2012 which is fine as it takes all this time to collate all of the information. Let us get the science right and put to bed the arguments the media loves, opinion and debate on the subject.

    Comment by pete best — 16 Feb 2009 @ 5:07 AM

  199. Dr. Schmidt, in your paper you state the following criteria for evaluating Ross’s correlations:

    If the distribution encompasses the observed correlations, then the null hypothesis (that there is no contamination)cannot be rejected.”> (Emphasis added)

    Since the distribution of correlations that you generated does not encompass the correlations found by McKitrick, this means that logically — by your own criteria — we must reject the null hypothesis and conclude that there is contamination in the data. Why, then, have you (apparently) abandoned the very criteria you stated in the paper and instead concluded that the McKitrick correlations are “spurious”?

    [Response: The statement you quote is true and applies in general, in particular, to the discussion of the de Laat and Maurellis paper. It is not however the sole reason why a correlation might be spurious. If you read the section regarding MM07, the issue there is that the quoted significance is likely to be very strongly over-estimated:

    ... the preponderance of nominally significant correlations certainly implies that the reported F -test values are not a fair assessment of the hypothesis put forward by MM07. We find that supposedly 95% significant correlations to ‘g’ and ‘e’ (in experiment G3) occur in 3 and 4 (respectively) simulations out of 9, roughly 7 times as often as should be expected if the ‘significance’ test used by MM07 had even its minimum reported power. This clearly demonstrates that there are far fewer degrees of freedom in these correlations than they assumed.

    It would be nice to have a good estimate of the distribution of the correlations of the null hypothesis in this case, but I doubt it can be properly defined with only 5 model runs. I would encourage someone interested in pursuing this further to start looking at the full set of AR4 model runs to get a better handle on it. Until then, it will be unclear how the line you quote applies to MM07. - gavin]

    Comment by Michael Smith — 16 Feb 2009 @ 6:36 AM

  200. Bernie-wernie, #186, it isn’t needed. It’s probably *nice* to have, but it isn’t needed and, like I say time and again but nobody seems to be reading it, fly-by-wire systems don’t rely on proven code methodologies. It relies on replication of results and voting on what to do to make it reliable.

    Comment by Mark — 16 Feb 2009 @ 6:45 AM

  201. There is an interesting discussion on an attempt to replicate Steig et al Antarctic warming going on now in the blogosphere. Perhaps Gavin would like to comment.

    http://wattsupwiththat.com/2009/02/15/redoing-steig-et-al-with-simple-data-regrouping-gives-half-the-warming-result-in-antarctica/

    Comment by captdallas2 — 16 Feb 2009 @ 7:29 AM

  202. Commentary for Michael Smith:

    “If the distribution encompasses the observed correlations, then the null hypothesis (that there is no contamination)cannot be rejected”

    This statement has the logical form:

    If X, then not-Y

    This is equivalent to:

    If Y, then not-X.

    But it implies nothing about the case:

    If not-X, then…

    i.e., it says nothing, “logically” or otherwise, about:

    If the distribution does NOT encompass the observed correlations, then…

    Comment by paulina — 16 Feb 2009 @ 9:01 AM

  203. Re:198 Thanks Pete, you’ve given be well moderated anwsers to my questions in the past but in relation to my argument in 185 to what degree of duplication is necessary for a hypotothis to become scientific fact and how many independent studies must be undertaken and how many valuable years will this take (I almost used the term ‘waste’). If we wait for govermental agencies to get 100% certainty as to what degree climate change is natural variability and what degree is anthropogenic we’re cooked already. The luxury scientists had in the past on being fastidious is NOT relevent anymore for this subject…we are in unprecedented times. If a raging forest fire is hurtling towards your house at 50miles/hr, (very revalant for me in australia although not living in victoria thank god!) even if you were a nobel winning scientist you are not likely to go and measure relative humidity, which vectors and variabilty the wind is coming from, the relative darkening of the sky, a spectrograph of the brightness of the glow on the horizon etc..you are going to get your most valuable things and family jump in your car and get the hell out of there taking decisive action that will ultimately save you and your family’s life. I can see no difference at all with climate change and the response we all have to take. Sorry for being blunt..but you would have to be the biggest numbskull not to believe that CC is caused primararily by human factors. 400000 years ago the water level was 21ft higher (recent studies in Bermuda) than today with similar and corresponding rise in CO2, today it is over 25-30% higher than 400kyrs ago and guess what the sea level rise is again accelerating year by year. We simply cannot wait for the type of replication scientists were used to in the past. The scientific community and media has to make it blindingly obvious to any policy maker that the time to take decisive action is TODAY.

    Comment by Lawrence Coleman — 16 Feb 2009 @ 9:21 AM

  204. Hank #107, tamino #108: this is somewhat relevant.

    http://www.paulgraham.com/opensource.html

    Comment by Martin Vermeer — 16 Feb 2009 @ 9:28 AM

  205. RE 201,

    Tell him to publish it in a scientific journal so we can see what it is worth… I don’t have time to check out all the “tries” that are done on blogs.

    Comment by Magnus Westerstrand — 16 Feb 2009 @ 9:47 AM

  206. Magnus, yeah.

    But off-the-cuff, one obvious problem with the method proposed is that the “simple data regrouping” also transmogrifies the empirical correlation matrix of the data…

    Remember the RegEM is based on using the empirical correlation directly between the station data. That already includes implicitly the long range spatial correlations present in the data, in a theoretically optimal way.

    Now you move around some points, by up to 550 nm (or more?) and merge a number of them into new points, averaging their data. The new “virtual stations” will have a vastly smeared out correlation matrix, with probably a much longer correlation range. IOW where the data is sparse, in the East, it will also get a higher areal weight. I think. Or something.

    Why oh why re-do something the method is already supposed to be doing right?

    [Response: Do you really need an answer to that? - gavin]

    Comment by Martin Vermeer — 16 Feb 2009 @ 10:29 AM

  207. I’m new to RealClimate, so perhaps I’m asking for something that is easily available but just don’t know where to find it. I would like to be able to understand how the various computer programs that simulate future climate are put together. If possible I’d like to be able to look at some of the code. Can you point me to a source? Thanks.

    [Response: NCAR CCSM or GISS ModelE are freely available for download (the former is prettier than the latter). - gavin]

    Comment by Bill Hamilton — 16 Feb 2009 @ 10:46 AM

  208. There is an interesting discussion on an attempt to replicate Steig et al Antarctic warming going on now in the blogosphere

    Watts et al are science illiterates, there’s really no point for adults to waste their time with their pseudo-scientific endeavors.

    Comment by dhogaza — 16 Feb 2009 @ 10:46 AM

  209. I find this discussion about code and data to be very entertaining – and I’d suggest that the same level of scrutiny be applied to the various clean coal proposals that are circulating through Congress, especially the issue of FutureGen, the public-private partnership between coal and mining and electric utility interests and the Department of Energy, set up in 2003 and managed by the contractor Battelle (who also operates the National Renewable Energy Lab for the the DOE).

    There are no publications on the supposed technology, and no data on performance has been made available. Even though the project was set up with DOE funding, all data on the technology has been kept secret because it is “proprietary” – and the recent stimulus package included $2 billion for FutureGen. There is certainly a large issue there, right?

    Most likely, the technology just doesn’t work and the project is simply being promoted as a greenwashing effort by the coal industry and a cash cow for whoever gets the contract. In particular, the effort to sequester the emissions is doomed to failure, which we’ve explained on simple mass balance and thermodynamic grounds: there’s no place to store the CO2, and capturing all of it would probably suck up 90% of the power plants electric output – meaning 1/10th the power per ton of coal compared to a modern dirty power plant.

    What is obvious is that the world will have to voluntarily stop burning coal, and that means shutting down the coal-fired power plants that do exist and replacing them with solar and wind energy. This will be a gigantic task; coal-fired electricity generation is at something like 2,000,000 GW of power, while wind is around 20,000 GW and solar is near 500 GW – and that’s just in the US. There is no way a ‘market-based approach’ is going to change that situation.

    For a good discussion of the political manioulation that results in $2 billion for a coal plant and nothing at all for solar and wind demonstration systems, despite popular demand, see this:

    At the same time Peabody Energy celebrated an eightfold increase in profits last quarter and announced its intention to reopen the controversial and widely denounced strip mine on tribal lands on Black Mesa in Arizona, Sen. Durbin has been arm-twisting Department of Energy Secretary Stephen Chu and President Barack Obama into subsidizing Peabody’s–and a host of the world’s largest extraction companies–FutureGen boondoggle.

    Another coal plant that should be halted is the Desert Rock coal plant – indeed, a public moratorium on coal plant construction should be implemented, and a plan for closing down and replacing every coal-fired plant in the U.S. should be drawn up immediately.

    Comment by Ike Solem — 16 Feb 2009 @ 11:00 AM

  210. Yup. Wossname up there doesn’t like the recipe/cookbook metaphor and thinks that’s a herring of the reddish variety. Block that metaphor.

    Michael Tobis and I both saw the original exchange as a cookbook/recipe issue, though from different perspectives.

    The exchange escalated fast; it went something like

    —I want the code (reply, see the Matlab code)
    —I need _your_ code (reply, I’m packing, you can take my class)
    —I mean the _Antarctic_ code (reply, leaving for Antarctica now)
    —I run a company, give me it all (beep…please leave a message)

    Tobis didn’t recognize jeffid. He answered discussing what’s ideal.

    Cookbook/recipe demand? Yeah, I think that’s still about right.
    A good cookbook includes general instructions; a good recipe includes specifics about amounts. Between chefs, a respectful answer may well be “I got the ingredients _there_, of course you know what to do.”

    Ideally, each question is answered by providing a cookbook as well as the recipe, and perhaps even a guide to how to find a kitchen. But between chefs, that would be a snarky answer. Ya can’t win, when you don’t know the individual who’s asking the question and why.

    How much detail and direction? It always depends on the cook:

    http://www.williamrubel.com/wordpress/2008/03/01/first-catch-your-hare/

    Comment by Hank Roberts — 16 Feb 2009 @ 11:09 AM

  211. Re #203, I muse around the web and post on several sites about this topic but James Hansen testified in 1988 and now its 2009 and look at the available mitigation of CO2, errr, none, emissions have increased globally. Politicians talk the talk but the walk is small and in Austealia on a baking rock on solar abilities its essentially zero. 22 million people pump out around 330 million tonnes a year I believe, their per head of capita emissions are very high although in global terms, not that bad I suppose.

    The USA, Europe, Australia (and New Zealand) have come nowhere in emissions mitigation and have hooked India and China and many others besides. Its all a bit tragic and total renewables energy for the world is around 1%.

    Comment by pete best — 16 Feb 2009 @ 11:33 AM

  212. PS, I commend this — good common sense advice for anyone who has a problem figuring something out:

    http://www.chiark.greenend.org.uk/~sgtatham/bugs.html

    (Hat tip to Bi, over at Michael Tobis’s thread)

    Comment by Hank Roberts — 16 Feb 2009 @ 11:42 AM

  213. Ike:”I find this discussion about code and data to be very entertaining – and I’d suggest that the same level of scrutiny be applied to the various clean coal proposals that are circulating through Congress…”

    I agree, but am not surprised that politicians would respond as you discuss. How do you propose to change their response?

    Comment by Steve Reynolds — 16 Feb 2009 @ 11:43 AM

  214. trying_to_make_sense,

    Its Mr. Nierenberg, but thanks for the upgrade.

    I actually have mailed Dr. McKitrick and he has been kind enough to answer. I don’t have Dr. Schmidt’s email address, and so far he hasn’t answered here.

    In this particular case the code isn’t as important to me, as an explanation of how the data was summarized from the underlying source. There is more than one way to do it, and for me it is a nuisance trying to explore and find out how each author did that.

    Neither paper explains the process. I believe that they thought it obvious, but even a first look for me shows that different choices could be made. So far I guess I haven’t made the same choices.

    As to code itself, it is the most accurate explanation generally, but I agree it isn’t the only way.

    [Response: I'm looking into it and I'll let you know. - gavin]

    Comment by Nicolas Nierenberg — 16 Feb 2009 @ 12:01 PM

  215. An aside related to the fire issue mentioned farther up:

    Who is responsible? “Environmentalists”. But the image Ball uses to explain the Australia fires is priceless. An authority on Australian climate and environment can’t even take the time to find a southern hemisphere textbook image?

    http://www.canadafreepress.com/images/uploads/ball0216-2.jpg

    http://canadafreepress.com/index.php/article/8504

    “Environmentalists played a role in disastrous Australia fire”,
    By Dr. Tim Ball Monday, February 16, 2009

    Comment by Dan — 16 Feb 2009 @ 12:29 PM

  216. Sorry – I didn’t mean to change the subject – the fire comment started me reading on the factors affecting this event, and I ran across this silly figure.

    Comment by Dan — 16 Feb 2009 @ 12:35 PM

  217. Re 201,
    Sorry, I couldn’t force myself to click the link. It’s too early in the week to endure the intellectual pain of a visit to WUWT. However, the title in the link suggests that the denizens have grown soft. Reduce the warming by just *half*? Back in the good ‘ol days they’d have settled for nothing less than proving a complete reversal of the claimed trend, with falsification of the entire dataset as a bonus.

    Recaptcha: tales, 7:00 News

    Comment by spilgard — 16 Feb 2009 @ 12:39 PM

  218. Off-topic, but George Will is at it again, spouting the much of the same garbage that the realclimate folks took him to task for over 4 years ago!

    See http://www.washingtonpost.com/wp-dyn/content/article/2009/02/13/AR2009021302514.html for Will’s latest.

    Now jump into the realclimate “wayback machine” and go back to 2004: http://www.realclimate.org/index.php?p=91

    If I were a climate-scientist having a beer with Will, I’d tip the bartender generously and ask him/her to dip Will’s beer mug in soapy dishwater before filling it.

    Comment by caerbannog — 16 Feb 2009 @ 1:55 PM

  219. 208
    There is an interesting discussion on an attempt to replicate Steig et al Antarctic warming going on now in the blogosphere

    Watts et al are science illiterates, there’s really no point for adults to waste their time with their pseudo-scientific endeavors.

    I don’t disagree. There is a cliche in my neck of the woods. “Even a blind hog finds an acorn once in a while.”

    Comment by captdallas2 — 16 Feb 2009 @ 6:04 PM

  220. One of the primary points of the WUWT discussion is that RegEM assumes that the missing data locations are random in nature. Since the missing data in the Steig paper isn’t random (almost all the interior is “missing”), then is it proper to use a method that assumes otherwise?

    [Response: No. The issue isn't that the data have to be randomly missing in time or space, but that the value of the missing data is unrelated to the fact that it is missing. - gavin]

    On a similar note, and more on topic for this thread, when you use someone else’s code, it’s incumbent on you to understand the assumptions that go along with that code. Too often those assumptions are not clearly spelled out.

    Comment by dean — 16 Feb 2009 @ 6:40 PM

  221. Dr Schmidt,

    which of your papers are little things? And which not?

    regards

    Al

    Comment by AL — 16 Feb 2009 @ 7:11 PM

  222. Nicolas , I went over what I did, and your guess was correct – the data I archived was sampled from the 2.5×2.5 MSU data, not averaged over the 5×5 CRU grid box. I also did it the other way – which is probably better – and re-did the analysis. It makes the coherence of the surface data to the MSU data stronger, but otherwise leaves the analysis untouched in any significant way. If you’d like to investigate further, I updated the supplemental data to include the averaged RSS data and added the results to the readme file.

    Comment by gavin — 16 Feb 2009 @ 7:14 PM

  223. 220 Gavin’s response.
    [Response: No. The issue isn’t that the data have to be randomly missing in time or space, but that the value of the missing data is unrelated to the fact that it is missing. - gavin]

    Incorrect, on WUWT the major issue is that Regem doesn’t deal adequately with spacial averaging and/or weighting. Both CA and
    WUWT mention concerns of how Regem can handle non-random data infilling, but that has little to do with replication, the topic of this thread I believe. In replicating Steig, the topic is replication I believe, the Jeffs had difficulty determining how the occupied weather stations and AWS were spacialy dealt with by Regem. That is interesting and deserves more than a wave.

    [Response: WUWT is very confused. The whole point of RegEM is to impute (strange verb) what the spatial correlations are rather than assuming what they should be a priori. RegEM knows nothing about the distance between any points, all it knows about are the similarities in variability between stations. How you combine the reconstruction to form a regional average afterwards is a whole other issue. By the way, my answer was not incorrect - your question however might not have been what you really wanted answering. - gavin]

    Comment by captdallas2 — 16 Feb 2009 @ 7:43 PM

  224. Gavin’s response to 223
    [Response: WUWT is very confused. The whole point of RegEM is to impute (strange verb) what the spatial correlations are rather than assuming what they should be a priori. RegEM knows nothing about the distance between any points, all it knows about are the similarities in variability between stations. How you combine the reconstruction to form a regional average afterwards is a whole other issue. By the way, my answer was not incorrect - your question however might not have been what you really wanted answering. - gavin]

    I have no doubt they (WUWT) are often confused (frustratingly often). But there is little comfort knowing that RegEm is clueless of distances or number of inputs spatially. How RegEm was used spatially in the Steig paper is the question. With 36 to 40 percent of the stations in the west, how were they weighted? Not clear in my opinion, which is something that is required for replication. Hopefully, now my question is more clear. To add: your answer was not incorrect, the question was, for this thread at least.

    [Response: But "how RegEM was used spatially" is a meaningless statement. RegEM takes a data matrix and through an iterative process fills in the missing data based on the correlations of the existing data. It has no clue whether it is dealing with apple sales, tax filings or widgets. If a column in the matrix shows a coherent connection to another one, then it will be used to fill in data where needed. In this case, it is likely that WAIS stations will be more connected to the Peninsula data than that on the opposite coastline, RegEM will therefore fill in preferentially from there. This is simply a reflection of the local weather patterns (and is seen more clearly in the satellite reconstruction). Ask yourself why the trans-antarctic mountains are so prominent in the figures? This mountains are not put in by hand, they are simply the natural demarcation for the various influences. Thus RegEM decides for itself what the radius of influence of any one station is. - gavin]

    Comment by captdallas2 — 16 Feb 2009 @ 8:26 PM

  225. [But “how RegEM was used spatially” is a meaningless statement. RegEM takes a data matrix and through an iterative process fills in the missing data based on the correlations of the existing data. It has no clue whether it is dealing with apple sales, tax filings or widgets. If a column in the matrix shows a coherent connection to another one, then it will be used to fill in data where needed. In this case, it is likely that WAIS stations will be more connected to the Peninsula data than that on the opposite coastline, RegEM will therefore fill in preferentially from there. This is simply a reflection of the local weather patterns (and is seen more clearly in the satellite reconstruction). Ask yourself why the trans-antarctic mountains are so prominent in the figures? This mountains are not put in by hand, they are simply the natural demarcation for the various influences. Thus RegEM decides for itself what the radius of influence of any one station is. - gavin]

    Interesting. The satellite data was IR not RSS or MSU so the replicators could not gather the data needed to complete the replication. Do you know if that data is available online? Also, if the number of stations in each common area are averaged prior to RegEm input, pre-weighting spatially, there appears to be a significant difference in the results. How significant do you feel that is to the analysis?

    [Response: Joey Comiso is apparently working on the data preparation along with sufficient explanation to make it usable. I have no particular insight into the timetable. If all the stations being averaged together are coherent, then it pre-averaging shouldn't make that much difference. If instead there are some dud stations/bad data mixed in, they will corrupt the average and reduce the coherence with the far-field stations. Bad stations in the standard approach should be incoherent to any variability in the far field stations and so shouldn't affect the result much. You'd be better off removing them rather than trying to average them away I would think. - gavin]

    Comment by captdallas2 — 16 Feb 2009 @ 9:00 PM

  226. Thanks Gavin, I’ll fish on it a while before I respond. Have a good night.

    Comment by captdallas2 — 16 Feb 2009 @ 9:50 PM

  227. This is actually off-topic but I thought this might interest you. Lord Monckton recently put online an article here:

    http://scienceandpublicpolicy.org/images/stories/papers/monckton/temperature_co2_change_scientific_briefing.pdf

    At the page 3 there is a graph with words:

    “It may be the sun: a strong anti-correlation between intensity and radiosonde temperatures over the past 50 years. Source: Svensmark and Friis-Christensen, 2007.”

    However, [edit - please be polite] the graphs shows heavily edited temperature trend with:

    1.) removed ENSO
    2.) removed effects of volcanic activity
    3.) REMOVED WARMING TREND OF 0.14 C PER DECADE!

    I saved the pic just in case it is removed:

    http://2.bp.blogspot.com/_KMFU6wbcd2Y/SZrdC1ZS55I/AAAAAAAAAJ4/bgaDsMf0sDA/s400/SvensmarkTrop-Monckton.jpg

    You can see that the upper part of the graph is cut off since there are parts of numbers visible at the upper corner. Here’s the full graph:

    http://4.bp.blogspot.com/_KMFU6wbcd2Y/SZrdvmF1kLI/AAAAAAAAAKA/PT0YnmeAMW8/s400/SvensmarkTrop-todellinen.jpg

    The upper part has unedited temperatures, the lower one is edited, most importantly with removed warming trend.

    I’d love to see you publish some kind of response since this kind of behavior is something utterly unacceptable.

    Yours sincerely,

    Tuukka Simonen

    Comment by Tuukka Simonen — 17 Feb 2009 @ 11:25 AM

  228. Ref 224,225

    “How RegEm is used spatially is a meaningless statement.” My bad. I assumed that the lat/lon of the predictor sites and sites to be predicted needed to be considered in RegEm, but that makes sense.

    [Response: No difference from PCA in this regard. Such methods make use only of the mutual covariance between the data, they know nothing about the locations or geometry. -mike]

    Comment by captdallas2 — 17 Feb 2009 @ 12:37 PM

  229. I have been having a good time learning R and looking at MM07 and S09. I found some stuff that I think is interesting and look forward to comments. I have made a blog post here.

    Comment by Nicolas Nierenberg — 17 Feb 2009 @ 2:49 PM

  230. Gavin (I hope it is ok to be informal),

    I wrote the entry above before I notice that you had responded. I updated my post just now to reflect your comment.

    Since the points in the global table don’t correspond to the sat grid cells, which of the four choices did you make as to which grid cell to select? I am now planning to average all four, which is what I think you are saying you did in your update?

    [Response: In my original sample, it was top-right. The average I posted is for all four cells, weighted by the area (or cos(mid-point latitude)). - gavin]

    Comment by Nicolas Nierenberg — 17 Feb 2009 @ 3:01 PM

  231. Re: #229 (Nicolas Nierenberg)

    What you haven’t addressed, and neither did MM, is the spatial correlation of the data series. This is the reason for the spurious correlations and that the results of MM are not just suspect, they’re in error.

    The claim in MM07 that they did GLS (generalized least squares) is incorrect. They did OLS (ordinary least squares), but attempted to correct the confidence intervals — not for correlation but for heteroskedasticity. In essence they assume that there’s no spatial correlation — an assumption which is, frankly, ludicrous. Doing so leads to hugely exaggerated estimates of significance.

    Comment by tamino — 17 Feb 2009 @ 3:04 PM

  232. tamino,

    In my post I addressed what I set out to address which was mainly the demonstration that the choice of grid location was arbitrary and undocumented. My other comments just follow the logic of both papers.

    Why do you say that they didn’t use GLS? The paper says that they did.

    Comment by Nicolas Nierenberg — 17 Feb 2009 @ 5:06 PM

  233. Re: #232 (Nicolas Nierenberg)

    I said they didn’t use GLS because they didn’t.

    The paper states “Equation (3) was estimated using Generalized Least
    Squares (GLS) as follows.” They then proceed to describe a procedure which is most definitely not GLS. They may have made this mistake because some of the STATA documentation incorrectly indicates that certain procedures are a form of GLS when they’re not.

    What they did was OLS. They then attempt to compensate the probable variation in the parameters for heteroskedasticity of the residuals, by assuming that the variance-covariance matrix of the residuals is diagonal. This amounts to assuming that the noise in the data shows no spatial correlation — which assumption is, frankly, ludicrous.

    But: even under their no-correlation assumption, they didn’t do GLS. The way to do that is to use the assumed form for the var-covar matrix of the residuals in order to compute the regression itself; they only used it to estimate the probable variation of the parameters.

    And: even if *do* use GLS, it won’t be right if you get the var-covar matrix wrong. Assuming no spatial correlation is wrong.

    Comment by tamino — 17 Feb 2009 @ 6:09 PM

  234. Tamino (#233) yeah. Incredible.

    Don’t you think that scientists writing climatology papers involving non-trivial statistical methodology should consult with professional statisticians?

    Yeah I know, being mean ;-)

    Comment by Martin Vermeer — 18 Feb 2009 @ 2:18 AM

  235. Martin Vermeer, #234. Or just average?

    Comment by Mark — 18 Feb 2009 @ 6:38 AM

  236. Gavin and Mike (re. missing at random in RegEM) – I understand that spatiotemporal patterns in missing data are in general meaningless to RegEM. Per Rubin and Schneider (and your comments), the randomness in question only entails that the value of a missing datapoint cannot impact the probability that the datapoint is missing.

    However, very obvious spatial patterns in climate trends do exist (both in terms of absolute temperature and warming). Therefore, spatial patterns in missing data would often indicate nonrandomness in the data that is meaningful to the RegEM analysis—even if RegEM doesn’t care about the spatial pattern itself. We can (and have) accounted for this issue on a global scale without major problems. But it’s less clear how this fits with the Steig et. al. paper on Antarctic warming.

    On a global scale, spatial patterns in missing data often offset each other, and are thereby less problematic. For example, the randomness assumption would clearly be violated if the only place from which data was missing was the Arctic–because that data would be non-randomly cold, and would have a non-random warming trend (using Rubin’s definition of random). But if we are also missing a lot of data from the Sahara, the two sets of missing data largely offset each other as far as RegEM is concerned.

    However, when looking at a smaller geography like Antarctica, you don’t necessarily get these offsets. There does appear to be a spatiotemporal pattern to the missing data. There is also a fairly clear difference between the climate in East and West Antarctica in both the observed data and the imputed data (after all, the conclusion of the paper largely relies on the fact that the imputed data, on average, is colder than more recent, observed data for East Antarctica). Thus, the spatiotemporal pattern gives at least prima facie reason to question whether or not the missing data is random in a sense that is relevant to the RegEM analysis.

    I definitely do not have enough background here to diagnose whether this is actually an important issue in the Steig et. al. paper. Maybe the missing data in Antarctica is still MAR, and even if it is not, it might not detract from analysis. However, it does make me less surprised that the analysis produced a novel conclusion. I’d appreciate any feedback here. If there is a body of work already tackling this type of issue, a reference would also work fine.

    [Response: There is a substantial body of work testing how such infilling methods work in practice, i.e. whether the (somewhat weak) type of stationarity assumptions implicit in PCA- (or related variants such as RegEM)-based infilling algorithms work in the types of situations commonly encountered where it may not be obvious as to whether or not the assumptions are satisfied. See both the Schneider et al paper and various other papers cited in Steig et al (and papers cited by those papers) using artificial data, etc. It is precisely because one is never absolutely certain about the degree to which the underlying assumptions hold up, that cross-validation--i.e. testing whether the model works in predicting withheld data--- as done in Steig et al, (using both PCA and RegEM-based infilling approaches), and other related studies, is so important. -mike]

    Comment by Colin A — 18 Feb 2009 @ 4:35 PM

  237. A note on “using artificial data” from Gavin’s response: This is one of my common critiques of skeptic studies (eg, Schwarz and his “response time” method to calculating CS, or Spencer and his attempt to show that CO2 increases in the atmosphere are natural and not anthropogenic): using artificial data should be a first test of the method. Since we only have 1 earth, running control experiments is hard. And models, of course, can never be perfect representations of the earth. But, they are a near-perfect testbed for methodologies: since we _know_ all the relevant information in the model (in this case, temperatures over the entire Antarctic, in Schwarz’ case, the response of the model to doubled CO2, in Spencer’s case the real source of increased CO2), then using a methodology on “imperfect information” from the model is a good first test of whether the methodology can back out a reasonable answer.

    This may not be a sufficient test of the methodology, but it is (almost always) a necessary test.

    And I’m glad (and not surprised) to hear that Schneider and other papers in the “reality” community do do the appropriate artificial data tests.

    Comment by Marcus — 18 Feb 2009 @ 6:11 PM

  238. Tamino does not seem to understand that Huber-White robust standard errors allow for arbitrary correlations within clusters. The procedure does not just correct for heteroscedasticity. MM07 corrected for clustering within countries because they used country-level data on smaller cells. The test then was not a test of spatial autocorrelation due to climate but due to data clustering. However, they fail to reject the null of no correlation (or heteroscedasticity) within countries. (Of course no real scientist would misinterpret that sentence as accepting the null hypothesis. Nor would any climate scientist misinterpreted a p-value as an ex post probability.)

    Tamino is correct that the reported coefficients are OLS. He does not explain that in single equation estimation it makes sense NOT to adjust OLS coefficients for deviations from the Gauss-Markov assumptions, especially when the weights are not known a priori. Why? Because OLS estimates remain unbiased even under the alternative. Meanwhile if one attempts 2SLS on a single equation you introduce a bias in small samples. And if you fail to reject the null, which is what the test did, these estimates are now biased AND inefficient relative to OLS. Thus MM07 following standard operating procedures.

    Tamino in his infinite knowledge does not realize that Stata (which is a name not an acronym, so not STATA) follows the convention in associating GLS with corrections to standard errors even if coefficient estimates are OLS, because it does GENERALIZE OLS and it is a form of observation WEIGHTING, not in terms of the coefficients but in terms of the information about the variance matrix of residuals (thus estimated standard errors).

    Regardless of naming conventions the results reported in the paper are clearly stated.

    But, hey, climate scientists are real scientists and they serve as the ultimate judge of all technique and interpretation. So don’t listen to me.

    [Response: The point tamino was making, which is correct, is that clustering by country is not the same as looking for and correcting the spatial correlation in the data. A claim that the generalisation of OLS 'deals' with this issue in this particular case, is therefore not supported. You just have to look at the difference between Europe and the N. America to see the very different impact country clustering will have - and in neither case is the real issue of spatial correlation dealt with. - gavin]

    Comment by Chris Ferrall — 19 Feb 2009 @ 8:51 AM

  239. “Assuming no spatial correlation is wrong.”

    Of course this is not a scientific statement, in that it is not quantitative. For example, according to Tamino the assumptions of general relativity are wrong because they are violated at the sub atomic level. The fact that this violation is insignificant at the galactic level does not matter. They are wrong, wrong, wrong!

    Tamino seems to believe that demonstrating (significant) correlation among observables is sufficient to make inference under the assumption of no correlation invalid. And Tamino continues to assert without justification that the robust standard errors reported in the second part of this paper do not relax the assumption of no correlation.

    If Tamino said “testing for arbitrary correlation in residuals within countries and failing to reject the null of no correlation is not likely to be a good test of autocorrelation in residuals due to climate” then I would find what he had to say next about this topic worth some attention.

    Comment by Chris Ferrall — 19 Feb 2009 @ 10:00 AM

  240. Chris Ferrall, we aren’t talking about anything requiring string theory to resolve here. Climate doesn’t respect borders. GDP does. As such, it seems to me that Tamino’s objection is quite cogent. If I were the authors, I would want to address it and see if it affected the analysis–assuming the purpose of the analysis was to illuminate the science.

    Comment by Ray Ladbury — 19 Feb 2009 @ 10:35 AM

  241. Tamino claims that robust standard errors as computed did not control for correlation across observations. That is NOT correct yet he continues to repeat that claim. He is wrong and continues to be wrong and misleading about the limitations of the original analysis (which indeed has limitations). One might call this Huberis.

    Indeed as everyone says … it would be useful to carry out an analysis that captures this vaunted spatial autocorrelation based on climate that everyone thinks is the Achilles heel of this study. Dr. Schmidt’s replication did not do this either and ignored the tests carried out in the original study.

    Everyone here falls into the trap that any deviation from the null hypothesis is critical. It is not a quantitative issue concerning residuals – it is a qualitative issue that can be answered by looking at observables. And everyone seems to believe that a test of a particular alternative (arbitrary correlation within countries) says nothing about a related alternative (spatial correlation due to climate).

    But … I agree with this: MM07 could have clustered Europe even though the data is not clustered. Perhaps that would have affected their failure to reject the null. Dr. Schmidt should be able to carry out that test very easily so I look forward to seeing the results.

    Comment by Chris Ferrall — 19 Feb 2009 @ 11:17 AM

  242. Chris Ferrall claims Tamino made an error about general relativity.
    The word ‘relativity’ appears twice in the thread.

    Can you point to a basis for this claim? Use the ‘find’ tool.

    Or is this illustrating the tactic of taking something used in a particular context so far out of context that it breaks, and claiming that proves it couldn’t possibly have been correct where it was actually being used? Showing why scientists mistrust the motives of those who want to take all their work for free to nitpick it?

    Or some other intent here?

    Comment by Hank Roberts — 19 Feb 2009 @ 11:39 AM

  243. > One might call this Huberis.

    Petard, engineer, boom.

    Comment by Hank Roberts — 19 Feb 2009 @ 11:41 AM

  244. Re: Chris Ferrall

    I understand perfectly well that the procedure used by MM07 addresses the correlation within clusters. All you’ve done is attempt to deflect attention from the real point: that MM07 utterly ignores the correlation between clusters. Which invalidates their results.

    As for “assuming no spatial correlation is wrong,” I stand corrected. I should have said “There’s undeniable spatial correlation in these data, so ignoring it is wrong.”

    I’ve got no beef with correcting OLS for correlation rather than using GLS, I do it all the time. But I don’t call it GLS. It’s probably an honest mistake due to the way Stata (not STATA — thanks for the triviality) documents their procedures. But it’s already caused commenters here falsely to believe that their procedure does correct for all correlations, simply by the use of the term; as such it deserves correction.

    If you had contributed anything other than sound and fury signifying nothing, I might be interested in your further opinions. But since you chose only to strut and fret your hour upon the stage, I can only hope you’ll then be heard no more.

    Comment by tamino — 19 Feb 2009 @ 11:42 AM

  245. Chris Ferrall #241 “One might call this Huberis.”

    One could. One would wonder what you were on about. Maybe Hubris, but then again, who can tell.

    “Indeed as everyone says ” Isn’t true either. Many say. Not everyone. And many who say it would be useful would add that other things would be MORE useful. Cleaning the tiling grout in the bathroom may be useful, but if your mum is having a heart attack, phoning the ambulance would be more useful.

    And I also fail to see where you have shown Tamino was wrong except just by stating it is so. [edit - please keep it civil]

    Comment by Mark — 19 Feb 2009 @ 11:59 AM

  246. But, hey, climate scientists are real scientists and they serve as the ultimate judge of all technique and interpretation. So don’t listen to me.

    Tamino’s not a climate scientist, he’s a professional statistician, more than qualified to comment on the quality of the statistical analysis done by M&M.

    Your arrogance isn’t helping your cause, BTW. When people present themselves as being superior scientists to all those doing work in climate science, it’s pretty obvious to the objective observer that they’re just blowin’ smoke.

    Comment by dhogaza — 19 Feb 2009 @ 12:12 PM

  247. I have updated my results using the 5×5 grids after a couple of false starts. My results while similar to S09 are not identical. I show more significance for the economic data using the RSS tropospheric values. This is probably caused by a difference in how the RSS data are processed which is not documented in either S09 or the SI.

    Before you all jump down my throat about spatial correlation I am just looking at the particular conclusion in S09 that using the RSS data got a significantly different result than the UAH data. I didn’t get the same result.

    My post is here.

    I also note that Dr. McKitrick wrote a paper discussing the spatial correlation issue. It can be found here. I don’t know the comments from tamino and Gavin already take this response into account.

    Comment by Nicolas Nierenberg — 19 Feb 2009 @ 12:53 PM

  248. I’m probably the only one who cares, but I suddenly recalled that S09 used an updated temperature set in addition to the RSS data. I made that change and now my results are consistent with S09. It requires both changes to get this result. The UAH data combined with the updated surface data still gets the same result as MM07

    Comment by Nicolas Nierenberg — 19 Feb 2009 @ 3:31 PM

  249. Thanks for convincing me of your view by such reasoned arguments. Clearly peer review failed in the case of this science journal since, as you have so clearly convinced me, the results they published are wrong.

    “I can only hope you’ll then be heard no more.”

    Indeed! I have now deleted Real Climate from my bookmarks so that the science will not be impeded any further by dealing with such a feeble mind.

    [Response: How convenient for you. Heaven forbid that you not find an excuse to engage without the sarcasm. Regardless, the correlations published by MM07 aren't GLS. - gavin]

    Comment by Chris Ferrall — 19 Feb 2009 @ 4:56 PM

  250. re Chris Ferrall’s posts:

    For anyone not familiar with academic subcultures, many people in Econ departments are primed to be aggressive in every interaction. Econ people seem to like this style and take it as a sign of intelligence; most of the rest of us refer to it as “d*** measuring.” It’s especially embarrassing when the criticism and bluster wander so far from the operative point, as demonstrated by Mr. Ferrall above.

    Comment by Ian — 20 Feb 2009 @ 10:24 AM

  251. > Ferrall
    Was that the NPR econ commentator? Australian economist at QED? The gaming blogger? Two out of three? There are so many.

    > Economists
    Yep. Deltoid quotes Pooley on this cultural trait
    http://scienceblogs.com/deltoid/2009/02/the_economists_consensus_on_gl.php

    “Journalists have missed the economic consensus partly because economists are such a querulous bunch–they argue bitterly among themselves even when they agree….. That sort of quarrelling masks the underlying consensus and communicates a greater degree of discord and uncertainty than actually exists.” (Pooley, quoted at Deltoid)

    A solitary economist will argue with himself. Recall Harry Truman’ plea: http://austincentrist.blogspot.com/2006/02/give-me-one-armed-economist.html

    The people in ‘ecological economics’ seem to work better in groups.
    Perhaps that reflects an understanding of ecology?

    Comment by Hank Roberts — 20 Feb 2009 @ 12:22 PM

  252. tamino,

    As I noted above Dr. McKitrick has posted a follow up to his paper specifically dealing with spatial correlation. So while the statement that his original paper assumed no spatial correlation might be true, it is no longer true. In the follow up posting he has attempted to show that spatial correlation is not an issue with his results. This is quite different than assuming it doesn’t exist.

    So that I can learn more about this topic I would like to hear your comments on his follow up posting on spatial correlation.

    Comment by Nicolas Nierenberg — 20 Feb 2009 @ 7:27 PM

  253. Re #252 (Nicolas Nierenberg)

    I’ve downloaded the data and duplicated the regression, and it sure seems to me that there’s spatial correlation aplenty in the residuals, contrary to claims in the follow-up paper. I’ve downloaded that and am digesting it. There’s quite a bit more to do to understand it completely.

    But I can tell you this: the more I get into the details of MM07, the more screwball this analysis seems to be. It’s not just failure to deal with correlations correctly, there are lots of reasons to be suspicious of their conclusions. Whatever I find, whether I’m right or wrong, I’ll report in due time.

    Comment by tamino — 21 Feb 2009 @ 7:49 AM

  254. tamino,

    Maybe it would be more prudent to just have written the last line. To me the rest of your post didn’t carry much information content. I look forward to your analysis.

    Comment by Nicolas Nierenberg — 21 Feb 2009 @ 8:35 PM

  255. Mr Nierenberg (#252): there is a way to try to answer this question for yourself. What I suggest is the following. The MM07 follow-up offers the equation:

    u = lambda W u + e

    with e gaussian independently distributed.

    Use this eq. to generate 10,000 samples of u, using the gaussian generator for e, for each of the proposed W1, W2, W3.

    Then, look up the semivariogram plot in Gavin’s paper. The semivariogram is directly related to the autocorrelation function (in fact it’s that turned upside down) and computable from it.

    Now, plot the semivariograms for your above generated synthetic data, and see if it resembles any of Gavin’s coloured curves, e.g. the one for the surface temp data. This is what Gavin did himself, though for GCM-generated data.

    Hope this is of help.

    Comment by Martin Vermeer — 22 Feb 2009 @ 7:40 AM

  256. > more prudent to just have written the last line

    Nah. Remember he’s not just writing for you.

    Those of us who have learned over the long term to trust Tamino know he gets busy sometimes with his day job and doesn’t have much to say for a while. It’s good to have been given some information about what he’s working on to tide us over.

    Comment by Hank Roberts — 22 Feb 2009 @ 12:52 PM

  257. Mr. Vermeer,

    I am actually going down a different path. It appears from reading the literature, and Dr. McKitrick’s paper that there are well accepted tests for spatial autocorrelation. They seem to yield a test result that either accepts or rejects the hypothesis. I am looking at how to transform the data to fit the algorithms that are already implemented in R.

    Also given that Dr. McKitrick has already done computations using some of these tests I am interested in learning why his choice of tests were incorrect, or why his implementation of the tests was incorrect.

    Comment by Nicolas Nierenberg — 22 Feb 2009 @ 4:29 PM

  258. Correction to #255: Gavin’s semivariogram is for the full data, not the residuals (IIUC). So the comparison would have to be against a variogram of the empirical residuals instead.

    I don’t think the tests chosen by Dr McKitrick are incorrect, the question is how realistic the three W models chosen are. That is the issue that my proposed test would address.

    Testing for autocorrelation is fine, but remember that even autocorrelation that does not show up as being significant in such a test, nevertheless may lead to a “significant” test result for data that is compatible with the null hypothesis, as ignoring it on computing variances will make the error bounds too optimistic. Monte Carlo is good in that it shows you this kind of pathology.

    Comment by Martin Vermeer — 23 Feb 2009 @ 12:28 AM

  259. Mr. Roberts,

    With all due respect remarks like “there’s spatial correlation aplenty”, and “the more screwball…it seems to me.” Don’t give me any idea what he is working on. They seem like placeholders for “I know something is wrong but I haven’t figured it out yet.”

    My own preliminary work shows that the residuals are very slightly positively correlated with location. Right on the edge of significance. It is my impression at this time that this slight positive correlation doesn’t affect the results, but I’m still looking into that.

    Comment by Nicolas Nierenberg — 23 Feb 2009 @ 11:53 AM

  260. Dr Schmidt,

    As a relative newcomer to climate science I have a few observations and a few questions. First, let me say thank you for spending the time and effort to host this site and to make available important information regarding the current state of the art in this realm of the science.

    In keeping with the thread topic I would strongly side with those calling for the routine publication of code/data in the most clear and transparent form possible. As someone upthread has suggested doing so in something like google code would be a great place to start.

    The reasonable arguments against this effort seem to be centered around the idea that the cost to the scientist in terms of time would be greater than the benefits. My only estimation of the cost side comes from my own modeling work in an unrelated field (computational finance). I am quite sympathetic to the state of your desktop and the myriad of code fragments which may reside there. Mine is similar.

    You made the point above that when things rise to a certain level of importance you spend more time on what one might call the end to end reproducibility of both code and data. For myself that level would be reached whenever any capital is to be committed to an idea. In my world the cost of error can be catastrophic (on a personal level) so there is almost no level of ‘proving’ which is not beneficial. I would posit that in your field where the public policy decisions will have global scale effects the cost to getting it wrong is infinitely higher. From this perpective I would suggest that anything which is being submitted for publication rises to the level of importance requiring the highest level of transparency and clarity.

    A secondary benefit of such activity would be make the work accessible to many people not in the field but never the less capable of understanding the math and physics were it presented in a manner designed to facilitate understandling. While some here argue that that would encourage ‘crackpots’ to nitpick and take shots thus somehow ‘wasting’ the scientist’s time I don’t see how that can be the case. The scientist is always free to ignore any criticism justified or not. To the argument that it would effect the political process because it would give ammunition to the ‘deniers’ I’d say that would be the worst reason of all. A lack of transparency is far far more disturbing than some ill formed opinion of a paper.

    David

    Comment by david_a — 23 Feb 2009 @ 1:23 PM

  261. Having had the opportunity to at least read some of the papers linked by your site and having also perused the papers of those antogonistic to the magnitude of the AGW hypothesis I’d appreciate some help with the following, as I understand it:

    The amount of solar radiation hitting the planet is relatively stable at least from a total energy standpoint in the time frame of say a century or so.

    At a time of energetic equilibrium the net radiative flux at the atmospheric/space boundary would be zero, so that the amount of energy entering the system would be equal to the amount of energy leaving the system.

    Adding a large fraction of additional C02 to the atmosphere increases the radiative forcing first by retransmission of long wave back to the surface of the planet and secondly by the positive feedback mechanism of increasing temperature and then water vapor in the atmosphere which will have again the same retransmission effects.

    Absent any other forcings these effects would monotonically increase the temperature of the system until such time that the increase in outbound radiative flux due to higher surface/atmospheric temperature would cause the net radiative flux to again be zero, but now with the system at a higher equilibrium temperature.

    The atmosphere, shallow ocean (1000m ?) and land will tend to reach equilibrium significantly (?) faster than the shallow ocean with the deep ocean.

    In equilibrium between land, atmosphere and shallow ocean, the energy stored in the shallow ocean is much greater (10x ?) than that of the other two components.

    Looking at the time series of ocean heat content over the last 20 years when the C02/water vapor forcing should have been in effect there is first a rise in OHC then a flattening over the last 6-8 years or so depending on the dataset used for OHC.

    Since the physics of the forcing imply a monotonically increasing OHC (absent xfer to the deep ocean and the minor xfer to other smaller sinks) it would seem that over at least the near time the forcing is being counterbalanced by some other forcing which is causing the net radiative flux to be in balance so there is no accumulated energy in the system.

    Given your detailed understand of the various forcing factors could you guess which ones would be most likely to be increasing in the near term to offset the background CO2/H20 forcing?

    Relatedly, do the GCMs provide any probabilistic outputs as to the magnitude of the forcings and their couplings so that monte carlo simulations could assess the relative probability of departure from the model predictions?

    Thanks,
    David

    Comment by david_a — 23 Feb 2009 @ 2:24 PM

  262. Since the physics of the forcing imply a monotonically increasing OHC (absent xfer to the deep ocean and the minor xfer to other smaller sinks) it would seem that over at least the near time the forcing is being counterbalanced by some other forcing which is causing the net radiative flux to be in balance so there is no accumulated energy in the system.

    Where did anyone claim that ENSO and other heat transfer mechanisms have screeched to a halt?

    Comment by dhogaza — 23 Feb 2009 @ 2:55 PM

  263. David, ensembles are one method of providing “probabalistic” forecasts. It is more useful still in climatology to test sensitivity. E.g. if you run 10 programs that use the same code but change, say, cloud coverage fitting constants, you can see if the output of the model is particularly sensitive to that figure being wrong. If your model is not sensitive, then it is likely (unless you left something important out), that the real world is not particularly sensitive to you getting that feature wrong.

    I have heard a lot recently about ensembles wrt climate (wrt weather it’s kind of old hat) but storage becomes a HUGE problem. Add to that skeptics and denialists want the raw data (whether they do anything with it is unknown) so you can’t even reduce storage. And you have to keep it for decades maybe and you have a LOT of data to handle.

    And that costs.

    A lot.

    But, funnily enough, people don’t want to pay for that through extra taxes.

    Humans.

    Can’t live with them, can’t disintegrate them.

    Comment by Mark — 23 Feb 2009 @ 3:22 PM

  264. I’m sorry that I don’t directly understand what you are saying.

    My guess is that you are saying that ENSO and other heat transfer mechanisms are moving the heat to the deep ocean which is why it is not accumulating in the upper ocean. Since I am somewhat new at this could you explain more of this or point me towards a paper where the transfer of energy to the deep ocean is discussed and what the variance of the process might be?
    thanks
    david

    Comment by david_a — 23 Feb 2009 @ 3:24 PM

  265. > over the last 6-8 years
    Not enough to determine a trend. Noisy planet.

    Comment by Hank Roberts — 23 Feb 2009 @ 3:35 PM

  266. #261 dave_a

    You are correct about the incoming solar radiation. There is a secular trend in solar irradiance from 1900-1950 or so, but none after that. Aside from longer-term changes, there is an 11-year solar cycle which has very small implications for surface temperature, and does not contribute to the long-term warming trend.

    You are right about equilibrium conditions

    The addition of further CO2 not only serves to “retransmit” radiation from the atmosphere to the surface, but other energy fluxes as well which are non-radiative. It is probably more useful to think of the enhanced greenhosue effect as working through the heat loss side of the equation at the top of the atmospher than a radiative heat gain term at the surface. It just happens that heat is mixed very well throughout the troposphere so that warming will be realized. See An Analysis of Radiative Equilibrium, Forcings, and Feedbacks as well as RealClimates A saturated gassy argument

    I don’t think I really with the terminology about land equilibriating faster than the ocean…the whole planet is out of equilibrium and the oceans are what causes a significant lag between the forcing and full response. Obviously land heats faster (and obviously oceans have a higher heat capacity) but the full warming will not be realized even on land until full equilibrium is established.

    //”Since the physics of the forcing imply a monotonically increasing OHC “//

    This is wrong. Inter and intra-annual variability does not go away with more CO2, and the trend is small compared to the weather or short time intervals. 6 year “trends” are meaningless since there is no such thing as “6 year climate.” You need to focus on a longer and more suitable timeframe to address the question of climate change. The multidecadal trend in OHC is rising (Dominigues et al 2008). As such, there is no necessity to invoke a “forcing” which is “counteracting” CO2/WV. And water vapor is a feedback, not a forcing.

    Comment by Chris Colose — 23 Feb 2009 @ 4:00 PM

  267. Mark,

    There is no need to save the output of multiple runs if the input data is provided as well as the software itself and the parameters used. If people properly kept versions of data there would be no need to provide input data either. But since this isn’t done it is necessary.

    In the situation that started this thread it actually makes a difference whether you used the original crut data or the updated crut data in combination with the RSS data. Dr. Schmidt emphasized the effect of RSS but I was a bit confused until I saw that the surface temperature set was changed as well. If Dr. McKitrick hadn’t provided the original surface temperature data there would have been no way to confirm that this data change mattered. In this case I don’t think it is very important, but I could imagine other cases where it would be.

    Comment by Nicolas Nierenberg — 23 Feb 2009 @ 4:13 PM

  268. david_a (264) — Start here:

    THC primer
    http://www.pik-potsdam.de/~stefan/thc_fact_sheet.html

    Comment by David B. Benson — 23 Feb 2009 @ 5:22 PM

  269. Chris, David,

    Thanks for the links to the papers. I’ll try and read them tonight.
    david

    Comment by david_a — 23 Feb 2009 @ 6:28 PM

  270. re #267.

    Nope, for some, unless every byte is reproduced, someone will shout loud and long (and be picked up in the tabloids) about how the data is being massaged to “prove” AGW.

    Heck, if you check the BBC website you’ll see people saying that Gavin is bullsitting everyone and this can be proven because he hasn’t given out the source code to one of his programs used in a paper.

    Comment by Mark — 23 Feb 2009 @ 7:03 PM

  271. david_a, just remember …

    Before you earn your Nobel Prize by overturning all that’s known about climatology, you’re going to have to learn some climatology.

    Now, of course, as a guy in computational finance we’re all sure you’ll be able to master the field in a day or two, and the day following, overturn the work of thousands of professional scientists.

    But pardon us if we wait until you actually accomplish the feat, before we adorn you with global honors deserved of those that trigger a scientific revolution.

    Comment by dhogaza — 24 Feb 2009 @ 12:50 AM

  272. Hi Chris,
    The paper is very clear. It’s a great jumping off point.

    On page 4 there is an illustration due to Trenberth etal 2009 which shows a snapshot of the global energy balance. As per the illustration, the incoming solar is 341.3 W/m2, the outgoing reflected shortwave is 101.9 W/m2 and the outgoing longwave 238.5 W/m2 for a total of 340.4 W/m2 outgoing leaving a net imbalance of .9 W/m2 which is shown on the bottom as ‘net absorbed’. By your remarks above with respect to 6 years not being long enough for this signal to dominate I take it that the variance of the outgoing energy budget processes must easily be high enough to mask this signal in the short term, since the incoming solar side is relative static over the short term.

    My intuition is that the outgoing longwave process is more invariant than the reflected solar because the former is relying on what would generally be statistically aggregated properties of molecular and atomic interactions while the latter is dependent upon much more macro level effects.

    If you have any insight into both the magnitudes and the sources of these variances or could point me towards some papers that would be great.

    One bit of confusion — On page 10 of the paper there is a paragraph dealing with direct radiative forcing due to a double of CO2. The number given is 3.7W/m2. The next line goes on to say that this would be similar to a 2% rise in solar irradiance.
    3.7 / 341 ~ 1.1%. Is this a typo or am I missing something?

    Thanks again
    David

    [Response: Forcing from a 2% increase in solar is 0.02*1366*0.7/4 = 4.8W/m2 - gavin]

    Comment by david_a — 24 Feb 2009 @ 10:41 AM

  273. david_a

    No typo. Gavin’s response gave you a back-of-envelope for solar forcing. The 1366*0.02 part comes from the 2% change and the 0.7/4 part comes from the Earth’s albedo and geometry. Although not often done or reported in the literature, if the forcing is defined at the tropopause, the RF is even a bit less since UV is absorbed above that layer.

    Keep in mind that a 2% increase in solar irradiance is absurdly large. Most ideas invoking a strong solar influence on climate change generally involve indirect effects as opposed to simple increases in total solar irradiance (e.g., UV effects on circulation patterns, cosmic rays).

    C

    Comment by Chris Colose — 24 Feb 2009 @ 2:18 PM

  274. “David, ensembles are one method of providing “probabalistic” forecasts. It is more useful still in climatology to test sensitivity. E.g. if you run 10 programs that use the same code but change, say, cloud coverage fitting constants, you can see if the output of the model is particularly sensitive to that figure being wrong. If your model is not sensitive, then it is likely (unless you left something important out), that the real world is not particularly sensitive to you getting that feature wrong.

    I have heard a lot recently about ensembles wrt climate (wrt weather it’s kind of old hat) but storage becomes a HUGE problem. Add to that skeptics and denialists want the raw data (whether they do anything with it is unknown) so you can’t even reduce storage. And you have to keep it for decades maybe and you have a LOT of data to handle.”

    What is a “cloud coverage fitting constant”, how does one quantify or qualify a claim of “likely” with respect in general to models and has that been done or done specific to any specific model, why is storage of “ensembles”, presumably raw data/code/docs used in any particular single study, a “huge problem” and why would you necessarily “have” to keep that data for “decades”?

    Comment by Glenn — 24 Feb 2009 @ 5:35 PM

  275. Chuckle.

    “We will be happy to load our data onto your storage devices, and contribute floor space for them, provided they will thereafter remain physically on our site and be accessible to the public on the same terms you request we offer you. You will have to confirm you have made longterm contract arrangements for electricity and air conditioning with the local utility company, for connectivity with the local Internet Service Provider, and for system operations staff on your own account. To arrange delivery of your storage hardware to our facility, after these support contracts are confirmed, please contact us ….”

    And

    We regret that the pen drive and reel of mag tape you offered to send will only store 0.000001 and 0.00001 percent of the data you requested, respectively, and without backup.

    Comment by Hank Roberts — 24 Feb 2009 @ 6:24 PM

  276. Glenn, I don’t know.

    But probably still they just figure that if you take the values you have in a cell like relative humidity and make cloud cover A when it’s less than 40%, B when it’s less than 70, C under 90 and D over 90 then you get the right sort of weather.

    A B C and D and when they swap could easily change and be completely wrong.

    But if the model that uses this is run with different values of A B C D and show little sensitivity to having these factors changed, you know that the determination of these factors is not a problem.

    I don’t do climate modelling, remember. I’ve read some stuff. You can do it yourself.

    likely: run 10 different models with the same figures. If they diverge qidely, it’s likely a difficult to forecast system or your parameterisations are wrong. Not certain, it could just be bad luck. Even if they all track, they could be doing that through very good luck.

    Likely it’s merely that it is good if they track bad if they don’t.

    Likely.

    And the storage of the raw data is the only thing that will shut up the denialists. If any of the data is not kept, they will say that the data removed was done because it proved AGW was wrong or the model was bad.

    And that cannot be disproved to the public (it will NEVER be disproved to the denialist) without keeping ALL the data.

    That keeping it all is otherwise useless doesn’t matter when you are dealing with someone at the “banging on the table” stage of debate.

    Got it?

    Comment by Mark — 24 Feb 2009 @ 6:49 PM

  277. Hi Chris,
    Seek and ye shall find :)

    I found this paper which is a study of TOA SW radiation budget for 1984 – 1997. It is fascinating. The variability both spatially and temporally is quite high, and would be capable of swamping an average linear forcing signal of .6 w/m2. If I read it correctly, over the 14 year period there was annual trend forcing of 2.3w/m2 which is obviously a pretty big number. Again from my first read it appears that the big driver of the variability is clouds both in quantity, type, and structure. The general direction is more tropical clouds more reflected short wave.

    I am searching for one which extends the study to present time as the last 8 years have been relatively cooler.

    david

    http://www.atmos-chem-phys-discuss.net/4/2671/print-redirect.html

    Comment by david_a — 25 Feb 2009 @ 1:00 PM

  278. david_a wrote in 277:

    I found this paper which is a study of TOA SW radiation budget for 1984 – 1997. It is fascinating. The variability both spatially and temporally is quite high, and would be capable of swamping an average linear forcing signal of .6 w/m2. If I read it correctly, over the 14 year period there was annual trend forcing of 2.3w/m2 which is obviously a pretty big number. Again from my first read it appears that the big driver of the variability is clouds both in quantity, type, and structure. The general direction is more tropical clouds more reflected short wave.

    We did see cloud cover decrease over the last fifteen years of the twentieth century in the tropics which was not predicted some models. Whether this is due to diminished aerosol load, global warming or natural variability (e.g., ENSO) is still an open question inasmuch as the trend has been relatively short.

    But the net effect at the top of the atmosphere has been a reduction in reflected sunlight which almost exactly matches the increase in outgoing longwave radiation, implying no net warming or cooling as the result of diminished cloud cover, and over the same period, the temperature of the tropics has continued to rise.

    I will refer you to the same authors, later paper:

    A significant decreasing trend in OSR [outgoing solar radiation] anomalies, starting mainly from the late 1980s, was found in tropical and subtropical regions (30° S-30° N), indicating a decadal increase in solar planetary heating equal to 1.9±0.3Wm-2/decade, reproducing well the features recorded by satellite observations, in contrast to climate model results. This increase in solar planetary heating, however, is accompanied by a similar increase in planetary cooling, due to increased outgoing longwave radiation, so that there is no change in net radiation. The model computed OSR trend is in good agreement with the corresponding linear decadal decrease of 2.5±0.4Wm-2/decade in tropical mean OSR anomalies derived from ERBE S-10N non-scanner data (edition 2). An attempt was made to identify the physical processes responsible for the decreasing trend in tropical mean OSR.

    A. Fotiadi, et al, Analysis of the decrease in the tropical mean outgoing shortwave radiation at the top of atmosphere for the period 1984-2000, Atmos. Chem. Phys., 5, 1721-1730, 2005
    http://www.atmos-chem-phys.org/5/1721/2005/acp-5-1721-2005.html

    Incidentally, that was actually a reduction in outgoing shortwave for that period being balanced by an increase outgoing longwave – as the result of a reduction in cloud cover.

    Comment by Timothy Chase — 25 Feb 2009 @ 3:43 PM

  279. I have taken a look at issues of spatial autocorrelation discussed in S09 (that started this thread) referring to MM07. My conclusions are that the main results of MM07 are not affected by spatial autocorrelation, which is in agreement with Dr. McKitrick’s follow up article. I also found that the spurious correlations reported by Dr. Schmidt using the Model E data were indeed caused by spatial autocorrelation, which is what he hypothesized in S09.

    My analysis can be found here.

    It has been quite interesting looking into this and learning about R and about spatial analysis. I look forward to comments, suggestions, and criticisms.

    Comment by Nicolas Nierenberg — 25 Feb 2009 @ 5:49 PM

  280. Hi Tim,
    Thanks for the link.

    The group did a lot of papers on radiation budget. The links to them are here
    http://de.scientificcommons.org/n_hatzianastassiou

    There is one on tropical longwave budget here that I am just starting to read
    http://de.scientificcommons.org/35494343

    Though I do not have the paper reference I recall that ocean heat content rose during the period, which would imply that even though the trend in radiation balance had not changed there was some difference in the integrals or things were out of balance to begin with and that just stayed put. As a first guess the integral hypothesis would be better since after the fact OHC has declined a bit and though it is only a short term measurement it still implies current radiation balance.

    One thing I completely don’t get in this field is the apparent lag in the models which are working from what seems to be real-time data. Or more simply, why does the paper stop in 2000 when it is published in 2008. If the satellites are still orbiting and producing data why wouldn’t you update and post the results as they became available? Or is this done and it is just emailed around with the community and just not evident to the random google searcher.

    If you have any links to ocean heat content papers that would be great. This measurement would seem to be a pretty key one in the whole of the science as it is one which appears to be the best integrator of all of the data and so would be far more robust then any of the trend data.

    thanks
    david

    Comment by david_a — 26 Feb 2009 @ 8:36 AM

  281. “Or more simply, why does the paper stop in 2000 when it is published in 2008.”

    Why is Vista 7 still in Beta when they finished the coding in 2007?

    Why is the Texaco financials only until April 3 when they published it in May?

    Comment by Mark — 26 Feb 2009 @ 10:58 AM

  282. david_a,

    The Earth system doesn’t have to be in equilibrium on short time scales (even in the absence of an underlying trend) – there is year-to-year variability all the time, and the ocean and atmosphere are always exchanging heat back and forth; changes in ocean circulation cause changes in atmospheric circulation, which cause changes in clouds and water vapor, which change shortwave and longwave radiation. The oceans exhibit their own variability, so you do not expect increases year after year, so you really need to look at the trends. Trying to evaluate climate change using time periods 1/3 or 1/5 as long as a standard climatology is like declaring that summer has ended because there is a cold week in July (sorry to neglect any of the SH folks in here).

    Following the World Meteorological Organisation (WMO), 30 years is the classical period for performing the statistics used to define climate. This of course depends on context– people talking about the “climate of the last ice age” don’t talk about hundreds of 30-year segments, so this is used in a broader context, but the standard is well suited for studying recent decades, because such an analysis requires a reasonable amount of data and still provides a good sample of the different types of weather that can occur in a particular area.

    A persistent global warming signal over such a suitable timeframe is an indication that there is a top of the atmosphere net energy imbalance, and as you suggest, a rise in OHC. This has been discussed at RC here and here, as well as other posts if you scroll through the archives.

    Comment by Chris Colose — 26 Feb 2009 @ 11:44 AM

  283. Hi Chris,

    Yes it is quite clear that there is a continual exchange of energy within the earths climate system as well as continual change at TOA. However, I do not think that it is in principle impossible to decompose the various effects to lend more or less credence to a hypothesized underlying trend. I would also guess that as time goes on and measurement accuracy increases it will become easier to extract signals on a shorter and shorter time period. A lot of the noise appears to be much more a function of imprecise data sets than any limitations imposed by the physics.

    I’ve had the chance now to read a paper by Lyman et al. about recent cooling in the upper ocean which can be found here:

    http://trs-new.jpl.nasa.gov/dspace/handle/2014/40964

    One of the interesting points of the paper was their estimation of how the error bars (1sd) around one year avg OHC in the upper 750 meters had changed from about 3.7×10^22 joules down to around 0.6×10^22 joules from 1955 data to 2005 data. The latter number I believe corresponds to a yearly radiative imbalance of about .5 w/m2 at TOA. So at least from a pure measurement standpoint a .9 w/m2 imbalance is well within the realm of detection in fairly short order given the current state of the measurement system. Now of course if the number and variance of non trend components of the energy balance are large then their ability to mask the trend component would increase over any finite time frame.

    To put my idea another way, suppose you had a bunch of differing denominated coins and you flipped the whole bunch at once. Heads increases your bank account, tails decreases it. If the set of coins contained a two headed coin it would be pretty easy to see its trend in your bank account over time. The time it would take to determine its existence to any degree of confidence would be proportional to the number and values of the two head coin and all the other coins. The bigger the other coins the longer it would take to ‘know’ to some degree of certainty whether there was indeed a two headed one in the bunch.

    I’m just trying to understand the sizes and quantity of the various coins to figure out how long it should take to decide if a two headed one is there or not and how big it might be.

    david

    Comment by david_a — 26 Feb 2009 @ 4:37 PM

  284. David, all that is done is take a 30-year period and average all the March temperatures, etc, all the April temperatures, etc and so on.

    30 years is picked because out of the known cyclic variations that we can consider null in comparison to human scale climate change are much less than 30 years, so you get at least a *few* repeats of the cycle. This tends to reduce the noise.

    Longer periodicities can be seen by collecting several 30-year averages together and this is done to see what the longer term natural variations are by proxy measurements (since we don’t have 200,000 year old met stations and log books).

    They are selected because of the physics we know, not because we are throwing bent coins, where the uncertainties can be calculated from abstract mathematics, something the messy real world doesn’t let us do with climate, so we HAVE to measure it. And to remove the known periodicities and short term (hence not predictive over the long term future) outliers, 30 years is about what is needed.

    Purely so the smaller scale periodicities are averaged out over at least a few cycles.

    Comment by Mark — 26 Feb 2009 @ 5:42 PM

  285. about recent cooling

    You keep saying the above, and none of science types has challenged it.

    Given “The authors of the original Lyman et al. paper (12) have now publicly acknowledged that their earlier finding of pronounced ocean cooling over 2003–2005 was spurious (30). Their unpublished analyses confirm that this “cooling” arose for reasons similar to those identified here. …”, was there recent cooling?

    Comment by JCH — 26 Feb 2009 @ 6:14 PM

  286. david,

    Perfect measurements or not, the climate system is characterized by noise and (possibly) an underlying signal. Better instruments will not make El Ninos or La Ninas go away, and these things operate on timescales as long as, or longer than the timescales of which you think we can get a coherent signal, and remain a key source of climatic fluctuations about the mean.

    The ability to detect a signal against background noise depends on the system (and statistic) you are analyzing, and it’s not obvious that perfect measurements should make detection abilities suitable at still shorter time intervals. During glacial times for instance, the climate was not only colder but also more variable, so it was likely more difficult to distinguish between possible trends and background variability (e.g. due to ocean circulation changes). There’s no argument by intuition which suggests the climate shouldn’t be more variable, but observations and models show relative stability over Holocene-like conditions, with no simulation of the coupled atmosphere-ocean system that can spontaneously produce persistent changes as strong as a doubling of CO2.

    The equilibrium conditions and any possible secular changes depend on the TOA energy (im)balance. As such, this is the key factor in climate prediction since it serves to define the basic boundary conditions which constrain the global climate. But there’s always going to be weather superimposed on the long-term trend, so a handful of data points just won’t cut it for trend analysis.

    Comment by Chris Colose — 26 Feb 2009 @ 8:19 PM

  287. Re; Lyman paper

    Wasn’t there a difficulty with this paper, already addressed in this forum ?
    aha, the magic of the Memex reveals

    http://www.realclimate.org/index.php/archives/2007/04/ocean-cooling-not/

    followed by the discussion of the Domingues paper

    http://www.realclimate.org/index.php/archives/2008/06/ocean-heat-content-revisions/

    Comment by sidd — 26 Feb 2009 @ 9:46 PM

  288. dave_a is obviously mining denialist sites for evidence, as evidence by his dragging up the Lyman paper which, as the authors themselves later agreed, was based on spurious results.

    dave_a, we know you didn’t find this all by yourself. ‘fess up, dude. Then tell us why you’re mining denialist sites for what you imagine is superior science.

    Comment by dhogaza — 27 Feb 2009 @ 12:15 AM

  289. dhogaza,

    david_a is not the “Dave A” known and loved by all of us ;-)

    Comment by Martin Vermeer — 27 Feb 2009 @ 5:09 AM

  290. You’re quite correct, Martin.

    He spells his name differently and doesn’t use capitals.

    Comment by Mark — 27 Feb 2009 @ 7:42 AM

  291. Chris,

    I would be very surprised if at the level we are measuring, the climate system it is characterized by ‘noise’. The only fundamental noise in physics is that due to quantum mechanical uncertainly. And while aggregating up from an atomic level can give macro effects, black body radiation spectra being a big one here, at 10^-34 Js h is small enough so that it is not going to get in the way of forcings 50 orders of magnitude greater. Perhaps we are only having a semantic disagreement but I have a very different understanding of noise, at least as it applies to the measurement of physical systems. From purely a semantical standpoint I believe it would be more correct to say that there are processes we do not understand or that we are not measuring to a fine enough degree to be able to separate them out from those that we do.

    I agree that better instruments will not make El Ninos or La Ninas go away, but I do not believe that is the issue nor am I suggesting that it is. The issue is whether or not ENSO, PDO, NAO etc are truly random (emergent properties of quantum effects) or appear random because we don’t have a handle on what forces them. But to bring this back around to my real contention, even if we can not answer the prior question, can we isolate their effects on the earths radiation budget to some degree? There seems to be an accepted wisdom that 30 years is somehow a magic number below which the process level uncertainty makes trend identification inherently impossible. Perhaps this is true, but shouldn’t there be some mathematical/physical reason for it to be so, and if there is then by all means point me towards it.

    The equilibrium conditions and any possible secular changes depend on the TOA energy (im)balance. As such, this is the key factor in climate prediction since it serves to define the basic boundary conditions which constrain the global climate.

    Perfectly said.

    I would add to that by saying that by far the best integrator of the energy (im)balance over the relevent time scales for measuring GHG , or any other, forcing/feedback would be the heat content of the worlds oceans. The added bonus of the oceans, is that the measurement tools that we currently have (thousands of thermometers, tide gauges bobbing around all over the place plus a few satellites) are now reaching a measurement resolution high enough to begin giving statistically strong global heat content data for annual and intra-annual time frames.

    We also know to a high degree of precision the amount of solar radiation reaching our planet so we can easily remove this variance from the global energy budget.

    This leaves the outgoing shortwave and the outgoing longwave as the two missing pieces to close the system. And in fact if we have one of them then the other just falls out though from the standpoint of error checking and validation it would be far more preferable to have them both.

    As a first order approximation the GHG forcing/feedback effects would effect only the longwave side of the equation. So in principle if you had a good handle on the variance of the outbound shortwave, and measurement of the energy balance, you could then produce a function which related the probability of any sequence of energy imbalances with the variability in the OLR process. One could then compare the variance in the OLR predicted by the GCM’s and plug it into this function to begin to get estimates on the reliability of their forecasting skill given a particular sequence of OSR and OHC measurements. Since this data is available I can go and do this, cool.

    Thanks,
    David

    Comment by david_a — 27 Feb 2009 @ 10:49 AM

  292. I realize he’s not, but I do think he’s clearly mining the denialist ore. In a more sophisticated manner than cap Dave cap A.

    Comment by dhogaza — 27 Feb 2009 @ 10:51 AM

  293. david_a (291) — Rather than the term noise, the phrase internal variability is sometimes used to describe the effects of ocean oscillations and so forth. It is shorter to just write “noise” and agrees with common practice in may sciences in distinguishing “signal” from “noise”.

    Comment by David B. Benson — 27 Feb 2009 @ 2:31 PM

  294. I’m confused about this new meme that seems to be spreading through the denialist community that “noise” is not a good description of certain aspects of the system.

    Google search for “data noise” pops up this definition:
    “Analysis of interactions with a site commonly involve data sets that include ‘noise’ that may affect results. Noise is data that does not typically reflect the main trends thus making these trends more difficult to identify.”

    Or as the common phrase goes: “One man’s noise is another man’s signal”.

    Clearly, noise in most contexts does not mean “only quantum noise”. Nor does it mean that there isn’t an underlying explanation for the noise. In the climate context, I think it is perfectly valid to characterize the internal shifting of heat around the system (as typified by El Nino and La Nina) as “noise” compared to the long term signal of heat accumulation. And indeed, because of the chaotic nature of weather, it may be that perfectly predicting El Ninos and La Ninas is indeed impossible – possibly due to your quantum noise! Heck, climate models, much simpler than the earth system, have “internal variability” which can be well described by “noise”, and there was a time when due to some cheap chips and a broken air conditioner that I couldn’t repeat some model runs reliably: one bit flipping in the middle of a 36 hour run would lead, inexorably, to significant changes in the year-to-year variability – though not the long term trend. Eg, “noise” or “chaos” or whatever you want to call it.

    ps. Ocean heat content would be a great way to monitor long term flux imbalances. But looking at the variability in recent papers and research by Domingues, Willis, Levitus, and Gouretski, I am stunned that anyone thinks we are anywhere close to having an ocean record with reliability as good as the surface temperature record. Of course, denialists (WattsUp and Pielke) like claiming that the surface temperature record is faulty – but I thought the resolution of the satellite trend/SAT trend in favor of SAT was a fairly impressive validation of SAT methodology to determine accurate trends (at least for the last 40 years).

    Comment by Marcus — 27 Feb 2009 @ 2:44 PM

  295. Marcus, I have just had an idea about how to explain this “noise”. If it isn’t reliable, it’s noise.

    E.g. On a Warm summer August, any one day will be cooler or warmer than the best guess of what that day’s temperature would be based SOLELY of the records (so no computer models, just observations), which is what, really “average” means.

    However, I cannot rely on the 5th August to be the cooler next time we have a 5th August just because this one was.

    The data for this one specific day is not reliable.

    However, without any forcing of the weather to change (which is what climate change is), it is reliably consistent with the past mean and daily variations.

    Comment by Mark — 1 Mar 2009 @ 6:40 AM

Sorry, the comment form is closed at this time.

Close this window.

0.867 Powered by WordPress