RealClimate logo


Climate code archiving: an open and shut case?

Filed under: — eric @ 26 October 2010

Gavin Schmidt and Eric Steig

The last couple of weeks saw a number of interesting articles about archiving code – particularly for climate science applications. The Zeeya Merali news piece in Nature set the stage, and the commentary from Nick Barnes (of ClearClimateCode fame), proposed an ‘everything and the kitchen sink’ approach. Responses from Anthony Vejes and Stoat also made useful points concerning the need for better documentation and proper archiving. However, while everyone is in favor of openness, transparency, motherhood and apple pie, there are some serious issues that need consideration before the open code revolution is going to really get going.

It would help to start by being clear about what is meant by ‘code’. Punditry about the need for release of ‘all supporting data, codes and programmes’ is not very helpful because it wraps very simple things, like a few lines of Matlab script used to do simple linear regressions, along with very complex things, like climate model code, which is far more sophisticated. The issues involved in each are quite different, for reasons both scientific and professional, as well as organizational.

First, the practical scientific issues. Consider, for example, the production of key observational climate data sets. While replicability is a vital component of the enterprise, this is not the same thing as simply repetition. It is independent replication that counts far more towards acceptance of a result than merely demonstrating that given the same assumptions, the same input, and the same code, somebody can get the same result. It is far better to have two independent ice core isotope records from Summit in Greenland than it is to see the code used in the mass spectrometer in one of them. Similarly, it is better to have two (or three or four) independent analyses of the surface temperature station data showing essentially the same global trends than it is to see the code for one of them. Better that an ocean sediment core corroborates a cave record than looking at the code that produced the age model. Our point is not that the code is not useful, but that this level of replication is not particularly relevant to the observational sciences. In general, it is the observations themselves – not the particular manner in which they are processed – that is the source of the greatest uncertainty. Given that fundamental outlook, arguments for completely open code are not going to be seen as priorities in this area.

By contrast, when it comes to developers of climate models, the code is the number one issue, and debugging, testing and applying it to interesting problems is what they spend all their time on. Yet even there, it is very rare that the code itself (many of which have been freely available for some time) is an issue for replication — it is much more important whether multiple independent models show the same result (and even then, you still don’t know for sure that it necessarily applies to the real world).

The second set of issues are professional. Different scientists, and different sciences, have very different paths to career success. Mathematicians progress through providing step by step, line by line documentation of every proof. But data-gathering paleo-climatologists thrive based on their skill in finding interesting locations for records and applying careful, highly technical analyses to the samples. In neither case is ‘code’ a particularly important piece of their science.

However, there are many scientists who work on analysis or synthesis that make heavy use of increasingly complex code, applied to increasingly complex data, and this is (rightly) where most of the ‘action’ has been in the open code debate so far. But this is where the conflicts between scientific productivity at the individual level and at the community level are most stark. Much of the raw input data for climate analysis is freely available (reanalysis output, GCM output, paleo-records, weather stations, ocean records, satellite retrievals etc), and so the skill of the analyst is related to how they choose to analyse that data and the conclusions they are able to draw. Very often, novel methodologies applied to one set of data to gain insight can be applied to others as well. And so an individual scientist with such a methodology might understandably feel that providing all the details to make duplication of their type of analysis ‘too simple’ (that is, providing the code rather carefully describing the mathematical algorithm) will undercut their own ability to get future funding to do similar work. There are certainly no shortage of people happy to use someone else’s ideas to analyse data or model output (and in truth, there is no shortage of analyses that need to be done). But to assume there is no perception of conflict between open code and what may be thought necessary for career success – and the advancement of science that benefits from a bit a competition for ideas — would be naïve.

The process of making code available is clearly made easier if it is established at the start of a project that any code developed will be open source, but taking an existing non-trivial code base and turning into open source is not simple, even if all participants are willing. In a recent climate model source code discussion for instance, lawyers for the various institutions involved were very concerned that code that had been historically incorporated into the project might have come from outside parties who would assert copyright infringement related to their bits of code if it were now to be freely redistributed (which is what the developers wanted). Given that a climate model project might have been in existence 30 years or more, and involved hundreds of scientists and programmers, from government, universities and the private sector, even sorting out who would need to be asked was unclear. And that didn’t even get into what happens if some code that was innocently used for a standard mathematical function (say a matrix inversion) came from a commercial copyrighted source (see here for why that’s a problem).

Yet the need for more code archiving is clear. Analyses of the AR4 climate models done by hundreds of scientists not affiliated with the climate model groups are almost impossible to replicate on a routine and scalable basis by the groups developing the next generation of models, and so improvements in those metrics will not be priorities. When it comes to AR5 (for which model simulations are currently underway), archiving of code will certainly make replication of the analyses across all the models, and all the model configurations much less hit or miss. Yet recently, it was only recommended, not mandated, that the code be archived, and no mechanisms (AFAIK) have been set up yet to make even that easy. In these cases, it makes far more sense to argue for better code archiving on the basis of operational need, than it does on the basis of science replication.

This brings us to the third, and most important issue, which is organizational. The currently emerging system of archiving by ‘paper’ does not serve the operational needs of ongoing research very well at all (and see here for related problems in other fields). Most papers for which code is archived demonstrate the application of a particular method (or methods) to a particular data set. This can be broken down into generic code that applies the method (the function), and paper-specific code that applies that method to the data set at hand (the application). Many papers use a similar method but in varied applications, and with the current system of archiving by ‘paper’, the code that gets archived conflates the two aspects, making it harder than necessary to disentangle the functionality when it is needed in a new application. This leads to the archiving of multiple versions of essentially the same functional code causing unnecessary confusion and poor version control.

It would be much better if there existed a stable master archive of code, organised ‘by function’ (not ‘by paper’), that was referenced by specific applications in individual papers. Any new method would first be uploaded to the master archive, and then only the meta script for the application referencing the specific code version used would need to be archived with an individual paper. It would then be much easier to build on a previous set of studies, it would be clear where further development (either by the original authors or others) could be archived, and it would be easy to test whether the results of older papers were robust to methodological improvements. Forward citation (keeping track of links to papers that used any particular function) could be used to gauge impact and apportion necessary credit.

One could envision this system being used profitably for climate model/reanalysis output analysis, paleo-reconstructions, model-data comparisons, surface station analyses, and even for age-model construction for paleo-climate records, but of course this is not specific to climate science. Perhaps Nick Barnes’ Open Climate Code project has this in mind, in which case, good luck to them. Either way, the time is clearly ripe for a meta-project for code archiving by function.


200 Responses to “Climate code archiving: an open and shut case?”

  1. 101
  2. 102

    100 (Leif),

    How one deals with code in the millions of lines is a different problem. Perhaps the real problem is that code should not be allowed to grow into such monsters.

    Which is to say… don’t bother to use computers to solve anything except trivial problems?

  3. 103
    Didactylos says:

    No, klee12: you still have it backwards.

    Publishing the data and code is sufficient for repetition. It tells you absolutely nothing about the intent of the code, or whether the code does what is intended.

    And your description of “all I want in the way of documentation is a statement of what the subroutine does, and the order and data structures of the input parameters and the output parameters” sounds a lot like undocumentation. Yes, stale documentation is also a concern. But if the specification document is normative, then the code doesn’t matter. Given any implementation, you can tell whether it is correct – whether by examining the code, or testing the results. (And “specification documents” really are pretty normative, since that is the part that gets peer reviewed and published.)

    Publishing the method, the intent, the specification – that is the vital part. The code is just gravy.

    “often the specifications are not reflected in the code. One cannot claim the code is correct just because it runs and gives results. One has to be sure that the specifications are met.”

    You are so very close to being right with this. But then you ruin it by saying:

    “programmer B who wants to determine whether or not programmer A’s code satisfies the specifications needs to look at the code”

    …and you’re back to being wrong. 180 degrees wrong. The code without the specifications tell you nothing. The specifications without the code can tell you enough, because you also have the results of the code. You can determine from the output whether the code meets the specification. And most importantly, by replicating those results by applying the method from first principles yourself, you provide very strong evidence that the method is correct, and not an artefact of buggy code.

    Please be aware that it isn’t often code that future scientists will rely on when they build on someone’s work. It is the methods, the published algorithms. So for useful replication, it is the reliability of the methods that must be tested, not just the repeatability of a single implementation.

    Also, consider this: code review processes usually find something like 65% of bugs. Now, fixing 65% of bugs is a useful thing to do, but it doesn’t say anything about whether the output is correct. If there is a significant problem in that elusive 35%, then you are stuck. But if the code does what it should under the prevailing conditions, then we can be confident that the undetected bugs either a) have such a small effect on the result as to be negligible, or b) only manifest under conditions that are currently out of range.

    [Response: Italicized emphasis mine --eric]

  4. 104
    klee12 says:

    103
    Didactylos @103

    I’m afraid we’re going to have to agree to disagree

    klee12

  5. 105
    Francis says:

    Many years ago I majored in Computer Science at Dartmouth and wrote a lot of code. About a decade later I was cleaning up my garage and found printouts of my own work. Now, I was an undergraduate at the time and went to law school in the interim, so a certain lack of skill and loss of skill might be expected. But I was completely incapable of understanding my own raw code. The only way I could figure out what I was trying to do was to read my own comments. So let me add my voice to those who argue that the raw code is much less important than the specification which describes what the programmer was trying to do.

  6. 106
    Didactylos says:

    klee12 said “I’m afraid we’re going to have to agree to disagree”

    It would be far more valuable if you examined your assumptions and worked out why I and the RC scientists disagree with you.

    Our opinions aren’t so far apart. We all want clear specifications and full archival of code and data. I’m not confident you have fully grasped the difference between repetition and replication, and that is why your priorities are different.

  7. 107
    Leonard Evens says:

    As with Francis, I also have had trouble deciphering my own code, at least when the project was somewhat complex. Early on, I learned the necessity of providing adequate comments, but even with those, it was usually quite difficult to work out just what I was doing. In many cases, it might have been easier to rewrite the program from scratch.

    On the basis of such experience, it seems to me to make more sense to describe the approach and algorithms that are being used and to have others write their own programs to see if they come up with the same results from the same inputs. Robustness under that kind of test would to me to be much more convincing that checking through enormously many lines of code for mistakes.

  8. 108
    klee12 says:

    Francis @105 wrote

    So let me add my voice to those who argue that the raw code is much less important than the specification which describes what the programmer was trying to do.

    Oops, I didn’t mean to say that documentation in the form of comments are bad. I meant, but did not express well, that for me they are not that very important. Commenting can be time consuming and I would much rather be programming. I just wanted to encourage people here not to feel they must go through the trouble of extensive documentation of code before publishing it.

    Let me explain why documentation is not important for me. Suppose I get a program where I suspect there is a bug. I assume I know something about what the program is supposed to do, i.e. solve partial differential equations rather than render graphics.

    I would not start reading code. I would look at the main program and look at the subroutines in the top level. Then

    1. I would try to compile the program with all optimizations turned off, and warning messages or debugging mode turned on. There is a saying in scientific computing “Do you want it to run fast or run correctly. Pick one”. In debugging mode, a compiler can insert code for runtime checks that references to array elements are within the bounds of the declared arrays. Other runtime checks may be inserted. Optimizations can do strange things that I won’t go into now. If I get a good compilation, I read the warning messages. If not I fix it up.

    2. I would eyeball the main program and look at the major subroutines called. I try to figure out what they do by looking at parameters going into the subroutines a the output parameters. I might insert my own debugging statements into the code. I might build my on test data and run the program and try to verify what is going on. The goal is to try to find the most like place there is an error.

    3. If I want to examine a subroutine more carefully and if it contains more subroutines I might repeat step 2 on the subroutine. Eventually I will want to examine a few subroutines. I might break them out of the code and build my own test harness (program) to run the subroutine. It is easier to vary the parameters in the harness than in the main program.

    4. At this point I read look at the code. I examine calling sequences, undeclared variables, global variables, and other things. Then if there is documentation I now read the documentation. I note which data structures are 32 bit and which are 64 bit. If there are specifications, I check to see if they are followed. After all that I read the code, I insert debugging statements, and run the test harness.

    5. If I still can’t figure out what the code is doing I use a debugger to to single step through the code. The debugger can execute a single line of code and then wait for a command. Commands may be to (1) continue to execute the next line of code (2) print the value of any variable (3) set a breakpoint so program runs until that line is reached. There are many other commands.

    That’s what I would do. Different programmers may use different approaches. Every program is different and this approach might not work for all programs. Sometimes I get things under control in a day, sometimes in a couple of weeks. But that’s a problem for the programmer checking the code, the the author.

    klee12

  9. 109
    Eli Rabett says:

    Seems to Eli there are two cases here which are being inconveniently mixed. The first is one off, simple stuff, used once in some analysis and interred. There is absolutely no need to go beyond publishing a description of this.

    The second is production code (GISTEMP, RSS, UAH and the Community Climate Code, etc, which (or at least their cores) are constantly used by many or whose outputs are used by many. The code for those should be publicly available.

    One could make an argument for the first class of programs that they should be written in as high a level language as possible, MATHEMATICA and the like.

  10. 110
    Tony O'Brien says:

    Once code gets “out there” it can take on a life of its own.

    I wrote a useful little program, just for me, for a very specific set of conditions. Of course I didn’t document it, no input controls no pretties. Coworkers liked it and used also, it spread.

    Later I ran into a pretty version. Overall a nasty program.

    Yes my very simple little program was at the heart of the nasty one. It was no longer obvious that it only catered for a specific set of circumstances and gave garbage for others.

    In truth I put in less effort on the original program than someone else did creating input controls, but I only made it for me.

    So yes I can see problems in open sourcing all code.

  11. 111
    Mark says:

    Eric and Gavin,

    Thanks for bringing this out. It is amazing that scientist have to do extra work, with no further pay to satisfy those who just don’t believe the facts. To bad in the political game donations do not have to be accountable. Maybe a comment not appropriate in this discussion but one to ponder.
    Much appreciation to the article and discussion.

  12. 112
    Didactylos says:

    klee12: that’s a great approach if you just want to fix the bugs. But if the question is “does the software meet the client’s operational needs” – then maybe you would try a different approach.

    Oh, and here’s a secret: nearly all programmers dislike writing comments. (We try not to let anyone know this, though.) After a few experiences like Francis and Leonard’s, you too will really wish your comments were better. You might even start writing better comments. It’s still unlikely you will enjoy it. If you think it’s easier to interpret the code – wait until you have had to reinterpret the same code a few times; you will wish you had added some comments the first time you worked everything out. Either that, or you’re some kind of masochist ;-)

  13. 113
    Steve Metzler says:

    Now yer talkin’ my kind of language…

    I was originally a computer designer, and now a professional programmer for the past 30 years. I realise that I’m mainly echoing the points raised in this excellent article, and by the likes of Damien in comment #4, and Bob in comment #11. But they are points that need to be constantly reinforced nonetheless:

    1. The algorithm is the important thing, not the code
    In a published paper, I want to see the basic algorithms/methods you used along with any important constants and assumptions. Then I can develop my own code and run my own data (coming to that in a second) through it to see if I get similar results.

    2. GIGO (Garbage In, Garbage Out)
    It is much more important that many independent, reliable, data sets of a given type are gathered, than that everyone is trying to run their own home-brewed algorithm on a single data set to see how far they can push the limits of credibility :-)

    So I agree that an open source library of climate-related functions that are difficult to implement would be handy and time-saving, just like any high level language comes with a built-in library. But asking for entire coded bases to be open-sourced? Not useful, IMO.

    ETA: in fact, pseudo-code might be even more useful. Then someone could implement each function in any computer language.

  14. 114
    Ray Ladbury says:

    klee12,
    I’m going to go out on a limb here and guess that you haven’t written any code for scientific computing. Specifications? You’re lucky if you get a well documented dataset. And the majority of the time, the issues you are looking for are not “bugs”. A decent scientific programmer or his working group will have found most of the significant bugs before the results are published. More often than not, a “bug” means the code either doesn’t work or yields results that are not consistent with known science. When was the last time you heard of a result being retracted because of a bug in the code?

    And I beg your pardon, but a third researcher trying to reproduce the results is precisely what would happen if you had two other researchers who disagreed. And a fourth, fifth and sixth for that matter. If a result significantly advances understanding of a field of study, you will have every other researcher in the field trying to reproduce the results and to see if it also advances their project.

    And why not release the code:
    1)it does not advance the science
    2)most scientific code is used within small groups and may not be sufficiently well documented for outsiders to use without wandering outside the validity of the models being used.
    3)if a researcher can’t reproduce the analysis from a description, then either the description was inadequate or the analyst trying to reproduce it is not sufficiently skilled to be mucking about in the field anyway.
    4)independent replication is more likely to lead to advances than simple repetition.

    I have found that it is possible to get access to code via certain advanced techniques–namely asking nicely. The advantage of this is that it usually also comes with help from the author of the code.

  15. 115
    Thomas says:

    Tony @110 had a good and easily overlooked point. The broader and less specialized the user community for a code, the greater is the need for investment in user-friendliness. Otherwise naive users can easily violate the logic and capabilities of the code, often getting nonsense results, or reporting bugs. So you can either make a large investment in bells whistles, user hand-holding, sophisticated algorithms to determine whether the solution is converging correctly etc, etc. This sort of coding can often cost more than the cost of the original code, and is not particularly enjoyable. So the temptation will be to just make the source code available on as asis basis. But, then you run a reputational risk when naive users get into trouble, and blame you for writing low quality code. So there is a danger that a requirement for openness which is motivated by a need for greater public confidence can end of having the exact opposite result.

    A good example of this, just happened to me with CAPTCHA. After typing in the comment, I hit SAYIT, without filling in the CAPTCHA, and had to completely retype this message (because of poor quality software).

  16. 116
    Rod B says:

    A quicky from the peanut gallery. I’m in full agreement with most of the commenters and the thrust of the article. Ray L.’s points in 114 sums up the rational pretty well (as have many others). Distributing raw code seems a colossal waste of time with the probability of 0.1% of anything productive coming of it. Though there probably ought to be a mechanism for a very select list of people getting a copy in some circumstances.

    The documentation, specs, algorithms ought to be freely available (sometimes under non-disclosures) to aid the perception and reality of openness in this critical highly-charged area. Though I recognize the significant dilemma and aggravation this can be sometimes.

    btw, I once knew a guy in old Big Blue who could debug (once in a while) even massive programs by reading machine code. Don’t know if this is uncommon, but as a neophyte at the time I was greatly impressed!

  17. 117
    Robin D Johnson says:

    I think a useful analogy would be the world of computer security.

    Encryption algorithms and security protocols have to be rigorously described and all assumptions documented. The algorithms involve highly complex and advanced mathematical theory. The protocols have to be designed around established cryptographic principals. Successfully writing code to implement the algorithms and protocols requires a significant understanding of cryptography. The algorithms and supporting reasoning are generally subject to public review (that is, by folks that could actually follow the reasoning).

    Code to implement the encryption algorithms often is freely available under open source licensing (for example, openssl, openssh). But organizations and individuals often write their own implementations for their own purposes. The code for encryption is often quite “simple” from a software engineering perspective. The main software engineering problem is performance – so reducing overhead with fewer function calls and extraneous variable checking is critical. This requires a higher degree of software engineering skill, talent and experience to avoid fubars and subtle defects that can be exploited by a hacker. Such code needs a high degree of review to avoid subtle defects. Only experts bother to review the code for obvious reasons. Again for those reasons, most folks use the free versions mainly because they are the most widely used, best reviewed and hence most trusted.

    The analogy being that climate model code is really only comprehensible if you understand the physical model that the code implements. The difference here is that the implementation is being done largely by smart climate scientists and not software engineers. All that means though is that the code has much less gold plate, is probably not quite bulletproof (like say a temperature data set with an unexpected control character might crash), is probably not written in a self-documenting fashion using variable naming conventions and style to maximize readability, is potentially inefficient [although fortran is generally regarded as very efficient] and is not taking advantage of the latest advances in software engineering theory. [NOTE: I've read through the some of the climate model code that is available so I'm not just guessing wildly here.]

    Could subtle bugs produce lousy results? Sure. But the vast majority of bugs, even the subtle ones, produce obviously wrong results. A bug in a decryption or encryption algorithm doesn’t change “Tom” to “Bob”, it changes “Tom” to “4#~”. When the climate model produces obviously wrong results, was it an error in the algorithm or the code or the data set or the computer configuration? Poorly written code can make that discovery process more time consuming than necessary and hence reducing the amount of time spent on climate science that could help us make sound policy decisions.

    This is where well-written, reviewed code is beneficial. A coder can quickly read the code and say “The only reason you could get that result is that the algorithm for pressure gradient was implemented wrong.” or “The temperature data set must be corrupt for Florida.”

    Documentation of algorithms, file formats, and interfaces are absolutely critical and need to be well-written and error free.

    But code “documentation” is largely useless or dangerous since it is rarely correct. Comments are particularly dangerous. The code needs to speak for itself.

    Seriously, I personally wrote, maintain and expand a codebase of 1+ million lines of relatively complex C++ and manage a team of Java developers with a codebase of 1.5+ million lines of code (by comparison Windows OS is reputed to be 40+ million lines). It really just comes down to the discipline of writing readable code. No amount of “documentation” will make poorly written code comprehensible. The time spent writing “documentation” would have been better spent making the code more readable.

    Bottom line, it seems to me that “archiving” climate modeling code should be done to help the climate scientists produce better code not for political purposes to satisfy demands for “transparency”.

    [Response: Thanks for you level headed comments and interesting perspective as software expert. I agree that in general, decisions ought to be made that best serve the science, not to satisfy political winds. I think Nick Barnes would reply that his project is serving both purposes -- hence ClearClimateCode.org and Open Climate Code.--eric]

  18. 118
    Jeffrey Davis says:

    It can’t be emphasized too much that the code is not the world. The code in GCMs attempts to mimic a physical process, but as near as I can tell, it’s never the code of the GCMs that the naysayers and deniers have fits about. The code they have fits about are statistical routines: the code which attempts to find signal in noise. The naysayers never seem to want to kvetch about code that mimics a physical process. No, they moan that certain statistical procedures weren’t done to their satisfaction.

  19. 119
    RiparianJohn says:

    While I agree that papers published describing coded implementation of mathematical models can be tedious and may not be a good presentation of the model they do serve to announce the existence of an implementation and some of the thinking behind it. I also agree that new and different coded and tested approaches to the same or similar problem and data provides another view for comparison, but if an implementation produces particularly useful data then it is necessary to look closely at the code and ideas that produced said data. If a coded solution does produce an answer as expected it is possible in the name of tuning or calibrating to paint the hood rather than fixing the engine. The result looks great, but it doesn’t really run. How would anyone know that if the code and documentation is not refereed?

  20. 120
    klee12 says:

    Ray Ladbury @114 wrote

    I’m going to go out on a limb here and guess that you haven’t written any code for scientific computing.

    Well you decide. I worked several years as a senior scientists at NASA’s Ames Research Center at Moffett Field, California. I worked with a division involved with computational fluid dynamics (CFD) but I was attached to a section that was involved with benchmarking and alogrithm development. The algorithms had to do with taking advantage of the architectures of the machines we had (Crays and early parallel processing). I worked with the guys who really developed code and I sometimes I was asked to help make their code run faster. In my benchmarking I did write some code to investigated some algorithms but I didn’t write CFD code. At that time I could read Fortran like a newspaper. I later worked with C, but never got to that level of expertise as I did in Fortran because C was too close to the machine level … more tricky things could happen than in Fortran. I had an interest in software engineering.

    The work at Ames may have been defense related. I needed security clearance. The codes, AFAIK, was not published. But it was not important for science even if it was published. It was engineering, it did not add to our scientific knowledge. It was successful if it could produce a plane that could fly faster and/or more efficiently or whatever.

    Before I go on, let me say where I’m coming from. On this thread, there are many subthreads. Nick Barnes started a good on @comment 78. He gave gave reasons to publish code. I’m suggesting another reason, name science requires that results can be checked, and science should publish their code since that is (IMHO) you essentially cannot check the results without the code. In particular it is difficult to resolve differences if someone, attempting to reproduce the results, gets a different result. Who is correct? I am arguing that giving a description of an algorithm may not, in software engineering terms, be an adequate set of specifications and that even if the code implements the algorithms, bugs may occur in the code.

    I also point that it does not take much effort publish the code and the data. No documentation, no support; just the whatever is necessary to produce the result. See post 85 by Nick Barnes.

    Now back to Ray Ladbury@114.

    Specifications? You’re lucky if you get a well documented dataset. Yes, I knew that and I think I wrote implicit specifications, which I left undefined. Generally description of methods do not specify the precision of the computation or the handling of run time errors. However these issues must be handled in the code so I invented a classification of errors with the name implicit specification error. I was trying to point out why two different implementation, both without coding bugs, may give two different results. By the way small round off errors can lead to large errors when inverting ill conditioned matrices. That’s why they invented double precision.

    When was the last time you heard of a result being retracted because of a bug in the code?
    I won’t define bugs but I know it when I see it. When Microsoft issues a service pack, they have found bugs. When an open source software issues a security warning to update a package, they have found a bug. If a teams of professional programs highly motivated to produce bug free programs still write code that is buggy, why do you think that you can write a rather large bug free programs? I certainly don’t think I can. There’s an saying from software engineering that says “there are bugs in every large program, you just haven’t found them yet”.

    klee12

  21. 121

    A lot of naysayers I’ve dealt with seem kind of hazy on the idea that there’s anything *besides* statistics in GCMs.

  22. 122
    Chris G says:

    A quick nod to Ray Ladbury #114, Rod B #116, Robin D Johnson #117, you guys (and others) have hit it well.

    So many times I’ve come into an old piece of code, read the comments, and discovered that the comments had little to do with the code anymore.

    I once knew a guy who was called into a problem in the software, looked at the output, and declared something to the effect that the only way this problem could happen was if register X of processor 13 of 16 was faulty. Awe ensued. It wasn’t even a problem in the millions of lines of multi-threaded code; it was a problem in the hardware.

    Sometimes readability and performance are at odds with each other. I once took a very simple function than was maybe 10 lines of code and rewrote it in to something 3 times longer that even my teammates looked at and said,
    “Why the heck did you do this?”
    “It now runs 6 times faster, and it is used a lot.”

    Just seconding that looking at code without a high degree of expertise in both code and the domain the code was written for is generally a waste of time, not only for the viewer, but also for all the people they ask for help.

    Modularizing, eliminating redundancy, and imposing version control over a pre-existing, disparate library of climate code would be a daunting task. My hat is off to David Jones and company.

    Lawyers at my company would come into my cube, rip off my arms, and beat me with them if I released code into the public domain. Then security would come and escort me to the door. Maybe I should hope that security gets there first.

  23. 123

    I know it is not the right thread, but that piece on beetles and forests below this one is marvelous. It’s the kind of thing that blogs do so well….

  24. 124
    Didactylos says:

    klee12, you are still missing the point. Ray never implied that the code is bug-free, just that any bugs don’t affect the result.

    I believe it is relatively common for minor bugs to be found during replication. The number of times they have resulted in a retraction? Very small.

    Since this is exactly the point I made in a reply to you way back when, it seems you aren’t really taking in what is being said.

    “I am arguing that giving a description of an algorithm may not, in software engineering terms, be an adequate set of specifications”

    Then that is a flaw in the description. It’s not something that causes everything to collapse. The description gets fixed, we move on. And yes, this is a real situation that occurs during replication. Nobody is arguing that the “specification” is perfect – just that it is central. Nobody is arguing that any code is bug-free. Any non-trivial code inevitably contains bugs.

    This would be a lot easier if you didn’t argue against straw men.

  25. 125
    Rattus Norvegicus says:

    h/t to Robin w/respect to writing clear code.

    I worked on the Unix kernel and the NT kernel for about 15 years and some fairly complex applications (virtual filesystems and ill designed application code) for the last 10 years. None had much in the way of any comments, but all were reasonably clearly written. I’ve been writing code for very complex systems for a long time. Comments are, well, worthless.

    Pretty much the only time I write comments is to explain where I do something “tricky” (oh, that’s a bad word) or to make clear assumptions on internal routines. Sometimes this is done through asserts (to catch things I never expect) or comments to indicate the conditions under which I expect a routine to be called, although it will work in those circumstances in which it is not appropriate to call.

    The other situation in which I will write comments is to make clear the assumptions which were valid at the time the code was written indicating where code will have to be changed if the assumptions change. The vast majority of the code I write is uncommented, but I do try and make it readable. This approach is the best that you can do.

    I’ve been working on complex systems which are mostly written by others for years — porting operating systems to new architectures for a long, long time. Readable code is your best friend. A good debugger is your next best friend. For the sort of stuff I am best at (OS work), being able to look at the dump provided by a logic analyzer is the next best thing (a good ICE which can read the symbolic output of your compiler or assembler is better). Lots of times you have to be able to read the output of your assembler or compiler and compare it with the LA dump to see what is wrong.

    However, you can do good clear programming even in assembler (however, it does take lots of comments especially for Intel architectures where you have a shortage of registers, an exception). I remember getting some questions I got on some cache control code I wrote for an adaptation of the MIPS R4600 to support a secondary cache. “Why don’t you follow the conventional calling conventions?”. I tried very hard to not pollute cache lines in the SC. Once I explained the reason (not polluting lines while trying to ascertain which lines were to be cleaned) the questions went away. This was fun code to write, because I used every trick in the book to speed it up. It was ugly code, but it worked. Have fun all you “programmers” out there pontificating.

    Notice I said nothing about comments. Currently I am doing server side web site programming and I comment much more, but only because I am making the assumption that people who might follow in my footsteps don’t know WTF they are doing and will take an approach to debugging like klee12. He obviously has never done Real Programming(tm).

    One of the things that I was proudest of, and which USL (Mashey knows who they are) did not accept was an interface which did auto configuration of the kernel at runtime for the System V ESMP kernel for the type of processor it was running on. This involved detection and loading at runtime of the proper driver for the MP architecture it was being run on.

    This involved abstract classes (written in C and assembler!) which made it very easy for users to install on a machine, and if the disk was moved to a new machine, it still worked — as long as it was an Intel machine, no way to tell which version of locore should be loaded. I realize now that I should have gotten a patent on that one, because we could have sued Microsoft a few years later, but didn’t realize that I had something there (too young to realize I had a good idea there).

    As has been pointed out in other posts what I like to call “pessimizing” should not be done (it makes the code less readable) because modern compilers are pretty good at optimizing and code movement which makes stuff faster. Writing fast code should wait until a routine is identified as a sinner and not prior to that.

    Writing real code is far removed from something like the Matlab scripts which Mike Mann currently releases with his papers (and he has been doing it for a long time). Two of the three most important climate models (CCSM and GISS) are available, in source, on the interwebs. Scripts, such as Mann and others release, are trivial and are well described in the papers. I don’t understand why this stuff should be subject to the same rigorous software engineering standards that so many of the people calling for “openness” seem to think that they should be subjected to. Oh, I’m sorry, Matlab isn’t open. Well, if you can’t afford a Matlab license (and there are open Matlab interpreters out there) get lost. You just aren’t serious.

  26. 126
    Didactylos says:

    I find some of the discussion about code comments to be quite extraordinarily scary. I strongly suspect that many who claim that they don’t need comments are (at least in part) using it as an excuse for their own poor commenting practices.

    Now, what you do in the privacy of your own home is your business. If nobody else will see your code, then do what you like. The only person who will be puzzling over your code later is yourself.

    But for shared code, comments are important. If you think otherwise, then perhaps you don’t know how to write good, useful comments yourself, and it’s a fact that good, useful comments are rarely to be found in any code, open source or commercial.

    So for the love of all that is holy, please stop using the existence of useless comments as an excuse for perpetuating the problem. Write comments that will actually save you time, save your colleagues time, and add to the value of the code. Comments that say things that are *not* immediately apparent from the code itself.

    I’m not holding myself up as an example. My comments range from good to bad to awful.

    Rattus Norvegicus: We all admire Real Programmers, and we all pray we will never, ever have to wade through the undocumented monstrosity they pretend is code. Job security is nice, but….

  27. 127
    klee12 says:

    Didactylos @124 wrote

    I believe it is relatively common for minor bugs to be found during replication. The number of times they have resulted in a retraction? Very small.

    The probability of a significant bug may be small, but the should still be exposed. I suspect that the probability of significant bug is at least as large as that of large applications released to the public and some of those applications have reported security bugs.

    Rattus Norvegicus @125 wrote

    people who might follow in my footsteps don’t know WTF they are doing and will take an approach to debugging like klee12.

    In my post @108 I wrote

    Let me explain why documentation is not important for me.

    I think it was understood that I was speaking of scientific code. I wrote not only to show why documentation is not important but offer an to scientists an way to approach porting and understanding code.

    Readable code is your best friend. A good debugger is your next best friend. For the sort of stuff I am best at (OS work), being able to look at the dump provided by a logic analyzer is the next best thing (a good ICE which can read the symbolic output of your compiler or assembler is better).

    I’m not sure this is relevant. Of course readable code is your best friend but this is something you have no control over when trying to understand codes written by other programmers. Before I use the debugger I try to narrow down the area of code that I might have problems with. I do this with print statements, looking at structure of code. I don’t think dumps and ICE are relevant or useful to the average scientific programmer.

    Currently I am doing server side web site programming and I comment much more, but only because I am making the assumption that people who might follow in my footsteps don’t know WTF they are doing and will take an approach to debugging like klee12.

    I have used my approach and found it successful. What approach would you take with large scientific programs?

    He obviously has never done Real Programming(tm).

    Not sure what you consider to be Real Programming, but its not relevant to my posts here.

    klee12

  28. 128
    Rattus Norvegicus says:

    Didactylos,

    I think we are in violent agreement. Code should be clear. Comments should make things which are not apparent from the code clear (like I said, tricks, assumptions, conditions in which the code should be called…). Redundant comments are pretty useless and can be worse than useless if poorly written or not maintained, and I’ve seen my share (and probably written some) of those too.

  29. 129
    Thomas says:

    klee12, I think security bugs are a different class of bugs than normal software bugs. I would best that most security bugs, are bugs of omission, not considering the possibility of some mode of attack. Security versus hackers is more like an arms race, the software developer has to outsmart the attackers. Normal scientific software is designed ofr “friendly” users. Now I don’t for a minute believe most scientific software is bug free, a lot of bugs cause too small an effect of the most common solutions to me noticed. Often it is when some input tries to use the code in an unanticipated way, that they are revealed. Also there are a lot of bugs which result in using the wrong data somewhere in the code. Usually as long as the bogus values are innocuous enough, the small errors in the final solution are not noticed. But, a fresh compile may scramble the order of memory usage, and suddenly the code crashes on input it used to appear to process fine. Over time, more and more such bugs are flushed out. Ideally we would have really good tools available to flush out such things, but in practice that is rarely doable.

  30. 130
    Didactylos says:

    Rattus Norvegicus: your code doesn’t look like line noise? I’m beginning to doubt your Real Programmer credentials.

    klee12: I take it you’re not familiar with the story of Mel?

  31. 131
    Steve Metzler says:

    Guys, you’re way OT here. Yeah, I know it’s a natural thing to want to expound on your field of expertise when given the slightest opportunity to do so. My natural inclination as well.

    But please, this level of discussion belongs on slashdot. Or go read the latest comments on TDWTF if you want to bitch about the subtleties of code. There be primarily climatologists here, and code is not even secondary to what their core concerns are. It’s arguably not even tertiary. Which is what the original article is all about, actually.

    Next, someone’s going to launch a diatribe about how bad Java is. And I just couldn’t take that.

    /concern troll

  32. 132
    Robin D Johnson says:

    Re #126

    I stand by my statement that comments are useless and dangerous. All code is dangerous to change without understanding the entire context. Modifying non-trivial code is essentially equivalent to writing that code from scratch plus understanding all the code that depends on the code that will be changed. This is why most large software projects are done using an object oriented language like C++ or Java or C#. The ability to extend, encapsulate and handle exceptions is critical to the long term health of large codebases. This is a well-worn area of software engineering.

  33. 133
    Rattus Norvegicus says:

    No it doesn’t look like line noise, but like Mel, I started out writing in machine code. The first programming class I ever took was in high school and involved the use of a Wang “desktop” programmable calculator. They called it a desktop because it was, well, the size of a desktop. We had an optical card reader and they way you wrote programs for this thing was to use a #2 pencil to write the 8-bit opcodes in binary. Had a bug? Dump the program you loaded and look at the assembly output, then examine the logic, find the error and go back and write a new card, or sometimes the remainder of the deck to fix the bug, because erasing the opcode and replacing it with the correct one often caused problems for the card reader. I was pretty proud of the fact that I could get the thing to play tic-tac-toe with some alacrity. Who needs an assembler! This was in the early 70′s and, thankfully, things have gotten better.

    I have written code which needed an oscilloscope to debug (it supplied a software generated synthetic clock signal to a chip which couldn’t use the bus clock) and in the mid to late 90′s that counted as real programming. Having to use an LA to debug code during the bring up of a new computer also counts, in my mind at least since you had to read hex, know the opcodes and understand what the contents of memory should be. I do have to say that once the last bug was out of the system and you saw “login:” on the screen, typed “root”, then “password” and hit return and got a “#” prompt gave you a feeling like nothing I have ever experienced in any other type of programming. It was fun!

    I have also written lots of code in assembler for the MIPS chips which made heavy use of the delay slot to load something useful for the target instruction, a feature of the chip that the compilers usually ignored at that time. Of course, almost every line of these routines was commented and subroutine block comments contained the pseudo code (c’ish in this case) which laid out the logic of the routine. God knows it is hard enough to read your own assembly, much less someone else’s.

    So no, in the sense of Mel, I am not a real programmer. I have never written self modifying code or code which was optimized for the spin of a drum. I would hope that I never have to write code like that, although I understand that writing for the Alpha might have entailed a more modern version of such nightmares.

    But this is all neither here nor there with respect to the subject of the post. The most complex climate code seems to be reasonably well written. Steve Easterbrook has done a lot of work on this. What I fail to see is the need to publish scripts which hold to the highest standards of software engineering practice when most of the code you use every day honors these standards in the breach. a couple of hundred lines of script just aren’t a big thing, and if you can’t read and understand them with under a day of work, I have to wonder why you are asking to see the code in the first place, because you Just. Can’t. Read. Code. and having the code won’t help you understand the paper one whit.

    So while publishing the code is good or at least good publicity, it is of limited scientific value. And while I sympathize with the idea of an open repository of library code, for the most part this isn’t necessary given that most statistical analysis code is written in Matlab or R these days so this doesn’t seem to be really necessary. Finally, the Nature news article linked in this post contains a cautionary tale about sharing code. The word “retraction” comes to mind.

  34. 134
    Didactylos says:

    Steve: you are quite right.

    Robin: you managed to be absolutely wrong and absolutely right in the same comment. Well done.

    Rattus: I worship at the feet of Dijkstra, and quiche tastes wonderful when made right. But assembly is important too, and I have little respect for programmers with no clue what goes on under the hood.

    I got the impression that any “open repository” would have a much larger scope than just bits and bobs of script. I got the impression from Gavin that the primary purpose would be archival – tagged versions that can be referenced in papers with zero ambiguity. Merely having everything in one place can be an enormous help – look at CPAN.

    Our dear hosts: is this OT for Unforced Variations, too, or should we just stop now?

  35. 135
    Steve Metzler says:

    Robin D Johnson (#134) says:

    I stand by my statement that comments are useless and dangerous.

    Sorry, I vehemently disagree. If a person writes page after page of code without any comments, to me that shows a callous disregard for those that have to maintain the stuff later on. I’m a stickler for good commenting, and it even helps when you come across a problem in your own code a few months (or even years) later. Well written comments tell a third party what you are trying to accomplish. Useful pseudo code, if you wish.

    OTOH, if the comments don’t agree with the code (you change the code but don’t update the comments), then that is bad. Shame on you for not being *professional*.

    As for the rest of your post after that opening salvo… I agree with what you say 100%, Hah! Bet you didn’t expect that :-)

  36. 136
    klee12 says:

    Didactylos @130


    I take it you’re not familiar with the story of Mel?

    Oh my gosh … I had forgotten about Real Programmers.

    Sorry if I got a little testy

    klee12

  37. 137
    klee12 says:

    Thomas wrote

    I think security bugs are a different class of bugs than normal software bugs.

    Well, OK. I mentioned them because I am most concerned about those kind of bugs, but there were others. In the early days, the Blue Screen of Death on Microsoft products would indicate a bug. If you define a crash (rather than a graceful exit tries to preserve data) of an application certainly many open source applications have bugs. I guess I am just more pessimistic about significant bugs in climate science than others. When I worked a Ames Research Center we tested the results of our code by building physical models of planes and putting in the wind tunnel (now they build the plane and fly it with instruments). AFAIK there really isn’t a test like that in climate science.

    On the subject of a climate science repository, at the Ames Research Center we used a numerical analysis library called NAG. It was very well documented and you could buy a book (IIRC, Numerical Recipes) that described the algorithms. A repository made up of voluntary contributions has the following defects, IMHO, that supported packages are not supposed to have.

    1. uncertain quality control. Someone has to screen for bad code.

    2. Uncertain standards, especially on exceptions and documentation. On divide by zero, do you get a core dump or an informative message.

    3. Uncertain support. If I had a problem who do I call. What if he/she has died.

    klee12

  38. 138

    klee 137: AFAIK there really isn’t a test like that in climate science.

    BPL: Look again.

    http://BartonPaulLevenson.com/ModelsReliable.html

  39. 139
    Steve Metzler says:

    OMG, I finally got a chance to read the story of Mel that Didactylos linked to in #130. That is so precious.

    I’m going to have to turn in my Real Programmer Badge™ now. I’m not worthy. Oh wait… I used to put bootstrap instructions into a PDP-11 using the 16 binary switches on the front panel. I’m still in there with a chance.

  40. 140
    Robin D Johnson says:

    Re #135 – C’mon. This isn’t SlashDot – which I avoid like the plague because I can’t stand Flame Wars. Accusing me of being unprofessional is pretty silly – especially since it seems you didn’t read my previous post #117 for appropriate context nor have you seen the 2.5 million lines of code my team has produced – so how would you know? Well-written code DOES speak for itself. I’ve worked with tons of open source software (openssl, Apache, curl, STL, etc) and routinely have to read OS code headers/source (sockets, pthreads, file i/o, etc) to understand what they are doing and their comments were useless (and it wasn’t particularly well-written either but that’s another story).

  41. 141
    klee12 says:

    Barton Paul Levenson wrote

    klee 137: AFAIK there really isn’t a test like that in climate science.

    BPL: Look again…

    Well it depends on what type of test one is talking about. In klee@137 I wrote

    When I worked a Ames Research Center we tested the results of our code by building physical models of planes and putting in the wind tunnel (now they build the plane and fly it with instruments). AFAIK there really isn’t a test like that in climate science.

    I was talking about controlled tests that were reproducible (if you had a wind tunnel). I was talking about a high level of certainty, like Newton’s law (under certain conditions ignoring relativity). Building models which correctly predict the near future does not cut it for this level of certainty. Maybe the sun irradiance was was exceptionally strong or weak, maybe there were exceptionally long range internal cycles of something that worked for a hundred years and which may reverse. You can’t rule out other factors, some of which may be yet unknown, that might have caused agreement between prediction and result. In view of the possibility of confounding effects, the level of certainty is not has high for me as it is for others scientists. How certain you feel is subjective of course. In the case of airplane design, the number of factors (pressure, temperature, whatever) is small and can be taken into account in the CFD models. Because of that my certainty about CFD applied to airplane design is much greater than my certainty about climate science models. But that doesn’t mean climate models should be ignored; IMHO, their results should still influence real world decisions.

    Economics is another area where my level of certainty is fairly low. Economists have elaborate models that seem reasonable and usually they make correct predictions. And the models influence my real world decisions (i.e. my investing). But I don’t have as high level of faith in them as CFD models about airplanes.

    klee12

  42. 142
    Ray Ladbury says:

    Klee12,
    OK, so let’s say you are living on Mt. Merapi, and the government seismologists tell you it’s going to erupt. Do you leave? After all, you can’t do repeatable experiments.

    Or you want to know if it’s safe to dump toxic substances in a region that may flow into an aquifer. Again, no repeatable experiments.

    And the same for much of biology, medicine, ecology, astrophysics, cosmology… Do you think that maybe your lack of confidence in these fields might have to do with your lack of understanding rather than with whether they lack sufficient evidence for confidence in their results?

  43. 143
    Damien says:

    But code “documentation” is largely useless or dangerous since it is rarely correct. Comments are particularly dangerous. The code needs to speak for itself.

    Comments *are* useful to a point. I have seen beautiful code without comments and ugly code exquisitely commented. For example:

    /* See Farrell p358 Equation 10.36 */

    Is extremely helpful for anyone reading the code later on.

    Where the comments do not match the code, it is an excellent indicator that both are wrong.

  44. 144

    I particularly liked the Zeeya Merali article, which I saw before this one. I’ve already added a comment over there. The important take-away message is that for all the real flaws in climate science (nobody’s perfect), it is way ahead of biology (where I started doing research a couple of years back from a computer science background) and probably most other areas of applied science in terms of data and code quality. Sure, we all (that is those who actually studied computer science) would prefer that James Hansen didn’t use FORTRAN, but at least his code is published and available to anyone to check.

    Everyone who makes claims about the state of climate science as if it’s anomalously bad should be pointed at the Merali article.

  45. 145

    klee 141: I was talking about controlled tests that were reproducible (if you had a wind tunnel). I was talking about a high level of certainty, like Newton’s law (under certain conditions ignoring relativity). Building models which correctly predict the near future does not cut it for this level of certainty.

    BPL: Which part of “the models correctly made seventeen predictions which panned out” did you not understand? By any reasonable standard, as opposed to your standard, those models are vindicated. Period.

  46. 146

    #120 klee12

    If I might attempt a redirect for the sake of context. And mind you I’m not a programmer, though I have done a bit of schema design for software.

    Yes, bugs exist.

    But isn’t it really about the results?

    The main issue that is tossed about in the denial of human caused global warming is that the models are wrong and we are not warming, or that it is not human caused.

    Now, let’s set a side the models and start from scratch for a second.

    - We see that the Arctic ice seems to be losing volume.
    - We see the glaciers seem to be losing ice mass at what looks, from eyeballing, to be an accelerated rate.
    - We see that the frequency of stronger hurricanes seems to be increasing.
    - We see the sea level rate seems to be rising and recently accelerated its rise rate.
    - We see a lot of things that are changing in the biosystems.

    Why?

    Now, from the basic understanding of how GHG’s work, they block infrared. SO we might assume that more GHG’s means more warming, and less would mean less warming.

    There’s also that thing about interglacial cycles and Milankovitch.

    Well, we know that we have increased atmospheric concentrations of GHG’s and that information is empirical.

    Now, models.

    Lot’s of different disciplines have looked at global warming. They have used different methods, and models, and code. They all seem to be showing generally the same results.

    I will go back to what Scott Mandia said:

    Consider the odds that various international scientists using quite different data and quite different data analysis techniques can all be wrong in the same way.  What are the odds that a hockey stick is always the shape of the wrong answer?

    So while the models are likely wrong due to a number of reasons that would take too long to illustrate, they are right enough in showing replicable results from multiple disciplines.

    Making models better is great. Open code is great. But what is really important is that we recognize that we are warming, it is human caused, and we need to take meaningful action if we wish to avoid unpleasant results in the overall economy.

    Economics: Balancing Economies
    October Leading Edge: The Cuccinelli ‘Witch Hunt”

    Fee & Dividend: Our best chanceLearn the IssueSign the Petition
    A Climate Minute: Natural CycleGreenhouse EffectClimate Science HistoryArctic Ice Melt

  47. 147
    MX says:

    Is it possible to point to any studies which have been performed using open / shared code and data sets that demonstrate that the majority of climate change simulations are incorrect in their forward projections over the coming decades and that the conclusions of the last UN Assessment Report are thereby unreliable or untenable?

    In those cases where researchers have demonstrated that the shared code is based on less-than-adequate precepts, have those researchers provided an open critique of the code and made openly available their new models and simulation systems? Do those new models and simulations explain the temperature data sets?

    Where is the ‘best’ simulator currently provided openly by the climate change skeptical research community?

  48. 148

    Gavin, Eric,

    You made a strong case as to why simple repetition isn’t half as important as independent replication, at least to the science.

    But to the suspicious public (and those susceptible to suspicion) simple repetition apparently is very important. Like it or dislike it, that seems to be the current reality, and I think that is why also many supporters of science are calling for this kind of openness that allows simple repetition.

    Hey, at least the time that people spend repeating what you did, they don’t spend that time yelling at you!

  49. 149
    Ray Ladbury says:

    Bart,
    Those calling for repetition base their calls on paranoid delusions of a vast scientific conspiracy. They will not be satisfied regardless of the evidence. Scientists should not cater to idiots.

    Outside efforts to make the code work–and even to improve it–are great. Scientists should stick to doing the science rather than changing the mental diapers of the denialists.

  50. 150
    Richard Simons says:

    Bart:

    Hey, at least the time that people spend repeating what you did, they don’t spend that time yelling at you!

    I have not been following the antics of denialists very closely, but have any actually repeated or independently done a serious analysis of the data?


Switch to our mobile site