Where’s the data?

27 Nov 2009 by group

Much of the discussion in recent days has been motivated by the idea that climate science is somehow unfairly restricting access to raw data upon which scientific conclusions are based. This is a powerful meme and one that has clear resonance far beyond the people who are actually interested in analysing data themselves. However, many of the people raising this issue are not aware of what and how much data is actually available.

Therefore, we have set up a page of data links to sources of temperature and other climate data, codes to process it, model outputs, model codes, reconstructions, paleo-records, the codes involved in reconstructions etc. We have made a start on this on a new Data Sources page, but if anyone has other links that we’ve missed, note them in the comments and we’ll update accordingly.

The climate science community fully understands how important it is that data sources are made as open and transparent as possible, for research purposes as well as for other interested parties, and is actively working to increase accessibility and usability of the data. We encourage people to investigate the various graphical portals to get a feel for the data and what can be done with it. The providers of these online resources are very interested in getting feedback on any of these sites and so don’t hesitate to contact them if you want to see improvements.

Update: Big thank you to all for all the additional links given below. Keep them coming!

407 Responses to "Where’s the data?"

Matthew says

29 Nov 2009 at 11:18 AM

I am glad to read of the ccc-gistemp project, Alan Barnes above.

For an example of what is required for adequate disclosure, check out the paper by Mann et al in the current Science magazine. The supporting on line material comes to 22 MB, data and source code. It can be downloaded by any member of AAAS. For example, I downloaded it. Yes, this slows down the review process, but it makes the result more reliable, and it is required as a condition of publication by an increasing number of journals.
L. David Cooke says

29 Nov 2009 at 11:22 AM

Hey, Dr. Schmidt,

You may want to add in the SkyRad data sets that were available at arm.gov…

Cheers!
Dave Cooke
Matthew says

29 Nov 2009 at 11:40 AM

deech56: Now it so happens that I have experience with both patent applications and FDA submissions. My patents were based on material that I had published, and except for making sure I had everything in my (company-owned) notebooks in case of challenge, there wasn’t anything I put in the application that wasn’t found in my publications.

When I wrote reports in support of intellectual property, I included the source code. When I published papers, I created directories that had the source code, complete unedited output, and the data used (with a reference to the data that they had been selected from), and wrote the directory to a backup CD. The programs included references to the sources of the equations if I hadn’t thought of them myself. That was in case it was requested. Even you had supporting material in case of challenge.

As you note, there is no equivalent of the FDA in climate science. That’s why full public disclosure, exemplified by the Mann et al paper in the current Science, is necessary. It is the process of intense scrutiny of everything by skeptics that gives science its strength. This is one reason why full disclosure is increasingly required as a condition of receiving federal grant money.
Hank Roberts says

29 Nov 2009 at 11:49 AM

CER wrote:
> they didn’t pay for what is free, they paid
> for the 2% that’s not free.

CER, you misunderstood: some countries sell their meteorological data to businesses; the same data is shared with researchers who agree not to give it away — so they don’t undercut the market for selling it to businesses.

Businesses buy such country weather records to do business in that country; knowing growing season, rainy season, things like that affect business plans.

> if they are that good at the science
> they will be in jobs doing the science.

Or really, really wish they were: http://www.xkcd.com/664/
I watched a really clever Excel expert fall out of her chair laughing upon seeing that particular one.
caerbannog says

29 Nov 2009 at 11:56 AM

A forum for crank research? Yeah, there’s a crying need on the web for that.

Actually, I don’t expect to see *anything* like that from the skeptics here. Even “crank” research would be too much work for those guys.

However, a link to a blank page listing all research proposals/results from the folks who have been demanding “more data” might be a useful rhetorical tool. ;)
Jeffrey Davis says

29 Nov 2009 at 11:57 AM

“All we need is the ability to do

c:>runAnalysis.exe input.dat output.dat”

Well, no. That says nothing except that the code, given A, produces B. Who knows what that means? At a certain point, and it’s reached early on in the scientific process, a degree of literacy in the field, is needed. The code is a abstraction, a facsimile, of the process described in the study. It isn’t the process.

I wouldn’t presume to any interest in the code and data resources being assembled here since I wouldn’t know if the data were apt for the programs or if the processes studied were apt for the science.

There’s a huge element of bunkum in the request for data and code. If you were a scientist, you’d know where the data were and you would know how to code to test the hypothesis in the study. You’d know roughly if the hypothesis described was worth the effort to study and if the experiment was properly designed to test its hypothesis. The code is more akin to an illustration than substance.

It’s a lot like a dog chasing a school bus except the dog knows he doesn’t know how to drive.
Hank Roberts says

29 Nov 2009 at 12:11 PM

Ray Ladbury says: 29 November 2009 at 8:47 AM
> Forrest Mims … Gavin … sparing us the screed

NOT

Ray, you were confused on two points there.

Forrest Mims posted in the thread early on; the full text is there.
Gavin did nothing to what Forrest Mims posted. It’s all there.

“ccpo” quoted _part_ and said he/she had “deleted” [WRONG WORD] ‘a screed’
Hank Roberts says

29 Nov 2009 at 12:14 PM

(testing the spamfilter)

—> omitted, elided, left out <— [RIGHT WORD for omitting something]
Henrik Nerbo says

29 Nov 2009 at 12:24 PM

Im getting really tired of these deniers. Its as if a global conspiracy engulfed the world. They are simply conspiracy theorist!

And they also hide themselves. If you truly believe in this conspiracy why dont you show yourselves by name? I wish there was a badge these denialists were forced to wear on their breast!! Like a bright yellow fluorescent D, for denialist, or something.
The Bert Man says

29 Nov 2009 at 12:46 PM

If I pay taxes, then my meteorological services at all levels should provide me with the data for free as a citizen.
David says

29 Nov 2009 at 1:06 PM

Hank, BPL, Keven,

Thanks for the explanations you’ve given – so basically, the ‘missing’ ocean heat was never really missing and only thought to be missing due to bad data that’s since been corrected. Is that right?

Mark A. York, since you inquired as to why I asked you the question, it was to see if you knew as much as you presented yourself as knowing. Since you felt the need to smear the editor of the article to cover for not being aware of the ‘missing heat’ a puzzle which, as you note, made the headlines, you might want to go back and see the scientists he quoted and the substantial portion of the article that consisted of direct quotes from those scientists. At the time the article was written, it was genuinely a puzzle. It was good reporting by the NPR.
donQ says

29 Nov 2009 at 1:11 PM

David (others), GISTEMP code doesn’t do much, regardless of its faulty structure and implementation. (Harry, or whomever, might very well go crazy trying to fix this code.)
Out of all the people here who have downloaded I doubt there is even one who has managed to run this software to its completion, I doubt that even Gavin is using it for anything. Then, after it has been run, I don’t see how to verify what the output is.

It is great that someone is working (from scratch) to make things better, presumably rewriting the whole thing.

@ Christopher Hogan,
in GISTEMP_sources there isn’t a single html file and the code is not well documented. I’m not sure what you are looking at, can you provide a link?
Petro says

29 Nov 2009 at 1:31 PM

If I pay taxes, my intelligence agencies at all levels should provide me with the data as a citizen.
Mark A. York says

29 Nov 2009 at 1:33 PM

“Since you felt the need to smear the editor of the article”

David what’s chip on your shoulder about? I have no idea what you are talking about. I know what I know from study. It’s not all-encompassing though.
Chris Auld says

29 Nov 2009 at 1:48 PM

Gavin: We discussed this above. The replication that is required in an observational science like climate is the replication of the conclusion.

I am baffled by the claims that making code available is neither standard practice nor desirable. In the social and life sciences, anyways, it is standard practice at many journals to require that code be made publicly available, and hopefully this standard will become universal in the next few years. There are several high-profile examples of coding errors that have been caught only because authors made code available. Making code available also makes it completely transparent what steps were taken to clean or otherwise process data, and similarly clarifies the statistical analysis.

It isn’t that either “replication of the conclusion” by recreating a study from scratch is important or verification is important, it’s rather that both are important. As far as I am aware, that is not a controversial opinion in any quantitative discipline.

[Response: I never said it wasn’t desirable. The maximum openness possible is certainly a goal worth striving for. However, you may be mistaken in thinking that a) it will change many of the skeptics mind or b) reduce the number of attacks on climate scientists. Nor will it make much difference (if any) to the robustness of conclusions. It might reduce the spin-up time for new people into the field, but access to code is not a substitute for knowing what to do with it. You will still learn much more by doing something yourself than in starting off with someone else’s code. Much to many people’s disappointment, global warming is not caused by an arithmetical error. Just ask the glaciers. – gavin]
Hank Roberts says

29 Nov 2009 at 1:50 PM

Petro says: 29 November 2009 at 1:31 PM
> If I pay taxes, my intelligence agencies at all levels
> should provide me with the data as a citizen.

Chuckle. Let us know, when you ask them to deliver it, how that works out.
Hank Roberts says

29 Nov 2009 at 2:03 PM

David says: 29 November 2009 at 1:06 PM

> the ‘missing’ ocean heat was … due to bad data that’s
> since been corrected. Is that right?
…
> Mark York … I asked … to see if you knew as much …
> the need to smear the editor of the article

Follow the science. There’s no single point at which it’s “right.”
Posting faux-naive questions with outdated information to try to test your fellow reader wastes everyone’s time who bothers to try to be helpful here.

You’re using a common name as a userid — whoever you are, people won’t know if the next “David” along who asks a naive question is from you again, playing games, or from someone with an honest question.

I think I need more coffee. I’m grumpy this morning.
Anne van der Bom says

29 Nov 2009 at 2:04 PM

The Bert Man
29 November 2009 at 12:46 PM

I pay all my taxes. An extract of the population register is not free. Requesting MY OWN name & address costs money!

I do not see any harm in letting the people who use the data pay for it. After all, most is requested by companies that make money of the data. Why should government hand it out for free so others can make money of tax-payer funded data?

Damned if you do, damned if you don’t.
Joel Shore says

29 Nov 2009 at 2:15 PM

Chris Auld (#216) says: “I am baffled by the claims that making code available is neither standard practice nor desirable. In the social and life sciences, anyways, it is standard practice at many journals to require that code be made publicly available, and hopefully this standard will become universal in the next few years.”

In the areas of physics that I have worked in, it is quite unusual to make the code publicly available. In fact, I have published papers where my corporate employer would never allow me to make the code publicly available (nor even give the code to a specific person who asked for it). Certainly one effect of a journal requiring one to make code publicly available would be a severe drop-off in submissions from people who work in corporate R&D labs, which may not be too relevant for climate science where a fairly small fraction of the work is done in not done in the corporate world but would be for other fields. There would likely be a huge drop-off in submissions for journals like Applied Physics Letters.
Frank Tuijnman says

29 Nov 2009 at 2:27 PM

In a lecture (sept 2009) at the University of Strasbourg Vincent Courtillot (director of the Geophysics Institute of Paris), mentions that he has asked CRU for several years for the data underlying the error estimate of the air temperatures above the Ocean in the period 1850 – today. He claims that this data was never given to him, despite repeated requests. Can someone point out where on the Internet he can find this data?

The lecture (in french) can be watched on dailymotion.com . The particular claim is between minute 13 and 15 in the first part (of six).

[Response: It would almost certainly come from the ICOADS data (which is the collection of all the temperature measurements by ships and buoys). This is processed by the Hadley Centre to produce the HadISST gridded product, and their papers would be the ones to read on how the uncertainties are estimated. In neither case has this anything to do with CRU. – gavin]
David Wright says

29 Nov 2009 at 2:48 PM

“214.If I pay taxes, my intelligence agencies at all levels should provide me with the data as a citizen.”

Are met workers in danger of having their covers blown, losing their lives over a disclosure?
cer says

29 Nov 2009 at 3:11 PM

Hank Roberts wrote:

CER, you misunderstood: some countries sell their meteorological data to businesses; the same data is shared with researchers who agree not to give it away — so they don’t undercut the market for selling it to businesses.

Sorry, I should’ve been more specific – I wasn’t sure if the extra data CRU used was only from met offices or also from private/corporate sources. If CRU got it for free then I guess it was the former. Certainly when we did a regional study on Siberia we obtained a dataset from some Russian source, which we definitely had to pay for (but then it was partly for use with an industry project so I guess it didn’t qualify as purely academic purposes).

Chris Auld wrote:

I am baffled by the claims that making code available is neither standard practice nor desirable.

Well I can’t claim to be expert on every field of science, but it’s definitely not just climate science where making code available isn’t standard practice. I used to work in a particle physics lab and certainly there was no “open-source code” flying around. It worked exactly the same was as climate science – if you wanted to replicate someone else’s work, you read the relevant publications, understood their methodology and then wrote your own code to do the same thing. In fact my first supervisor wouldn’t let me have a copy of his own code at first, because he wanted me to check that his method was working correctly, so he had me write my own *completely independent* code to do the same calculation, because that’s the only way of really being sure the results are sound. No amount of “verification” or bug-fixing can provide the level of validation that a new, separate analysis can.
Chris Auld says

29 Nov 2009 at 3:13 PM

RE: post #215: Gavin, I agree with you that making code available is unlikely to reduce baseless attacks, substitute for substantive knowledge, or overthrow results that have been arrived at from multiple research strategies.

But I disagree that making code available has little or no effect on the robustness of particular results—as I noted, there are famous examples of specific results in the primary literature which were artifacts of coding errors. Making code available implies complete transparency in methods, allows others to catch coding errors, and perhaps reduces the frequency of such errors in the first place.

For these reasons many journals and societies in the observational and other sciences either require or encourage authors to make code available. I think you underestimate how important it is to make code available in general, not just in the context of climate research.
Ron Broberg says

29 Nov 2009 at 3:22 PM

@donQ#212: Out of all the people here who have downloaded I doubt there is even one who has managed to run this software to its completion…

I have just today run STEPS 0, 1, 2, 3, and 5 on a Ubuntu 9.10 x86 laptop.

Step 4 is an optional step where surface data is combined with sea surface data. But the sea surface data is BIGENDIAN, and requires special compiler flags to run on linux/x86 computers. I hope to crack that nut soon.

After generating my own output, I have compared it with the official output of the file located here:
http://data.giss.nasa.gov/gistemp/tabledata/GLB.Ts.txt

There are some differences in the last significant digit. I will explore these as time is available. This is a hobby. I suspect that some of the differences have to do with variations in precision in various compilers on different processors.

DonQ don’t generalize your own lack of skills. I am nowhere near the first person to accomplish this. There are more than a few similar successes reported on various blogs. I am working on creating a tarball and instructions so that anyone capable of running a Linux workstation can duplicate this “feat.”
David B. Benson says

29 Nov 2009 at 3:28 PM

Benjamin (176) — Huh? I’m not even sure what “sockpuppetry” is…
Brian Dodge says

29 Nov 2009 at 4:00 PM

** I’m in the top ten percent of wage earners (in my country -USA), and we pay more taxes than the lower 90% combined, and I demand that access to the data, and lawmakers, and government regulators, and university researchers, and other policy makers be apportioned according to how much we pay in to the system. Now that I think about it, I already get that. **
** = air quotes. I just made most of that up, and explicitly included “air quotes” and this explanation for the potential parody challenged readers. I did make a 5 figure donation to fund an ongoing fellowship at a local University, and I get a Christmas card from the dean each year; he either answers or return my occasional phone call. I once used a free “student version” of commercial modeling software, and wanted to include some materials properties that weren’t in the free version’s data file. I wrote the company asking if the material properties were available to me, and politely explained what I was using it for – basically farting around for my own edification and enjoyment – and that I wasn’t interested in and didn’t need their full, commercially valuable, material database. I received a polite reply that the only data that could be supplied to users with a student license was included with the free download. They also noted that the files were updated regularly, and perhaps I should download the latest free student version of the material data file, and gave me the link to do so. When I downloaded the (3 day) newer version of the material property file, I found, **no doubt entirely by coincidence**, that the material I was interested in had been added to the latest data. I wouldn’t want to imply that Dr. Jones did give access to data that was restricted under commercial license to people outside the CRU, but it wouldn’t surprise me if some emails that were selectively omitted from the hackfile showed that the **”hidden, secret, now destroyed, smoking gun that disproves global warming”** data were shared with independent scientists who didn’t start off with accusations of fraud, conspiracy, hippy liberal eco-nazi soc – ialist power grabs, and that Al gore is fat.
Google search results
“Results 1 – 10 of about 313 from climateaudit.org for fraud. ”
“Results 1 – 10 of about 361 from climateaudit.org for conspiracy”
“Results 1 – 10 of about 63 from climateaudit.org for gore fat. ”
“Your search – hippy liberal eco-nazi site:climateaudit.org – did not match any documents.” (do I have to put ** around EVERY exaggeration?)

In the interest of full disclosure, from WUWT discussion on google hits –
“However, the point of this blog entry is still with merit and significance. Google has historically made pro-AGW material much more accessible via their system than material which challenges the IPCC, AGW supporters, and Gore.”
“I think it’s crystal clear that having climategate ”go missing” in the Google suggestions list was very definitely and deliberately ”thought about”, from the partisan AGW agenda perspective.
As another WUWT commenter astutely pointed out on another thread:
Al Gore sits (or at least used to sit) on the Google BOD. ‘nough said.”
dhogaza says

29 Nov 2009 at 4:11 PM

Are met workers in danger of having their covers blown, losing their lives over a disclosure?

Professor Jones was given police protection due to threats received after the hacked e-mails were released.
Mike of Oz says

29 Nov 2009 at 4:13 PM

With this avalanche of climate data (most of which was already available on the web), I am going to hold my breath while waiting for the first sceptical re-analysis which conclusively demonstrates how it was manipulated and the results fabricated.

Is this a good idea, do you think?
Pete W says

29 Nov 2009 at 4:47 PM

My state in the USA used to complete information requests free of charge. But too many crackpot groups abused the system. So now requestors must also pay expenses. Requests for information have gone way down.
Matthew says

29 Nov 2009 at 5:15 PM

Gavin: Much to many people’s disappointment, global warming is not caused by an arithmetical error. Just ask the glaciers.

Yes, but the forecasts for the future may be based on some sort of programming error.

However, you may be mistaken in thinking that a) it will change many of the skeptics mind or b) reduce the number of attacks on climate scientists.

I think that you underestimate the effect that openness will have among the skeptics, though you are surly right if you mean that the “deniers” will be unaffected. It seems (your information may be different) that the ranks of the skeptics have grown in the last few years, and especially in the last few days. A major effort to win back the skeptics might be worthwhile.

Nor will it make much difference (if any) to the robustness of conclusions.

That presumes that there are no mistakes or irregularities. Only a completely open review can establish that, I think, in light of the hacks.

Thorough open scrutiny of everything by skeptics isn’t fun for the participants, but it is one of the principal strengths of scientific research.
Phil M says

29 Nov 2009 at 5:17 PM

Thanks for putting up all the links to the info, code & data
– it’s a useful repository

From what I’ve seen of the Giss Model E, it seems well documented, and the code well written & commented
– so I’m not sure what the complaint above was about..

Perhaps you could encourage CRU to adopt a similar coding & documentation process, and then publish some of their work in the web so that no one would be in doubt about which bits of code were run for which results etc!

As an aside, I would have thought an OO language such as C++ would be great for climate/weather models, as you could describe the atmosphere boxes as objects, each with their own state variables, and functions to process the data for the next timeslice.
– I guess converting 100K lines of Fortan to another language is a no-go though – but it might not be as hard as you think!
David B. Benson says

29 Nov 2009 at 5:46 PM

Matthew (230) — All whatever the number, 23?, GCMs all give very close to the same answer. Amazing they all contain exactly the same programming error, is it not?

In fact, using just a single equation and assuming BAU, I can do quite a good job of predicting future temperatures; I’ve checked this against historical data. Amazing, is it not?

I leave out feedbacks thought to be accelerating in the future; I doubt these are in the GCMs yet either. But none of that is strictly necessary as one can use paleodata with profit here; Mark Lynas has done so in his research leading to his book, “Six Degrees”. Here is a link to a review:
http://www.timesonline.co.uk/tol/news/uk/science/article1480669.ece
Jason says

29 Nov 2009 at 6:02 PM

In the late 80’s and early 90’s I did ad hoc data analysis and developed a major new database we needed. My comments were minimum in the ad hoc analysis, because they were modified versions of other programs I wrote for a different, one-time data request. The reports I generated periodically did have more comments so that I could eventually hand it off for someone else to run. The edits, displays, etc. for the database I wrote were well documented, because these programs were run by users at the entities required to report the data. The code was available to the users of the system so they knew exactly what we were checking. I also wrote a three inch Data Dictionary (with error code definitions). The database took about three years before it went into production. Most of the ad hoc reports took from 5 minutes to an hour for the more difficult ones, though some complex questions could take several hours from start to printer.

To have heavily commented my ad hoc programs and data tools would have been a waste of time and my efficiency would have been dismal. The edits, displays, and other user tools for the database used script files in the form of \runAnalysis entity abbreviation, with a predetermined input file name format so that the documentation could be easily followed even by a computer illiterate PhD. These added layers for a simple user interface, user documentation, comments and meetings increased the development time by at least of a factor of 10 compared to bareback programs.

These are scientist trying to analyze data in a timely manner. To require a c:>runAnalysis.exe input.dat output.dat” end product would be a complete waste of time needed for other projects. Having said this, I almost cried when I read the HARRY_READ_ME.txt. I once had to comment and update an old program (which contained no comments) written by my boss’s boss – he was trying to hand it off to me…. It was a nightmare similar to the HARRY_READ.ME file. Some commenting is necessary when others will need to access to the code.

I really do feel for the people whose email was made public, it’s just wrong. I’ve read a very small percent of the email and only have one problem with what I’ve read so far: these scientist are suppose to be concerned with AGW, yet they seem to spend half their time jetting around and around the world then write about it almost as if there was a competition (i.e. whose frequent flyer account is bigger). Does CRU or the other organizations have any type of offsetting program to balance the CO2 produced by these frequent intercontinental flights?
Rod B says

29 Nov 2009 at 6:04 PM

Tom Franklin (93), but if they had actually done that Madoff would now probably be a free man…
David Kane says

29 Nov 2009 at 6:24 PM

For this to be an intellectually honest exercise, you would want to list all the major data/models that went into IPCC. Then, provide links to the data/models that are available and explanations/requests for the ones that are not.
grumpy software architect says

29 Nov 2009 at 6:28 PM

There are some severe misapprehensions about both software quality and about scientific method. In both cases, commenters seem to think that there is ONE approach.

Software quality can be achieved in a few ways, only one of which is to try to make a single piece of software as perfect as possible. Its a nice, simple approach that you can see in the comments, more or less “lets have everything and we will run it and find some bugs”, but in fact the bugs are just as likely to be in the choices of smoothing techniques, in the assumptions about errors or somewhere in the basic algorithms and software review will never find those. Indeed, review of code, even testing of it, has a characteristic defect discovery efficiency that is never 100%, so even the best code review and testing does not eliminate all bugs. My favorite program, one that is run hundreds of thousands of times all over the world every day, is two lines long and has had three defects raised against it in 45 years. Was that a failure of code review? Version one of the program had ONE line of code (its IEFBR14 for the dinosaurs out there who will recognise the name).

An alternative approach, one that is used in the software industry where extremely high reliability is required in very complex environments, is also the fundamental approach in published science, is what is called a voting strategy, where independently developed systems evaluate the same data and compare results. The winning answer is the one that is reproduced independently by multiple systems. Sounds awfully like what happens in scientific journals and conferences. The choices people make about what they try to reproduce are driven by factors other than formal methods, they are driven by interest, by intuition and by ego (there are lots more).

The constant whining about conforming to THE scientific method is begging the question, assuming a weirdly naive Popperianism, when science is much messier than turn-the-crank fantasies about scientists collecting data in some completely wide-eyed-innocent way, evaluating it with no preconceptions, and being surprised by what pops out. We might be able to write genetic algorithms that work that way on complex data sets (I actually think that that is likely to be a fruitful approach in some areas of computing like system management) but science is not like that. Its just too wildly inefficient for people to work that way, people have to make choices about what to focus on, and the successful ones do not choose to focus on the low-level stuff of commenting their code or making it easy for idiots to run, they are trying to get a result that they can feed into the science process: inventing, competing, reviewing, and re-creating others results.
Deech56 says

29 Nov 2009 at 6:29 PM

If I may make a suggestion, any chance for some coverage of that climate paper recently published in Science? As necessary as it was and as much as I appreciate that RC has turned into the primary place to go for the discussion of the CRU hack, it might be fun to get the inside scoop on some real data and give Gavin a well-earned break.
donQ says

29 Nov 2009 at 6:52 PM

@Ron Broberg:224
Ron,
you seem confused about what data and code means.
You claim that you must use special compiler flags to deal with big-endian data .. however Big-endian data can be used without problems on little-enidan machines regardless of compiler flags — and compiler flags are used to change how the code is interpreted/compiled. Your message is so confused, while attempting to use basic computer-science terms, that I’m inclined to think you have not done anything at all and are just trolling.

(btw, within this thread you are, so far, the first one to claim to have ran this software with some level of success … congratulations. You just have to make it a bit more believable.)
Ben says

29 Nov 2009 at 6:53 PM

Quick question guys.

This article has raised concerns for me http://www.timesonline.co.uk/tol/news/environment/article6936328.ece, just wondering what the stance is on this, specifically “We do not hold the original raw data but only the value-added (quality controlled and homogenised) data.”

Is this true?
Duae Quartunciea says

29 Nov 2009 at 7:02 PM

Here is a link for the ETCCDI Climate Extreme Indices dataset.

This is particularly relevant to the matter of sharing local station data from many different countries which have different regulations and expectations for the sharing of their data. See especially the following paper (currently open access):

Thomas C. Peterson and Michael J. Manton (2008) Monitoring Changes in Climate Extremes: A Tale of International Collaboration, in BAMS 89(9) Sept 2008, pp 1266–1271, doi:10.1175/2008BAMS2501.1

In this case, rather than store complete station records, which would have been a problem for many countries, data is given as major climate indices. The description says this is a global land-based climate extremes dataset produced through the coordination of the ETCCDMI. It comprises of 27 indices of temperature and precipitation computed from daily station data using the RClimDex software. Data is in the form of zip files for each participating country, plus a gridded version and pointers to related sources.
Ray Ladbury says

29 Nov 2009 at 7:07 PM

Matthew, there are no true skeptics left–merely the ignorant, the wilfully ignorant, the denialists and the wingnuts.

Dave Benson’s point is important–you have 23 models, all developed independently, which agree to a very large degree. This is strong evidence that the models are properly coded and have more or less the right physics.

If each team had had access to the others’ code we could not with confidence conclude that the models were independent, and your supposition about a programming error could not be dismissed as easily. To me, I see this as a strong argument for not sharing code or making code public. Independence is an essential criterion for most statistical analyses–and for good reason.
Hank Roberts says

29 Nov 2009 at 7:16 PM

In reply to: Ben says: 29 November 2009 at 6:53 PM
> Quick question

Ben, Google:

http://www.google.com/search?q=We+do+not+hold+the+original+raw+data+but+only+the+value-added+%28quality+controlled+and+homogenised%29+site%3Arealclimate.org
Ron Broberg says

29 Nov 2009 at 7:36 PM

@donQ#238: I’m inclined to think you have not done anything at all and are just trolling.

Well, at least we understand each other – I’m also inclined to think that you have not done anything at all and are just trolling. It’s probably pretty obvious to most of us, in fact.

Or have you in fact contributed something to this thread that I missed?
Matthew says

29 Nov 2009 at 8:25 PM

237, deech56: If I may make a suggestion, any chance for some coverage of that climate paper recently published in Science?

It’s is not easy to address in the short comment format of blogs, except for a few points. I commented that it illustrated what ought to have been done all along, namely putting everything together in the Supporting Online Material for independent, public skeptical review. Also, they have decided to give a new name to the Medieval Warm Period, namely “Medieval Climate Anomaly”, without justifying the claim that it was “anomalous”. Also, it appears that the GCM models that they tested were unable to reproduce the difference between the Medieval Warm Period and the Little Ice Age, perhaps calling into question the accuracy of those models.

Since all (23?) of the GCMs failed to predict the apparent end/reduction of warming since 1999, and since other studies show that they can not reproduce any recent changes, it would seem that all of them are suspect. Of course it is possible that they all contain a common core of flaws.

241, Ray Ladbury: Matthew, there are no true skeptics left–merely the ignorant, the wilfully ignorant, the denialists and the wingnuts. Well, that’s one opinion. Mine is, as I stated, that the number of true skeptics has increased.
Hank Roberts says

29 Nov 2009 at 8:33 PM

Wai’minnit here:

A few days ago new poster “donQ” was suggesting Gavin was hiding the code;
now “donQ” 29 November 2009 at 6:52 PM suggests Ron Broberg 29 November 2009 at 3:22 PM might be a liar or troll for a post about partial success running the code, partly refuting donQ’s suggestion at 212 that nobody’s succeeded.

Ron, you could cite/point to reports of success; that’d let others check the fact. Otherwise it’s just an argument by assertion on both sides.

I’ll leave the question about big-endian for the programmers; surely there’s somewhere else to pursue that.

Useful: http://www.catb.org/~esr/faqs/smart-questions.html
(Ironic–on climate he doesn’t practice what he preaches–but I appreciate good preaching despite preachers’ personal peccadilloes in practice.)
Blair Dowden says

29 Nov 2009 at 9:15 PM

In #241 Ray Ladbury says there are 23 independently developed climate models. I do not think that is entirely true – I understand many of them are derived from a smaller set of older models, and groups share algorithms and even code. I don’t know how one could quantify the degree of independence. In any case, I disagree with the case for not disclosing code and data, and apparently so does Gavin in his recent comments.
Steve says

29 Nov 2009 at 9:30 PM

My set of links compiled a few months ago on availability of source code for *all* the GCMs used in IPCC AR4:
http://www.easterbrook.ca/steve/?p=667
Philip Machanick says

29 Nov 2009 at 9:59 PM

Gavin, there’s sea level and ocean heat content data here: http://www.cmar.csiro.au/sealevel/sl_data_cmar.html (from a variety of international sources).

Thanks for making all this stuff easier to find.

This maximum openness demand is marvellous, especially as many in the denial camp are free market extremists who believe in closing government down (the same people who are pressuring academia to be more like business, and charge for everything on a commercial basis).

Although I understand why CRU people may not have felt motivated to supply data to openly hostile denialists, this is a matter of damned if you do, damned if you don’t (remember the huge fuss made over the correction to GISStemp that had no significant effect on the overall trend)? On balance the NASA approach of making everything freely available is better because it helps those genuinely interested in the science.

What I find particularly amusing about the “hide the decline” issue is that we’ve known for more than 10 years that tree ring data diverges from the late 20th century temperature record. If the denialists had evenly marginally competent scientists on their side, they could have used this openly known fact to sow confusion. So much easier than making up garbage statistical methods.
PeterW says

29 Nov 2009 at 10:08 PM

“[Response: And I would have got away with it too, if it hadn’t been for those pesky kids…. ;) – gavin]”

Thank you Gavin, that made my day :-)
Ken W says

29 Nov 2009 at 10:19 PM

Re: 210
“If I pay taxes, then my meteorological services at all levels should provide me with the data for free as a citizen.”

If that’s the way you feel, then be prepared to have your income taxes tripled or quadrupled! Civil Servants are hired for specific jobs to help various elements of the government to function. There are chains of commands for each one (up to Congress, or the White House). These chains do NOT pass through every tax paying citizen, just because they think their personal request should be the top priority of the entire government. You have the right to vote, if you don’t like how government functions, but you do not have the right to dictate how Civil Servants spend their time. That comes from the top down.