Where’s the data?

27 Nov 2009 by group

Much of the discussion in recent days has been motivated by the idea that climate science is somehow unfairly restricting access to raw data upon which scientific conclusions are based. This is a powerful meme and one that has clear resonance far beyond the people who are actually interested in analysing data themselves. However, many of the people raising this issue are not aware of what and how much data is actually available.

Therefore, we have set up a page of data links to sources of temperature and other climate data, codes to process it, model outputs, model codes, reconstructions, paleo-records, the codes involved in reconstructions etc. We have made a start on this on a new Data Sources page, but if anyone has other links that we’ve missed, note them in the comments and we’ll update accordingly.

The climate science community fully understands how important it is that data sources are made as open and transparent as possible, for research purposes as well as for other interested parties, and is actively working to increase accessibility and usability of the data. We encourage people to investigate the various graphical portals to get a feel for the data and what can be done with it. The providers of these online resources are very interested in getting feedback on any of these sites and so don’t hesitate to contact them if you want to see improvements.

Update: Big thank you to all for all the additional links given below. Keep them coming!

407 Responses to "Where’s the data?"

Mark A. York says

28 Nov 2009 at 10:21 PM

Excellent article Kevin KcKinney. Thanks! That sure explains the ocean warming/cooling dichotomy and is a great example of how science is self-correcting. Our contrarian friends stay with the headlines, which can be written by editors who know little about the story. And thanks Gavin for that last answer.
Jackie says

28 Nov 2009 at 10:30 PM

I keep a directory of online scientific data resources at Element List (http://www.elementlist.com/lnx/scientific_data/).
Jimi Bostock says

28 Nov 2009 at 10:31 PM

Thanks David B. Benson, I am having a look now. Of course, much of it is beyond me. It is not a bad thing to admit, I am just a normal guy trying to make sense of it all.

But may I say that I am heartened that my comments were posted. I think your site has a wonderful opportunity to play a role in the healing.

I am certainly lobbying within the so called denier camp that we should refrain from using words lile “alarmist”. I think you should try to stop using the word “denier”.

I suggest this not just from a human perspective but from a PR angle. Yes, I know that is not a wonderful angle but it is vital. I simply think that using that word is going to rebound in light of the CRU emails. I think it will be too easy for us to turn that around and make it about the denials that are revealed in the emails.

To me, that would be a bad outcome. I liked someone saying that this is not about winners and losers.

So, all power to you Gavin. I am urging you to play a big part in changing the tone. SOme of your own posts here show that you are trying but you are slipping into old ways.

Peace
Jackie says

28 Nov 2009 at 10:32 PM

I keep a large list of online scientific databases at Element List (http://www.elementlist.com/lnx/scientific_data/).
Susanne says

28 Nov 2009 at 10:40 PM

#50 Jimi, I found this site a couple of years ago after seeing the Great Global Warming Swindle, and I’ve followed the climate change discussions, both scientific (as best I can) and political, quite closely ever since.

I suggest you look back over some of the years of posts here and get a sense of the culture and of events from before the email hack.

Over and over again I’ve seen scientists answer the same questions and self-described “climate skeptics” ignore the answers. Over and over again I’ve seen “climate skeptics” attack the motives and integrity of scientists and ignore the content of their work. Over and over again I’ve seen open discussion and dispute about details of climate science distorted by “climate skeptics” into statements that global warming is all wrong.

I don’t mean to suggest that you’re a phony skeptic like most of the bloggers I’ve read. All I’m suggesting is that you ask yourself how you might talk to colleagues or friends about requests and attacks from people who had demonstrated bad faith many times over.
Mike of Oz says

28 Nov 2009 at 10:58 PM

Gavin, I DEMAND that you teach me how to be a climate scientist IMMEDIATELY, so that I can expose this giant con you are peddling!

Seriously though Gavin, that you actually still appear to be sane and rational after reading some of the recent comments is nothing short of a miracle.

I am eagerly awaiting the stony cold silence which will accompany the first competent (if that’s not wishful thinking) sceptical “re-analysis” of the data.
Bill S says

28 Nov 2009 at 11:09 PM

For anyone who knows anything about science, the refusal to release data, codes, methods, proxies, all of it, until forced to do so is hard to comprehend.

The excuse is that CRU has confidentiality agreements with the providers of data, but that data sets that are public correlate 98% with the data set used by CRU.

This brings up two issues. One, if the data sets are substantially the same, why did CRU pay for something that is free elsewhere? Two, why would anyone involved in the search for the unbiased truth sign a confidentiality agreement that would prevent others from having access to the raw data, along with all of the “value added” statistical adjustments?

As a corollary, when physicists were debating competing theories for origin of the universe (big bang vs steady state) one side did not make information unavailable to the other, and the debate between the two views was every bit as contentious as this one. Ultimately, the overwhelming evidence proved big bang…until something else comes along.
Bernie says

28 Nov 2009 at 11:34 PM

This story from the London Times forthrightly states that the raw data that Phil Jones and his team depends upon has been lost.
http://www.timesonline.co.uk/tol/news/environment/article6936328.ece
Can you clarify whether this is true or not.
Thanks

[Response: No. The original data is curated at the met services where it originated. – gavin]
Alan of Oz says

28 Nov 2009 at 11:41 PM

Re #67 – “Alan of Oz brings up a great point, code and data availability isn’t going to help.”

To clarify my point, access to data and methods is required for reproduction, access to code is irrelevant.
Ken W says

29 Nov 2009 at 12:01 AM

“I’m no scientist and I have no understanding about the numbers. But I can tell when someone is BSing me.”

Really? If you don’t understand the science or math, than how can you possibly know who’s BSing who? You can’t. You probably inherently trust someone merely because they agree with you in some unrelated realm (e.g. politics or religious). Sadly that’s not a very good mechanism for determining the truth.
If you lack the science and math to understand this complex field, then the rational thing to do would be to accept what the vast majority of practicing climate scientists are saying (unless you buy into some silly conspirasy theory or discount all of them as liars).
Patrick Caldon says

29 Nov 2009 at 12:12 AM

ccpo –

Not sure what I wrote that justifies the long winded reply. I pretty much agree with much of you wrote, but I might employ a less strident tone to express it.

I’m really not sure where I said anything fundamental has been done in error.

Indeed my criticisms of the “climate change project” would boil down to saying that the sea-ice modeling looks pretty conservative, and I’d like to see lots more work on biofeedbacks which look poorly characterized. I’m really interested in whitecap albedo, and the papers I’ve read on this have not been clear. Further I’m not sure why there’s such an emphasis on all this dendrochonology stuff – to me it’s academically interesting but at the end of the day doesn’t really answer the more fundamental question: “How does emitting a bejesus-load of GHG alter the planet?”

That’s just my amateur judgment. None of this says it’s a good idea to emit a bejesus-load of GHGs. At the end of the day not much turns on it. If you disagree I’d love to hear why.

My point was that linking batches of code is pretty useless as a pedagogical activity, and I learned a lot more from reading a textbook. Of course I don’t think what Gavin is doing is a pedagogical activity – it’s more making a rhetorical point, and not a bad one at that. But pedagogy never hurts, and a few textbooks might be a nice addition to the site. I said the Washington and Parkinson’s text is pretty good. As a suggestion to the site I said it would therefore be nice to have a few textbook references on the “books” page.
Mac says

29 Nov 2009 at 12:17 AM

Gavin,

I saw your link to the NSIDC but to make collection a bit easier here is the direct link to the submarine upward looking sonar data.

http://nsidc.org/data/g01360.html

Mac
Joe V. says

29 Nov 2009 at 12:21 AM

Is the data raw or corrected. If it is corrected, where did the corrections primarily come from, and where would one find the raw data with a listing of the corrected algos( if aplicable)?

Thanks!
Matthew says

29 Nov 2009 at 12:21 AM

What’s required is that the data, metadata, program source code and program output for each published paper be put together for review by skeptics and allies alike. Then the code can be run on the data, the output confirmed, and any mistakes identified. Then modifications to the programs can be run, by skeptics and allies alike, and the differences caused by the modifications can be clearly identified. That way, assumptions that are made, and decisions that are made, can be clearly shown to determine, or not determine, specific claims made about the results. Put differently, for each published paper there needs to be a complete and accurate data audit trail.

That is what ClimateAudit has persistenly requested. What RealClimate has supplied instead is a direction such as “We used the data at such and such a website.” That would never work for a patent application, or FDA approval.
Tim McDermott says

29 Nov 2009 at 12:48 AM

donQ says:
28 November 2009 at 6:45 PM

Gavin,
your code (as found here http://data.giss.nasa.gov/gistemp/sources/GISTEMP_sources.tar.gz) is rather flaky and lacks basic documentation to make it usable.

What a hoot! Folks have demand, DEMAND! access to the code, and then have to ask for tutorials on what it is and does. Almost 30 years ago, Fred Brooks observed, in The Mythical Man Month that a software product is three times harder (read expensive) than a program. So what should our civil servants be doing with our money, building programs that are good enough to do their day-to-day jobs, or writing products that allow complete noobs to do the job too, looking over our civil servants’ shoulders? This is a serious question. According to the CRU emails, the CRU grants for the last 19 years average, roughly, a million USD per year. In the Washington, DC area I could cover the salary, benefits, overhead, and G&A for 4 good software product developers. I doubt that I could cover 5.I won’t take the folks who are demanding better code seriously until they also start demanding that the budget for the climate folks be tripled to pay for turning programs into products.
debreuil says

29 Nov 2009 at 1:00 AM

116: If Forrest Mims is not considered someone worth listening to here (**screed filled with blatantly false claims deleted**) then I’m at a loss. I don’t know what else to say.
Adlac says

29 Nov 2009 at 1:11 AM

[Response: No. The original data is curated at the met services where it originated. – gavin]

No one seems to know this but you… CRU is not pointing people to “the met.” Could you offer greater confirmation of this claim?

Here is the relevant CRU statement regarding this issue:

“We, therefore, do not hold the original raw data but only the value-added data.”
dhogaza says

29 Nov 2009 at 1:18 AM

This brings up two issues. One, if the data sets are substantially the same, why did CRU pay for something that is free elsewhere?

They didn’t pay. They got the data, for free, under an agreement (“contract”, just in case you’re obtuse) that they’d only use it for research purposes and not release it without permission from those who provided it.

Two, why would anyone involved in the search for the unbiased truth sign a confidentiality agreement that would prevent others from having access to the raw data, along with all of the “value added” statistical adjustments?

Because they want as much data as they can get. More data is better. A couple of decades ago they never imagined that someone would be fabricating a political McCarthy-ist attack on their research over something that’s not at all uncommon in science. They never imagined they’d have to defend themselves against extremist anti-science right-wingnut attacks.
dhogaza says

29 Nov 2009 at 1:21 AM

That is what ClimateAudit has persistenly requested. What RealClimate has supplied instead is a direction such as “We used the data at such and such a website.” That would never work for a patent application

This person actually believes that the patent office requires applicants to provide working source code and ACTUALLY RUNS IT?

The mind boggles. The earth must be flat, after all, and the sky green and the sea sweet and my tap water salty.
Hank Roberts says

29 Nov 2009 at 1:30 AM

Whoah, debreuil — 116 is just ccpo, quoting part of what Forrest Mims wrote much earlier at https://www.realclimate.org/index.php/archives/2009/11/wheres-the-data/comment-page-2/#comment-145290

Nothing has been deleted; ccpo used the word “deleted” but meant “elided” or “omitted from quoting” — notice the timestamp is provided so you can find the original for yourself.

Forrest Mims doesn’t specify what he’s referring to:
“… advocating the misconduct revealed in many of the Hadley CRU e-mails” could be a specific accusation if he were specific; without specificity, it’s only an opinion that there must be something wrong somewhere.

Everyone wants to be judge and jury, but none of us get to decide what really happened; all we have is “randomly selected” (whatever _that_ means) bits from some larger stolen email file.

Why aren’t people demanding that the crackers reveal the rest of what they’re hiding, so we can know the whole truth?

Oh, wait, that won’t work well for them, will it?
Sean says

29 Nov 2009 at 1:31 AM

Bernie says:

“This story from the London Times forthrightly states that the raw data that Phil Jones and his team depends upon has been lost.
http://www.timesonline.co.uk/tol/news/environment/article6936328.ece
Can you clarify whether this is true or not.
Thanks

[Response: No. The original data is curated at the met services where it originated. – gavin]”

Gavin, the original data may indeed be at the met services where it originated, along with a lot of other data that CRU chose not to use for its global temperature record. So unless CRU released a list of which stations it used and which stations it chose not to use (which it has not done) saying that the original data is at the local met services is like saying that the answer to the “Who wants to be a millionaire” final question is in the encyclopedia. I’m sure it is, but the “trick” is: where?
Mike says

29 Nov 2009 at 1:39 AM

I have a question about whether (and if so, how) gridded temperature averages are weighted.

I am not talking about station temperature weight based on distance to the grid point – that’s a separate topic.

What I want to know is, when gridded average temperatures are themselves averaged to determine regional/global averages, if each grid square ,ie. 2 or 5 degrees, is weighted for its actual surface area. I looked at the zonav.f code file, and I didn’t see any code that weighted grid squares by area.

On this topic, recent articles refer to previous articles going back to Hansen et al. 1981. Hansen et al 1981 states “The results shown were obtained with 40 equal-area boxes in each hemisphere … and the global trend was obtained by area-weighting the trends for all latitude zones.” But there is no detailed explanation or formula for how these boxes were weighted or even if “equal-area boxes” means equal by surface area or if it means 2 X 2 degree or 5 X 5 degree squares as is used currently.

Since a grid square at, say, 60 degrees latitude is half the size of a square at the equator, it seems that heating or cooling away from the equator should receive less weight per gridded square. Does that happen in the calculations? If so, can you point me towards details? Thanks in advance for your help with this.

[Response: If you do a global mean from gridded data, then yes, you weight for the area of the box. A quick estimate is to use the cos(lat) for the mid-point of the box. A lat-lon grid (i.e. 2×2) definitely needs this, but you can have ‘equal area’ grids that have each grid box have the same area, in which case the global mean is simply the unweighted average. – gavin]
BlogReader says

29 Nov 2009 at 1:39 AM

Alan of Oz To clarify my point, access to data and methods is required for reproduction, access to code is irrelevant.

Why? Why shouldn’t the exact means for which a paper is being based on not be freely available?

I don’t understand what you’re trying to accomplish here. yes the code might have warts in it, yes it might be ugly. Who cares if the most efficient way of sorting an array wasn’t used, just make the code available.

Can you clarify what you mean by “irrelevant”? You state that the “methods” should be open, so you mean describing the code but without releasing it? What if there was a mistake in there that many eyeballs would catch?

[Response: We discussed this above. The replication that is required in an observational science like climate is the replication of the conclusion. Perfect code, perfectly executed is worthless if the basic assumption that went in was flawed. Independent approaches using either the same base data, or completely different data are much better at assessing this than checking other peoples’ code line-by-line. Two different climate models that give the same result are much more informative than taking one of them and looking for bugs (which undoubtedly exist). Given limited resources, scientists focus mostly on issues that matter – ie. verifying/challenging conclusions rather than checking code. Sometimes there are problems, but these usually emerge because someone’s independent but nominally similar approach gives a different result (such as with the MSU records) – at that point it is worth looking more closely to see why that should be. – gavin]
David says

29 Nov 2009 at 1:55 AM

donQ,

The request was for “basic documentation.” Documenting one’s code is a software-development best practice. See HARRY_READ_ME.txt for an example of what happens if you don’t have adequate documentation.
http://www.anenglishmanscastle.com/HARRY_READ_ME.txt

Yes, there’s merit to the argument that even if adequate documentation was provided, a layperson may not understand it. However, Harry is (I assume) an expert, and he seemed despairing of ever getting this code to work, to say nothing of reconciling the databases’ issues.
CM says

29 Nov 2009 at 2:13 AM

“Esmeralda Dangerfield” said:

This comment is not for attrition

Then scientists will find it a welcome change from vexatious Freedom of Information requests and spurious insinuations of data fiddling.

(attrition: “the act of weakening or exhausting by constant harassment, abuse, or attack” – Merriam-Webster)
Benjamin says

29 Nov 2009 at 2:35 AM

David Benson – I guess you’re calling me a sockpuppet. I’ve got a link to a blog in my username, I’m a contributor on that blog. I came here trying to get information and most of what I see in these comments is not info but snark and scorn. Depressing.
Josh Cryer says

29 Nov 2009 at 3:34 AM

Tim McDermott, why sir, being a programmer and a hobbyist software developer, I am quite pleased by the ownage you have displayed. A hat tip to you!

This discussion reminds me of the creation of the PDS. The PDS (Planetary Data System) is a system that NASA set up a while ago that all non-Earth science space related data paid for by a government grant (either NSF or NASA) must be released in to. Note, the PDS was possibly created due to conspiracy mongers begging for data and claiming falsification, though more likely for academia, only shutting up the conspiracy freaks coincidently; I put denialists in the same basic camp as moon hoaxers or Mars alien artifact freaks when it comes to claims for data and accusations against scientists.
Alan of Oz says

29 Nov 2009 at 3:43 AM

Re 165: Agreed. The arrogance and wide-eyed naivety that is driving the (irrelevant) critsisims of code quality is unfortunately well known to experienced developers, a quote from an old friend of mine nails the behaviour very well – “Source code is like sh*t, everybody else’s stinks”.

As for costs, I work in a group of about 25 developers/testers on an esoteric set of data tools that I doubt anyone here has heard of, it not the largest project I’ve worked on by a long shot but our annual budget is twice that of the IPCC! Like most of the commercial code I have seen over the last 20yrs source level comments are virtually non-existant and when they do appear are often misleading because they are not maintained.
ccpo says

29 Nov 2009 at 3:49 AM

Bruce the Canuck says:
28 November 2009 at 8:56 PM

Post 116, in response to post 76:
>Dear Forrest…Deal with that, if you’ve the ethical and moral fortitude.

Ok, sorry but I call foul. By his linked website, Forrest is somewhat right leaning, and perhaps “on the other side” – but he’s *also* an admirable science geek.

And debreuil says:
29 November 2009 at 1:00 AM

116: If Forrest Mims is not considered someone worth listening to here (**screed filled with blatantly false claims deleted**) then I’m at a loss. I don’t know what else to say.

Sorry, but I don’t know either of your stances – whether anti-AGW or not – but did you read what the guy wrote? His background means nothing to me. His actions do. Did you, again, read what he wrote? Based on the responses by Gavin, et al., here and others elsewhere, some of his claims were, as I’ve already said, blatantly false. That he listed his resume before his screed may have fooled you with an Appeal to Authority, but it did not me. He actually asked RealClimate to stop doing science with the people involved in the e-mails! What in the world do you think that implies?

When in doubt, be the one toning it down.

I wasn’t aware I was in doubt.

…it is neither in the interests of the cause of reducing CO2 emmissions, nor the greater cause of upholding science itself, to assume all skeptics are Denialists.

Why? They are. If they are so ill informed as to not know the very obvious, even as laypersons, perhaps they should not be posting, eh? Since the evidence is incredibly one-sided, and even the scientists here and elsewhere don’t dispute my contention posted here and elsewhere that there is not even one paper that in any way refutes any of the underpinnings of AGW, how can they be but a denialist?

Asking me to be “nice” will get you nowhere. We’ve been too nice, and the nuts are winning.

The only thing that will stop these pugilists is for them to have consequences. They need to find themselves in criminal or civil court.

Cheers
Mk says

29 Nov 2009 at 3:49 AM

When people say they want to know where the data is, I assume they mean the data that may have been deleted, or not made public – not the data that has always been out there, or the data that you have not deleted.

Just a thought.
Jonathan Fischoff says

29 Nov 2009 at 3:49 AM

Seeing a lot of comments on the blogosphere about software quality.

As a software engineer that worked on a commercial open source software product (Gamebryo) I would say the time difference between just making something that works and making a standard conforming documented program, is about 10/1.

Raw data and code are necessary, however I don’t think scientists need to create great code.

All we need is the ability to do

c:>runAnalysis.exe input.dat output.dat

Can’t say if we are there or not, but that seems like an appropriate minimum.
ccpo says

29 Nov 2009 at 3:54 AM

Patrick,

You simply misread my intent. You highlighted the silliness of non-climate scientists asking for all the code and the data, is all.
Donald Oats says

29 Nov 2009 at 4:35 AM

Even with the best software developers in the world, there is still the problem that they must understand the mathematics and algorithms to the level that they can think independently and discuss things at a high enough level with the climate scientists. That’s the *really* tricky bit: finding some software developers who are capable of crossing the scientific divide and still being damned good developers. Chances are, if they are that good at the science they will be in jobs doing the science.
ChrisC says

29 Nov 2009 at 4:36 AM

I once read on a programming support forum the following relating to many of the questions posed on said forum.

“…it seems that a lot of the posts in this forum are from people who say things like: I want to learn how to do ray tracing in cobol on a palm pilot; I don’t know anything about programming yet, and clearly I’m not able to use a search engine either, but I want to be able to complete my project by noon tomorrow”.

The “enquires” from the denialists on the last few RC posts remind me alot of this (of course with an obligatory reference to the scam added on).

If you want to know the first thing about climate science, you may actually need to study it. Flipping through blog posts does not count as studying. You may actually need to crack a book, read a paper and do some friggin’ work before it’ll make much sense.

Of course the vast majority of armchair “skeptics” have no intention of doing any work. If they had, this post would have been unnecessary, as instead of crowing “where is the data”, they would have spent the 10 minutes on google that are required to have found it themselves.

Thanks for your efforts Gavin. Australian Bureau of Meteorology data can be obtained here: http://www.bom.gov.au/climate (raw AWS/METAR/SYNOP data is often unavailable… the Bureau sells it). Monthly SOI data going back to 1876 is available here http://www.bom.gov.au/climate/current/soihtm1.shtml
Barton Paul Levenson says

29 Nov 2009 at 5:43 AM

David: Since you know so much, perhaps you could tell me why the oceans have not been warming after 2003 — or if they have why it’s a very slight warming contrary to the projections of the IPCC’s models.

BPL: Look again:

Domingues, C.M., J.A. Church, N.J. White, P.J. Gleckler, S.E. Wijffels, P.M. Barker, and J.R. Dunn 2008. “Improved Estimates of Upper-Ocean Warming and Multi-Decadal Sea-Level Rise.” Nature 453, 1090-1093.

Levitus et al. 2009. “Global ocean heat content 1955–2008 in light of recently revealed instrumentation problems” Geophys. Res. Lett. 36, L07608.

[Response: I’ll put up an updated figure on this shortly. – gavin]
Bart Verheggen says

29 Nov 2009 at 6:08 AM

Re Ocean Heat Content:

Levitus et al (GRL 2009) seem to use these OHC data (http://www.nodc.noaa.gov/OC5/3M_HEAT_CONTENT/)
whichseem to show a levelling off since 2003/2004. Based on the amount of noise in the whole signal (see e.g. the relative flatline in OHC from the late 70s to the late 80s, when global air temperatures climbed steeply) I wouldn’t make too much out of it, but perhaps someone with more knowledge on the OHC content could provide a perspective?

Re my comment 95 (see saw), guess it goes to show that I’m not a native speaker…
Michael Ashley says

29 Nov 2009 at 6:54 AM

Gavin et al.,

Here is an idea:

We all know that you are never going to satisfy the denialists, no matter how much data you provide, and how much code you supply. There will always be something more that they want.

The fundamental issue is that proving AGW is actually a relatively straightforward matter of physics, combined with a few observations.

So, how about stripping out everything except the bare essentials? Jettison tree-rings altogether. Ignore sea ice measurements. Ignore tide gauges. My guess as a physicist (and yes I know that we assume cows are spherical and isotropic :-)) would be that you could make do with a fairly small dataset, e.g.

– GISTEMP
– satellite measurements of sea level and temperature
– CO2 measurements from Mauna Loa
– baseline natural CO2 from ice cores
– satellite measurements of TSI

Then, build a really simple climate model. I guess the most complex input will be the detailed line spectra of CO2, etc, you can read this in from MODTRAN or whatever you use. Make the code high quality and freely available. Make all the databases freely available.

Next, collaborate with 20 or more climate scientists and write a peer-reviewed paper describing the above, together with the necessary physics and any additional assumptions you need (e.g., radioactive heat output of the earth, albedo, average volcanic activity) and show convincingly that it proves AGW.

Now, this isn’t going to be easy, and may well require additional complexity (clouds, aerosols, etc). But the basic idea is to provide the simplest, most robust, proof of AGW. With the code and data all available, anyone can try running the model, and try to find faults.

My outline above may not be practical, but I think that something close to it should be. If it could be done, it would short-circuit all the criticism of Yamal, HARRY, etc.
cer says

29 Nov 2009 at 7:14 AM

Bill S wrote:

“For anyone who knows anything about science, the refusal to release data, codes, methods, proxies, all of it, until forced to do so is hard to comprehend.”

Perhaps you missed the point that the vast majority of data, codes and proxies have been freely available for a long time without anybody being “forced” into it. As for the methods, they are all in the published literature, that being what journals are there for.

“The excuse is that CRU has confidentiality agreements with the providers of data, but that data sets that are public correlate 98% with the data set used by CRU.

This brings up two issues. One, if the data sets are substantially the same, why did CRU pay for something that is free elsewhere?”

Er, they didn’t pay for what is free, they paid for the 2% that’s not free. The reason for doing that even though it makes no difference to the global average temperature is that a) you can’t know whether something will make a difference or not without examining it and b) it doesn’t make a difference globally, but it provides a more complete small-scale / regional picture of temperature trends.

Are you really suggesting that Phil Jones should have said “eh, well, I know there’s some extra data out there but I’m not going to include it in my work because I’m pretty sure it won’t change much”. Can you imagine the “skeptics” uproar? “AGW alarmists refuse to include new data sources – what are they afraid of?!”

“Two, why would anyone involved in the search for the unbiased truth sign a confidentiality agreement that would prevent others from having access to the raw data, along with all of the “value added” statistical adjustments?”

Because if you don’t sign the agreement you don’t get access to the data and you can’t include it, rendering your analysis less complete as a result? I would’ve thought that’d be obvious by simple common sense.
Deech56 says

29 Nov 2009 at 7:27 AM

RE Matthew

What’s required is that the data, metadata, program source code and program output for each published paper be put together for review by skeptics and allies alike. Then the code can be run on the data, the output confirmed, and any mistakes identified. Then modifications to the programs can be run, by skeptics and allies alike, and the differences caused by the modifications can be clearly identified. That way, assumptions that are made, and decisions that are made, can be clearly shown to determine, or not determine, specific claims made about the results. Put differently, for each published paper there needs to be a complete and accurate data audit trail.

That is what ClimateAudit has persistenly requested. What RealClimate has supplied instead is a direction such as “We used the data at such and such a website.” That would never work for a patent application, or FDA approval.

What a way to completely slow down the review process. An important part of peer review is selecting qualified reviewers, and I don’t see this happening with this selection. Allies are probably other climatologists who have enough to do.

Now it so happens that I have experience with both patent applications and FDA submissions. My patents were based on material that I had published, and except for making sure I had everything in my (company-owned) notebooks in case of challenge, there wasn’t anything I put in the application that wasn’t found in my publications.

The FDA will accept manuscripts as supporting data. What you are writing about are the clinical studies and GLP animal safety studies. I can tell you that:

1. Getting set up for GLP studies (having SOPs, validation, QA review) is onerous. There’s a reason research labs, even corporate research labs, are not GLP. Allow for about a year in down time and about $300,000 for each institute to start, and about $150,000-200,000/yr for maintenance. Who will pay for this? Allow for 6-month to 1-year delays in the publication of each study.

2. All auditing is internal by employees or contractors/consultants of the company. For submitted data, the FDA can go through the books and do their own checking. What’s the climactic version of the FDA?

3. The raw data for these studies is not released to the public (or rival companies), except under terms of any journal that publishes a paper. Publication is not a precondition for FDA review; in fact tables in an Investigator’s Brochure are fine with the FDA. This may be why many clinical studies are reported only in scientific meeting abstracts.

Science has a method for validation of results – independent confirmation. Scientists are a tough audience. For example: the temperature reconstructions by MBH led to other reconstructions. More data gathering and experience from almost ten years of publications, meetings, comments led to Mann, et al. 2008. I will admit that I believe the NAS findings helped Mann, et al. Not sure that these points would not have been raised without the criticisms from the outside or not. Of course, lab animal care has improved over the last 30 years I’ve been involved in research; that doesn’t mean that all lab animal protocols and data should be reviewed by PeTA first.

Each finding is a piece of the puzzle. Nothing is entirely perfect, but the picture becomes clearer with each new study. Like the fossil record, scientists study what nature can show them to get the best idea of what has happened and is happening. We should at least admit that they know their field better than we do and that they know what they are doing.
Ray Ladbury says

29 Nov 2009 at 8:24 AM

Matthew says “What’s required is that the data, metadata, program source code and program output for each published paper be put together for review by skeptics and allies alike.”

Uh, no. This is not required except by tin-foil-hat-and-black-helicopter conspiracy theorists. What is required by scientists is validation of the conclusions and methodology of the work. If you want to come up with another endeavor, fine, but don’t pretend it’s science. Last I saw, science was working just fine.
The Raven says

29 Nov 2009 at 8:47 AM

DonQ, as a start, go to David Archer’s post, listen to his lectures, read his book. That will perhaps put you in a much better position to deal with the GCM code.

(Why yes, I am standing on one foot. But I’m a bird and this is not difficult.)
Ray Ladbury says

29 Nov 2009 at 8:47 AM

As much as it pains me to say it, as I have great respect for Forrest Mims work popularizing electronics, has anyone pointed out that Mims is a Fellow at the Discovery Institute? This hardly bespeaks any strong understanding of scientific method, and given the anti-science activities of said Institute, I think we are safe in thanking Gavin for sparing us the screed.
eelco rohling says

29 Nov 2009 at 9:28 AM

Composite continuous 200 to 250-year resolution sea-level record for the last 500,000 years on EDC3 chronology (with the various component parts from which the composite was formed), for use alongside ice-core CO2 and temperature records (Rohling, E.J., Grant, K., Bolshaw, M., Roberts, A.P., Siddall, M., Hemleben, Ch., and Kucera, M., Antarctic temperature and global sea level closely coupled over the past five glacial cycles, Nature Geoscience, 2, 500-504, 2009). Composite record is shown in columns AR (EDC3 age) and AS (Relative Sea Level). Link: http://www.soes.soton.ac.uk/staff/ejr/Rohling-papers/Rohling%20et%20al%20Nat%20Geosc_%20data%20supplement.xls
David Wright says

29 Nov 2009 at 9:28 AM

If you want scientists to review and accept your work, then there is no need to make your data understandable to the general public. They are welcome to take the risk of citing your work if they believe in it. They assume the risk of later being proven wrong.
If you want the general public to understand your work and alter their behaviour accordingly, you had better be ready to educate them on how your science works. A fatherly pat on the head and a “trust me” will not suffice.
This would require a complete open source model which can be operated by the public, with all raw data open sourced.
Welcome to the 21st century.
Paul Lindsey says

29 Nov 2009 at 10:24 AM

There is a website where the owner purports to have been analyzing the code used to convert GHCN data to GIStemp. He believes he has found multiple methodology errors, such as adding “false accuracy” of multiple decimal places to arithmetic means of integer data, and errors due to the use of distant reference stations that do not experience the same climate to lengthen the temperature series of current stations.

He’s not a climatologist, but is obviously a very experienced programmer. He also states in at least one comment that he has attempted to post comments here at RC, only to have them round-filed.

It looks like he’s been at this since at least Apr 09. I haven’t read and digested everything, but so far, his assertions appear plausible: that on the way to becoming GIStemp, the temperature data has been bent, folded and mutilated. The website is http://chiefio.wordpress.com/gistemp/

[Response: Any coding errors found should of course be notified to the GISTEMP people – they will be happy to fix these things immediately and post a brief description of the impact it has on the results (as they have many times before). However, a brief perusal of this person’s site shows that he is pretty confused about where the data comes from, the motivations of choices made in the analysis and has a woeful lack of understanding of what is going on in a calculation of the average temperature anomaly. His insistence that GISTEMP is dropping high altitude sites to increase the mean temperature of the stations is just nonsense. The mean anomaly is calculated by averaging anomalies across stations and so only if there was a systematically smaller anomaly in high altitude regions would the stations shift through time have this affect. In fact though, the evidence is that anomalies are of larger magnitude at altitude, and so it would have the opposite effect to that claimed. Combined with the fact that GISTEMP doesn’t control what gets into the GHCN data in the first place, those kinds of criticisms have very little merit. However, I stress that any actual coding errors should of course be reported. – gavin]
caerbannog says

29 Nov 2009 at 10:26 AM

Of course the vast majority of armchair “skeptics” have no intention of doing any work. If they had, this post would have been unnecessary, as instead of crowing “where is the data”, they would have spent the 10 minutes on google that are required to have found it themselves.

Perhaps it’s time for the RC folks to post a short piece devoted to soliciting informal research proposals from the skeptics here who have been demanding more access to data. These folks who have been loudly demanding full access to the data must be “chomping at the bit” to get started on their own independent research projects. As the ever-growing “Data Sources” page attests, there is an incredible variety of freely available raw and processed climatic data available to them. If what’s already out there isn’t sufficient, then the skeptics could detail what additional data their research projects would require and perhaps others here could help them get their hands on that data (all the while respecting IP guidelines, of course.)

A proposal could be along the lines of “I don’t feel that UHI has been addressed adequately, so I am going to perform my own comparison of urban and rural temperature data to demonstrate that”, along with some followup details about temperature station selection, etc. Nothing terribly detailed and formal, mind you, just some quick “out there” ideas that skeptics feel haven’t been looked out closely enough.

And after the RC folks here have collected the skeptics’ research proposals, they might want to set up a page here devoted to publishing the skeptics’ results.
Christopher Hogan says

29 Nov 2009 at 10:36 AM

I finally got curious enough to download the GISS Model E from the NASA website.

In my opinion, people who say this is inadequately documented are simply stupid. The gzip has a how-to document in HTML that walks you through running the model. The code is FORTRAN. About every 4th line is a comment describing what the program is doing. The variable names even (largely) make sense. What the @#$* else could you possibly want for a one-off piece of research software written in FORTRAN?

I haven’t written a large FORTRAN program in close to 20 years (I program in SAS). But this is perfectly readable code, even for somebody as out-of-practice as I am.

If I had a month or two, I could probably understand it in a fair level of detail. But if you think you’re going to be able to download it and find an error in a couple of hours, you don’t have your head on straight.
dhogaza says

29 Nov 2009 at 10:46 AM

Perhaps it’s time for the RC folks to post a short piece devoted to soliciting informal research proposals from the skeptics here who have been demanding more access to data.

Somewhere in one of these overburdened threads Gavin mentioned a “citizen science” project that would be “truly useful”.

Perhaps a list of “useful citizen science” projects that could be undertaken by people with a genuine interest might be interesting. As you say, “there is an incredible variety of freely available raw and processed climatic data available” … it could be a list of “useful armchair citizen science” projects, no field work required :)

I guess the difference between what you’re proposing and what I’m saying is that I’m suggesting ideas flow from researchers to citizens, rather than skeptics to researchers. I fear the latter case would result in a short list mostly boiling down to “I want to prove climate science is a fraud”.
Hans Olav Hygen says

29 Nov 2009 at 11:00 AM

I’m not sure if it was mentioned allready, but rimfrost.no got a great collection of surface temperature.
Jeffrey Davis says

29 Nov 2009 at 11:18 AM

“And after the RC folks here have collected the skeptics’ research proposals, they might want to set up a page here devoted to publishing the skeptics’ results.”

A forum for crank research? Yeah, there’s a crying need on the web for that.

« Older Comments

Newer Comments »

Where’s the data?

407 Responses to "Where’s the data?"

ABOUT

DATA AND GRAPHICS

INDEX

Realclimate Stats

Reader Interactions

407 Responses to "Where’s the data?"

Footer

ABOUT

DATA AND GRAPHICS

INDEX

Realclimate Stats