CRU Hack: More context

2 Dec 2009 by Gavin

Continuation of the older threads. Please scan those (even briefly) to see whether your point has already been dealt with. Let me know if there is something worth pulling from the comments to the main post.

In the meantime, read about why peer-review is a necessary but not sufficient condition for science to be worth looking at. Also, before you conclude that the emails have any impact on the science, read about the six easy steps that mean that CO2 (and the other greenhouse gases) are indeed likely to be a problem, and think specifically how anything in the emails affect them.

Update: The piece by Peter Kelemen at Columbia in Popular Mechanics is quite sensible, even if I don’t agree in all particulars.

Further update: Nature’s editorial.

Further, further update: Ben Santer’s mail (click on quoted text), the Mike Hulme op-ed, and Kevin Trenberth.

About Gavin

1285 Responses to "CRU Hack: More context"

Ken W says

9 Dec 2009 at 12:00 PM

Re: 645
Kevin,
I’ve rewritten tons of software in my career that came out of various research labs. What I’ve found is that they are usually very poorly written (no naming conventions, inefficient, etc., etc.) and lack all kinds of nice error-trapping, but they are accurate and generate correct results.

How can this be?

It’s because the research code is written by people who intimately understand the inputs, outputs, and processes that they are coding. In commercial software, where countless people (with no understanding of the inner workings of the software) can attempt all kinds of unanticipated things, you have to code against the possible (no matter how unlikely). But in research code, you can define very precisely the inputs and sequence of events, so it’s far easier to do it correctly.
Richard says

9 Dec 2009 at 12:07 PM

Re: “[Response: That was hyperbole born of frustration- no data was destroyed and none will be. And there are third party agreements. – gavin]”

Gavin, you’re a wonderful advocate for your cause, but at some point the layman, like me, has to ask how many strained explanations are too many? “Trick” is jargon. “Hide the decline” was about tree-rings not temperature and “hide” was just an unfortunate choice of words. “Redefin[ing] the peer review process” isn’t about rigging the game. An expressed willingness to break FOI laws is really just an expression of frustration.

I think Mr. Occam needs a shave.

Respectfully,

Richard Grath

[Response: I challenge you to publish 13 years of your email and have it be subjected to examination by hordes of hostile parties and then defend every single joke, ambiguous word choice or out of context quote they come up with. Unless you are some kind of saint, you would have just as many or more examples of things that can be spun to make you look bad. I wouldn’t wish that on my worst enemy. – gavin]
Timothy Chase says

9 Dec 2009 at 12:08 PM

Rod B wrote in 627:

Timothy Chase (602 [and 594]), your character assassination net casts a large trail. I’m impressed.

Maybe I’m getting a little of Atlas Shrugged out of my system. Then again there was James Taggart and the people he liked to do “business” with. Whatever. In my book identification precedes evaluation. Nothing is of greater importance than adherence to reality. SourceWatch has references for all the material I am relying upon. So much for the charge of character assassination.

All totalled I have 17 different organizations that were involved in both the tobacco and AGW denial campaigns — but I’m sure there are more. However, it does mean that a fair amount of the money from the Foundations (Scaife, Bradley, Koch and Coors) went to pay denial campaigns other than those related to AGW. So at a certain level this means that a large part of the Exxon disinformation network was already in place before Exxon got around to using it.

In this respect it reminds me of bacteria.

Scientists were surprised at how rapidly different strains and species of bacteria developed resistance to new antibiotics. But of course there was a small world lateral gene transfer network. It was/is highly efficient, small world, involving hubs and paths between distant nodes of the network that involve few jumps. And the reason why it exists isn’t simply in order trade in genes for antibiotic resistance, but pathogenicity, metabolic processes, symbiosis, and other aspects of ecological fitness.

And yes, even antibiotic resistance. It turns out that sometimes that strains of bacteria are already resistant to antibiotics when they are first developed — such that scientists have sugggested testing new antibiotics against wild strains of bacteria found in the soil before taking them any further. The pinnacle of creation. We weren’t the ones that originated antibiotics: bacteria were, for use against competing strains. It was the original germ warfare.

AGW didn’t create the need for the network. It was acid rain. Tobacco. Lead paint. Asbestos. DDTs. CFCs. There is a reason why so many of these organizations espouse a libertarian ideology, why the funders have chosen to invest in it. Of course in some cases the funders genuinely believe in it. Perhaps the majority.

No matter. Cooption. In either case it makes the organizations multi-use. A large part of the money spent on a given organization in the present is capital invested for dealing with unseen contingencies and needs in the future. And what of the Religious Right? Richard Mellon Scaife was largely responsible for creating it as a political force in American politics. Some overlap in the organizations and funders. What role does it play? The attacks upon evolutionary biology? There is still a great deal I don’t understand.
Joe says

9 Dec 2009 at 12:17 PM

re: 647 John Pearson:

I’ve got to agree with Kevin King up to a point here. Note, the up to a point.

What seems to be ignored in all the comments about “reproducibility” is that the released code is actually at a very low level in the research. It’s concerned with building and maintaining the basic database, not with calculating and confirming / invalidating results.

As such, it’s highly unlikely that others have “written their own code to confirm” – any such confirmatory code will be at a higher level, but still based on data compiled by some pretty poor Fortran. The importance of exception handling has been understood pretty well since computers were invented and to include none in code handling huge amounts of data is foolish, bordering on negligent.

There is also the small matter, highlighted by the “rogue” precip value that cause an overflow for poor Harry, that the software doesn’t carry out even basic checks for obviously erroneous data. 4992mm of rain in a month in Syria is a fairly obvious rogue value but anomdtb doesn’t notice and crashes as a result.

The same module was used to process temperature sets but a temp set with similar error values that are an order of manitude too high (entirely possible – 4992 was quite likely 492 with a double-tap on the 9) probably wouldn’t cause an overflow because it’s erroneous values would still be within the “normal” range of the precipitation.

Although it’s used with differing data ranges, the program doesn’t differentiate between the type of data it’s reading, or the likely values of that data. So you could quite easily re-name a precip file as temp and the program would happily insert 12 months of 100+ degree temperatures into the final dataset. I’m not suggesting for a minute that anyone would (intentionally) do that and such a big anomaly would almost certainly be spotted in later analysis. But the occasional mis-typed value here and there could easily slip through and produce a significant bias over time.

Protecting against that sort of error is such basic stuff that we were taught it during O level computer science over 28 years ago – hardly a “new, modern development” concept! Failing to add such very basic checking, even to in-house code is a sign of either no appreciation of the issues (seriously, schoolboy level stuff guys) or intellectual laziness – take your pick which is more likely!
t_p_hamilton says

9 Dec 2009 at 1:00 PM

[Response: Oil companies have sometimes funded real scientists to do research on issues that were interesting to them. Deep time paleo-climate is a frequent subject for instance. But at no time to the companies sponsoring the research get to determine the outcome (at least not with any of the scientists I know). This is very different to oil companies sponsoring disinformation efforts on climate change which involve no research but plenty of misrepresentation instead. The former is fine, the latter, not so much. – gavin]

Funny how people who claim to follow the money never look at what it was used for!
kevin king says

9 Dec 2009 at 1:51 PM

Thank you joe. I’ll take up to a point. maybe simply writing test results into
a schema with adequate constraints would pick up data integrity issues right?
KenW says

9 Dec 2009 at 2:04 PM

Re: 651
Richard: “Trick” is jargon.
KenW: Yes, it is jargon. Just the other day I told my wife, “here’s a great TRICK you can use in Excel”. No deception. No cover-up. Just a clever way to achieve something.

Richard: “Hide the decline” was about tree-rings not temperature and “hide” was just an unfortunate choice of words.
KenW: Yes, the decline was in a tree-ring dataset. A more precise way this could have been written is “clear up the confusion this known bad data would produce on a plot”. But since it was an e-mail between people that understand the context, there wasn’t any need to write it in that manner.

Richard: “Redefin[ing] the peer review process” isn’t about rigging the game.
KenW: Would you want to put your endorsement on a paper that contained what you knew to be inaccurate information? Any group project will contain efforts by some to weed out the bad work of others. That’s called integrity and due diligence. But alas, the paper in question ended up in the IPCC report anyways. So much for those powerful scientists rigging the game.

Richard: An expressed willingness to break FOI laws is really just an expression of frustration.
KenW: What would you do if you received dozens of requests for data (each, which would take significant time away from your already busy work schedule) by people you knew only intend to misrepresent your work and are seeking more amunition to attack you with? I suspect you, like every other decent person, would at times express frustration and consider doing things you probably shouldn’t to eliminate the headaches.
Ray Ladbury says

9 Dec 2009 at 2:12 PM

Rod B. @627

Telling the truth is NOT character assassination. Issuing a selection of emails designed specifically to paint someone in a bad light is character assassination. In the former case, one is telling the truth; in the latter, one is distorting the truth. Do you see the difference, or do I need to draw you a map?
Hank Roberts says

9 Dec 2009 at 2:35 PM

Wano asks (569)
> can anybody debunk [American Thinker]

Yes; many places did, and long before AT wrote what you point to. You might say it’s been “predebunked.”

The key is to understand the first “cartoon” picture they start out with –the sketch with the bump called a ‘Medieval Warm’ period, which was a sketch, from few sources, of a local not global event, that is far better understood — and that’s why it disappears from more recent _global_ temperature charts, as shown.

It’s explained, for instance, here:
http://www.pewclimate.org/what_s_being_done/in_the_congress/7_27_06.cfm

What you’ve got there, Wano, is dead horse beaten into mushy hamburger. You can easily look this up for yourself.

Recommendation — talk to a librarian; use Google Scholar.
Don’t rely on some guy you don’t know, on a blog that you really want to trust for reasons unrelated to the science.
That’s falling for PR. Don’t trust _anybody_’s PR. Learn to look this stuff up for yourself.
manacker says

9 Dec 2009 at 2:37 PM

Gavin

I would fully agree with your comment (654) to t_p_hamilton.

Funding valid research work in any field increases knowledge and is, therefore, a good thing.

Funding “agenda driven science” is not a good thing, no matter who does it.

Max
manacker says

9 Dec 2009 at 2:44 PM

To the question “third party agreements” blocking disclosure under the requirements of the FOIA, is anyone here qualified to comment legally on which would take precedence?
manacker says

9 Dec 2009 at 2:59 PM

Gavin

You wrote

[Response: Research from government sources is not designed to ‘support the AGW premise’ – the vast majority supports the satellite progams which I’m pretty sure are not deciding which photon they register based on whether it supports warming or cooling. And the research grants that are funded at NSF can be looked up – and although I’ve challenged people repeatedly, no-one has ever found any evidence of claims that they are only funding reseach to support AGW. And finally, the research that oil companies pay for (and I have a number of colleagues who’ve received research grants on paleo-climate from the likes of Shell etc.) is perfectly fine in general and not tied to any political outcome. You are confusing the millions the oil companies have spent/spend on spreading disinformation with research. CFACT, CEI, FoS, Heartland etc. do not do scientific research. – gavin]

“Spreading disinformation” is a rather one-sided concept, which I’d rather not get into.

Is IPCC and are the key researchers who are cited by IPCC “seeking the truth” (about our planet’s climate and what makes it work) or are they “seeking the proof” (for anthropogenic causes)? What do you truly think?

The point was, Gavin, that there is far less money being spent by oil companies to disprove AGW than there is by governments to prove it (not fourteen times more, as SA had claimed).

Max

[Response: Wrong. The oil companies aren’t spending a cent trying to disprove AGW. They have instead spent millions trying to tell people that AGW is disproved. Big difference. Research money is not spent on PR efforts. – gavin]
Ian says

9 Dec 2009 at 3:08 PM

@Michael Dodge Thomas, #629, WUWT post “needs to be addressed on a priority basis”

Michael, a good start is to search the WUWT thread you linked to for Nick Stokes’ comments.
JLS says

9 Dec 2009 at 3:21 PM

David,

on your(430), agreed these are valid, although statement 4, atmospheric CO2 == AGW, IMHO is grounded as a correlation supporting the only available(ok) yet unfalsifiable(not ok) hypothesis. Please understand, I’m no climate expert, but the arguments of many real ones including some here lead me to accept that simulated scenarios may well be the best the models can “discover” to resolve things. Causation obviously cannot be empirically confirmed by experimentation at the required global scale.

I can see that the data shows a recent warming trend coincident with a trend of increasing population and economic outputs. And long-term CO2 levels are of the greatest concern simply because these pose an unknown, uncontrolled, and possibly runaway risk, and therefore mitigation strategies ought be researched and readied to scale in the event outcomes do start to emerge dismally.
Dendrite says

9 Dec 2009 at 4:02 PM

I have been shocked and appalled to read of the connections between the AGW deniers (I now feel justified in using the term) and other denial campaigns such as those associated with the health risks of tobacco and the environmental risks of acid rain. This may be old news to some in the US, but I suspect that many of us in Europe are not as clued up on these things and will be as outraged as I am.

These connections:
1) call into question the motives of the denialists, and
2) call into question their scientific judgement and competence. At best, they have a history of backing the wrong side, at worst they have shown a total disregard for scientific evidence and the principles of scientific practice.

To think that I have spent hours being preached at, in print and even on TV, about ‘standards’ and ‘how science should be conducted’ by people with this kind of pedigree.

Thanks to Timothy Chase (posts 594, 602, 627) for the info.

‘Climategate’ has certainly been an eye-opener for me.
t_p_hamilton says

9 Dec 2009 at 4:21 PM

Max Anacker seems to be determined to be exhibit 1 for my statement “Funny how people who claim to follow the money never look at what it was used for!”:

“Funding valid research work in any field increases knowledge and is, therefore, a good thing.

Funding “agenda driven science” is not a good thing, no matter who does it.”

Have you looked yet? Gavin gave some places to look for “government science”, i.e. the NSF which has every funded proposal abstract online. Go to http://www.nsf.gov and search awards for climate change. The first hit is:

Award number 0909523

Abstract

Rapid changes in the arctic climate system that occurred in the relatively recent past can be compared with the output of climate models to improve the understanding of the processes responsible for nonlinear system change. This study focuses on the transition between the Holocene thermal maximum (HTM) and the onset of Neoglaciation, and on the step-like changes that occurred subsequently during the late Holocene. The millennial-scale cooling trend that followed the HTM coincides with the decrease in Northern Hemisphere summer insolation driven by slow changes in Earth?s orbit. Despite the nearly linear forcing, the transition from the HTM to the Little Ice Age (1500-1900 AD) was neither gradual nor uniform. To understand how feedbacks and perturbations result in rapid changes, a geographically distributed network of proxy climate records will be used to study the spatial and temporal patterns of change, and to quantify the magnitude of change during these transitions.

rest of abstract deleted, you can read it yourself if interested, or any of the other 2000 proposals that have climate change as keywords.

The agenda appears to be a better temperature record and how it can be used to understand “tipping points”. People will be paid to go out into the field and collect data and analyze that data and publish papers. The annual and final reports will include the peer-reviewed papers that result from this work.

Now where is the Competitive Enterprise Institute money going? I’ll tell you, it is PR, nothing more nothing less. No data collection, no peer reviewed papers.

This is because the National Science Foundation is interested in science, the Competitive Enterprise Institute is not. The names of the organizations should be a big hint.
manacker says

9 Dec 2009 at 4:24 PM

Hank Roberts (659)

“The key is to understand the first “cartoon” picture they start out with –the sketch with the bump called a ‘Medieval Warm’ period, which was a sketch, from few sources, of a local not global event, that is far better understood — and that’s why it disappears from more recent _global_ temperature charts, as shown.”

Sorry, Hank, there have been over twenty studies using all sorts of paleoclimate reconstructions from all over the world, which confirm that the MWP was somewhat warmer than the current period.

[edit]
As you can see, the evidence for a global MWP with temperatures somewhat higher than today are overwhelming.

There is no need to rely on a single study (with a poor correlation between tree-ring data and physically observed temperature after 1960) or any of the “copy-hockeysticks” that followed it.

Max

Links to follow separately

[Response: 20 studies? Gosh. Which have ‘warm periods’ at different times. And all the other studies that don’t show it? You are falling into the same wishful thinking trap as Soon and Baliunas – only looking for what you want to see. (NB. Please do not spam the site with links). – gavin]
ZT says

9 Dec 2009 at 4:45 PM

I have to agree with Kevin King. The CRU data and code should represent software and scientific reproducibility, and be appropriately secured, backed up, commented, etc.

To try to defend a lack of reproducibility, documentation, or even interest in maintaining the integrity of the data is brave and loyal to the cause, but stretching things.

How would you like it if your medical records were not backed up but stored chaotically in a folder called ‘documents’ on a doctor’s hard drive.

Tax payers invested millions obtaining this information and it has been treated as a political plaything.

It should be open sourced immediately. If some of the ‘proprietary’ data can’t be opened – no problem – leave that out. If the effects of global warming are so severe (presumably) that warming will still show up. This will end the debate. The skeptical scientists will be able to satisfy their skepticism (which not a bad quality for a scientist to have) and the true believers will be able to ascend to seventh heaven – safe in the knowledge that not only did they save the planet but that they did it without impeding analysis. (Or even having to invent excuses for impeding analysis).
David B. Benson says

9 Dec 2009 at 4:49 PM

JLS (664) — The science supporting atmospheric CO2 == AGW is impeccable, thoroughly researched, and eseentially all done by 1979. Please read the summary of
http://books.nap.edu/openbook.php?record_id=12181&page=1

Outcomes are already dismal in areas depending upon glacial meltwater. In addition, we are already seeing some of the 1 K predictions from Mark Lynas’s “Six Degrees”. Here is a review:
http://www.timesonline.co.uk/tol/news/uk/science/article1480669.ece
dhogaza says

9 Dec 2009 at 4:52 PM

To the question “third party agreements” blocking disclosure under the requirements of the FOIA, is anyone here qualified to comment legally on which would take precedence?

No, but the information officer who evaluated the FOI request, as well as the one who reviewed it upon appeal, are, and stated that the balance lies with the third party agreements. I believe – you can look this up yourself – that the fact that the information is available from those third parties was one of the reasons for this determination.

The US FOIA is even more restrictive in this regard – a FOIA can not be used to get ahold of such information under any circumstance. Only stuff – data, photos, whatever – that FOIA’d agency has the legal right to distribute will be given out. You can read about these restrictions on any number of agency sites discussing the FOIA process and what is, and is not, subject to release as a result of a FOIA request.
Robert Butler says

9 Dec 2009 at 4:54 PM

I am more a student of people than climate. I’m more at home on an internet forum that discusses history and politics than science. And yet we too have a global warming thread.

A major aspect of the problem is values and world views. It is very difficult for an individual to reevaluate the basic principles and perspectives through which he navigates the world. Further, if an individual cannot calmly and rationally reevaluate whether he should change, say, his religion or political party, how much more difficult would it be for an entire culture to change, for a significant fraction of the population to let go and begin anew.

In the United States we have had this happen, but rarely… roughly every four score and seven years. We’ve had our Revolution, Civil War, and Great Depression. After each of these times of turmoil, we cam out very different from what we were before. Alas, it takes a great hardship for a people to collectively reject old values. The old way of doing things must in generally totally fail. If one wants to understand the degree of devastation it takes for a culture to collectively let go of dated values, one might examine pictures of Atlanta after Sherman rode through, or Berlin after Hitler’s suicide. People will cling to old values with precious fanaticism.

Today there seems to be a required shift from world where economics provides the dominant values to one where ecology must take its place. People have kept trying to shovel dirt on Malthus’ grave. it has never quite happened. Still, there are too many people, and limited resources. His voice still speaks no matter how much dirt is thrown.

On the Fourth Turning forums, the libertarians, conservatives and anarchists are universally skeptics on the issue of climate change. Their way of looking at the world sees big government as a problem… and not without reason. There is much that is less than perfect about big government. There is a problem in that their political values are stronger than their scientific values. The role of government in perpetuating much that is wrong with their world is vitally important in how they see the world. Any solution to any problem that involves big government exerting greater authority will be rejected on a values basis.

Scientific integrity and adjusting one’s theories to fit observed data? Not so much. The bar required, for those dominated by political values, is not a matter of writing a better peer reviewed paper than the other guy. The bar is much higher. Such individuals must perceive their political values to have failed them in a catastrophic way before they will be willing to reevaluate their world views and accept that their culture must move on.

Which is a problem. The Confederacy or the Axis Powers could surrender, and the reconstruction began almost immediately. Mother Nature will not respond to such a surrender as graciously. She is not so forgiving.

I suspect everyone knows this on an intuitive level. I just felt a need to state it more formally. The ability of man to view the world objectively, and to change his mind in response to new information and peril, is really very limited. World views and values are necessary for cultures to be coherent and stable, for individuals to have quick and consistent answers to common problems. Cultures do need change. Given rapidly changing technology, the need for cultures to change has been far greater than the norm in recent centuries. Still, the process of change is invariably traumatic. People won’t change until they are virtually forced to do so.

And by the time the effects of Global Warming become traumatic enough for many to reconsider their values, it may very well be too late for anything approximating a soft landing.

All that philosophical rumbling aside, I’ve got problems in the latest skirmish of the moment. People might want to to a search on “darwin zero smoking gun.” The skeptics have started another skirmish, and there isn’t much available on the net yet to counter.
Wano says

9 Dec 2009 at 5:19 PM

#659 Thanks Hank

Just writing a piece about PR.
Ken W says

9 Dec 2009 at 5:23 PM

Re: 662
Max wrote:
“The point was, Gavin, that there is far less money being spent by oil companies to disprove AGW than there is by governments to prove it (not fourteen times more, as SA had claimed).”

Research money isn’t spent by governments “to prove it”. Research money is spent by government to gain an understanding of it. If that research somehow supports the truth of AGW (which I’d certainly expect to be the case if the theory has any validity), that’s a side effect, not the intent.
SecularAnimist says

9 Dec 2009 at 5:25 PM

manacker wrote: “there is far less money being spent by oil companies to disprove AGW than there is by governments to prove it”

gavin replied: “The oil companies aren’t spending a cent trying to disprove AGW. They have instead spent millions trying to tell people that AGW is disproved.”

Moreover, governments aren’t spending a cent trying to “prove” AGW. The purpose of climate research is not to “prove” anything, but to find out what is actually going on.

So manacker has managed to put two distinct falsehoods into a single sentence.
SecularAnimist says

9 Dec 2009 at 5:30 PM

JLS wrote: “Causation obviously cannot be empirically confirmed by experimentation at the required global scale.”

Wrong. Anthropogenic causation has been empirically confirmed.

JLS wrote: “… mitigation strategies ought be researched and readied to scale in the event outcomes do start to emerge dismally.”

Outcomes are already “emerging dismally”. The ice is melting. The oceans are acidifying (and warming). The forests are dying. The deserts are spreading. The crops are failing. The seas are rising. The permafrost is thawing. All of this is already happening, rapidly, as a result of the anthropogenic global warming.

If it isn’t already too late to stop it, then if we wait for things to get any more “dismal” than they already are, it surely will be too late.
Ken W says

9 Dec 2009 at 5:32 PM

Re: 664
JLS,
If you’re not familiar with Spencer Werts “The Discovery of Global Warming” page, I’d encourage you to spend some time reading it.

http://www.aip.org/history/climate/index.html
Joe says

9 Dec 2009 at 5:37 PM

re 656 Kevin King,

It would, but it’s a relatively modern technique (certainly wrt non-pro development). That’s what I meant by “to a point” – the critics on here are quite right that modern professional development techniques probably don’t have a place in this sort of work.

What they seem to forget, though, is that those techniques are only formalisation of best practice to make problems as easy to trace as possible. They’re purely designed to make commercial development relatively painless. You have to take an enormous step back from BEST practice to throw ALL error checking out the window and trust to the Gods (or the typists in some weather station somewhere), which is what a lot of this code seems to do.

At the time this code seems to have been written the minimum acceptable approach, for anything remotely important, would have been to check input values for reasonableness – so recorded rainfall of 4000mm in a month would never have had a chance to throw an overflow. Similarly, all those temps (that might have been) typed in as 330 degrees instead of 30 would have been caught as the dataset was being built – once they’re incorporated it becomes very hard to spot them and virtually impossible to remove them without reprocessing everything.
Philip Machanick says

9 Dec 2009 at 5:50 PM

RaymondT #614: no one here has any objection to genuine skeptics, people who are not gullible and want to understand for themselves. I’m not an expert either and I try to help, knowing the real experts are too busy to help with every question (but someone will for sure correct me if I get things wrong).

Go back and read the McLean et al. 2009 paper again. Look for this text:

To remove the noise, the absolute values were replaced with derivative values based on variations. Here the derivative is the 12-month running average subtracted from the same average for data 12 months later.

Try this technique on any data set of your choice (the function y=rand()+Ax will do nicely with A an arbitrary constant; vary A and see what happens — if you use a spreadsheet you may have to compute the random numbers once then copy and paste them as values to stop them recalculating). Or better still, think back to your first week of calculus. The effect of this data manipulation is to delete any linear trend from the data. If the data consists primarily of ENSO plus a linear trend, what you are left with is only variation due to ENSO.

I wish I could get away with publishing such sloppy science. Then again, no. I have some self esteem.
Lee A. Arnold says

9 Dec 2009 at 5:53 PM

Max #632 Since almost all the corporations are using loopholes and subsidies to pay far less than the official U.S. federal corporate income tax rate of around 35%, and many corporations are paying next to nothing, (I think Exxon Mobil paid about 7%-8% in U.S. federal income tax, last year,) I think most tax economists would say that the “net flow of money” is actually going to the energy companies, out of other taxpayers’ pockets.

But don’t keep avoiding the main question: Why hasn’t the fossil fuel industry disproved any of the main findings in climate science?

If it were truly damning evidence in the emails, then it ought to be easy, and disproving it should not have taken a lot of money.

Is that why you also need the excuse that the data isn’t available? — but this turns out to be a lie. It’s all available, and what isn’t, you can purchase for yourself from the weather bureaus.
Philip Machanick says

9 Dec 2009 at 7:25 PM

Some people are arguing as if this a complete archive (especially on meaning extracted from “HARRY_READ_ME.txt”). I extracted the dates from the emails and sorted them. The distribution looks very suss. There are months with no emails at all, and one month with over 50. Mean 6.5 per month, stddev 7.3 (you can get the mean if you are lazy by counting the files and dividing by the total months, but I wanted to plot the distribution as well).

I would expect a very busy research group to have a fairly constant flow of emails, slowly increasing since the 1990s as other modes of communication phased out. Obviously there will be some bursts of activity around deadlines but the pattern I see here doesn’t look right (seriously, a whole research group averaging 6.5 emails per month?) unless a lot of emails have been (possibly selectively) omitted.

My own work inbox has 1750 mails since September last year, and I delete inconsequential emails. That’s about 120 emails per month, and I have nowhere near the level of collaborations going on that this group has.

If a lot of emails have been left out, arguing over how well a coding exercise was documented because the only record of that is a single copy of a coder’s personal log is silly. How do we know this is the only documentation for that project? It’s one thing not to let the facts spoil a good argument. It’s quite another to argue on the basis of a lack of facts.
Ray Ladbury says

9 Dec 2009 at 8:10 PM

JLS, WRONG!!

CO2 was identified as a greenhouse gas in 1824 by Joseph Fourier.

Global warming due to anthropogenic CO2 has been predicted at various times going back to around the turn of the last Century (Svante Arhennius).

Warming was observed unmistakably from 1975 through the present, thus confirming the prediction. We have both correlation AND a mechanism, not to mention that the warming has an unmistakable greenhouse signature (e.g. simultaneous tropospheric warming and stratospheric cooling).
John Mashey says

9 Dec 2009 at 8:28 PM

re: #645 Kevin King

You still haven’t said how long you’ve been at this, which I asked to figure out where to start. As it happens, some of the tools and techniques you tell me about are *specifically* descended from ones I invented/helped invent/popularize in the 1970s and early 1980s, to the point they are taken for granted.
I suspect this might precede your writing much code.

By now, most software development is done on UNIX variants (including MacOS & Linux) or Windows, regardless of the target environment, which didn’t used to be true. People are used to having trouble-report systems, source-code control systems (like CVS). People are used to having scripting languages of various kinds to automate procedural tasks. People are used to having Makefiles. Some people are used to having powerful test-frame systems for regression & performance tests.

But in the early 1970s, these rarely existed outside a few of the biggest, most sophisticated programming shops. There was quite a bit of manual labor, with people recommending programming teams with “program librarians.” Most larger projects were on mainframes.

So where did some of this come from? As in science, things build on others’ work, BUT:

{UNIX, C, make} came from Bell Labs/Murray Hill, but the early 1970s efforts to make UNIX more useful for more audiences were centered around one particular project:

Programmer’s Workbench, dreamed up by Evan Ivie, done to support a 1000+-programmer staff writing code for IBM, Univac, and XDS computers, among others.

PWB/Unix Shell”. “Shell programming” enabled large numbers of programmers to whip up procedural automation of all sorts without spending too much time doing it. This is also the origin of features that {Steve Bourne, Dennis Ritchie, and I} later generalized into Environment Variables for UNIX 7th Edition. That’s what lets one avoid hard-coded pathnames, which was one of the reasons for creating it in the first place. The PWB shell was also where variable search path ($PATH) started. Of course, all this (more-or-less} has been used in every UNIX variant since then, plus Windows besides.

SCCS was the progenitor of most modern source code versioning systems, i.e., SCCS=>RCS=>CS, for example. Marc Rochkind (next office) wrote the first one, and later, he and Alan Glasser did more (My office-mate, so we had a lot of discussions, with delta-charts all over our whiteboard. That was 1974-1976.)

PWB/UNIX had a big emphasis on document preparation, and sometimes on embedding documentation in source code on ways that programs could pull them out and turn into documents as need be, i.e., source code and documents were all part of combined repositories.
That’s part of what the “MM macros” were about.

LEAP was a PWB subsystem that let a minicomputer fake being a bunch of mainframe terminals to run regression and performance tests. (That’s a piece I had nothing to do with.)

We talked about this publicly for the first time in late 1976, at which time we were running the largest UNIX computer center, and had done a lot of work on reliability and security. Of the 6-paper set we presented Oct 1976, here was the Introduction. While this may seem mundane now, at the time it was pretty far ahead of normal practice, particularly at its relatively low cost.

BUT STILL, at Bell Labs, even though we had this technology in the 1970s, and big software engineering projects used it, our research scientists didn’t use much, for all the reasons others have mentioned. We had multiple projects in the 300-programmer size range, and some of them had far more stringent requirements for performance, real-time-response, and reliability than most programmers are used to. Bell Labs had everything from one-off code written by researchers for their own use, to massive codes that had to run big databases in complex networks, or run switching machines that simply *could not go down* (target: 2 hours in 40 years), and that had incredibly complex update strategies, because switches had to get new OS software releases while they were running.

The appropriate level of software engineering methodology varied by project varied hugely, and we used to make sure (I helped teach the internal course for software project management, sometimes) that people picked appropriate levels.

Agile programming is very nice where it works, but it is hardly new. See Small is Beautiful and Software Army on the March, two talks created around 1977 and 1982, respectively. Thousands of people heard these talks.

But, anyway, your comments seem just fine for doing certain classes of software, but you seriously may want to consider the idea that some other posters here just might have a bit broader experience and even know what they are talking about :-)
Timothy Chase says

9 Dec 2009 at 8:32 PM

JLS wrote in 664:

David,

on your(430), agreed these are valid, although statement 4, atmospheric CO2 == AGW, IMHO is grounded as a correlation supporting the only available(ok) yet unfalsifiable(not ok) hypothesis.

If carbon dioxide is anthropogenic — due to the burning of fossil fuel — the ratio of carbon-12 to carbon-14 will increase over time.

Falsifiable? Yes. The result? The ratio of carbon-12 to carbon-14 has increased over time, strongly suggesting that it is anthropogenic in origin.

If carbon dioxide is anthropogenic then the ratio then the percentage of oxygen in the atmosphere should be decreasing over time.

Falsifiable? Yes. The result? The percentage of oxygen in the atmosphere has been decreasing over time, strongly suggesting the carbon dioxide is the result of the combustion of fossil fuel.

(NOTE: There are additional arguments — and more exact arguments in terms of carbon accounting, but the above should be easy enough for most people to understand.)

Then again, there is a general problem with the principle of falsiability in that strictly speaking no modern theory can be tested independently of all other theories. I go into it here.

But one can and should strive to make one’s theories as falsifiable as possible. And unlike the Omphalosian argument that so many “creationist theories” resemble, the “theory” that the major source of additional carbon dioxide is anthropogenic in origin (like the “theory” that there is a sun which illuminates the world — which I see when I stick my head out the window) certainly meets that requirement.

However if we keep emitting carbon dioxide it is quite possible that the major source of carbon dioxide will be “natural” (in a sense) — due to positive feedback from the carbon cycle, e.g., if the ocean temperature rises enough then it will begin to degas the carbon dioxide that is currently suspended in it, if the temperature of permafrost rises sufficiently then it will emit methane that will decompose, resulting in more carbon dioxide, etc.

I hope this helps…

PS

I strongly suspect David knew all this already. In fact his understanding of the science is probably just as good as mine if not better — but I wanted to respond just in case David didn’t see your comment right away. After all, it has been about four days since he wrote what you just responded to.
JLS says

9 Dec 2009 at 10:06 PM

Barton,

on your (641), JLS: “Nuclear Winter” was neither good modelled climate science.

BPL: Actually, it was. The famous 1984 “Nuclear Autumn” paper righties still cite as “disproving” it got the plume heights wrong by a factor of three. See Turco et al. 1991.

I based this on cites of Carl Sagan conceding the actual climactic effects of the 1991 Gulf War did not meet predictions derived from work done by the TTAPS group. It could be the plumes from the burning Kuwaiti fields were supposedly not lofted high enough, but the point is much in this field remains disputed or unconfirmed. That said, I’m aware that Pinatubo did validate other of the scenarios.
Jason says

9 Dec 2009 at 10:25 PM

Re: 645 Kevin King:

Just have to totally disagree with 631.
If you don’t test basic code you’ve got a problem.
You’ ve evidently never worked in a production environment.
Exception handling is the thing. Ah ha..the CRU doesn’t write
code for production environments. Okay but It still seems to
me the need to handle exceptions correctly and consistently is
important. Exceptions are not handled correctly in some of the code from the CRU.

Exception handling is a means to an end, not an end in itself.

I don’t bother including exception handling in the little tools that I write for my own personal use, either.

Why? Because I’m writing a program, not creating a product. A program is something that works if you get the inputs just right and know how to use it; a product will still do something sensible no matter what inputs you give it and no matter what you do to it at each step.

It is poor engineering (in the sense of not optimising resource use) to put the effort required into making a product when all you needed was a program because a product easily costs ten times as much to create.

What really matters in this context is whether it gives the right result, and so far nobody seems to have found a bug in the released code that, when fixed, gets rid of the hockey stick or anything else — just as no serious bugs were found in the GISS code when that was released years ago. This should not be surprising, because if there were serious bugs in the code, they would have led it to produce results inconsistent with all the other temperature reconstructions out there, and the programmers would have noticed this and fixed them.

Instead, they’re complaining that CRU wasn’t using software development methodologies that even the biggest commercial software houses don’t always use!

And John Mashey 633. The emails talk about different releases of
the software. There are multiple versions. Are you seriously suggesting no releases are made and
not stored in a repository that is regularly backed up? Are they
stored on the developer’s laptop?

Even as a computer science postgraduate many years ago I didn’t use a “proper” version control system. I just created a tarball of my current source code whenever a milestone was reached (e.g. a paper that relied on it was published) or I was about to embark on a major modification.

Combined with nightly server backups that would allow me to go back to an earlier date at will (but with some effort) it was fine and minimally intrusive. (I don’t know why you are suggesting that the code might not even have been backed up and was perhaps stored on the developer’s laptop since it seems to have been stolen from a central server. You seem to think that if they didn’t implement Best Practices they must have gone out of their way to do the opposite…)

Things have come a long way since then but I don’t know why you are so shocked that scientists in other fields aren’t using formal version control systems. People did manage to get by before CVS was invented, you know. (In fact, I did try out RCS — the precursor to CVS — for a while, but actually went back to my old system fairly quickly.) Frankly, your comments about version control sound too much like those who ridicule anybody still using Subversion instead of Git.

You really don’t seem to understand that practices that can be justified in a multi-developer team releasing commercial software products over a long period of time and supporting customers running different versions of those products just aren’t required for simple tools that are run by their authors to transform some data based on some mathematical formulas.

I would actually have to advertise agile development here
and simple unit testing. Okay I have no experience of coding Fortran
modules but let’s assume the following.
I write a piece of code that runs a collection of combined algorithms
that I put together to combine data from 2 different datasets.
I now add another piece of code that uses the above that I’ve
packed off undocumented in some file I forgot to name correctly. Let’s
call it fudge.gy.
The new piece of functionality appears to work. It returns apparently
reasonable results. Unfortunately I’m not aware of the fact the
script I forgot to test is silently swallowing exceptions and returning
unreliable data. Had I constructed some unit tests against which
I could compare input and output data I would have picked this up.

Yes, if there was a bug, or if the data being fed into the software failed to meet the assumptions placed upon it by the researcher, then adding in unit tests and exception handling could well have told them right away that the bug in the code or the data existed.

But if the bug had no impact on the result (allowing it to go undetected), or if it did have an impact on the result but the cause took longer to find because of the lack of testing, then this may still have been the right tradeoff compared to putting in all the extra time and effort to include the unit tests and exception handling code.

I mean okay, let’s skip the continuous builds. Let’s just
have a simple reproducible set of unit tests where I can test input data
against output data. Surely that’s not too much to ask.

Don’t forget that they would no doubt be comparing the output from a freshly modified version of the program to the output of the previous version and if the result was different to what they expected based on the theoretical basis of the modification they made, they would investigate. There’s a big difference between “adopting the latest agile development fad holus-bolus” and “completely failing to check the results”.

And 635. Of course I have seen worse code. But trillions of dollars
rest on code like this. Not much of an excuse.

No, they don’t. As I said. Nobody is saying policy should be decided on the back of what this code produces. It is one tiny piece of a much larger picture, and if it didn’t fit then it would be questioned.

Think about it: GISS released all their source code and data years ago. It shows much the same thing as CRU’s code, with the differences explainable by the differences between their methodologies. Nobody has found any bugs in the GISS code that materially change the results, and so far I haven’t seen any reports of bugs in the CRU code that materially change the results.

People can, and do, write working programs without all the latest large scale software development Best Practices. Criticising the development methodology without any evidence that the results are wrong or that it caused the software to be more expensive to develop seems odd to me.
Steve Fish says

9 Dec 2009 at 10:29 PM

For all the programming experts commenting on the stolen code:

Without further information from the folks involved, you have absolutely no way of knowing what the importance of the code snippet in the stolen e-mails was. Gavin has suggested that it was not involved in any finished product (published study). If you doubt this, or are just concerned, there is a simple solution.

Much of the results published from the CRU data has been replicated using other databases and code that is freely available. You can obtain the code and run your own verification on the data, or just analyze the code for accurate output. All of this speculation is very, very empty, especially if you are actually a programmer. So, prove your point by making a contribution to the science. If you have said that openness and replication is fundamental to science, and the outcome is important to our society, prove it.

Steve
Steve Fish says

9 Dec 2009 at 10:46 PM

Comment by JLS — 9 December 2009 @ 3:21 PM:

You appear to be saying that CO2-greenhouse warming, described by climate science, is just the result of a descriptive statistical model and simple correlation. It is not. If I have misunderstood what you have said, never mind, otherwise go study.

Steve
Philip Machanick says

9 Dec 2009 at 11:48 PM

RaymondT #614: more on the McLean et al. 2009 paper here, specifically a claim from Carter that temperature lags ENSO by 7 months, which is probably not far off. What is odd is that Carter tries to claim that the paper is not about trends yet claims to have eliminated the possibility of CO_2 as a major driver of temperature. Since his analysis method eliminates the trend, his paper certainly is not about trends, but then he cannot claim to have accounted for a factor that is linearly increasing.
Kevin McKinney says

10 Dec 2009 at 12:35 AM

Ray, a rare slip on your part, I’m afraid.

Fourier’s 1824 paper was the first to enunciate the conceptual framework of greenhouse warming, identifying (and quantifying) it as a normal part of the Earth’s climate system.

But attribution to CO2 & water vapor had to wait for Tyndall, 1860 or 61.

Needless to say, your larger point stands.

Those interested can access these classic GW papers and many others here.
James A says

10 Dec 2009 at 12:47 AM

Have you checked http://www.informationisbeautiful.net/? Go there, you might like what you see (And may want to link that picture, too)
MR SH says

10 Dec 2009 at 1:14 AM

Gavin, I read the papers again and concluded that “the ‘trick’ Phil wrote is no more than the exclusion of the outlayers of reconstructed series from tree rings.
If so, most of the empirical sttatistical analyst have daily experienced like the application of LPF in the signal processing.

Is my understanding correct?

thanks

[Response: No. He was trying to construct a long term smoothed temperature record without odd end-point effects in the middle. – gavin]
manacker says

10 Dec 2009 at 2:36 AM

Robert Butler (670)

You wrote:

“the libertarians, conservatives and anarchists are universally skeptics on the issue of climate change. Their way of looking at the world sees big government as a problem… and not without reason.”

This line of reasoning attempts to limit the debate on AGW to a strictly political one, rather than one about the validity of the supporting science. This is incorrect, of course. Rational skepticism is an integral part of the scientific process, which really has nothing to do with politics.

It is an insistence on empirical data based on actual physical observations to support a theory (where model simulations are not empirical data).

But back to the statement on “big government as a problem”, since you are “more at home on an internet forum that discusses history and politics than science”.

Two quotations by H.L. Mencken, plus one by C.S. Lewis illustrate this point fairly well, in context with the political debate surrounding the AGW premise.

H.L. Mencken

“The whole aim of practical politics is to keep the populace alarmed (and hence clamorous to be led to safety) by menacing it with an endless series of hobgoblins, all of them imaginary.”

“The urge to save humanity is always a false front for the urge to rule it”:

C.S. Lewis

“Of all tyrannies a tyranny sincerely exercised for the good of its victim may be the most oppressive. It may be better to live under robber barons than under omnipotent moral busybodies. The robber baron’s cruelty may sometimes sleep, his cupidity may at some point be satiated, but those who torment us for our own good will torment us without end for they do so with the approval of their own conscience.”

Although these statements were written before the current AGW debate started, they apply very well for today’s political situation surrounding the AGW debate.

Max

[Response: Marvellous. So only politicians who explicitly say they are in it for the money can be trusted, and no possible idea for improving the lot of society can possibly be sincere or effective. Tosh. Take that public education, unemployment insurance, social security, medicare, public transport etc. etc. Be careful Max, your true colours are showing. – gavin]
CM says

10 Dec 2009 at 3:42 AM

Timothy Chase (#682), I think you meant carbon-13, not carbon-14 (though fossil fuel emissions should decrease the 14C ratio too).
Jason says

10 Dec 2009 at 4:04 AM

#650, John:

Let me just quote a bit about the scientific method from Wikipedia:

“Another basic expectation is to document, archive and share all data and methodology so they are available for careful scrutiny by other scientists, thereby allowing other researchers the opportunity to verify results by attempting to reproduce them. This practice, called full disclosure, also allows statistical measures of the reliability of these data to be established.”

The key question is: “Are the results reproducible?”

Note the use of the word “methodology” above rather than “source code”. The algorithm needs to be documented. The source code is simply an expression of that algorithm, and if the result is important, others will attempt to implement the same algorithm and, if they do not get the same results, they will publish papers saying so (or, at the very least, ask the original authors why they can’t get it to work). There is far more reputation to be gained from finding a serious flaw in an important result than simply saying “me, too”!

Many years ago I was doing a PhD in Computer Science, and if there is one field where you might expect source code to be available, that would have to be it — and often it is. But, also quite often, it is not. Many times I crafted my own implementation of an algorithm published in a paper to try out the ideas and it never once occurred to me that the authors weren’t adhering to the scientific method by not giving me their source code. Quite frankly, even in the cases where source code was available, I would prefer to implement it myself because it is often faster to write my own implementation of the published algorithm than to try to figure out what their source code is doing. This is because the former is targetted towards a human audience, while the latter is written for a computer, and may be written in a language or for a target environment that is different to yours. It also ensures that what they have claimed about the algorithm is correct, because if you simply use their source code you can’t be certain that the source code accurately reflects the algorithm presented in the paper — you might overlook the same discrepancy that they (intentionally or not) did.

The only advantage to having everything is that you can quickly prove that they did not lie about the output of the code they wrote, and, quite frankly, that is the least likely scenario. Reimplementing the published algorithm from scratch, ignoring their implementation completely, is far more likely to detect any bugs they might have in their code than analysing their source in painstaking detail would because the chance of both of you making the same mistake independently translation the algorithm into source is pretty slim.

In cases like temperature reconstructions from station records, not knowing what stations they used is also beneficial, because if you can make an independent selection of stations and data and still reproduce their results then their results are pretty robust. If you get different results to them then questions need to be asked about how they selected their stations — questions that it may not be obvious need asking if you simply used the same data that they did, inheriting the same station selection bias, and then generate the same output.

And this is where the climate scientists at large fail. You cannot scrutinize the data fully. You cannot look into the details of the methodology since the little unreported details are hidden in the source code and you plain and simple cannot determine if the scientist is producing poor and error prone code. And this state of affairs will of course give scientist producing poor code a competitive advantage in the rush for pushing out papers.

And we need to fix this now! Open the data and open the code. Otherwise there will always be a cloud of mistrust against your results.

As Gavin said, the GISS source code and data has been available for years and is consistent with CRU’s.

It is far easier to detect “little unreported details hidden in the source code” if you try to reproduce their results solely from the published methodologies; if it wasn’t hard to detect them in source code then software in general would be far less buggy and open-source products with thousands of developers, like Linux, would never have them.
Mick says

10 Dec 2009 at 5:20 AM

Can someone critique this analysis of the ice core records in Greenland and Antartica please?

http://www.foresight.org/nanodot/?p=3553

[Response: It’s a single record – not a hemispheric reconstruction, and the piece is full of misrepresentations and strawmen arguments. Who has said that CO2 is the only effect on climate? – gavin]
Nick Barnes says

10 Dec 2009 at 5:55 AM

Kevin King at 645, and others:

If you think it is vitally important for climate science software to be better, come over to the Clear Climate Code project and make it better.
Ray Ladbury says

10 Dec 2009 at 6:02 AM

Kevin @688 My bad. Thanks for the correction.
JBowers says

10 Dec 2009 at 7:46 AM

670: Robert Butler asks: “All that philosophical rumbling aside, I’ve got problems in the latest skirmish of the moment. People might want to to a search on “darwin zero smoking gun.” The skeptics have started another skirmish, and there isn’t much available on the net yet to counter.”

One point has been raised by this blogger who lives close to Yamba, and queries if it’s the same Yamba:

Perplexed by Smoking Gun
Santa never made it to Yamba?
http://broadcast.oreilly.com/2009/12/perplexed-by-smoking-gun.html

Quote: “3000km, to give some context, is a little more than the distance between London England and Ankara Turkey. More than the distance from Detroit Michigan and Kingston Jamaica.”
JBowers says

10 Dec 2009 at 7:52 AM

Addendum to my last post on Yamba:

Try this link as well:

http://scienceblogs.com/deltoid/2009/12/willis_eschenbach_caught_lying.php?utm_source=sbhomepage&utm_medium=link&utm_content=channellink
Deech56 says

10 Dec 2009 at 9:01 AM

RE Dendrite

Thanks to Timothy Chase (posts 594, 602, 627) for the info.

I second that thanks. Timothy, I linked to your posts in our local rag.

Much of this is known to us, but it’s nice to see it all put together. I think it would be interesting to see how the message surrounding the hack has spread; the spin from the echo chamber seems to have resonated with journalists. But maybe with the increased visibility, there are new opportunities to educate, but we may have to wait for another annual temperature record for people to be receptive to the scientific message.

« Older Comments

Newer Comments »

CRU Hack: More context

About Gavin

1285 Responses to "CRU Hack: More context"

ABOUT

DATA AND GRAPHICS

INDEX

Realclimate Stats

About Gavin

Reader Interactions

1285 Responses to "CRU Hack: More context"

Footer

ABOUT

DATA AND GRAPHICS

INDEX

Realclimate Stats