New rule for high profile papers

4 Jan 2008 by Gavin

New rule: When declaring that climate models are misleading in a high profile paper, maybe looking at some model output first would be a good idea.

This is a reference to an otherwise interesting paper in Nature this week (Graversen et al) on the vertical structure of heating in the Arctic in recent decades. One of the key results is that during the summer, when temperatures near the surface are constrained to be close to zero by the presence of open water and sea ice, the troposphere heats up anyway. The mechanism for this heating is hypothesised to be related to changes in atmospheric heat transport. So far so good.

But towards the end, there is this curious line:

Our results do not imply that studies based on models forced by anticipated future CO₂ levels are misleading when they point to the importance of the snow and ice feedbacks. …. Much of the present warming, however, appears to be linked to other processes, such as atmospheric energy transports.

The clear implication is that climate models don’t suggest that atmospheric heat transports will change and that all polar amplification in those possibly misleading models is driven by snow and ice feedbacks. But is this correct? Well, it’s hard to tell from this paper because they don’t look at any model results!

This didn’t stop the AP from declaring the heat transports to be part of some “natural and cyclical increase”! For National Geographic it was just ‘mysteriously occurring’….

But in order to see what models have to say, all one has to do is look. With the easy availability of the CMIP3 archive, it’s not too difficult to do the analysis for all the IPCC AR4 simulations for this exact period. As a short cut (and just because there is an easy interface) you can also go to the GISS archive and to pull down the figure for the summertime (Jun-Aug) temperature changes in the “all forcings” run for the same time period (1979-2001). If you do so, you’ll see that in the Arctic, the models also suggest that summer time surface changes are small and that there is heating aloft – similar to the analysis in this paper. The match to the ERA-40 analysis isn’t perfect by any means (but the match between different analyses products is not that great either). More analysis would need to be done to work out what was forced and how large the weather noise is etc, but the basic phenomena seems to be quite universal and not mysterious at all.

The point is that this isn’t difficult stuff, and it should be standard practice to at least give a cursory look at what models actually show before accusing them of being misleading.

About Gavin

183 Responses to "New rule for high profile papers"

Aaron Lewis says

11 Jan 2008 at 12:48 PM

Re Comment on 137

Doing substantial GCM in Python would require pushing policy makers, managers, and scientists up the object-oriented programming/ Python learning curve. Currently, they may not actually program in FORTRAN, but FORTRAN is familiar to them. They would need to become comfortable with the new environment. Once they are up there, they will love the view. My feeling is that the best investment in climate modeling that we could make would be to spend $600,000 training managers and scientists at GISS and NCAR on Python programming. It could mostly be done online using, using materials similar to what raypierre has set up and some online tutors. A smart policy maker in the organization could make this happen very fast.

Once all those folks get up that curve, then the time and cost to answer a question will be less. That is efficiency. Any measure of efficiency needs to include: formulating the question, writing the grant proposal, funding, writing the code, running the code, and analyzing the results to get the answer to the original question that simulated the grant proposal. With Python, a physical scientist might be able to skip the whole proposal and funding delay, build the model modifications in a long weekend, and have the answer in time for the next AGU conference. That is efficiency! Almost real time climate science. It is about the only way to deal with the data from very non-linear climate responses.

The flexibility of FORTRAN has affected how many questions were answered for the money and time that we spent. Are we getting best value for the money that we spend on climate science? Are we answering questions fast enough? More flexible models means more questions answered.

(Every program should writen in clean, maintainable code. Work paid for by tax dollars should be professional quality.)
Barton Paul Levenson says

11 Jan 2008 at 3:51 PM

Mr. Lewis —

Have you noticed that Python is interpreted and Fortran is compiled? That means in math-intensive calculation loops, like those used in atmosphere or climate simulations, the Fortran program will run about 20 times faster. That’s probably the main reason most scientific programming isn’t done in Python.

If someone comes out with a native-code Python compiler, you might begin to have a case. But you’d be up against 50 years of compiler writers learning to optimize Fortran code for various processors and operating systems.
Mike Boucher says

12 Jan 2008 at 4:30 PM

Attempting to come up with a better language for future highly parallel scientific computing by choosing among existing languages designed in the past for serial general computing may work, but one can imagine at least one other probable outcome.

Another approach underway by my former employer is to create a new language specifically designed for the purpose:
http://www.gcn.com/print/26_03/43058-1.html
http://research.sun.com/projects/plrg/Publications/1.02_steele.pdf
http://research.sun.com/minds/2005-0302/

There are undoubtedly other language development efforts underway, but this is the one of which I know. Ordinarily, I would be skeptical in the extreme about an effort to develop a language that needs to be supported by such an extensive and sophisticated infrastructure (IMSL, MPI, graphics libs, compilers, users’ groups, docs, etc.). However, whatever you think of Java, I think it’s indisputable that it came with an enormous infrastructure of at least reasonable quality and the organization that did that could do the same here.
Martin Vermeer says

12 Jan 2008 at 4:56 PM

Re: #151 vs #152:

Barton, I have to agree with Aaron here. Sure Python is interpreted, but did you notice that SciPy is based around NumPy, containing all the primitives for numerical analysis in precompiled form?

I have unfortunately no personal experience with Scientific Python, but I do with Octave, which has standard numerics packages linked in — as Fortran libraries ;-) So all those time consuming inner loops you worry about are (or can be) optimised by the best in compiler technology…
Hank Roberts says

12 Jan 2008 at 5:52 PM

> Fortress

http://fortress.sunsource.net/

Searching ‘climate’ in the project’s mail archive, I see email addresses indicating they’ve attracted some climate modelers.

Nice pointer, thanks Mike. Tamino, do you know these folks?
Barton Paul Levenson says

13 Jan 2008 at 5:58 AM

Martin Vermeer writes:

[[I have unfortunately no personal experience with Scientific Python, but I do with Octave, which has standard numerics packages linked in — as Fortran libraries So all those time consuming inner loops you worry about are (or can be) optimised by the best in compiler technology…]]

If they’re calling subroutines rather than computing in-line they’re still going to be slower.
Martin Vermeer says

13 Jan 2008 at 1:56 PM

Re #156

If they’re calling subroutines rather than computing in-line they’re still going to be slower.

Yeah, sure… knowing what you’re doing is the working assumption here ;-)

It’s called vectorization. I used to do this myself in a previous life, coding the inner loop of what was then a massive gravity field inversion problem in PDP-11 assembler (so now you know how old I am). Enter both vectors from Fortran, allow the assembler routine to do a dot product, and fetch the result from memory in Fortran again.

It isn’t really any different with Python/Fortran as it was with Fortran/assembler.
dhogaza says

14 Jan 2008 at 4:54 AM

If they’re calling subroutines rather than computing in-line they’re still going to be slower.

So you would recommend, say, that every program that needs to compute the sin of an angle do so inline rather than calling the sin() subprogram?
Barton Paul Levenson says

14 Jan 2008 at 6:55 AM

Except that Python is interpreted and Fortran is compiled, so any given line of code will be slower for Python, including subroutine calls.
Lawrence Brown says

14 Jan 2008 at 10:02 AM

Speaking of noise, NOAA will calculate temperatures for you using a thermometric method you may not be familiar with:
http://www.srh.noaa.gov/elp/wxcalc/cricketconvert.html
Martin Vermeer says

14 Jan 2008 at 10:48 AM

Re #158:

Good example. Unlikely to make a difference where you call it from, sin(x) is going to take a while to evaluate, and inlining wouldn’t help much either. But the Chebychev polynomial doing the evaluation will be of assembler efficiency nevertheless.

OTOH there are situations where you can efficiently evaluate a large number of different function values of the same argument x using Clenshaw summation. This applies to exp, sin/cos and, e.g., Legendre functions. You need to have a recursion relation. This often happens when doing Fourier or the like, like spherical harmonics. And FFT is again a whole story of its own. All serious languages for numerics should have these things as primitives.

Re #159:

At this point I usually stop arguing. I made my point and cannot do so any clearer ;-)
Barton Paul Levenson says

14 Jan 2008 at 3:34 PM

Let me give an example.

0001 do 10 i = 1, 10
0002 v = sin(x)
0003 v = incrediblyComplexRoutineInAssembler(v)
0004 c = exp(-v)
0005 continue

Line 3 will take about the same amount of time whichever language calls it. Lines 1-2 and 4-5 may take 20 times longer. So the loop will still take about 20 times longer to execute in an interpreted language than it will in a compiled language.
Dylan says

14 Jan 2008 at 5:17 PM

Re: 162, only true if the assembly language function takes roughly the same amount of time as the rest of the loop. More typically, that function would take up 19/20ths of the loop execution time, so whether the rest of the loop was in a low-level compiled language or a high-level interpretive language wouldn’t make much difference. Especially so with modern, optimising interpreters that recognise loops and pre-compile them. Further, in most real-life cases, the time-savings programmers gain from using a higher level languages allows them to concentrate on the sort of optimisations that computers don’t do well automatically: choice of algorithm, use of heuristic “hints”, better feedback so that less time is wasted running algorithms that are returning obviously incorrect results etc. etc.
There really is very little reason these days to use such a low-level language as Fortran to write an entire application.
Martin Vermeer says

15 Jan 2008 at 4:15 AM

Re #162: That depends on how “incredibly complex” line 3 is, doesn’t it? Already if it’s an FFT on a 1024×1024 point grid, it will dominate the CPU budget — unless your interpreter is really badly written. Not even Microsoft Basic was that bad.

Ah, I see Dylan making the same point already.
dhogaza says

15 Jan 2008 at 9:55 AM

BPL, I made my living writing compilers from 1971 until 1999. Your blanket assumptions are wrong, starting with the declaration that “FORTRAN is a compiled language while Python is an interpreted one”.
Barton Paul Levenson says

15 Jan 2008 at 11:28 AM

Really??? Fortran isn’t compiled and Python isn’t interpreted??? What did I download from Salford Software — a Fortran compiler or a Fortran interpreter? I could have sworn a compiler produced native code which could be run directly from the operating system, while an interpreter translated code and ran it essentially one line at a time. Where am I wrong there? I hope I’m not too old to learn. I certainly don’t have your experience, as I was only a professional computer programmer from 1984 to 1992 and occasionally thereafter, and have only written a few compilers.
Tim McDermott says

15 Jan 2008 at 7:42 PM

The difference between compiled and interpreted is not all that large these days. The interesting issue in computing’s near future is “who is going to harness the power of multiple processors in an easy to use way. The current price for an 8 core Mac Pro (2 quad core 2.8 GHz xeons, 2 GB memory, 320 GB disk, etc.) is 2800 USD.

To make that kind of power easily accessible probably needs a new language/OS combination. Stand by, we are at another pivot point in computing.
dhogaza says

16 Jan 2008 at 11:02 AM

Really??? Fortran isn’t compiled and Python isn’t interpreted??? What did I download from Salford Software — a Fortran compiler or a Fortran interpreter?

I love it when people get snarky about something they know so little about.

Cheers. Enjoy your “expertise”!
dhogaza says

16 Jan 2008 at 11:23 AM

Well, snarky-Barty, here’s a FORTRAN product that includes both a compiler AND an interpreter:

http://www.hicest.com/

Still want to argue that compilation/interpretation is an intrinsic attribute of a language itself, rather than a description of one particular implementation of a language?

while an interpreter translated code and ran it essentially one line at a time. Where am I wrong there?

You’re right! Bingo! Congratulations!

However, the standard Python implementation isn’t an interpreter according to your (correct, classic) definition. It compiles Python to a virtual machine, a byte-code virtual machine to be precise. That virtual machine is interpreted but in theory one could build a computer with an instruction set that would execute the byte-code directly, as Burroughs did with its mainframes that executed an Algol virtual machine back in the late 1960s/1970s. Or Semantics (I think) did with Lisp.

There’s at least one JIT project for Python, and another Python compiler project which compiles to a lower-level language (presumably C, like the old CFront) which would then compile to machine code.

And pseudo-code compilers for FORTRAN – just like the standard Python compiler in principle – were not unknown back in the days of smaller, slower computers. For instance, the FORTRAN compiler for the PDP-8, compiled to a virtual machine just like the standard Python implementation does today.

“interpreted” vs. “pseudo-code compiler” vs. “native compiler” is an attribute of the implementation, not the language.

Now it’s true that some language-computer architecture combinations that are very difficult to write native code generators for. Note that it’s the combination, not the language itself, that makes the problem difficult.

and have only written a few compilers.

Yeah, sounds like it, uh-huh.

Let’s see, a couple of other things …

Even if the standard Python implementation was a true interpreter, it wouldn’t interpret “line by line” because statements can span lines in Python. Python is not Basic…but now I’m just being mean.
Barton Paul Levenson says

16 Jan 2008 at 12:22 PM

dhogaza, running on a virtual machine which is in turn running on a real machine is still going through an interpreter as far as I’m concerned. Intermediate code is nothing new. And yes, I’m familiar with Fortran interpreters. It doesn’t change the fact that more than 99% of Fortran development engines are native-code compilers and I don’t know of ANY native-code compilers for Python. If you’re such a genius in the field, why don’t you write one?

I stand by the concept that Fortran is faster than Python for any functionally equivalent code, and no scientist in his right mind would run a climate simulation on an interpreted language. For a global climate model you want speed, speed, and more speed, because you may be simulating climate in 100,000 boxes every three virtual hours for a hundred virtual years. You’d have to be clinically nuts to write such a thing for an interpreted language when you could use a compiler.
Hank Roberts says

16 Jan 2008 at 2:16 PM

Would you programmers look at
http://fortress.sunsource.net/
please? I’ve known Mike Boucher a long time; if he says it’s worth a look, it’s worth a look.
12 January 2008 at 4:30 PM

I realize coming into a religious argument between two old religions recommending a new one is fraught with whatever fraught consists of, but nevertheless, Sun’s done OK with Java and a climate programming language could be timely if it’s working out.

There’s always
“INTERCAL — the Language From Hell
Like FORTRAN, INTERCAL uses line numbers which are optional and follow in no … The COME FROM statement enables INTERCAL to do away with nasty GOTO’s, …
catb.org/~esr/intercal/stross.html”
raypierre says

16 Jan 2008 at 11:14 PM

It’s nice to see an argument break out over programming languages. A refreshing difference from the usual.

To get perfomance out of Python, or any interpreted language, you need to get used to writing your program in a non-loopy way. A big loop in Python, as in Matlab, will be slow. The trick is to isolate useful numerically intensive primitives, and optimize/compile those. Python already has a lot of compiled packages available, including the usual array primitives in NumPy. An advantage of Python is that it is easy to build new ones. Heck, we even turned the whole NCAR radiation code into a Python object. Once you encapsulate the Fortran in a Python object, you don’t need to look at it again until it’s time to revise the low level algorithm.

Rodrigo’s ClimT doesn’t incorporate parallelism, but Christian Dieterich’s ocean model (soon to be re-named OMlet) runs under PyMPI. After six years, we are finally beginning to know how to do this stuff, but now the money is running out.
Tim McDermott says

16 Jan 2008 at 11:56 PM

Raypierre,

Have you considered taking modeling open source? There are lots of headaches, like folks getting into flame wars about things less significant than language choice. But a well run OS project can turn out high quality code cheaply.

And think how it would vex certain people who could no longer complain about hidden source code.

Another side effect could be that the project could be an open tutorial on how climate models work. There is a lot of talent in the world waiting for something meaningful to do. We might even get some folks who would otherwise be taking pictures of weather stations.
Lawrence Coleman says

17 Jan 2008 at 3:40 AM

Just to keep you guys up to date…last year was Australia overalls 5th hottest year on record and the HOTTEST year on record for Western Australia, Victoria and New South Wales. Some parts of Adelaide an Melbourne reguraly had temps in the low to mid forties which lasted for days at a time. Lets see what the start of the La Nina will bring and whether it can cool us down some for next summer.
Abbe Mac says

17 Jan 2008 at 9:57 AM

Re #172 & 173

Raypierre,

Tim is calling for open source. Is the NCAR radiation code which is now a Python object available? Some of us might like to see it.

Cheers, Alastair.

[Response: One of the goals of my whole project was to take GCM’s open source. It was too ambitious for the amount of time and money we had, but we got partway there. The whole ClimT modelling package is available at http://mathsci.ucd.ie/~rca/climt/ . That includes the Python-wrapped NCAR radiation package. An older and more limited version is available for download for Mac G4 on my ClimateBook web site, but Rodrigo’s official site is the best place to go for the current stuff. ClimT actually is a whole object-oriented system with hooks for dynamics and so forth so it’s overkill if you just want the radiation. Still, not hard to use. –raypierre]
tsk says

17 Jan 2008 at 1:29 PM

Raypierre,

I am a long time python programmer, and know some control theory and physics. I have used python for wrapping models of much lower complexity. Do you know of any educational / simple models for me to experiment with?

Thanks for your time!
Barton Paul Levenson says

17 Jan 2008 at 2:32 PM

tsk: Start with the simplest model of all, a zero-dimensional radiative equilibrium model of the Earth:

F = (S / 4) (1 – A)

where F is the absorbed solar flux density (in watts per square meter), S the Solar constant at Earth’s orbital distance, and A the Earth’s bolometric Bond albedo. Once you have F, it will tell you the Earth’s radiative equilibrium or “effective” temperature:

Te = (F / sigma) ^ 0.25

where sigma is the Stefan-Boltzmann constant.

The next step after that: Model the atmosphere as a slab which allows sunlight to pass through, but absorbs all infrared from the ground. The ground’s solar input is F from the model above. It also gets H from the atmosphere. The ground puts out G. For energy to be conserved, F + H has to equal G. For the atmosphere, it’s receiving G, and putting out H both upward and downward — G has to equal 2 H. From that, figure out the temperature of the atmosphere and the temperature of the ground. Compare them to the results from the first model.

For a third model, try using two slabs, both of which pass sunlight but absorb all infrared. See if you can extend it conceptually to N slabs and relate N to the ground’s temperature.
Alastair McDonald says

18 Jan 2008 at 5:16 AM

Re #175

Thanks for your response Ray.

I will have a look at that code and see if I can use it.

Cheers, Alastair.
Chuck Booth says

18 Jan 2008 at 11:39 AM

Sorry to go off topic and interject politics into RC discussions, but this NPR interview with James Hansen and Mark Bowen, author of Censoring Science: Inside the Political Attack on Dr. James Hansen and the Truth of Global Warming, might be of interest to some:

http://www.npr.org/templates/story/story.php?storyId=17926941
Julian Flood says

30 Jan 2008 at 5:10 AM

quote Quite clearly from Figure 1(d), in the autumn the whole layer can be affected by what has happened at the surface (they sort of indicate this was by transport, but it could as well be by changes in radiation terms) unquote

Could you/anyone expand on this? I’m interested in emissivity changes of open Arctic water in particular. Have there been changes? If the emissivity were to fall, what would be the atmospheric signal?

Lynn Vincentnathan wrote

quote A mistake on the other side, such as overestimating the danger from GW, would not be harmful, since as we all know mitigating GW (even if it is not happening…and it is) would be of great help in saving people money and boosting the economy, without lowering living standards or productivity. Sort of a win-win-win-win situation, when you factor in better health and wealth and well-being from reducing other enviro problems to boot. unquote

Would that this were so — there are some bizarre sequestration proposals which are quite capable of doing major damage. Letting loose a crackpot carbon-fixation scheme on the principal that it would not be harmful might well be more damaging than the disease. To buy time — if things are that urgent — I would prefer people looked at Salter and Latham’s albedo enhancement proposal. It uses salt water, windpower, costs relatively little and can be switched off in seconds.

JF
Barton Paul Levenson says

30 Jan 2008 at 11:30 AM

JF writes:

[[To buy time — if things are that urgent — I would prefer people looked at Salter and Latham’s albedo enhancement proposal. It uses salt water, windpower, costs relatively little and can be switched off in seconds.]]

On the other hand, it allows the oceans to continue getting more and more acidic. Mitigation without emission reductions is not necessarily a good idea, especially if the mitigation makes people think, and fossil fuel producers assert, that they can now take a more leisurely pace toward phasing out fossil fuels.
dhogaza says

30 Jan 2008 at 3:23 PM

dhogaza, running on a virtual machine which is in turn running on a real machine is still going through an interpreter as far as I’m concerned.

And then, if you build a machine that executes the virtual machine code, the interpreter becomes a compiler, despite not a single line of code being changed.

Now, there’s a useful distinction between compiler and interpreter.

Not.

Nor a usual one.

And yes, I’m familiar with Fortran interpreters. It doesn’t change the fact that more than 99% of Fortran development engines are native-code compilers

Which means you’re aware that “compiled” vs. “interpreted” is an attribute of an implementation, not the language, but are too stubborn to admit it.

If you’re such a genius in the field, why don’t you write one?

Because after spending 20 years writing multi-language/multi-backend optimizing compilers I got tired of it, and decided to do other things in my life.

But that doesn’t mean I’ve forgotten everything I learned in those 20 years.
Barton Paul Levenson says

30 Jan 2008 at 4:39 PM

dhogaza, missing the point as usual, writes:

[[dhogaza, running on a virtual machine which is in turn running on a real machine is still going through an interpreter as far as I’m concerned.

And then, if you build a machine that executes the virtual machine code, the interpreter becomes a compiler, despite not a single line of code being changed.]]

And if your mother had wheels, she’d be a trolley.

Yes, if you took out the intermediate layer, it would be a compiler. Good observation. But the fact of the intermediate layer being there is what makes it an interpreter, and that’s what makes it slower than a compiled language.

It’s hard to believe you really don’t understand this. It just seems like more of your fighting for the sake of fighting.