RealClimate logo

On replication

Filed under: — gavin @ 8 February 2009

This week has been dominated by questions of replication and of what standards are required to serve the interests of transparency and/or science (not necessarily the same thing). Possibly a recent example of replication would be helpful in showing up some of the real (as opposed to manufactured) issues that arise. The paper I’ll discuss is one of mine, but in keeping with our usual stricture against too much pro-domo writing, I won’t discuss the substance of the paper (though of course readers are welcome to read it themselves). Instead, I’ll focus on the two separate replication efforts I undertook in order to do the analysis. The paper in question is Schmidt (2009, IJoC), and it revisits two papers published in recent years purporting to show that economic activity is contaminating the surface temperature records – specifically de Laat and Maurellis (2006) and McKitrick and Michaels (2007).

Both of these papers were based on analyses of publicly available data – the EDGAR gridded CO2 emissions, UAH MSU-TLT (5.0) and HadCRUT2 in the first paper, UAH MSU-TLT, CRUTEM2v and an eclectic mix of economic indicators in the second. In the first paper (dLM06), no supplementary data were placed online, while the second (MM07) placed the specific data used in the analysis online along with an application-specific script for the calculations. In dLM06 a new method of analysis was presented (though a modification of their earlier work), while MM07 used standard multiple regression techniques. Between them these papers and their replication touch on almost all of the issues raised in recent posts and comments.

Data-as-used vs. pointers to online resources

MM07 posted their data-as-used, and since those data were drawn from dozens of different sources (GDP, Coal use, population etc. as well as temperature), trends calculated and then gridded, recreating this data from scratch would have been difficult to say the least. Thus I relied on their data collation in my own analysis. However, this means that the economic data and their processing were not independently replicated. Depending on what one is looking at this might or might not be an issue (and it wasn’t for me).

On the other hand, dLM06 provided no data-as-used, making do with pointers to the online servers for the three principle data sets they used. Unlike for MM07, the preprocessing of their data for their analysis was straightforward – the data were already gridded, and the only required step was regridding to a specific resolution (from 1ºx1º online to 5ºx5º in the analysis). However, since the data used were not archived, the text in the paper had to be relied upon to explain exactly what data were used. It turns out that the EDGAR emissions are disaggregated into multiple source types, and the language in the paper wasn’t explicit about precisely which source types were included. This was apparent when the total emissions I came up with differed with the number given in the paper. A quick email to the author resolved the issue since they hadn’t included aircraft, shipping or biomass sources in their total. This made sense, and did not affect the calculations materially.

Data updates

In all of the data used, there are ongoing updates to the raw data. For the temperature records, there are variations over time in the processing algorithms (satellites as well as surface stations), for emissions and economic data, updates in reporting or estimation, and in all cases the correction of errors is an ongoing process. Since my interest was in how robust the analyses were, I spent some time reprocessing the updated datasets. This involved downloading the EDGAR3 data, the latest UAH MSU numbers, the latest CRUTEM2/HadCRU2v numbers, and alternative versions of the same (such as the RSS MSU data, HadCRUT3v, GISTEMP). In many cases, these updates are in different formats, have different ‘masks’ and required specific and unique processing steps. Given the complexity of (and my unfamiliarity with) of economic data, I did not attempt to update that, or even ascertain whether updates had occurred.

In these two papers then, we have two of the main problems often alluded to. It is next-to-impossible to recreate exactly the calculation used in dLM07 since the data sets have changed in the meantime. However, since my scientific interest is in what their analysis says about the real world, any conclusion that was not robust to that level of minor adjustment would not have been interesting. By redoing their calculations with the current data, or with different analyses of analogous data, it is very easy to see that there is no such dependency, and thus reproducing their exact calculation becomes moot. In the MM07 case, it is very difficult for someone coming from the climate side to test the robustness of their analysis to updates in economic data and so that wasn’t done. Thus while we have the potential for an exact replication, we are no wiser about its robustness to possibly important factors. I however was able to easily test the robustness of their calculations to changes in the satellite data source (RSS vs. UAH) or to updates in the surface temperature products.


MM07 used an apparently widespread statistics program called STATA and archived a script for all of their calculations. While this might have been useful for someone familiar with this proprietary software, it is next to useless for someone who doesn’t have access to it. STATA scripts are extremely high level, implying they are easy to code and use, but since the underlying code in the routines is not visible or public, they provide no means by which to translate the exact steps taken into a different programming language or environment. However, the calculations mainly consisted of multiple linear regressions which is a standard technique, and so other packages are relatively easily available. I’m an old-school fortran programmer (I know, I know), and so I downloaded a fortran package that appeared to have the same functionality and adapted it to my needs. Someone using Matlab or R could have done something very similar. It was a simple matter to then check that the coefficients from my calculation and that in MM07 were practically the same and that there was a one-to-one match in the nominal significance (which was also calculated differently). This also provides a validation of the STATA routines (which I’m sure everyone was concerned about).

The processing in dLM06 was described plainly in their paper. The idea is to define area masks as a function of the emissions data and calculate the average trend – two methods were presented (averaging over the area then calculating the trend, or calculating the trends and averaging them over the area). With complete data these methods are equivalent, but not quite when there is missing data, though the uncertainties in the trend are more straightforward in the first case. It was pretty easy to code this up myself so I did. Turns out that the method used in dLM07 was not the one they said, but again, having coded both, it is easy to test whether that was important (it isn’t).


Given the data from various sources, my own codes for the processing steps, I did a few test cases to show that I was getting basically the same results in the same circumstances as was reported in the original papers. That worked out fine. Had their been any further issues at this point, I would have sent out a couple of emails, but this was not necessary. Jos de Laat had helpfully replied to two previous questions (concerning what was included in the emissions and the method used for the average trend), and I’m sure he or the other authors involved would have been happy to clarify anything else that might have come up.

Are we done? Not in the least.


Much of the conversation concerning replication often appears to be based on the idea that a large fraction of scientific errors, or incorrect conclusions or problematic results are the result of errors in coding or analysis. The idealised implication being, that if we could just eliminate coding errors, then science would be much more error free. While there are undoubtedly individual cases where this has been the case (this protein folding code for instance), the vast majority of papers that turn out to be wrong, or non-robust are because of incorrect basic assumptions, overestimates of the power of a test, some wishful thinking, or a failure to take account of other important processes (It might be a good idea for someone to tally this in a quantitative way – any ideas for how that might be done?).

In the cases here, the issues that I thought worth exploring from a scientific point of view were not whether the arithmetic was correct, but whether the conclusions drawn from the analyses were. To test that I varied the data sources, the time periods used, the importance of spatial auto-correlation on the effective numbers of degree of freedom, and most importantly, I looked at how these methodologies stacked up in numerical laboratories (GCM model runs) where I knew the answer already. That was the bulk of the work and where all the science lies – the replication of the previous analyses was merely a means to an end. You can read the paper to see how that all worked out (actually even the abstract might be enough).

Bottom line

Despite minor errors in the printed description of what was done and no online code or data, my replication of the dLM07 analysis and it’s application to new situations was more thorough than I was able to do with MM07 despite their more complete online materials. Precisely because I recreated the essential tools myself, I was able to explore the sensitivity of the dLM07 results to all of the factors I thought important. While I did replicate the MM07 analysis, the fact that I was dependent on their initial economic data collation means that some potentially important sensitivities did not get explored. In neither case was replication trivial, though neither was it particularly arduous. In both cases there was enough information to scientifically replicate the results despite very different approaches to archiving. I consider that both sets of authors clearly met their responsibilities to the scientific community to have their work be reproducible.

However, the bigger point is that reproducibility of an analysis does not imply correctness of the conclusions. This is something that many scientists clearly appreciate, and probably lies at the bottom of the community’s slow uptake of online archiving standards since they mostly aren’t necessary for demonstrating scientific robustness (as in these cases for instance). In some sense, it is a good solution to a unimportant problem. For non-scientists, this point of view is not necessarily shared, and there is often an explicit link made between any flaw in a code or description however minor and the dismissal of a result. However, it is not until the “does it matter?” question has been fully answered that any conclusion is warranted. The unsatisfying part of many online replication attempts is that this question is rarely explored.

To conclude? Ease of replicability does not correlate to the quality of the scientific result.

And oh yes, the supplemental data for my paper are available here.

295 Responses to “On replication”

  1. 251
    Hank Roberts says:

    > Ferrall
    Was that the NPR econ commentator? Australian economist at QED? The gaming blogger? Two out of three? There are so many.

    > Economists
    Yep. Deltoid quotes Pooley on this cultural trait

    “Journalists have missed the economic consensus partly because economists are such a querulous bunch–they argue bitterly among themselves even when they agree….. That sort of quarrelling masks the underlying consensus and communicates a greater degree of discord and uncertainty than actually exists.” (Pooley, quoted at Deltoid)

    A solitary economist will argue with himself. Recall Harry Truman’ plea:

    The people in ‘ecological economics’ seem to work better in groups.
    Perhaps that reflects an understanding of ecology?

  2. 252


    As I noted above Dr. McKitrick has posted a follow up to his paper specifically dealing with spatial correlation. So while the statement that his original paper assumed no spatial correlation might be true, it is no longer true. In the follow up posting he has attempted to show that spatial correlation is not an issue with his results. This is quite different than assuming it doesn’t exist.

    So that I can learn more about this topic I would like to hear your comments on his follow up posting on spatial correlation.

  3. 253
    tamino says:

    Re #252 (Nicolas Nierenberg)

    I’ve downloaded the data and duplicated the regression, and it sure seems to me that there’s spatial correlation aplenty in the residuals, contrary to claims in the follow-up paper. I’ve downloaded that and am digesting it. There’s quite a bit more to do to understand it completely.

    But I can tell you this: the more I get into the details of MM07, the more screwball this analysis seems to be. It’s not just failure to deal with correlations correctly, there are lots of reasons to be suspicious of their conclusions. Whatever I find, whether I’m right or wrong, I’ll report in due time.

  4. 254


    Maybe it would be more prudent to just have written the last line. To me the rest of your post didn’t carry much information content. I look forward to your analysis.

  5. 255

    Mr Nierenberg (#252): there is a way to try to answer this question for yourself. What I suggest is the following. The MM07 follow-up offers the equation:

    u = lambda W u + e

    with e gaussian independently distributed.

    Use this eq. to generate 10,000 samples of u, using the gaussian generator for e, for each of the proposed W1, W2, W3.

    Then, look up the semivariogram plot in Gavin’s paper. The semivariogram is directly related to the autocorrelation function (in fact it’s that turned upside down) and computable from it.

    Now, plot the semivariograms for your above generated synthetic data, and see if it resembles any of Gavin’s coloured curves, e.g. the one for the surface temp data. This is what Gavin did himself, though for GCM-generated data.

    Hope this is of help.

  6. 256
    Hank Roberts says:

    > more prudent to just have written the last line

    Nah. Remember he’s not just writing for you.

    Those of us who have learned over the long term to trust Tamino know he gets busy sometimes with his day job and doesn’t have much to say for a while. It’s good to have been given some information about what he’s working on to tide us over.

  7. 257

    Mr. Vermeer,

    I am actually going down a different path. It appears from reading the literature, and Dr. McKitrick’s paper that there are well accepted tests for spatial autocorrelation. They seem to yield a test result that either accepts or rejects the hypothesis. I am looking at how to transform the data to fit the algorithms that are already implemented in R.

    Also given that Dr. McKitrick has already done computations using some of these tests I am interested in learning why his choice of tests were incorrect, or why his implementation of the tests was incorrect.

  8. 258
    Martin Vermeer says:

    Correction to #255: Gavin’s semivariogram is for the full data, not the residuals (IIUC). So the comparison would have to be against a variogram of the empirical residuals instead.

    I don’t think the tests chosen by Dr McKitrick are incorrect, the question is how realistic the three W models chosen are. That is the issue that my proposed test would address.

    Testing for autocorrelation is fine, but remember that even autocorrelation that does not show up as being significant in such a test, nevertheless may lead to a “significant” test result for data that is compatible with the null hypothesis, as ignoring it on computing variances will make the error bounds too optimistic. Monte Carlo is good in that it shows you this kind of pathology.

  9. 259

    Mr. Roberts,

    With all due respect remarks like “there’s spatial correlation aplenty”, and “the more screwball…it seems to me.” Don’t give me any idea what he is working on. They seem like placeholders for “I know something is wrong but I haven’t figured it out yet.”

    My own preliminary work shows that the residuals are very slightly positively correlated with location. Right on the edge of significance. It is my impression at this time that this slight positive correlation doesn’t affect the results, but I’m still looking into that.

  10. 260
    david_a says:

    Dr Schmidt,

    As a relative newcomer to climate science I have a few observations and a few questions. First, let me say thank you for spending the time and effort to host this site and to make available important information regarding the current state of the art in this realm of the science.

    In keeping with the thread topic I would strongly side with those calling for the routine publication of code/data in the most clear and transparent form possible. As someone upthread has suggested doing so in something like google code would be a great place to start.

    The reasonable arguments against this effort seem to be centered around the idea that the cost to the scientist in terms of time would be greater than the benefits. My only estimation of the cost side comes from my own modeling work in an unrelated field (computational finance). I am quite sympathetic to the state of your desktop and the myriad of code fragments which may reside there. Mine is similar.

    You made the point above that when things rise to a certain level of importance you spend more time on what one might call the end to end reproducibility of both code and data. For myself that level would be reached whenever any capital is to be committed to an idea. In my world the cost of error can be catastrophic (on a personal level) so there is almost no level of ‘proving’ which is not beneficial. I would posit that in your field where the public policy decisions will have global scale effects the cost to getting it wrong is infinitely higher. From this perpective I would suggest that anything which is being submitted for publication rises to the level of importance requiring the highest level of transparency and clarity.

    A secondary benefit of such activity would be make the work accessible to many people not in the field but never the less capable of understanding the math and physics were it presented in a manner designed to facilitate understandling. While some here argue that that would encourage ‘crackpots’ to nitpick and take shots thus somehow ‘wasting’ the scientist’s time I don’t see how that can be the case. The scientist is always free to ignore any criticism justified or not. To the argument that it would effect the political process because it would give ammunition to the ‘deniers’ I’d say that would be the worst reason of all. A lack of transparency is far far more disturbing than some ill formed opinion of a paper.


  11. 261
    david_a says:

    Having had the opportunity to at least read some of the papers linked by your site and having also perused the papers of those antogonistic to the magnitude of the AGW hypothesis I’d appreciate some help with the following, as I understand it:

    The amount of solar radiation hitting the planet is relatively stable at least from a total energy standpoint in the time frame of say a century or so.

    At a time of energetic equilibrium the net radiative flux at the atmospheric/space boundary would be zero, so that the amount of energy entering the system would be equal to the amount of energy leaving the system.

    Adding a large fraction of additional C02 to the atmosphere increases the radiative forcing first by retransmission of long wave back to the surface of the planet and secondly by the positive feedback mechanism of increasing temperature and then water vapor in the atmosphere which will have again the same retransmission effects.

    Absent any other forcings these effects would monotonically increase the temperature of the system until such time that the increase in outbound radiative flux due to higher surface/atmospheric temperature would cause the net radiative flux to again be zero, but now with the system at a higher equilibrium temperature.

    The atmosphere, shallow ocean (1000m ?) and land will tend to reach equilibrium significantly (?) faster than the shallow ocean with the deep ocean.

    In equilibrium between land, atmosphere and shallow ocean, the energy stored in the shallow ocean is much greater (10x ?) than that of the other two components.

    Looking at the time series of ocean heat content over the last 20 years when the C02/water vapor forcing should have been in effect there is first a rise in OHC then a flattening over the last 6-8 years or so depending on the dataset used for OHC.

    Since the physics of the forcing imply a monotonically increasing OHC (absent xfer to the deep ocean and the minor xfer to other smaller sinks) it would seem that over at least the near time the forcing is being counterbalanced by some other forcing which is causing the net radiative flux to be in balance so there is no accumulated energy in the system.

    Given your detailed understand of the various forcing factors could you guess which ones would be most likely to be increasing in the near term to offset the background CO2/H20 forcing?

    Relatedly, do the GCMs provide any probabilistic outputs as to the magnitude of the forcings and their couplings so that monte carlo simulations could assess the relative probability of departure from the model predictions?


  12. 262
    dhogaza says:

    Since the physics of the forcing imply a monotonically increasing OHC (absent xfer to the deep ocean and the minor xfer to other smaller sinks) it would seem that over at least the near time the forcing is being counterbalanced by some other forcing which is causing the net radiative flux to be in balance so there is no accumulated energy in the system.

    Where did anyone claim that ENSO and other heat transfer mechanisms have screeched to a halt?

  13. 263
    Mark says:

    David, ensembles are one method of providing “probabalistic” forecasts. It is more useful still in climatology to test sensitivity. E.g. if you run 10 programs that use the same code but change, say, cloud coverage fitting constants, you can see if the output of the model is particularly sensitive to that figure being wrong. If your model is not sensitive, then it is likely (unless you left something important out), that the real world is not particularly sensitive to you getting that feature wrong.

    I have heard a lot recently about ensembles wrt climate (wrt weather it’s kind of old hat) but storage becomes a HUGE problem. Add to that skeptics and denialists want the raw data (whether they do anything with it is unknown) so you can’t even reduce storage. And you have to keep it for decades maybe and you have a LOT of data to handle.

    And that costs.

    A lot.

    But, funnily enough, people don’t want to pay for that through extra taxes.


    Can’t live with them, can’t disintegrate them.

  14. 264
    david_a says:

    I’m sorry that I don’t directly understand what you are saying.

    My guess is that you are saying that ENSO and other heat transfer mechanisms are moving the heat to the deep ocean which is why it is not accumulating in the upper ocean. Since I am somewhat new at this could you explain more of this or point me towards a paper where the transfer of energy to the deep ocean is discussed and what the variance of the process might be?

  15. 265
    Hank Roberts says:

    > over the last 6-8 years
    Not enough to determine a trend. Noisy planet.

  16. 266
    Chris Colose says:

    #261 dave_a

    You are correct about the incoming solar radiation. There is a secular trend in solar irradiance from 1900-1950 or so, but none after that. Aside from longer-term changes, there is an 11-year solar cycle which has very small implications for surface temperature, and does not contribute to the long-term warming trend.

    You are right about equilibrium conditions

    The addition of further CO2 not only serves to “retransmit” radiation from the atmosphere to the surface, but other energy fluxes as well which are non-radiative. It is probably more useful to think of the enhanced greenhosue effect as working through the heat loss side of the equation at the top of the atmospher than a radiative heat gain term at the surface. It just happens that heat is mixed very well throughout the troposphere so that warming will be realized. See An Analysis of Radiative Equilibrium, Forcings, and Feedbacks as well as RealClimates A saturated gassy argument

    I don’t think I really with the terminology about land equilibriating faster than the ocean…the whole planet is out of equilibrium and the oceans are what causes a significant lag between the forcing and full response. Obviously land heats faster (and obviously oceans have a higher heat capacity) but the full warming will not be realized even on land until full equilibrium is established.

    //”Since the physics of the forcing imply a monotonically increasing OHC “//

    This is wrong. Inter and intra-annual variability does not go away with more CO2, and the trend is small compared to the weather or short time intervals. 6 year “trends” are meaningless since there is no such thing as “6 year climate.” You need to focus on a longer and more suitable timeframe to address the question of climate change. The multidecadal trend in OHC is rising (Dominigues et al 2008). As such, there is no necessity to invoke a “forcing” which is “counteracting” CO2/WV. And water vapor is a feedback, not a forcing.

  17. 267


    There is no need to save the output of multiple runs if the input data is provided as well as the software itself and the parameters used. If people properly kept versions of data there would be no need to provide input data either. But since this isn’t done it is necessary.

    In the situation that started this thread it actually makes a difference whether you used the original crut data or the updated crut data in combination with the RSS data. Dr. Schmidt emphasized the effect of RSS but I was a bit confused until I saw that the surface temperature set was changed as well. If Dr. McKitrick hadn’t provided the original surface temperature data there would have been no way to confirm that this data change mattered. In this case I don’t think it is very important, but I could imagine other cases where it would be.

  18. 268
    David B. Benson says:

    david_a (264) — Start here:

    THC primer

  19. 269
    david_a says:

    Chris, David,

    Thanks for the links to the papers. I’ll try and read them tonight.

  20. 270
    Mark says:

    re #267.

    Nope, for some, unless every byte is reproduced, someone will shout loud and long (and be picked up in the tabloids) about how the data is being massaged to “prove” AGW.

    Heck, if you check the BBC website you’ll see people saying that Gavin is bullsitting everyone and this can be proven because he hasn’t given out the source code to one of his programs used in a paper.

  21. 271
    dhogaza says:

    david_a, just remember …

    Before you earn your Nobel Prize by overturning all that’s known about climatology, you’re going to have to learn some climatology.

    Now, of course, as a guy in computational finance we’re all sure you’ll be able to master the field in a day or two, and the day following, overturn the work of thousands of professional scientists.

    But pardon us if we wait until you actually accomplish the feat, before we adorn you with global honors deserved of those that trigger a scientific revolution.

  22. 272
    david_a says:

    Hi Chris,
    The paper is very clear. It’s a great jumping off point.

    On page 4 there is an illustration due to Trenberth etal 2009 which shows a snapshot of the global energy balance. As per the illustration, the incoming solar is 341.3 W/m2, the outgoing reflected shortwave is 101.9 W/m2 and the outgoing longwave 238.5 W/m2 for a total of 340.4 W/m2 outgoing leaving a net imbalance of .9 W/m2 which is shown on the bottom as ‘net absorbed’. By your remarks above with respect to 6 years not being long enough for this signal to dominate I take it that the variance of the outgoing energy budget processes must easily be high enough to mask this signal in the short term, since the incoming solar side is relative static over the short term.

    My intuition is that the outgoing longwave process is more invariant than the reflected solar because the former is relying on what would generally be statistically aggregated properties of molecular and atomic interactions while the latter is dependent upon much more macro level effects.

    If you have any insight into both the magnitudes and the sources of these variances or could point me towards some papers that would be great.

    One bit of confusion — On page 10 of the paper there is a paragraph dealing with direct radiative forcing due to a double of CO2. The number given is 3.7W/m2. The next line goes on to say that this would be similar to a 2% rise in solar irradiance.
    3.7 / 341 ~ 1.1%. Is this a typo or am I missing something?

    Thanks again

    [Response: Forcing from a 2% increase in solar is 0.02*1366*0.7/4 = 4.8W/m2 – gavin]

  23. 273
    Chris Colose says:


    No typo. Gavin’s response gave you a back-of-envelope for solar forcing. The 1366*0.02 part comes from the 2% change and the 0.7/4 part comes from the Earth’s albedo and geometry. Although not often done or reported in the literature, if the forcing is defined at the tropopause, the RF is even a bit less since UV is absorbed above that layer.

    Keep in mind that a 2% increase in solar irradiance is absurdly large. Most ideas invoking a strong solar influence on climate change generally involve indirect effects as opposed to simple increases in total solar irradiance (e.g., UV effects on circulation patterns, cosmic rays).


  24. 274
    Glenn says:

    “David, ensembles are one method of providing “probabalistic” forecasts. It is more useful still in climatology to test sensitivity. E.g. if you run 10 programs that use the same code but change, say, cloud coverage fitting constants, you can see if the output of the model is particularly sensitive to that figure being wrong. If your model is not sensitive, then it is likely (unless you left something important out), that the real world is not particularly sensitive to you getting that feature wrong.

    I have heard a lot recently about ensembles wrt climate (wrt weather it’s kind of old hat) but storage becomes a HUGE problem. Add to that skeptics and denialists want the raw data (whether they do anything with it is unknown) so you can’t even reduce storage. And you have to keep it for decades maybe and you have a LOT of data to handle.”

    What is a “cloud coverage fitting constant”, how does one quantify or qualify a claim of “likely” with respect in general to models and has that been done or done specific to any specific model, why is storage of “ensembles”, presumably raw data/code/docs used in any particular single study, a “huge problem” and why would you necessarily “have” to keep that data for “decades”?

  25. 275
    Hank Roberts says:


    “We will be happy to load our data onto your storage devices, and contribute floor space for them, provided they will thereafter remain physically on our site and be accessible to the public on the same terms you request we offer you. You will have to confirm you have made longterm contract arrangements for electricity and air conditioning with the local utility company, for connectivity with the local Internet Service Provider, and for system operations staff on your own account. To arrange delivery of your storage hardware to our facility, after these support contracts are confirmed, please contact us ….”


    We regret that the pen drive and reel of mag tape you offered to send will only store 0.000001 and 0.00001 percent of the data you requested, respectively, and without backup.

  26. 276
    Mark says:

    Glenn, I don’t know.

    But probably still they just figure that if you take the values you have in a cell like relative humidity and make cloud cover A when it’s less than 40%, B when it’s less than 70, C under 90 and D over 90 then you get the right sort of weather.

    A B C and D and when they swap could easily change and be completely wrong.

    But if the model that uses this is run with different values of A B C D and show little sensitivity to having these factors changed, you know that the determination of these factors is not a problem.

    I don’t do climate modelling, remember. I’ve read some stuff. You can do it yourself.

    likely: run 10 different models with the same figures. If they diverge qidely, it’s likely a difficult to forecast system or your parameterisations are wrong. Not certain, it could just be bad luck. Even if they all track, they could be doing that through very good luck.

    Likely it’s merely that it is good if they track bad if they don’t.


    And the storage of the raw data is the only thing that will shut up the denialists. If any of the data is not kept, they will say that the data removed was done because it proved AGW was wrong or the model was bad.

    And that cannot be disproved to the public (it will NEVER be disproved to the denialist) without keeping ALL the data.

    That keeping it all is otherwise useless doesn’t matter when you are dealing with someone at the “banging on the table” stage of debate.

    Got it?

  27. 277
    david_a says:

    Hi Chris,
    Seek and ye shall find :)

    I found this paper which is a study of TOA SW radiation budget for 1984 – 1997. It is fascinating. The variability both spatially and temporally is quite high, and would be capable of swamping an average linear forcing signal of .6 w/m2. If I read it correctly, over the 14 year period there was annual trend forcing of 2.3w/m2 which is obviously a pretty big number. Again from my first read it appears that the big driver of the variability is clouds both in quantity, type, and structure. The general direction is more tropical clouds more reflected short wave.

    I am searching for one which extends the study to present time as the last 8 years have been relatively cooler.


  28. 278

    david_a wrote in 277:

    I found this paper which is a study of TOA SW radiation budget for 1984 – 1997. It is fascinating. The variability both spatially and temporally is quite high, and would be capable of swamping an average linear forcing signal of .6 w/m2. If I read it correctly, over the 14 year period there was annual trend forcing of 2.3w/m2 which is obviously a pretty big number. Again from my first read it appears that the big driver of the variability is clouds both in quantity, type, and structure. The general direction is more tropical clouds more reflected short wave.

    We did see cloud cover decrease over the last fifteen years of the twentieth century in the tropics which was not predicted some models. Whether this is due to diminished aerosol load, global warming or natural variability (e.g., ENSO) is still an open question inasmuch as the trend has been relatively short.

    But the net effect at the top of the atmosphere has been a reduction in reflected sunlight which almost exactly matches the increase in outgoing longwave radiation, implying no net warming or cooling as the result of diminished cloud cover, and over the same period, the temperature of the tropics has continued to rise.

    I will refer you to the same authors, later paper:

    A significant decreasing trend in OSR [outgoing solar radiation] anomalies, starting mainly from the late 1980s, was found in tropical and subtropical regions (30° S-30° N), indicating a decadal increase in solar planetary heating equal to 1.9±0.3Wm-2/decade, reproducing well the features recorded by satellite observations, in contrast to climate model results. This increase in solar planetary heating, however, is accompanied by a similar increase in planetary cooling, due to increased outgoing longwave radiation, so that there is no change in net radiation. The model computed OSR trend is in good agreement with the corresponding linear decadal decrease of 2.5±0.4Wm-2/decade in tropical mean OSR anomalies derived from ERBE S-10N non-scanner data (edition 2). An attempt was made to identify the physical processes responsible for the decreasing trend in tropical mean OSR.

    A. Fotiadi, et al, Analysis of the decrease in the tropical mean outgoing shortwave radiation at the top of atmosphere for the period 1984-2000, Atmos. Chem. Phys., 5, 1721-1730, 2005

    Incidentally, that was actually a reduction in outgoing shortwave for that period being balanced by an increase outgoing longwave – as the result of a reduction in cloud cover.

  29. 279

    I have taken a look at issues of spatial autocorrelation discussed in S09 (that started this thread) referring to MM07. My conclusions are that the main results of MM07 are not affected by spatial autocorrelation, which is in agreement with Dr. McKitrick’s follow up article. I also found that the spurious correlations reported by Dr. Schmidt using the Model E data were indeed caused by spatial autocorrelation, which is what he hypothesized in S09.

    My analysis can be found here.

    It has been quite interesting looking into this and learning about R and about spatial analysis. I look forward to comments, suggestions, and criticisms.

  30. 280
    david_a says:

    Hi Tim,
    Thanks for the link.

    The group did a lot of papers on radiation budget. The links to them are here

    There is one on tropical longwave budget here that I am just starting to read

    Though I do not have the paper reference I recall that ocean heat content rose during the period, which would imply that even though the trend in radiation balance had not changed there was some difference in the integrals or things were out of balance to begin with and that just stayed put. As a first guess the integral hypothesis would be better since after the fact OHC has declined a bit and though it is only a short term measurement it still implies current radiation balance.

    One thing I completely don’t get in this field is the apparent lag in the models which are working from what seems to be real-time data. Or more simply, why does the paper stop in 2000 when it is published in 2008. If the satellites are still orbiting and producing data why wouldn’t you update and post the results as they became available? Or is this done and it is just emailed around with the community and just not evident to the random google searcher.

    If you have any links to ocean heat content papers that would be great. This measurement would seem to be a pretty key one in the whole of the science as it is one which appears to be the best integrator of all of the data and so would be far more robust then any of the trend data.


  31. 281
    Mark says:

    “Or more simply, why does the paper stop in 2000 when it is published in 2008.”

    Why is Vista 7 still in Beta when they finished the coding in 2007?

    Why is the Texaco financials only until April 3 when they published it in May?

  32. 282
    Chris Colose says:


    The Earth system doesn’t have to be in equilibrium on short time scales (even in the absence of an underlying trend) – there is year-to-year variability all the time, and the ocean and atmosphere are always exchanging heat back and forth; changes in ocean circulation cause changes in atmospheric circulation, which cause changes in clouds and water vapor, which change shortwave and longwave radiation. The oceans exhibit their own variability, so you do not expect increases year after year, so you really need to look at the trends. Trying to evaluate climate change using time periods 1/3 or 1/5 as long as a standard climatology is like declaring that summer has ended because there is a cold week in July (sorry to neglect any of the SH folks in here).

    Following the World Meteorological Organisation (WMO), 30 years is the classical period for performing the statistics used to define climate. This of course depends on context– people talking about the “climate of the last ice age” don’t talk about hundreds of 30-year segments, so this is used in a broader context, but the standard is well suited for studying recent decades, because such an analysis requires a reasonable amount of data and still provides a good sample of the different types of weather that can occur in a particular area.

    A persistent global warming signal over such a suitable timeframe is an indication that there is a top of the atmosphere net energy imbalance, and as you suggest, a rise in OHC. This has been discussed at RC here and here, as well as other posts if you scroll through the archives.

  33. 283
    david_a says:

    Hi Chris,

    Yes it is quite clear that there is a continual exchange of energy within the earths climate system as well as continual change at TOA. However, I do not think that it is in principle impossible to decompose the various effects to lend more or less credence to a hypothesized underlying trend. I would also guess that as time goes on and measurement accuracy increases it will become easier to extract signals on a shorter and shorter time period. A lot of the noise appears to be much more a function of imprecise data sets than any limitations imposed by the physics.

    I’ve had the chance now to read a paper by Lyman et al. about recent cooling in the upper ocean which can be found here:

    One of the interesting points of the paper was their estimation of how the error bars (1sd) around one year avg OHC in the upper 750 meters had changed from about 3.7×10^22 joules down to around 0.6×10^22 joules from 1955 data to 2005 data. The latter number I believe corresponds to a yearly radiative imbalance of about .5 w/m2 at TOA. So at least from a pure measurement standpoint a .9 w/m2 imbalance is well within the realm of detection in fairly short order given the current state of the measurement system. Now of course if the number and variance of non trend components of the energy balance are large then their ability to mask the trend component would increase over any finite time frame.

    To put my idea another way, suppose you had a bunch of differing denominated coins and you flipped the whole bunch at once. Heads increases your bank account, tails decreases it. If the set of coins contained a two headed coin it would be pretty easy to see its trend in your bank account over time. The time it would take to determine its existence to any degree of confidence would be proportional to the number and values of the two head coin and all the other coins. The bigger the other coins the longer it would take to ‘know’ to some degree of certainty whether there was indeed a two headed one in the bunch.

    I’m just trying to understand the sizes and quantity of the various coins to figure out how long it should take to decide if a two headed one is there or not and how big it might be.


  34. 284
    Mark says:

    David, all that is done is take a 30-year period and average all the March temperatures, etc, all the April temperatures, etc and so on.

    30 years is picked because out of the known cyclic variations that we can consider null in comparison to human scale climate change are much less than 30 years, so you get at least a *few* repeats of the cycle. This tends to reduce the noise.

    Longer periodicities can be seen by collecting several 30-year averages together and this is done to see what the longer term natural variations are by proxy measurements (since we don’t have 200,000 year old met stations and log books).

    They are selected because of the physics we know, not because we are throwing bent coins, where the uncertainties can be calculated from abstract mathematics, something the messy real world doesn’t let us do with climate, so we HAVE to measure it. And to remove the known periodicities and short term (hence not predictive over the long term future) outliers, 30 years is about what is needed.

    Purely so the smaller scale periodicities are averaged out over at least a few cycles.

  35. 285
  36. 286
    Chris Colose says:


    Perfect measurements or not, the climate system is characterized by noise and (possibly) an underlying signal. Better instruments will not make El Ninos or La Ninas go away, and these things operate on timescales as long as, or longer than the timescales of which you think we can get a coherent signal, and remain a key source of climatic fluctuations about the mean.

    The ability to detect a signal against background noise depends on the system (and statistic) you are analyzing, and it’s not obvious that perfect measurements should make detection abilities suitable at still shorter time intervals. During glacial times for instance, the climate was not only colder but also more variable, so it was likely more difficult to distinguish between possible trends and background variability (e.g. due to ocean circulation changes). There’s no argument by intuition which suggests the climate shouldn’t be more variable, but observations and models show relative stability over Holocene-like conditions, with no simulation of the coupled atmosphere-ocean system that can spontaneously produce persistent changes as strong as a doubling of CO2.

    The equilibrium conditions and any possible secular changes depend on the TOA energy (im)balance. As such, this is the key factor in climate prediction since it serves to define the basic boundary conditions which constrain the global climate. But there’s always going to be weather superimposed on the long-term trend, so a handful of data points just won’t cut it for trend analysis.

  37. 287
    sidd says:

    Re; Lyman paper

    Wasn’t there a difficulty with this paper, already addressed in this forum ?
    aha, the magic of the Memex reveals

    followed by the discussion of the Domingues paper

  38. 288
    dhogaza says:

    dave_a is obviously mining denialist sites for evidence, as evidence by his dragging up the Lyman paper which, as the authors themselves later agreed, was based on spurious results.

    dave_a, we know you didn’t find this all by yourself. ‘fess up, dude. Then tell us why you’re mining denialist sites for what you imagine is superior science.

  39. 289


    david_a is not the “Dave A” known and loved by all of us ;-)

  40. 290
    Mark says:

    You’re quite correct, Martin.

    He spells his name differently and doesn’t use capitals.

  41. 291
    david_a says:


    I would be very surprised if at the level we are measuring, the climate system it is characterized by ‘noise’. The only fundamental noise in physics is that due to quantum mechanical uncertainly. And while aggregating up from an atomic level can give macro effects, black body radiation spectra being a big one here, at 10^-34 Js h is small enough so that it is not going to get in the way of forcings 50 orders of magnitude greater. Perhaps we are only having a semantic disagreement but I have a very different understanding of noise, at least as it applies to the measurement of physical systems. From purely a semantical standpoint I believe it would be more correct to say that there are processes we do not understand or that we are not measuring to a fine enough degree to be able to separate them out from those that we do.

    I agree that better instruments will not make El Ninos or La Ninas go away, but I do not believe that is the issue nor am I suggesting that it is. The issue is whether or not ENSO, PDO, NAO etc are truly random (emergent properties of quantum effects) or appear random because we don’t have a handle on what forces them. But to bring this back around to my real contention, even if we can not answer the prior question, can we isolate their effects on the earths radiation budget to some degree? There seems to be an accepted wisdom that 30 years is somehow a magic number below which the process level uncertainty makes trend identification inherently impossible. Perhaps this is true, but shouldn’t there be some mathematical/physical reason for it to be so, and if there is then by all means point me towards it.

    The equilibrium conditions and any possible secular changes depend on the TOA energy (im)balance. As such, this is the key factor in climate prediction since it serves to define the basic boundary conditions which constrain the global climate.

    Perfectly said.

    I would add to that by saying that by far the best integrator of the energy (im)balance over the relevent time scales for measuring GHG , or any other, forcing/feedback would be the heat content of the worlds oceans. The added bonus of the oceans, is that the measurement tools that we currently have (thousands of thermometers, tide gauges bobbing around all over the place plus a few satellites) are now reaching a measurement resolution high enough to begin giving statistically strong global heat content data for annual and intra-annual time frames.

    We also know to a high degree of precision the amount of solar radiation reaching our planet so we can easily remove this variance from the global energy budget.

    This leaves the outgoing shortwave and the outgoing longwave as the two missing pieces to close the system. And in fact if we have one of them then the other just falls out though from the standpoint of error checking and validation it would be far more preferable to have them both.

    As a first order approximation the GHG forcing/feedback effects would effect only the longwave side of the equation. So in principle if you had a good handle on the variance of the outbound shortwave, and measurement of the energy balance, you could then produce a function which related the probability of any sequence of energy imbalances with the variability in the OLR process. One could then compare the variance in the OLR predicted by the GCM’s and plug it into this function to begin to get estimates on the reliability of their forecasting skill given a particular sequence of OSR and OHC measurements. Since this data is available I can go and do this, cool.


  42. 292
    dhogaza says:

    I realize he’s not, but I do think he’s clearly mining the denialist ore. In a more sophisticated manner than cap Dave cap A.

  43. 293
    David B. Benson says:

    david_a (291) — Rather than the term noise, the phrase internal variability is sometimes used to describe the effects of ocean oscillations and so forth. It is shorter to just write “noise” and agrees with common practice in may sciences in distinguishing “signal” from “noise”.

  44. 294
    Marcus says:

    I’m confused about this new meme that seems to be spreading through the denialist community that “noise” is not a good description of certain aspects of the system.

    Google search for “data noise” pops up this definition:
    “Analysis of interactions with a site commonly involve data sets that include ‘noise’ that may affect results. Noise is data that does not typically reflect the main trends thus making these trends more difficult to identify.”

    Or as the common phrase goes: “One man’s noise is another man’s signal”.

    Clearly, noise in most contexts does not mean “only quantum noise”. Nor does it mean that there isn’t an underlying explanation for the noise. In the climate context, I think it is perfectly valid to characterize the internal shifting of heat around the system (as typified by El Nino and La Nina) as “noise” compared to the long term signal of heat accumulation. And indeed, because of the chaotic nature of weather, it may be that perfectly predicting El Ninos and La Ninas is indeed impossible – possibly due to your quantum noise! Heck, climate models, much simpler than the earth system, have “internal variability” which can be well described by “noise”, and there was a time when due to some cheap chips and a broken air conditioner that I couldn’t repeat some model runs reliably: one bit flipping in the middle of a 36 hour run would lead, inexorably, to significant changes in the year-to-year variability – though not the long term trend. Eg, “noise” or “chaos” or whatever you want to call it.

    ps. Ocean heat content would be a great way to monitor long term flux imbalances. But looking at the variability in recent papers and research by Domingues, Willis, Levitus, and Gouretski, I am stunned that anyone thinks we are anywhere close to having an ocean record with reliability as good as the surface temperature record. Of course, denialists (WattsUp and Pielke) like claiming that the surface temperature record is faulty – but I thought the resolution of the satellite trend/SAT trend in favor of SAT was a fairly impressive validation of SAT methodology to determine accurate trends (at least for the last 40 years).

  45. 295
    Mark says:

    Marcus, I have just had an idea about how to explain this “noise”. If it isn’t reliable, it’s noise.

    E.g. On a Warm summer August, any one day will be cooler or warmer than the best guess of what that day’s temperature would be based SOLELY of the records (so no computer models, just observations), which is what, really “average” means.

    However, I cannot rely on the 5th August to be the cooler next time we have a 5th August just because this one was.

    The data for this one specific day is not reliable.

    However, without any forcing of the weather to change (which is what climate change is), it is reliably consistent with the past mean and daily variations.