RSS feed for comments on this post.

  1. Nice work Jared! It will be good to have a single repository of all the world’s temperature records for folks to work with.

    Comment by Zeke Hausfather — 6 Jul 2014 @ 12:54 PM

  2. Thank you Jared Rennie for taking the time to write an informative post.

    Will the results of this project only make a more robust record of surface air temperatures with maximum transparency, which is in itself a good thing? Do you and your collaborators expect anything that may be different enough from previous work on this subject that could lead to new conclusions about the temperature record?

    Comment by Joseph O'Sullivan — 6 Jul 2014 @ 1:45 PM

  3. Thank you, multinational group of scientists. Thank you RC for a new article, finally. It has been a long time.

    Comment by Edward Greisch — 6 Jul 2014 @ 3:06 PM

  4. Zeke, thanks very much! We have worked really hard on this, and will continue to do so in the future.

    Joesph, that is what we are hoping for. The databank is designed so the community can go use the data to discover features about the temperature record that may have not been known before. We are really excited to see what comes down the road!

    Comment by Jared Rennie — 6 Jul 2014 @ 7:41 PM

  5. #2 Joseph O’Sullivan

    Thanks for your interest. We expect better informed estimates of surface temperature changes from global to regional and from interannual to centennial timescale. The global mean is important but people live not in the global mean but in their home town / city / village / region. Providing better, more local, estimates will aid society, industry, planners and others to make more informed decisions. Having a better handle on the plausible range of the true evolution of surface temperatures across multiple space and timescales will provide undoubted benefits. This is why we need new analyses as a crucial next step. We need to explore potential solutions to addressing the undoubted non-climatic issues in these basic data holdings to get better estimates of the true nature of the evolution of surface temperatures.

    As to what may be different. Its highly implausible that it will change the ‘direction of travel’ … the global mean will almost certainly still be going up (confidence in this would be about as high as that the sun will rise in the morning). Even if we had never invented the thermometer the evidence from multiple other facets would mean we would still conclude this (see e.g. FAQ 2.1 in IPCC AR5 WG1). But some important details may change. The changes will be larger regionally and locally in the global mean, particularly where before we had no or very little direct observational data and now we actually have enough data to perform meaningful analyses.


    Comment by Peter Thorne — 6 Jul 2014 @ 11:48 PM

  6. #4 Jared Rennie
    I agree, it is exciting to see what this new project will lead to!

    #5 Peter Thorne
    The area that I am particularly interested in is how AGW will effect the ecosystems around the world, which if I understand correctly will require better knowledge of local temperature changes. It will be great to see further research based on this project.

    This is great stuff, thanks Jared and Peter for answering my questions!

    Comment by Joseph O'Sullivan — 7 Jul 2014 @ 11:16 AM

  7. Jared Rennie wrote:
    All processing code is provided, for openness and transparency. Users are encouraged to experiment with the techniques used in these algorithms. The programs are designed to be modular, so that individuals have the option to develop and implement other methods that may be more robust than described here. We will remain open to releases of new versions should such techniques be constructed and verified.

    Such openness, although requiring extra effort, is a good tool to counter a large fraction of skeptics. Hidden data and hidden models invite suspicion. This does just the opposite. Congratulations and thanks.


    Comment by bob — 7 Jul 2014 @ 4:58 PM

  8. OT, but mighty topical, The Heartland Las Vegas Conference has to be seen to be disbelieved !

    Comment by Russell — 7 Jul 2014 @ 10:37 PM

  9. > Hidden data and hidden models invite suspicion

    Yep. The petroleum industry uses them, to know where and when petroleum started to form; continental drift has moved them, so they rely on data and models.

    I recall decades ago reading about injury to whales along the US East Coast from illegal oil prospecting (air cannon and explosion for seismic surveying damaging hearing) — that was being done in advance of the industry’s acquiring the North Carolina legislature for leases and the federal government for permission, which it appears they have done.

    They’re not going to make those data and models public, suspicion or not. Yet illegally obtained and hidden data and models, well, what do you think?

    Comment by Hank Roberts — 8 Jul 2014 @ 9:29 AM

  10. Question — this is about land surface temperature records, I understand.
    Do you connect these with data on ocean temperature/circulation/depth?

    Comment by Hank Roberts — 8 Jul 2014 @ 10:03 AM

  11. I followed the last link in the main post for the FTP site.
    The top there has this “Welcome Message” posted:

    ** This is a United States Department of Commerce computer **
    ** system, which may be accessed and used only for **
    ** official Government business by authorized personnel. **
    ** Unauthorized access or use of this computer system may **
    ** subject violators to criminal, civil, and/or administrative **
    ** action. All information on this computer system may be **
    ** intercepted, recorded, read, copied, and disclosed by and **
    ** to authorized personnel for official purposes, including **
    ** criminal investigations. Access or use of this computer **
    ** system by any person, whether authorized or unauthorized, **
    ** constitutes consent to these terms. **

    So, this OK?

    Comment by Hank Roberts — 8 Jul 2014 @ 10:06 AM

  12. Hank,

    With regards to your first question, I will say that one of the long term plans for ISTI is to incorporate other forms of temperature, including sea surface temperature.

    In response to your second question, that is a generic warning that is placed on the NCDC FTP site. All of the data that are on this FTP are in the public domain and available for use by anyone. NCDC is aware of our efforts and knows about our openness and transparency.

    In fact, all the source code we provide has this disclaimer:

    ! **COPYRIGHT**

    I hope this helps

    Comment by Jared Rennie — 8 Jul 2014 @ 9:51 PM

  13. Great achievement, very good.

    It seems that the priority in this project has been on quantity, traceability and availability of data and also flexibility in the analysis tool. There are still uncertainty about the long term quality and drift of each individual temperature measurement stations. However, a few hundred (say 500 with a daily reading) carefully selected temperature measurement stations will provide an average value with sufficient low uncertainty to see trends over decades. A plain simple average will be a well defined measurand for the selected stations. In this way no algorithms will add uncertainty or remove traceability for the results.

    Why not invest in manual quality assurance of the siting, equipment, uncertainty and recorded temperatures for a few hundred high quality temperature measurement stations?

    The European Union Regulation on Monitoring and Reporting on greenhouse gas emission put forward quite stringent requirements on each of the many many thousand measurement stations for measurement of CO2 emissions:

    “Article 6 Consistency, comparability and transparency
    1. Monitoring and reporting shall be consistent and comparable over time. To that end, operators and aircraft operators shall use the same monitoring methodologies and data sets subject to changes and derogations approved by the competent authority.
    2. Operators and aircraft operators shall obtain, record, compile, analyse and document monitoring data, including assumptions, references, activity data, emission factors, oxidation factors and conversion factors, in a transparent manner that enables the reproduction of the determination of emissions by the verifier and the competent authority.

    Article 7 Accuracy
    Operators and aircraft operators shall ensure that emission determination is neither systematically nor knowingly inaccurate.
    They shall identify and reduce any source of inaccuracies as far as possible.
    They shall exercise due diligence to ensure that the calculation and measurement of emissions exhibit the highest achievable accuracy.

    Article 8 Integrity of methodology
    The operator or aircraft operator shall enable reasonable assurance of the integrity of emission data to be reported. They shall determine emissions using the appropriate monitoring methodologies set out in this Regulation.
    Reported emission data and related disclosures shall be free from material misstatement, avoid bias in the selection and presentation of information, and provide a credible and balanced account of an installation’s or aircraft operator’s emissions.
    In selecting a monitoring methodology, the improvements from greater accuracy shall be balanced against the additional costs. Monitoring and reporting of emissions shall aim for the highest achievable accuracy, unless this is technically not feasible or incurs unreasonable costs.”

    Why not select a few hundred high quality temperature measurement stations let them be subject to similar requirements to evaluation, verification, traceability and documentation, and finally use these validated measurement stations to create a global average temperature record?

    Comment by DF — 9 Jul 2014 @ 4:03 AM

  14. I am exited to see the following approach to verify the various methods used to combine the individual temperature data series into some kind of average or even “anomaly”:
    “To help ascertain what works and what doesn’t the benchmarking working group are developing and will soon release a set of analogs to the databank. These will share the space and time sampling of the holdings but contain a set of known (to the originators) data issues that require removing. When analysts apply their methods to the analogs we can infer something meaningful about their methods.”

    As far as I know there are only two methods available to ensure that a measurement (like global average temperature) is accurate and repeatable within stated levels of uncertainty:
    1. Make sure that all individual measurements, which are combined to form the measurand, are calibrated and traceable to international standards for weight and measures (temperature). Verify by an independent calculation control that the individual measurements (each temperature measurement) are combined in a mathematically correct way. And, verify by an independent calculation control that the algorithms performed to combine the individual measurements into the measurand are performed correctly in accordance with an internationally accepted standard method for that measurement.

    2 Calibrate the measurement method towards a reference method. The reference will then have to be a well defined measurand with a well known uncertainty. If the reference method is performed by combination of a synthetic data series, the calculations in the reference method will have to be verified by independent calculation control. It will also have to be verified that the algorithms performed in the reference method are correctly performed in accordance with the internationally accepted standard method for that measurement.

    My point is, if the algorithms performed to combine the individual measurements are anything more than a simple mathematically combination you will need an internationally accepted standard for the combination of these into a measurement result. As no such standard currently exists, how will you propose to go about to verify that the various methods for homogenization, calculation of anomalies, construction of temperature fields, calculation of average temperature etc. are accurate and repeatable over time within some level of stated uncertainty?

    Comment by DF — 17 Jul 2014 @ 2:50 AM

  15. #14–”… you will need an internationally accepted standard for the combination of these into a measurement result.”

    (See more at:

    Why? All sorts of complicated calculations are developed and used in scientific literature all the time, without the need for “international agreement.” They are vetted by normal scientific process–peer review, replication, critique and so on. Maybe I’m missing something here, but it appears that your proposal would be an extreme (and extremely cumbersome) departure from normal practice.

    Comment by Kevin McKinney — 17 Jul 2014 @ 2:44 PM

  16. Having reviewed the paper I realize that you are in the planning stage of creating a reference towards which temperature data products can be benchmarked. I would like to make a few comments to some quoted parts of your plan:

    “For the benchmarking process, Global Climate Models (GCMs) can provide gridded values of l (and possibly v) for monthly mean temperature. GCMs simulate the global climate using mathematical equations representing the basic laws of physics. GCMs can therefore represent the short and longer-term behaviour of the climate system re- sulting from solar variations, volcanic eruptions and anthropogenic changes (external forcings). They can also represent natural large-scale climate modes (e.g. El Niño– Southern Oscillation – ENSO) and associated teleconnections (internal variability).»

    Global Climate Models have no merits at all in performing these things within reasonable level of uncertainties.

    “By necessity, homogenisation algorithms have to make an assumption that a given station is at least locally representative at some point in its record.” …..”Conceptually, for any analog-station x as denoted by Eq. (1) a d term can be added to represent an inhomogeneity at time t and location l to give an observed value x′ which differs from the true value (x): x′ =ct,s +lt,s +vt,s +mt,s +dt,s. (2)”

    “These d elements should be physically plausible representations of known causes inhomogeneity (e.g. station moves, instrument malfunctions or changes, screen/shield changes, changes to observing practice over time) as summarised in Table 1. « … “A range of frequencies and magnitudes should be explored. Ideally, they should take into account the effect on temperature from the change in climate covariates (e.g. rainfall, humidity, radiation, windspeed and direction) as accurately as possible at present, accepting that in the current state of knowledge this will in many respects be an assumption based on expert judgement.»

    Can you please explain how you intend to produce an impeccable reference for algorithms by using Global Climate Models and expert judgement?

    “Worlds should incorporate a mix of inhomogeneity types discussed above and the set of worlds should be broad, covering a realistic range of possibilities so as not to unduly penalise or support any one type of algorithm or too narrowly confine us to one a priori hypothesis as to real-world error structures.»

    Can you please explain how Global Climate Model and expert judgement will not confine you to several a priory hypothesis?

    “Any data-product creators utilising the ISTI databank and undertaking homogenisation will be encouraged to take part in the benchmarking as a means of improving the uncertainty estimation (specifically homogenisation uncertainty) of their product. This will involve running their homogenisation algorithms on the blind analog-error-worlds to create adjusted analog-error-worlds, just as they have done for the real ISTI databank stations.» … “There are two components of assessment: how well are individual change points located and their inhomogeneity characterised and how similar is the adjusted analog- error-world to its corresponding analog-clean-world?» … “For Level 1 assessment of large scale features (e.g. c, l and v in Eq. 1), a perfect algorithm would return the analog-clean-world features across a range of space and time scales. Algorithms should, ideally, at least make the analog-error-worlds more similar to their analog-clean-worlds.”

    If I understand this correctly you expect the temperature data-product creators to replicate the assumptions you have made by using Global Climate Models and expert judgement and remove the artifacts which are implemented in the analogue error world. And if they use other models and other expert judgements which do not fully remove these artifacts their model will be regard to be uncertain. How do you know that your model and your expert judgement will produce reasonable artifacts? What about uncertainty in reference you have created?

    I think you are planning a fundamental flaw in trying to make artifacts by models and expert judgement that you will expect other models and expert judgement to replicate. Can you please explain how your model and your expert judgement can be regarded as an acceptable reference?

    Comment by DF — 17 Jul 2014 @ 5:40 PM

  17. DF:

    Your pontifications on traceability are not news. The World Meteorological Organization (WMO) has guidelines and standards for maintaining the calibration accuracy of instruments used in meteorological and climatological monitoring networks. Most, if not all, major national agencies will follow those guidelines. The WMO, via various national members, also organizes many field tests to understand the behaviour of various instrumentation methodologies.

    As for transforming a series of local temperature measurements into a global average: this is a question of statistical sampling. Although there is no One True Standard, the scientific literature has many papers on the subject and the various issues that present (coverage, discontinuities, etc.). A good recent paper is Cowtan and Way.

    There are several major, well-known data sets that do such global trend analysis, and they all exhibit very similar behaviour. Try going to Wood for Trees to look at them. One standard methodology for assessing uncertainty in measurement is to try a variety of measurement methods and see how much they differ by. The similarity of the various global temperature sets puts some pretty tight bounds on how much uncertainty there is in the results.

    Comment by Bob Loblaw — 17 Jul 2014 @ 7:01 PM

  18. In the paper you state:
    “The crucial next step is to maximise the value of the collated data through a robust international framework of benchmarking and assessment for product intercompari- son and uncertainty estimation.”

    There exists an international standard which is highly relevant when you intend to provide testing, benchmarking, the way you describe:

    ISO/IEC 17025 General requirements for the competence of testing and calibration laboratories is the main ISO standard used by testing and calibration laboratories» … “And it applies directly to those organizations that produce testing and calibration results.» …”Laboratories use ISO/IEC 17025 to implement a quality system aimed at improving their ability to consistently produce valid results.[2] It is also the basis for accreditation from an accreditation body. Since the standard is about competence, accreditation is simply formal recognition of a demonstration of that competence.»

    You should observe closely this international standard if you aim at receive a recognition of your ability to perform the tests, benchmark, you are intending to provide.

    Comment by DF — 18 Jul 2014 @ 12:42 AM

  19. DF,

    regarding your first post we have also been calling for almost a decade for a global surface reference network. But this will only help us moving forwards – we don’t have a Back to the Future Delorean handy to reverse instigate such a capability. This would build upon the principals in the US Climate Reference Network ( – see (OA)), and the GCOS Reference Upper Air Network ( Given the significant costs involved in setting up and maintaining such a network its not a case of snapping your fingers and it happening. There are a number of people committed to trying to make this happen.

    Regarding your second post we are following exactly metrological best practice for the historical measurements where we have no traceability. Metrologists would term this ‘software testing’ and it is covered with the GUM. We have, on the steering committee, two members of the Consultative Committee on Thermometry including the head on the newly constituted group on environmental thermometry, on which I also sit as a ‘stakeholder’.

    So, yes we are to some extent a hostage to historical measurements and economic realities but we are doing our best within these constraints and we are working with metrology and statistics communities to do the very best that is possible.

    Apologies for the delay – we were at a meeting with SAMSI and IMAGe (two statistics and applied maths groups) trying to entrain additional groups. A number of the presentations are available in delayed mode streaming. If interested these are linked from

    Comment by Peter Thorne — 18 Jul 2014 @ 1:58 AM

  20. # 19 Peter Thorne

    Thank you for your reply. I understand that it will be impossible to perform quality assurance of all historical temperature measurement stations. Simply because we lack the information required to do so. And as you point out, we cannot travel in time. :) I also recognize that modern design and operation will provide very reliable temperature measurements. But it is a long time to wait until modern design has created a sufficient record.

    However, I have the basic assumption that it has been possible to measure temperature with low uncertainty (say 0,5 K @ 95% confidence level) for quite a while. I also have the basic assumption that a metrological network should be able to identify a few hundred measurement stations that have produced accurate air temperature measurements in a basically unchanged physical local environment for let us say hundred years. If we can’t point out reliable measurement records we are left with only models and algorithms. Is it really the case that we are not able to identify a few hundred reliable measurement stations which has produced reliable records for hundred years? Records that should not be adjusted by models and algorithms?

    Comment by DF — 18 Jul 2014 @ 6:17 PM

  21. #15 Kevin

    You cannot quantify the uncertainty and systematic errors of a complicated calculation without performing a calibration.
    See the following standards for definition of terms:
    International vocabulary of metrology .
    or the guide to the expression of uncertainty in measurement
    Peer review, replication, critique are not relevant terms in accordance with these standards.

    To be strict, benchmark does not seem to be a relevant term in accordance with International vocabulary of metrology.
    However, in normal use of the word, the term benchmark can have several meanings:
    1 Relative comparison of performance of similar products. 2 Evaluate a product by comparison with a standard.

    In the first case the standard is not important at all, it is not needed, the benchmark is a relative comparison. The relative comparison can tell you something about one measurement compares to other measurements. But, you cannot quantify uncertainty or systematic errors from a relative comparison.

    In the second case the standard is extremely important. The standard must have the same definition of the measurand and the same unit as the product you calibrate, the uncertainty of the standard should be lower than the uncertainty of the product subject for test and the test case must be relevant.

    People tend to have strong belief in their own products, the moment a product owner is presented with a discrepancy between his product and a standard he will start to put the uncertainty of the standard and the relevance of the standard or test case in doubt. Imagine the current discrepancy between predicted temperature by global climate models and average temperature. If I had generated a test record looking like the late temperature record, I imagine that the product owner for the global climate model would blame my synthetic temperature record for not being realistic. While a product owner can blame a synthetic world for not being realistic, he cannot blame nature for not being realistic. If you generate an analogue world and analog error worlds you will risk that the product owners get a reason to blame the synthetic worlds for not being realistic. Hence the analog world should be real measurements.

    Also you cannot compare apples and oranges, the various temperature products do not have the same definition and unit for their measurands. They do not provide the same output. Hence you will need to agree on definition and unit for the measurand (average temperature). You will also have to agree on how to arrive at the measurand from the individual temperature measurements. If not, product owners will question the standard and how the standard arrive at the measurand from the individual measurements.

    The model and algorithms used to arrive at the measurand should be as simple as possible. Complexity will add uncertainty and weaken the traceability. This is why I tend to think that a simple average will be valuable because it will be traceable to the individual measurements, it will be verifiable and it will not involve complex algorithms and models that will have to be agreed upon. Any complex algorithm or model involved to arrive at the measurand from the individual measurements will add uncertainty and reduce the traceability.

    Comment by DF — 19 Jul 2014 @ 4:55 AM

  22. # 19 Peter Thorne

    No need to apologize for the delay. I also understand that normal working hours brings more than these comments. I´m glad for your reply. Thanks for the links, I will certainly have a look. Good to hear that you are working with metrology and statistics communities.

    Regarding the following quote from your reply:
    “Regarding your second post we are following exactly metrological best practice for the historical measurements where we have no traceability. Metrologists would term this ‘software testing’ and it is covered with the GUM.”
    It is not clear exactly which best practice you are referring to. Can you please provide a link to the best practice(s) you use for historical measurement where you have no traceability?
    Further, I cannot find the term “software testing” in Guide to the expression of Uncertainty in Measurement (Hope this is the correct interpretation of the acronym GUM). Can you please provide a link or a little more information also on this term?

    Comment by DF — 19 Jul 2014 @ 4:00 PM

  23. DF,

    Answering these the wrong way round chronologically (sorry).

    you are correct it isn’t in the GUM – my apologies.

    We had come in to the initiation meeting with some white papers and we had a fairly interesting meeting where one of the major challenges was understanding one another’s vocabulary. The metrologists from NIST and NPL pointed out that what we were talking about as benchmarking was interchangeable with what they would term ‘software testing’ whereby when the measurement is unknown at least you can verify independently how algorithms that purport to address suspected issues do actually perform. I don’t see it explicitly in the GUM but there are many links from a straight google search on software testing and metrology. Of the first page links that are freely accessible this NPL powerpoint looks reasonable:

    One other thing to state on the software testing is that a key aspect of benchmarking will be our providing a range of benchmarks with a range of assumptions viz. underlying change, variability, and structure of non-climatic artefacts so we avoid over-tuning algorithms. Further, the whole thing will be cyclical. So, if you dumb luck out the first cycle with your algorithm you’ll be found out the second when the benchmarks change. Whereas if you are truly skilful in your algorithm that will come through over consecutive cycles. The whole thing should, if subscribed to, help to sort out the wheat from the chaff of algorithms leading to a more robust set of estimates (other aspects may be maturity, documentation, software provision etc.). That does not, of course, mean they will be correct. But at least we can weed out obviously sub-optimal contributions that do not stand up to reasonable scrutiny.

    On the few hundred stations, yes, in theory, if such a subset existed and the sole interest was the global mean then we could just select them and be done. The issue is that I really do not know, at least a priori and with any certainty, what that subset is. In the meeting in Boulder we were shown a Canadian example where the instrument shelter was moved just three metres horizontally with negligible vertical displacement (mm’s) and yet there was an obvious demonstrable break in minimum temperature series (nothing ‘detectable’ in maximum). No instrument change, observer change, or time of obs change. And the siting was reasonable, not right up against a house or next to a road. Even small changes may impart biases and a station with truly no changes at all in 100+ years seems somewhat implausible. USCRN will get around this by constant annual checking and recalibration, triple redundancy and active monitoring / management. Sadly I know of no current long-running sites (century + records) that have benefitted from such careful curation such that I could hang my hat on the series and say it is truth within its plus or minus 2k-coverage factor.

    Also, most people live locally and not globally and what matters to most people is local temperatures, local change, local extremes etc. So, even if we had a few hundred truly homogeneous sites capable of getting at the global mean evolution arguably for many applications we’d still want to consider additional data to get at that rich local data.

    Comment by Peter Thorne — 20 Jul 2014 @ 2:01 PM

  24. “You cannot quantify the uncertainty and systematic errors of a complicated calculation without performing a calibration.”

    [See more at:

    International agreements about calibration exist; but you scarcely need a new agreement in order to perform calibrations.

    “Global Climate Models have no merits at all in performing these things within reasonable level of uncertainties.”

    [See more at:

    And that is what we technically term an “unsupported assertion.” It’s also one quite at odds with the published literature, which contains numerous examples of GCMs doing just what you claim they can’t. As a pretty much random example–top of search–here’s an April 2014 presentation from the AMS investigating tropical storms using reanalysis (i.e., GCM ‘data’.) As the abstract says, “The reanalysis data is constrained by observations, and can therefore be treated as ‘truth’.”

    Of course, just what you deem ‘reasonable uncertainties’ may come into play here.

    “If I had generated a test record looking like the late temperature record, I imagine that the product owner for the global climate model would blame my synthetic temperature record for not being realistic.”

    I think your imagination is incorrect; individual model runs exist that are pretty close to the observations, and indeed there are runs that show still less warming. As far as I know, no-one has ever said that they were somehow ‘unrealistic.’ For instance, see graph 1 in the link below:

    Comment by Kevin McKinney — 21 Jul 2014 @ 6:56 AM

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>


Close this window.

0.316 Powered by WordPress