I was surprised by the shrill headlines from a British newspaper with the old fashioned name the Telegraph: “The fiddling with temperature data is the biggest science scandal ever”. So what is this all about?
I still remember the first time I was asked about how climate change affects El Niño. It was given as a group exercise during a winter school in Les Houghes (in France) back in February 1996. Since then, I have kept thinking about this question, and I have not been the only one wondering about this. Now I had my hopes up as a new study was just published on the evolution and forcing mechanisms of El Niño over the past 21,000 years (Liu et al., 2014).
- Z. Liu, Z. Lu, X. Wen, B.L. Otto-Bliesner, A. Timmermann, and K.M. Cobb, "Evolution and forcing mechanisms of El Niño over the past 21,000 years", Nature, vol. 515, pp. 550-553, 2014. http://dx.doi.org/10.1038/nature13963
I recently received a joint email from the World Meteorological and Health organisations (WMO & WHO) which I like to bring to the attention of our readers. Both because it shows the direction of some new developments, but also because the WMO and WHO are inviting people to share their experience with health and climate. We wrote a post on the subject climate and health in 2011, based on a book by Paul Epstein (who sadly pased away in November 2011) and Dan Ferber (Health on a Changing Planet), and are glad to see an increased emphasis on this topic. The call from WMO/WHO goes as follows:
Guest commentary from Joy Shumake-Guillemot
CALL FOR CASE STUDIES
Climate Services for Health
Enhancing Decision Support for Climate Risk Management and Adaptation
Climate services for health are an emerging technical field for both the health and climate communities. In 2012, WHO and WMO jointly published the Atlas on Climate and Health, drawing attention to the key linkages between climate and health, and how climate information can be used to understand and manage climate sensitive health risks. A new follow-up publication of Case Studies on Climate Services for Health is in preparation, and will take a next step to outline with greater detail how a wide range of health applications can benefit from using climate and weather information; what steps and processes can be used to co-develop and use climate and weather information in the health sector; and showcase how such partnerships and services can really make a difference to the health community.
Submission Guidance – Deadline October 31, 2014
We invite you to share your experiences and call attention to the increasing opportunities to solve health problems with climate service solutions. Case studies should highlight existing partnerships and good practices that demonstrate the broad range of possible applications and the value of using climate information to inform health decisions. Case studies from across health science and practice are welcomed, including examples of climate services for integrated surveillance, disease forecasting, early warning systems, risk mapping, health service planning, risk communication, research, evaluation, infrastructure siting, etc. Additionally, the publication aims to highlight the full range climate-related health issues and risks (i.e. nutrition, NCDs, air pollution, allergens, infectious diseases, water and sanitation, extreme temperatures and weather, etc.) where health decision-making can benefit from climate and weather knowledge at historic, immediate, seasonal, or long-term time scales.
Case studies should be short (~600 words, 2 pages incl. images/diagrams and references) and designed to highlight the added-value that climate services have made for managing climate risks to health. Please find additional guidance on the structure and four elements to be included at http://www.gfcs-climate.org/node/579.
For questions and submission please contact
Dr.Joy Shumake-Guillemot firstname.lastname@example.org
WHO/WMO Climate and Health Office, World Meteorological Organization, Geneva, Switzerland
Release of the International Surface Temperature Initiative’s (ISTI’s) Global Land Surface Databank, an expanded set of fundamental surface temperature records
Guest post by Jared Rennie, Cooperative Institute for Climate and Satellites, North Carolina on behalf of the databank working group of the International Surface Temperature Initiative
In the 21st Century, when multi-billion dollar decisions are being made to mitigate and adapt to climate change, society rightly expects openness and transparency in climate science to enable a greater understanding of how climate has changed and how it will continue to change. Arguably the very foundation of our understanding is the observational record. Today a new set of fundamental holdings of land surface air temperature records stretching back deep into the 19th Century has been released as a result of several years of effort by a multinational group of scientists.
The International Surface Temperature Initiative (ISTI) was launched by an international and multi-disciplinary group of scientists in 2010 to improve understanding of the Earth’s climate from the global to local scale. The Databank Working Group, under the leadership of NOAA’s National Climatic Data Center (NCDC), has produced an innovative data holding that largely leverages off existing data sources, but also incorporates many previously unavailable sources of surface air temperature. This data holding provides users a way to better track the origin of the data from its collection through its integration. By providing the data in various stages that lead to the integrated product, by including data origin tracking flags with information on each observation, and by providing the software used to process all observations, the processes involved in creating the observed fundamental climate record are completely open and transparent to the extent humanly possible.
The databank includes six data Stages, starting from the original observation to the final quality controlled and bias corrected product (Figure 1). The databank begins at Stage Zero holdings, which contain scanned images of digital observations in their original form. These images are hosted on the databank server when third party hosting is not possible. Stage One contains digitized data, in its native format, provided by the contributor. No effort is required on their part to convert the data into any other format. This reduces the possibility that errors could occur during translation. We collated over 50 sources ranging from single station records to holdings of several tens of thousands of stations.
Once data are submitted as Stage One, all data are converted into a common Stage Two format. In addition, data provenance flags are added to every observation to provide a history of that particular observation. Stage Two files are maintained in ASCII format, and the code to convert all the sources is provided. After collection and conversion to a common format, the data are then merged into a single, comprehensive Stage Three dataset. The algorithm that performs the merging is described below. Development of the merged dataset is followed by quality control and homogeneity adjustments (Stage Four and Five, respectively). These last two stages are not the responsibility of Databank Working Group, see the discussion of broader context below.
Merge Algorithm Description
The following is an overview of the process in which individual Stage Two sources are combined to form a comprehensive Stage Three dataset. A more detailed description can be found in a manuscript accepted and published by Geoscience Data Journal (Rennie et al., 2014).
The algorithm attempts to mimic the decisions an expert analyst would make manually. Given the fractured nature of historical data stewardship many sources will inevitably contain records for the same station and it is necessary to create a process for identifying and removing duplicate stations, merging some sources to produce a longer station record, and in other cases determining when a station should be brought in as a new distinct record.
The merge process is accomplished in an iterative fashion, starting from the highest priority data source (target) and running progressively through the other sources (candidates). A source hierarchy has been established which prioritizes datasets that have better data provenance, extensive metadata, and long, consistent periods of record. In addition it prioritizes holdings derived from daily data to allow consistency between daily holdings and monthly holdings. Every candidate station read in is compared to all target stations, and one of three possible decisions is made. First, when a station match is found, the candidate station is merged with the target station. Second, if the candidate station is determined to be unique it is added to the target dataset as a new station. Third, the available information is insufficient, conflicting, or ambiguous, and the candidate station is withheld.
Stations are first compared through their metadata to identify matching stations. Four tests are applied: geographic distance, height distance, station name similarity, and when the data record began. Non-missing metrics are then combined to create a metadata metric and it is determined whether to move on to data comparisons, or to withhold the candidate station. If a data comparison is deemed necessary, overlapping data between the target and candidate station is tested for goodness-of-fit using the Index of Agreement (IA). At least five years of overlap are required for a comparison to be made. A lookup table is used to provide two data metrics, the probability of station match (H1) and the probability of station uniqueness (H2). These are then combined with the metadata metric to create posterior metrics of station match and uniqueness. These are used to determine if the station is merged, added as unique, or withheld.
Stage Three Dataset Description
The integrated data holding recommended and endorsed by ISTI contains over 32,000 global stations (Figure 2), over four times as many stations as GHCN-M version 3. Although station coverage varies spatially and temporally, there are adequate stations with decadal and century periods of record at local, regional, and global scales. Since 1850, there consistently are more stations in the recommended merge than GHCN-M (Figure 3). In GHCN-M version 3, there was a significant drop in stations in 1990 reflecting the dependency on the decadal World Weather Records collection as a source, which is ameliorated by many of the new sources which can be updated much more rapidly and will enable better real-time monitoring.
Many thresholds are used in the merge and can be set by the user before running the merge program. Changing these thresholds can significantly alter the overall result of the program. Changes will also occur when the source priority hierarchy is altered. In order to characterize the uncertainty associated with the merge parameters, seven different variants of the Stage Three product were developed alongside the recommended merge. This uncertainty reflects the importance of data rescue. While a major effort has been undertaken through this initiative, more can be done to include areas that are lacking on both spatial and temporal scales, or lacking maximum and minimum temperature data.
Version 1.0.0 of the Global Land Surface Databank has been released and data are provided from a primary ftp site hosted by the Global Observing Systems Information Center (GOSIC) and World Data Center A at NOAA NCDC. The Stage Three dataset has multiple formats, including a format approved by ISTI, a format similar to GHCN-M, and netCDF files adhering to the Climate and Forecast (CF) convention. The data holding is version controlled and will be updated frequently in response to newly discovered data sources and user comments.
All processing code is provided, for openness and transparency. Users are encouraged to experiment with the techniques used in these algorithms. The programs are designed to be modular, so that individuals have the option to develop and implement other methods that may be more robust than described here. We will remain open to releases of new versions should such techniques be constructed and verified.
ISTI’s online directory provides further details on the merging process and other aspects associated with the full development of the databank as well as all of the data and processing code.
We are always looking to increase the completeness and provenance of the holdings. Data submissions are always welcome and strongly encouraged. If you have a lead on a new data source, please contact email@example.com with any information which may be useful.
The broader context
It is important to stress that the databank is a release of fundamental data holdings – holdings which contain myriad non-climatic artefacts arising from instrument changes, siting changes, time of observation changes etc. To gain maximum value from these improved holdings it is imperative that as a global community we now analyze them in multiple distinct ways to ascertain better estimates of the true evolution of surface temperatures locally, regionally, and globally. Interested analysts are strongly encouraged to develop innovative approaches to the problem.
To help ascertain what works and what doesn’t the benchmarking working group are developing and will soon release a set of analogs to the databank. These will share the space and time sampling of the holdings but contain a set of known (to the originators) data issues that require removing. When analysts apply their methods to the analogs we can infer something meaningful about their methods. Further details are available in a discussion paper under peer review [Willett et al., submitted].
Rennie, J.J. and coauthors, 2014, The International Surface Temperature Initiative Global Land Surface Databank: Monthly Temperature Data Version 1 Release Description and Methods. Accepted, Geoscience Data Journal.
Willett, K. M. et al., submitted, Concepts for benchmarking of homogenisation algorithm performance on the global scale. http://www.geosci-instrum-method-data-syst-discuss.net/4/235/2014/gid-4-235-2014.html
“These results are quite strange”, my colleague told me. He analysed some of the recent climate model results from an experiment known by the cryptic name ‘CMIP5‘. It turned out that the results were ok, but we had made an error when reading and processing the model output. The particular climate model that initially gave the strange results had used a different calendar set-up to the previous models we had examined.