RealClimate logo


Data rescue projects

Filed under: — gavin @ 17 August 2017

It’s often been said that while we can only gather new data about the planet at the rate of one year per year, rescuing old data can add far more data more quickly. Data rescue is however extremely labor intensive. Nonetheless there are multiple data rescue projects and citizen science efforts ongoing, some of which we have highlighted here before. For those looking for an intro into the subject, this 2014 article is an great introduction.



Weather diary from the the Observatoire de Paris, written by Giovanni Cassini on 18th January 1789.

I was asked this week whether there was a list of these projects, and with a bit of help from Twitter, we came up with the following:

(If you know of any more, please add them in the comments, and I’ll try and keep this list up to date).

21 Responses to “Data rescue projects”

  1. 1
    Koen Hufkens says:

    Hi,

    You may list my project as well. The “Congo basin eco-climatological data recovery and valorisation” (COBECORE) project I will digitize and transcribe old weather and ecologically relevant documents from the Institut National pour l’Etude Agronomique du Congo belge (INEAC), the agronomical research body in current the DR Congo, then Belgian Congo.

    You can find the website and project blog at the link below. We are just getting started but it should get interesting in the next few years.

    http://cobecore.org

    Cheers
    Koen

  2. 2

    These scientific delvings are, of course, distinguished from the more recent efforts, however imperfect, to shield U.S government data sets from actions which a hostile administration may or might take.

  3. 3
    Philip Brohan says:

    I am involved with several of these projects, and the most obvious difficulty with the work is that it is extremely labour intensive: reading and interpreting historical documents takes a lot of person time.

    It is likely that we could speed up the work dramatically by training a machine learning algorithm to extract temperature, pressure, etc. observations from (photographs of) documents. We’ve done enough tests to be confident that this is possible, but it’s HARD (not just a matter of deploying standard character recognition tools).

    If any reader of this blog is tempted by a data-science problem that would make a difference in the world, why not give this a go. I can provide a million or so digital images that need their data extracted, and a few hundred thousand images marked-up with manually extracted data (suitable for training an extraction algorithm).

  4. 4
    Tim Osborn says:

    It’s not a data rescue project as such, but some readers may be interested in data we hold on paper in the Climatic Research Unit (CRU) library at UEA. Many have been (partially) digitised already, but there is scope for more data digitisation in some instances.

    The list is available here:
    https://crudata.uea.ac.uk/cru/library/datasets.htm

  5. 5
  6. 6
    Mal Adapted says:

    I offer the following example of a productive data-rescue project:

    Historical Vegetation of the Willamette Valley, Oregon, circa 1850:

    Land survey data recorded by the General Land Office between 1851 and 1910 were used to map historical vegetation in the Willamette Valley, Oregon.

  7. 7
    Mark McCarthy says:

    Historic drought project for UK. http://historicdroughts.ceh.ac.uk/

  8. 8
    Simon Wisch says:

    Thanks for the overview!
    Wouldn’t that be a perfect task for CAPTCHAs? I think Google already used CAPTCHAs for digitized Google Books when the letters couldn’t be recognized automatically. Doing the same with weather data would probably help to speed up this process a lot and when values are only accepted if 10 or 20 users enter the same number, it could also minimize the error a lot compared to a completely manual processes.

  9. 9
    Florence Fetterer says:

    Sea ice back to 1850: https://nsidc.org/data/g10010

    Data set citation: Walsh, J. E., W. L. Chapman, and F. Fetterer. 2015. Gridded Monthly Sea Ice Extent and Concentration, 1850 Onward, Version 1. Boulder, Colorado USA. NSIDC: National Snow and Ice Data Center. doi: http://dx.doi.org/10.7265/N5833PZ5.

    Also see:
    Walsh, John E., Florence Fetterer, J. Scott Stewart, and William L. Chapman. 2016. A database for depicting Arctic sea ice variations back to 1850. Geographical Review. doi: 10.1111/j.1931-0846.2016.12195.x.

    Here’s an article about it:
    https://www.carbonbrief.org/guest-post-piecing-together-arctic-sea-ice-history-1850

    Glacier photographs, also as far back as the 1850s:https://nsidc.org/data/glacier_photo/

    Data collection citation: National Snow and Ice Data Center, compiler. 2002, updated 2015. Glacier Photograph Collection. Boulder, Colorado USA: National Snow and Ice Data Center. http://dx.doi.org/10.7265/N5/NSIDC-GPC-2009-12.

    Many of these were rescued through the NOAA Climate Database Modernization Program. Now, a Council on Library and Information Resources grant is helping us finish the scanning project.

  10. 10
    Linden Ashcroft says:

    A nice overview! The South Eastern Australian Recent Climate History project is not active at the moment, but we hope to get the data rescue portal open again soon: https://ozdocs.climatehistory.com.au/

    Another great source of data for the Australian region in the late 19th Century is the Todd Folios, painstakingly rescued by the Australian Meteorological Association: http://www.charlestodd.net/Todd_Folios/

  11. 11
    Russell says:

    I hope these worthwhile preservation efforts extend to data on man’s impact on the hydrosphere as well- anthropogenic sea lane fallout may be a useful, and retrievable. source of proxy data for palaeoclimatology.

  12. 12

    The German Weather Service (Deutscher Wetterdienst) has several projects for data rescue from ship logs, from light vessel observations, from oversea stations and from coastal signal stations.
    In English: http://www.dwd.de/EN/climate_environment/climatemonitoring/climatedatamanagement/datarescue/data_rescue_node.html
    In German: http://www.dwd.de/DE/klimaumwelt/klimaueberwachung/klimadatenverarbeitung/datenrettung/datenrettung_node.html

  13. 13
    barry says:

    The data rescue list is a great place to point skeptics concerned with data scarcity, coverage etc. These endeavours are a natural progression for the ex-surfacestations.org enthusiasts (et al). Anthony Watts could make a clarion call for skeptics to sign up, fill in the gaps and keep the bastards honest. They’d contribute, learn some things, and hopefully become utterly absorbed with it, benefiting everyone.

  14. 14
    Mac Benoy says:

    SEARCH (SE Australia…) died over 5 years ago. The principles were more interested in paleo climate than written history climate. We are the pre-eminent data rescue event in Australia along with the great work of Dr Christa Pudmenzky running Weather Detective.
    As a CitSci project we have been operating 2 days a week for the last 11 years. We have created 110,000 images of Australasian climate records containing 10,000,000 plus data items covering 1844-1957. Tied in with project ACRE (met-ACRE), we digitise slp readings for the International Surface Pressure Databank for use by the 20CR Reanalysis System. Over 350,000 data points have been sent so far. met-acre/MERIT is the proof of concept system for centralising and hosting Project ACRE images globally.

    FYI, comment 10 by Linden Ashcroft refers to the Todd Folios – that website has been superceded (and will be closed) by http://www.charlestodd.net/MERIT/ which is in the process of being moved to the met-acre.org/Merit website

    Mac Benoy, Volunteer Project Manager
    Citizen Science Team
    Australian Meteorological Association
    working with the Bureau of Meteorology

  15. 15
    Frank Kaspar says:

    For Germany you could add:
    “Data rescue of German and international meteorological observations at DWD” with a link to this summary: https://doi.org/10.5194/asr-12-57-2015

    Background: We continuously work on the digitization of historic weather observations that we have in our own archives (on paper, handwritten, etc.). These archives do not only contain observations from Germany, but also of the oceans and land stations in many parts of the world.

    The newly digitized data are then integrated in our national climate data archive. We have an open data policy: Time series from German stations are available online: ftp://ftp-cdc.dwd.de/pub/CDC/

  16. 16

    Suggest listing our new historic hydrometeorological data digitization site…
    http://weatherwizards.org

    The site is the only known site that digitizes precipitation strip charts with volunteers. We have just been funded by the Copernicus program to modify our program to digitize barograms, thermograms and alpha-numeric forms. Our site requests volunteers to register and start digitizing. My 10 year old granddaughter is one of our most prolific digitizers. This is an excellent opportunity for retirees to help the world. This effort can be especially good for retirees confined to nursing homes or without transportation possibilities. Rick

  17. 17
    Kristen says:

    The International Soil Carbon network is having a Hack-a-thon before AGU where people work together to try and write R scripts to import digital soil carbon data into the ICSN database. So its more recent data, but vulnerable to loss nonetheless. http://iscn.fluxdata.org/2017/07/24/agu2017/

  18. 18
    Titus says:

    This one might be of interest. Written initially in 1838 and reprinted in early 1900’s and recently reproduced for preservation. There’s a whole section on environmental type disasters:

    “The New Tablet of Memory: Or, Mirror of Chronology, History, Statistics, Arts and Science”
    https://www.amazon.com/New-Tablet-Memory-Chronology-Statistics/dp/1147184275

  19. 19
  20. 20
    Rob Allan says:

    With the international ACRE initiative, you might like to include this link to the profiles of all of the ACRE regional data rescue foci: http://www.met-acre.net/chapters.htm These are on the ‘new’ ACRE WWW site: http://www.met-acre.net/

    I am now also leading on an EU-Copernicus C3S Data Rescue Service (DRS) that began in April and will run for 4 years initially, with the intention from Copernicus for it to be sustainable in the long term. A WWW site for C3S DRS should be made widely available soon, and I will pass that on when it is. In the meantime, this link will provide access to details of the ACRE/C3S DRS ACRE Argentina, ACRE South Africa and ACRE Antarctica regional data rescue foci activities: https://sites.google.com/a/met-acre.org/acre/acre-media-profile/C3S%20Data%20Rescue%20Service-SH.docx?attredirects=0&d=1

    I’d note that the 1st C3S DRS Capacity Building and 11th ACRE workshops will be held at NIWA in New Zealand in the week of the 4th of December 2017 (https://www.niwa.co.nz/c3s-acre-2017), the 12th ACRE Workshop is tentatively being scheduled to be held in October or November 2018 at the Minami-Osawa campus of Tokyo Metropolitan University in Tokyo, Japan, and the 2nd C3S DRS Capacity Building and 13th ACRE workshops will be held in Argentina around mid March-April 2019.

  21. 21

    I think you have stated the wrong year in the caption of your example. Giolvanni Cassini lived about a century earlier.