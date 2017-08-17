It’s often been said that while we can only gather new data about the planet at the rate of one year per year, rescuing old data can add far more data more quickly. Data rescue is however extremely labor intensive. Nonetheless there are multiple data rescue projects and citizen science efforts ongoing, some of which we have highlighted here before. For those looking for an intro into the subject, this 2014 article is an great introduction.
Weather diary from the the Observatoire de Paris, written by Giovanni Cassini on 18th January 1789.
I was asked this week whether there was a list of these projects, and with a bit of help from Twitter, we came up with the following:
- Old Weather (@oldweather)
- Weather Detective (closing soon)
- Weather Rescue (coming soon) [Link to be added when it goes live]
- NOAA Climate Database Modernization Program
- New Zealand (@DeepSouth_NZ)
- The International Environmental Data Rescue Organization (IEDRO)
- Atmospheric Circulation Reconstruction over the Earth (@met_acre)
- The International Data Rescue Portal (i-Dare)
- Met Éirann (poster)
- Historical Climatology (list of more databases)
- Data Rescue at home
- Historical Canadian data
- SE Australia Recent Climate History (no longer active?)
- Congo basin eco-climatological data recovery and valorisation (COBECORE)
- The climate and environmental history collaborative research environment (Tambora)
(If you know of any more, please add them in the comments, and I’ll try and keep this list up to date).
You may list my project as well. The “Congo basin eco-climatological data recovery and valorisation” (COBECORE) project I will digitize and transcribe old weather and ecologically relevant documents from the Institut National pour l’Etude Agronomique du Congo belge (INEAC), the agronomical research body in current the DR Congo, then Belgian Congo.
You can find the website and project blog at the link below. We are just getting started but it should get interesting in the next few years.
http://cobecore.org
These scientific delvings are, of course, distinguished from the more recent efforts, however imperfect, to shield U.S government data sets from actions which a hostile administration may or might take.
I am involved with several of these projects, and the most obvious difficulty with the work is that it is extremely labour intensive: reading and interpreting historical documents takes a lot of person time.
It is likely that we could speed up the work dramatically by training a machine learning algorithm to extract temperature, pressure, etc. observations from (photographs of) documents. We’ve done enough tests to be confident that this is possible, but it’s HARD (not just a matter of deploying standard character recognition tools).
If any reader of this blog is tempted by a data-science problem that would make a difference in the world, why not give this a go. I can provide a million or so digital images that need their data extracted, and a few hundred thousand images marked-up with manually extracted data (suitable for training an extraction algorithm).
It’s not a data rescue project as such, but some readers may be interested in data we hold on paper in the Climatic Research Unit (CRU) library at UEA. Many have been (partially) digitised already, but there is scope for more data digitisation in some instances.
The list is available here:
https://crudata.uea.ac.uk/cru/library/datasets.htm
http://www.meteofrance.fr/climat-passe-et-futur/l-etude-du-climat-passe
and data portal https://donneespubliques.meteofrance.fr/
I offer the following example of a productive data-rescue project:
Historical Vegetation of the Willamette Valley, Oregon, circa 1850:
Historic drought project for UK. http://historicdroughts.ceh.ac.uk/
Wouldn’t that be a perfect task for CAPTCHAs? I think Google already used CAPTCHAs for digitized Google Books when the letters couldn’t be recognized automatically. Doing the same with weather data would probably help to speed up this process a lot and when values are only accepted if 10 or 20 users enter the same number, it could also minimize the error a lot compared to a completely manual processes.
Sea ice back to 1850: https://nsidc.org/data/g10010
Data set citation: Walsh, J. E., W. L. Chapman, and F. Fetterer. 2015. Gridded Monthly Sea Ice Extent and Concentration, 1850 Onward, Version 1. Boulder, Colorado USA. NSIDC: National Snow and Ice Data Center. doi: http://dx.doi.org/10.7265/N5833PZ5.
Also see:
Walsh, John E., Florence Fetterer, J. Scott Stewart, and William L. Chapman. 2016. A database for depicting Arctic sea ice variations back to 1850. Geographical Review. doi: 10.1111/j.1931-0846.2016.12195.x.
Here’s an article about it:
https://www.carbonbrief.org/guest-post-piecing-together-arctic-sea-ice-history-1850
Glacier photographs, also as far back as the 1850s:https://nsidc.org/data/glacier_photo/
Data collection citation: National Snow and Ice Data Center, compiler. 2002, updated 2015. Glacier Photograph Collection. Boulder, Colorado USA: National Snow and Ice Data Center. http://dx.doi.org/10.7265/N5/NSIDC-GPC-2009-12.
Many of these were rescued through the NOAA Climate Database Modernization Program. Now, a Council on Library and Information Resources grant is helping us finish the scanning project.
A nice overview! The South Eastern Australian Recent Climate History project is not active at the moment, but we hope to get the data rescue portal open again soon: https://ozdocs.climatehistory.com.au/
Another great source of data for the Australian region in the late 19th Century is the Todd Folios, painstakingly rescued by the Australian Meteorological Association: http://www.charlestodd.net/Todd_Folios/
I hope these worthwhile preservation efforts extend to data on man’s impact on the hydrosphere as well- anthropogenic sea lane fallout may be a useful, and retrievable. source of proxy data for palaeoclimatology.