Good thing? Of course.*
I was invited to give a short presentation to a committee at the National Academies last week on issues of reproducibility and replicability in climate science for a report they have been asked to prepare by Congress. My
slides give a brief overview of the points I made, but basically the issue is not that there isn’t enough data being made available, but rather there is too much!
A small selection of climate data sources is given on our (cleverly named) “Data Sources” page and these and others are enormously rich repositories of useful stuff that climate scientists and the interested public have been diving into for years. Claims that have persisted for decades that “data” aren’t available are mostly bogus (to save the commenters the trouble of angrily demanding it, here is a link for data from the original hockey stick paper. You’re welcome!).
The issues worth talking about are however a little more subtle. First off, what definitions are being used here. This committee has decided that formally:
- Reproducibility is the ability to test a result using independent methods and alternate choices in data processing. This is akin to a different laboratory testing an experimental result or a different climate model showing the same phenomena etc.
- Replicability is the ability to check and rerun the analysis and get the same answer.
[Note that these definitions are sometimes swapped in other discussions.] The two ideas are probably best described as checking the robustness of a result, or rerunning the analysis. Both are useful in different ways. Robustness is key if you want to make a case that any particular result is relevant to the real world (though that is necessary, not sufficient) and if a result is robust, there’s not much to be gained from rerunning the specifics of one person’s/one group’s analysis. For sure, rerunning the analysis is useful for checking the conclusions stemmed from the raw data, and is a great platform for subsequently testing its robustness (by making different choices for input data, analysis methods, etc.) as efficiently as possible.
So what issues are worth talking about? First, the big success in climate science with respect to robustness/reproducibility is the Coupled Model Intercomparison Project – all of the climate models from labs across the world running the same basic experiments with an open data platform that makes it easy to compare and contrast many aspects of the simulations. However, this data set is growing very quickly and the tools to analyse it have not scaled as well. So, while everything is testable in theory, bandwidth and computational restrictions make it difficult to do so in practice. This could be improved with appropriate server-side analytics (which are promised this time around) and the organized archiving of intermediate and derived data. Analysis code sharing in a more organized way would also be useful.
One minor issue is that while climate models are bit-reproducible at the local scale (something essential for testing and debugging), the environments for which that is true are fragile. Compilers, libraries, and operating systems change over time and preclude taking a code from say 2000 and the input files and getting exactly the same results (bit-for-bit) with simulations that are sensitive to initial conditions (like climate models). The emergent properties should be robust, and that is worth testing. There are ways to archive the run environment in digital ‘containers’, so this isn’t necessarily always going to be a problem, but this has not yet become standard practice. Most GCM codes are freely available (for instance, GISS ModelE, and the officially open source DOE E3SM).
There is more to climate science than GCMs of course. There are operational products (like GISTEMP – which is both replicable and reproducible), and paleo-climate records (such as are put together in projects like PAGES2K). Discussions on what the right standards are for those projects are being actively discussed (see this string of comments or the LiPD project for instance).
In all of the real discussions, the issue is not whether to strive for R&R, but how to do it efficiently, usably, and without unfairly burdening data producers. The costs (if any) of making an analysis replicable are borne by the original scientists, while the benefits are shared across the community. Conversely, the costs of reproducing research is borne by the community, while benefits accrue to the original authors (if the research is robust) or to the community (if it isn’t).
One aspect that is perhaps under-appreciated is that if research is done knowing from the start that there will be a code and data archive, it is much easier to build that into your workflow. Creating usable archives as an after thought is much harder. This lesson is one that is also true for specific communities – if we build an expectation for organized community archives and repositories it’s much easier for everyone to do the right thing.
[Update: My fault I expect, but for folks not completely familiar with the history here, this is an old discussion – for instance, “On Replication” from 2009, a suggestion for a online replication journal last year, multiple posts focused on replicating previously published work (e.g.) etc…]
* For the record, this does not imply support for the new EPA proposed rule on ‘transparency’**. This is an appallingly crafted ‘solution’ in search of a problem, promoted by people who really think that that the science of air pollution impacts on health can be disappeared by adding arbitrary hoops for researchers to jump through. They are wrong.
** Obviously this is my personal opinion, not an official statement.
The issue is not whether to strive for R&R, but how to make scientific progress efficiently.
The main advantage of sharing data and code is that it saves the rest of the scientific community time when building on your work. It is possible when building on someone’s work that you notice that something does not work and you have to go back and reproduce or replicate the results to understand why, but this is very rare in the natural sciences. (In contrast to more empirical fields like social psychology that do not have a firm theoretical foundation.)
The person most like to build on your work is your future self. With increasingly good tools for Open Science the main beneficiary of good science practices is likely you.
I am best know for my validation study from 2012 of homogenization algorithms in the project HOME. Several climate “sceptics” have asked me where my data is. When I gave them the URL they were no longer interested.
As far as I know only one colleague has used the dataset afterwards and also just as one of many validation datasets, so it would not have been missed much. But I did make a new analysis on the data myself this week. Even after 6 years this was easy because everything was well organised to be able to publish the data. I was the main beneficiary.
Gavin, I am glad you agree with transparency.
How can this be an issue?
Why did it take Trump to force this to happen?
[Response: Huh? This is nothing to do with Trump, and everything to do with increasing community expectations, larger data volumes, journal policies, better and more usable tools being developed etc. See our previous discussions on the topic going back years. – gavin]
You might want to have a word with the
NAS https://www.nas.org/projects/irreproducibility_report
A cross one perhaps.
[Response: They are not a serious operation, and the only case where they mention climate is factually wrong in almost every particular. – gavin]
1 Victor
Quote: “(In contrast to more empirical fields like social psychology that do not have a firm theoretical foundation.)”
Where is the “firmness” in climate science? Climate science has huge unknowns and the only way to run test scenarios is with models. Sure it has a firm theoretical foundation at some level, it just no one knows how the pieces interact or which are most important. Where is the “firmness” in water vapor, clouds, aerosols, and ocean heat transfer/storage? We may very well understand more about the social interactions of people than we do about interactions in climate. How do I dare say these things? Scientists have written them.
https://eos.org/opinions/climate-models-are-uncertain-but-we-can-do-something-about-it
Hello Dan,
What is your provable background in climate research and where have you looked for answers to your questions?
“WattsUp…” is not a source with any credibility nor integrity.
Dan DaSilva: “Why did it take Trump to force this to happen?”
It took Trump’s corrupt cronies to convert a scientific movement to adapt to the new possibilities of the internet into a way to protect the political donors from inconvenient science showing that the donors are hurting the health and lives of Americans.
The abuse of the Trump administration should not be a reason to go back on something that is basically a good idea. If only because the world is a lot bigger than the USA.
Dan DaSilva: “Where is the “firmness” in climate science? Climate science has huge unknowns and the only way to run test scenarios is with models.”
Your buddies at WUWT & Co. manage to write blog posts about stuff that is physically impossible on a fairly regular basis. Many fundamentalists even deny the existence of the greenhouse effect itself. So clearly physics is helpful in weeding out bad ideas, at least within science.
If you hold a rock in your hand and you open it, the rock will fall down. If you hold a living organism in your hand and you open it, it may fly away. Nearly any empirical result is possible in the life sciences and you will thus have to put a lot of effort into understanding the processes behind it before you get to firmer physical grounds. That firmer ground includes quantified confidence intervals/uncertainties.
Eli Rabett: “You might want to have a word with the NAS”
For the innocent: This astroturf NAS, The National Association of Scholars, should not be confused with National Associations of Science, which are legitimate organisations having the interests of science in mind.
DDS,
Huge unknowns, huh? Such as? There are error bars on all the critical inputs. And for climate change not to be a serious problem, all those errors have to line up on your side. And based on the data we are seeing, it ain’t lookin’ too good for your side.
This is why we call you luckwarmers.
Seems to me a related issue is whether uncertainty is being biased low by the way the inter comparison project is run. One source of uncertainty is related to parameter choices. If there is no credible observational constraint for the choices, one should I think include a range of runs with a range of credible choices. This looks like a potential form of selection bias that is very common in CFD for example. You run the model varying the parameters until you get a credible result that you like. Then you publish that result sometimes omitting the rest of the results because they “were not the best way to run the code.” The result is that since parameter choices are often not fully specified, replication is difficult and the overall literature is biased in the positive direction. A few papers have appeared recently on this issue. Zhao the al had one on cloud microphysics models I think.
I don’t know if CMIP5 had such multiple runs of the same model. The number of parameters is in reality quite large ranging from grid density and type to numerical stabilization choices to time steps to turbulence model parameters.
But we’re not about to have another all-DDS-all-the-time thread, are we?
We’ve been through that before with his cheerfully verbose predecessors.
Same chant, different speaker.
Thanks for the pointer to NAS, digging through their website was fascinating:
“NAS was founded to confront the rise of campus political correctness.”
“1992 – NAS proposes and helps establish the American Association for Liberal Education (AALE), an accreditation body focused on liberal education, now located in Washington, D.C.” By liberal they don’t mean “liberal” but the old meaning of teaching philosophy, history, Latin – that kind of stuff – rather than an engineering, science or trade school curricula. The site is a regular potpourri of Alice in Wonderland Humpty-Dumpty word re-purposing.
Their board is chock full of scientists: political science, linguistics, anthropology. “Prior to working at NAS he was the sole librarian at the John McEnroe Library..” We have “Professor of Philosophy, University of Texas at Austin, where he specializes in the application of logic to metaphysics, the philosophy of mind, political philosophy, and philosophy of religion.” There are no hard scientists on their board or staff.
If they can do it, so can I; I’ll start calling myself a NOAA scientist (national association of amateur anthropomorphisers …)
Dan DaSilva,
Trump had nothing to do with the push for reproducibility in climate or any other science. As Gavin pointed out, it is an age-old conversation that has been going on for years – history didn’t start on January 20, 2017.
As per your assertions that nothing in climate science is firm, that would require a few things – 1) havign a flagrant diregard for learning and research (all the information you spoke of is out there and available), 2) an unwillingness to confront complex ideas in venues which are not built to simply regurgitate information in a way which conforms with your preconcieved notions and 3) a vast and unsupported assertion that everything you hear (say from climate science) that you don’t like is a lie made up by those outside your “tribe”.
If you really want to understand climate science and not just have your nonsense reflected on you (which I doubt), the resources are out there (including this site). You just have to actually look and learn.
About reproducibility and replicability:
Sorry, did I say something wrong when submitting this url with comments? or it was only lost.
about comments at #2 and #13 and so on
The awareness of issues around reproducing scientific data has been driven by the political nature of climate science, said Andrea Dutton, a geologist at the University of Florida and expert in sea-level rise.
“Climate science has undergone a lot of public scrutiny as we’re all aware,” she said. “And I think dealing with that has really increased our awareness as a community of being very rigorous about quantifying our uncertainties and being transparent in reporting, being transparent in data archiving.”
A group of researchers from the academies is reviewing the issue at the behest of Rep. Lamar Smith (R-Texas), chairman of the House Science, Space and Technology Committee. The National Academies will produce a report by the end of the year that explores the issue.
Smith has accused federal climate scientists of committing fraud and misrepresenting humanity’s role in driving climate change. He was also instrumental in helping shape a new rule proposed by EPA Administrator Scott Pruitt that would require research used by EPA to craft regulations to have data that are public and transparent.
https://www.scientificamerican.com/article/climate-science-can-be-more-transparent-researchers-say/
there seems to be so little shared reality in the human world these days. almost none.
about uncertainties and unknowns.
are the impacts from positive feedbacks for a summer ice free arctic circa 2030 or after included in any forecast climate model or the rcp 8.5 scenario (closest trend to current real trend) used in the ipcc ar5 and other papers?
Given the ar5 said there was no known changes in the AMOC at that time, is any slowing of the AMOC at 15% or more as shown @ http://www.realclimate.org/index.php/archives/2018/04/stronger-evidence-for-a-weaker-atlantic-overturning-circulation/ included in any published climate models?
Any examples where they are? Would like to see them. thank you
ab@14, You’ve got Tyndall’s experiment backwards. The rays entering the tube represent IR radiation from the Earth. The rays exiting the tube represent IR radiation to space. See https://en.wikipedia.org/wiki/John_Tyndall for more information.
Any well accepted credible climate science papers showing the extent of regional weather and climate impacts from a summer ice free arctic from 2030 or thereafter, say starting a sea ice free period of under 1 month out to 3 months?
Any well accepted credible climate science papers showing the impacts of arctic sea with only very small areas with MYI no more than 2 years old and with a summer ice free arctic period?
Any well accepted climate papers or IPCC reports projecting non-summer anomalies consistently above 10C as high as 20C in the high arctic?
Some refs would be appreciated. It’s a difficult task when over 250 new papers on climate science on average are now being published each week.
I cannot imagine how the new batch ar6 volunteer reviewers and authors are supposed to effectively and accurately get through the huge numbers of new climate papers since the ar5 at an ever increasing rate to today but with only the same numbers of scientists and ipcc staff and resources of previous assessment reports. The stress levels must be very high while they all try to hold down their day jobs. No one can do teh impossible over an extended period of time. Something has got to give.
Ray:
“For climate change not to be a serious problem, all those errors have to line up on your side. And based on the data we are seeing, it ain’t lookin’ too good for your side. This is why we call you luckwarmers.”
True believers in the Precautionary Principle who concatenate dozens of worst-case assumptions are equally at risk of Murphey’s Second Law :
If everything must go wrong, don’t bet on it.
DDS, #4–
Dan, you are getting carried away. While it is true, of course, that there are substantial uncertainties in understanding and/or modeling “water vapor, clouds, aerosols, and ocean heat transfer/storage”, let’s not forget that all of these are in fact modeled with considerable skill all the time.
You can’t say the same for basically anything in social psychology, as far as I know. Nor are the underlying mechanisms in social psychology nearly as well-understood as the physical mechanisms involved in climate studies.
Moreover, there is, contrary to your assertion, considerable understanding of “how the pieces interact [and] which are most important.”
For example, here’s a search of the water vapor feedback:
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C34&q=water+vapor+feedback+and+global+warming
There’s a lifetime of reading right there. But one indicator is to take the top search results ‘as they come’, then compare results filtered by year. Subjectively, it seems to me that more recent papers tend to be much more tightly focused in their scope, which I take to indicate that the ‘big picture’ questions are somewhat settled. (Eg., “The role of the water vapor feedback in the ITCZ response to hemispherically asymmetric forcings,” (2018) as opposed to “Water vapor feedback and global warming” (2000).)
Seriously, guys, read the Eos article, op. cit. above, if you want to follow the scientific conversation.
If you’d rather keep banging on the political spin, there’s https://undark.org/article/national-association-of-scholars-reproducibility/
If you are talking about Rayleigh scattering the scattering is a bit smaller for water vapor than for N2 and O2 (~ 5 x 10^-27 cm2. The CO2, cross-section is maybe 2 times larger. While the scattering is significant throughout the atmosphere, as long as the pressure does not change much there would not be a significant difference. It’s doubtful that such a small scattering would have been seen by Tyndall. Ballpark Rayleigh scattering in his system would be of the order of 1 part in a hundred thousand.
If you are talking about water aerosols, that would make more sense
See https://www.sciencedirect.com/science/article/pii/S0022407304002973
ab @16
You do realize that the burning of fossil fuels does not add gas molecules to the atmosphere. For every carbon atom combusted one O2 molecule is removed from the atmosphere to form every CO2 molecule that is returned. Net gain to the atmosphere is zero molecules.
AB @16
“For each molecule of the atmosphere, whatever is its chemical composition (so including CO2 and all GHG or non GHG as well), there is as much probability for a descending SW sun ray than for an upwelling IR ground ray to collide with the molecule, and both rays also have the same probability to be scattered….Conclusion: by adding gases in the atmosphere, there is less incoming SW energy and less upwelling IR energy, thus a net cooling of the global mean temperature. ”
Except only CO2 and water vapour etc absorb the IR, and this makes all the difference and creates a net warming effect. So the rest of your commentary is wrong.
“So the measured global warming in open lands can only be attributed to changes in convection, and thus to global deforestation, and not to any addition of gases within the atmosphere,”
So how do you explain warming oceans? A cooling stratosphere? More warming at night etc. More crank junk science I suppose.
AB @16
Are you coming up with this error yourself, or are you channeling the confused guy who blogs at “principia-scientific.org”?
P.S for AB: Google really wants to be your friend. I searched on terms from your question. Here’s a relevant discussion:
https://www.researchgate.net/post/Greenhouse_gases-are_they_cooling_or_heating_the_earth
I think more equanimity would be an improvement in discussions here.
4 Dan DaSilva says: “Where is the “firmness” in climate science? Climate science has huge unknowns and the only way to run test scenarios is with models.”
22 Hank Roberts says: “Seriously, guys, read the Eos article, op. cit. above, if you want to follow the scientific conversation.”
Quoting the EOS article
“Model simulations of many climate phenomena remain highly uncertain despite scientific advances and huge amounts of data.”
“Scientists must do more to tackle model uncertainty head-on.”
“Model uncertainty is one of the biggest challenges we face in Earth system science, yet comparatively little effort is devoted to fixing it”.
“aerosol radiative forcing of climate, for which the uncertainty range has remained essentially unchanged through all IPCC assessment reports since 1995″
“Without such reductions in uncertainty, the science we do will not, by itself, be sufficient to provide robust information for governments, policy makers, and the public at large.”
There is no difference between the meaning of “Climate science has huge unknowns” versus climate scientists saying “Model simulations of many climate phenomena remain highly uncertain”.
adjective: uncertain – not able to be relied on; not known or definite.
adjective: unknown – not known or familiar.
Even if DDS is seen as a contrary denier that’s no justification to distort or deny the obvious meaning of his words when it is no different to what climate scientists have said for decades.
One reason we try to reproduce work.
http://www.bbc.com/news/magazine-22223190
Carrie,
In science, “uncertain”=/=”unknown”. Uncertainty implies error bars. Error bars make things manageable. Aunt Judy is being disingenuous with her “uncertainty monster”. DDS is being either disingenuous himself or selectively credulous.
Carrie #28,
Sure there is– a (“huge”) difference.
This is the old business of accuracy v precision, rhetorically framed to be misleading. There’s a “huge” difference between saying “GMST will increase by zero +/- 1C” and GMST will increase by 2C +/- 1C.
And even “…be sufficient to provide robust information for governments, policy makers, and the public at large.” is essentially rhetorical.
Steven Emmerson @ 18,
250 new papers per week is a lot to review indeed, and sadly, most of them are clearly based on the incorrect assumption or belief in GHG having the ability to increase the global mean temperature, from the nineteenth century, when they physically can’t. The science is clear, but many scientists are not looking at facts, clear, unambiguous and experimentally demonstrated.