How do we know what caused climate to change – or even if anything did?
This is a central question with respect to recent temperature trends, but of course it is much more general and applies to a whole range of climate changes over all time scales. Judging from comments we receive here and discussions elsewhere on the web, there is a fair amount of confusion about how this process works and what can (and cannot) be said with confidence. For instance, many people appear to (incorrectly) think that attribution is just based on a naive correlation of the global mean temperature, or that it is impossible to do unless a change is ‘unprecedented’ or that the answers are based on our lack of imagination about other causes.
In fact the process is more sophisticated than these misconceptions imply and I’ll go over the main issues below. But the executive summary is this:
- You can’t do attribution based only on statistics
- Attribution has nothing to do with something being “unprecedented”
- You always need a model of some sort
- The more distinct the fingerprint of a particular cause is, the easier it is to detect
Note that it helps enormously to think about attribution in contexts that don’t have anything to do with anthropogenic causes. For some reason that allows people to think a little bit more clearly about the problem.
First off, think about the difference between attribution in an observational science like climatology (or cosmology etc.) compared to a lab-based science (microbiology or materials science). In a laboratory, it’s relatively easy to demonstrate cause and effect: you set up the experiments – and if what you expect is a real phenomenon, you should be able to replicate it over and over again and get enough examples to demonstrate convincingly that a particular cause has a particular effect. Note that you can’t demonstrate that a particular effect can have only that cause, but should you see that effect in the real world and suspect that your cause is also present, then you can make a pretty good (though not 100%) case that a specific cause is to blame.
Why do you need a laboratory to do this? It is because the real world is always noisy – there is always something else going on that makes our (reductionist) theories less applicable than we’d like. Outside, we don’t get to perfectly stabilise the temperature and pressure, we don’t control the turbulence in the initial state, and we can’t shield the apparatus from cosmic rays etc. In the lab, we can do all of those things and ensure that (hopefully) we can boil the experiment down to its essentials. There is of course still ‘noise’ – imprecision in measuring instruments etc. and so you need to do it many times under slightly different conditions to be sure that your cause really does give the effect you are looking for.
The key to this kind of attribution is repetition, and this is where it should become obvious that for observational sciences, you are generally going to have to find a different way forward, since we don’t generally get to rerun the Holocene, or the Big Bang or the 20th Century (thankfully).
Repetition can be useful when you have repeating events in Nature – the ice age cycles, tides, volcanic eruptions, the seasons etc. These give you a chance to integrate over any unrelated confounding effects to get at the signal. For the impacts of volcanic eruptions in general, this has definitely been a useful technique (from Robock and Mao (1992) to Shindell et al (2004)). But many of the events that have occurred in geologic history are singular, or perhaps they’ve occurred more frequently but we only have good observations from one manifestation – the Paleocene-Eocene Thermal Maximum, the KT impact event, the 8.2 kyr event, the Little Ice Age etc. – and so another approach is required.
In the real world we attribute singular events all the time – in court cases for instance – and so we do have practical experience of this. If the evidence linking specific bank-robbers to a robbery is strong, prosecutors can get a conviction without the crimes needing to have been ‘unprecedented’, and without having to specifically prove that everyone else was innocent. What happens instead is that prosecutors (ideally) create a narrative for what they think happened (lets call that a ‘model’ for want of a better word), work out the consequences of that narrative (the suspect should have been seen by that camera at that moment, the DNA at the scene will match a suspect’s sample, the money will be found in the freezer etc.), and they then try and find those consequences in the evidence. It’s obviously important to make sure that the narrative isn’t simply a ‘just-so’ story, in which circumstances are strung together to suggest guilt, but which no further evidence is found to back up that particular story. Indeed these narratives are much more convincing when there is ‘out of sample’ confirmation.
We can generalise this: what is a required is a model of some sort that makes predictions for what should and should not have happened depending on some specific cause, combined with ‘out of sample’ validation of the model of events or phenomena that were not known about or used in the construction of the model.
Models come in many shapes and sizes. They can be statistical, empirical, physical, numerical or conceptual. Their utility is predicated on how specific they are, how clearly they distinguish their predictions from those of other models, and the avoidance of unnecessary complications (“Occam’s Razor”). If all else is equal, a more parsimonious explanation is generally preferred as a working hypothesis.
The overriding requirement however is that the model must be predictive. It can’t just be a fit to the observations. For instance, one can fit a Fourier series to a data set that is purely random, but however accurate the fit is, it won’t give good predictions. Similarly a linear or quadratic fit to a time series can be useful form of descriptive statistics, but without any reason to think that there is an underlying basis for such a trend, it has very little predictive value. In fact, any statistical fit to the data is necessarily trying to match observations using a mathematical constraint (ie. trying to minimise the mean square residual, or the gradient, using sinusoids, or wavelets, etc.) and since there is no physical reason to assume that any of these constraints apply to the real world, no purely statistical approach is going to be that useful in attribution (despite it being attempted all the time).
To be clear, defining any externally forced climate signal as simply the linear, quadratic, polynomial or spline fit to the data is not sufficient. The corollary which defines ‘internal climate variability’ as the residual from that fit doesn’t work either.
So what can you do? The first thing to do is to get away from the idea that you can only be using single-valued metrics like the global temperature. We have much more information than that – patterns of changes across the surface, through the vertical extent of the atmosphere, and in the oceans. Complex spatial fingerprints of change can do a much better job at discriminating between competing hypotheses than simple multiple linear regression with a single time-series. For instance, a big difference between solar forced changes compared to those driven by CO2 is that the stratosphere changes in tandem with the lower atmosphere for solar changes, but they are opposed for CO2-driven change. Aerosol changes often have specific regional patterns change that can be distinguished from changes from well-mixed greenhouse gases.
The expected patterns for any particular driver (the ‘fingerprints’) can be estimated from a climate model, or even a suite of climate models with the differences between them serving as an estimate of the structural uncertainty. If these patterns are robust, then one can have confidence that they are a good reflection of the underlying assumptions that went into building the models. Given these fingerprints for multiple hypothesised drivers (solar, aerosols, land-use/land cover change, greenhouse gases etc.), we can than examine the real world to see if the changes we see can be explained by a combination of them. One important point to note is that it is easy to account for some model imperfections – for instance, if the solar pattern is underestimated in strength we can test for whether a multiplicative factor would improve the match. We can also apply some independent tests on the models to try and make sure that only the ‘good’ ones are used, or at least demonstrate that the conclusions are not sensitive to those choices.
These techniques of course, make some assumptions. Firstly, that the spatio-temporal pattern associated with a particular forcing is reasonably accurate (though the magnitude of the pattern can be too large or small without causing a problem). To a large extent this is the case – the stratospheric cooling/tropospheric warming pattern associated with CO2 increases is well understood, as are the qualitative land vs ocean/Northern vs. southern/Arctic amplification features. The exact value of polar amplification though is quite uncertain, though this affects all the response patterns and so is not a crucial factor. More problematic are results that indicate that specific forcings might impact existing regional patterns of variability, like the Arctic Oscillation or El Niño. In those cases, clearly distinguishing internal natural variability from the forced change is more difficult.
In all of the above, estimates are required of the magnitude and patterns of internal variability. These can be derived from model simulations (for instance in their pre-industrial control runs with no forcings), or estimated from the observational record. The latter is problematic because there is no ‘clean’ period where there was only internal variability occurring – volcanoes, solar variability etc. have been affecting the record even prior to the 20th Century. Thus the most straightforward estimates come from the GCMs. Each model has a different expression of the internal variability – some have too much ENSO activity for instance while some have too little, or, the timescale for multi-decadal variability in the North Atlantic might vary from 20 to 60 years for instance. Conclusions about the magnitude of the forced changes need to be robust to these different estimates.
So how might this work in practice? Take the impact of the Pinatubo eruption in 1991. Examination of the temperature record over this period shows a slight cooling, peaking in 1992-1993, but these temperatures were certainly not ‘unprecedented’, nor did they exceed the bounds of observed variability, yet it is well accepted that the cooling was attributable to the eruption. Why? First off, there was a well-observed change in the atmospheric composition (a layer of sulphate aerosols in the lower stratosphere). Models ranging from 1-dimensional radiative transfer models to full GCMs all suggest that these aerosols were sufficient to alter the planetary energy balance and cause global cooling in the annual mean surface temperatures. They also suggest that there would be complex spatial patterns of response – local warming in the lower stratosphere, increases in reflected solar radiation, decreases in outgoing longwave radiation, dynamical changes in the northern hemisphere winter circulation, decreases in tropical precipitation etc. These changes were observed in the real world too, and with very similar magnitudes to those predicted. Indeed many of these changes were predicted by GCMs before they were observed.
I’ll leave it as an exercise for the reader to apply the same reasoning to the changes related to increasing greenhouse gases, but for those interested the relevant chapter in the IPCC report is well worth reading, as are a couple of recent papers by Santer and colleagues.