At first glance this seems like a strange question. Isn’t science precisely the quantification of observations into a theory or model and then using that to make predictions? Yes. And are those predictions in different cases then tested against observations again and again to either validate those models or generate ideas for potential improvements? Yes, again. So the fact that climate modelling was recently singled out as being somehow non-scientific seems absurd.
Granted, the author of the statement in question has little idea of what climate modelling is, or how or why it’s done. However, that his statement can be quoted in a major US newspaper says much about the level of public knowledge concerning climate change and the models used to try and understand it. So I will try here to demonstrate how the science of climate modelling works, and yes, why it is a little different from some other kinds of science (not that there’s anything wrong with that!).
Climate is complex. Since climatologists don’t have access to hundreds of Earth’s to observe and experiment with, they need virtual laboratories that allow ideas to be tested in a controlled manner. The huge range of physical processes that are involved are encapsulated in what are called General Circulation Models (or GCMs). These models consist of connected sub-modules that deal with radiative transfer, the circulation of the atmosphere and oceans, the physics of moist convection and cloud formation, sea ice, soil moisture and the like. They contain our best current understanding for how the physical processes interact (for instance, how evaporation depends on the wind and surface temperature, or how clouds depend on the humidity and vertical motion) while conserving basic quantities like energy, mass and momentum. These estimates are based on physical theories and empirical observations made around the world. However, some processes occur at scales too small to be captured at the grid-size available in these (necessarily global) models. These so-called ‘sub-gridscale’ processes therefore need to be ‘parameterised’.
A good example is related to clouds. Obviously, in an actual cloud, the relative humidity is close to 100%, but at a grid box scale of 100′s of km, the mean humidity – even if there are quite a few clouds – will be substantially less. Thus a parameterisation is needed that relates the large scale mean values, to actual distribution of clouds in a grid box that one would expect. There are of course many different ways to do that, and the many modelling groups (in the US, Europe, Japan, Australia etc.) may each make different assumptions and come up with slightly different results.
It’s important to note what these models are not good for. They aren’t any good for your local weather, or the temperature of the water at the nearest beach or for the wind in downtown Manhattan, because these are small scale features, affected by very local conditions. However, if you go up to the regional scale and beyond (i.e. Western Europe as a whole, the continental US) you start to expect better correlations.
One of the most important features of complex systems is that most of their interesting behaviour is emergent. It’s often found that the large scale behaviour is not a priori predictable from the small scale interactions that make up the system. So it is with climate models. If a change is made to the cloud parameterisation, it is difficult to tell ahead of time what impact that will have on, for instance, the climate sensitivity. This is because the number of possible feedback pathways (both positive and negative) is literally uncountable. You just have to put it in, let it physics work itself out and see what the effect is.
This means that validating these models is quite difficult. (NB. I use the term validating not in the sense of ‘proving true’ (an impossibility), but in the sense of ‘being good enough to be useful’). In essence, the validation must be done for the whole system if we are to have any confidence in the predictions about the whole system in the future. This validation is what most climate modellers spend almost all their time doing. First, we look at the mean climatology (i.e. are the large scale features of the climate reasonably modelled? Does it rain where it should, is there snow where there should be? are the ocean currents and winds going the right way?), then at the seasonal cycle (what does the sea ice advance and retreat look like? does the inter-tropical convergence zone move as it should?). Generally we find that the models actually do a reasonable job (see here or here for examples of different groups model validation papers) . There are of course problematic areas (such as eastern boundary regions of the oceans, circulation near large mountain ranges etc.) where important small scale processes may not be well understood or modelled, and these are the chief targets for further research by model developers and observationalists.
Then we look at climate variability. This step is key, but it is also quite subtle. There are two forms of variability: intrinsic variability (that occurs purely as a function of the internal chaotic dynamics of the system), and forced variability (changes that occur because of some external change, such as solar forcing). Note that ‘natural’ variability includes both intrinsic and forced components due to ‘natural’ forcings, such as volcanoes, solar or orbital changes. A clean comparison relies on either being able to isolate just one reasonably known forcing, or having enough data to be able to average over many examples and thus isolate the patterns associated solely with that forcing, even though in any particular case, more than one thing might have been happening. (A more detailed discussion of these points is available here).
While there is good data over the last century, there were many different changes to planet’s radiation balance (greenhouse gases, aerosols, solar forcing, volcanoes, land use changes etc.), some of which are difficult to quantify (for instance the indirect aerosol effects) and whose history is not well known. Earlier periods, say 1850 going back to the 1500s or so, have reasonable coverage from paleo-proxy data, and only have solar and volcanic forcing. In my own group’s work, we have used the spatial patterns available from proxy reconstructions of this period to look at both solar and volcanic forcing in the pre-industrial period. In both cases, despite uncertainties (particularly in the magnitude of the solar forcing), the comparisons are encouraging.
Recent volcanos as well have provided very good tests of the model’s water vapour feedbacks (Soden et al, 2002), dynamical feedbacks (Graf et al., 1994; Stenchikov et al., 2002), and overall global cooling (Hansen et al, 1992). In fact, the Hansen et al (1992) paper actually predicted the temperature impact of Pinatubo (around 0.5 deg C) prior to it being measured.
The mid-Holocene (6000 years ago) and Last Glacial Maximum (~20,000 years ago) are also attractive targets of model validation, and while some successes have been noted (i.e. Joussaume et al, 1999, Rind and Peteet, 1985) there is still some uncertainty in the forcings and response. Other periods such as the 8.2kyr event, or the Paleocene-Eocene Thermal Maximum are also useful, but clearly as one goes further back in time, the more uncertain the test becomes.
The 20th Century though still provides the test that appears to be most convincing. That is to say, the models are run over the whole period, with our best guesses for what the forcings were, and the results compared to the observed record. If by leaving out the anthropogenic effects you fail to match the observed record, while if you include them, you do, you have a quick-and-dirty way to do ‘detection and attribution’. (There is a much bigger literature that discusses more subtle and powerful ways to do D&A, so this isn’t the whole story by any means). The most quoted example of this is from the Stott et al. (2000) paper shown in the figure. Similar results can be found in simple models (Crowley, 2000) and in more up to date models (Meehl et al, 2004).
It’s important to note that if the first attempt to validate the model fails (e.g. the signal is too weak (or too strong), or the spatial pattern is unrealistic), this leads to a re-examination of the physics of the model. This may then lead to additional changes, for example, the incorporation of ozone feedbacks to solar changes, or the calculation of vegetation feedbacks to orbital forcing – which in each case improved the match to the observations. Sometimes though it is the observations that turn out to be wrong. For instance, for the Last Glacial Maximum, model-data mis-matches highlighted by Rind and Peteet (1985) for the tropical sea surface temperatures, have subsequently been more or less resolved in favour of the models.
So, in summary, the model results are compared to data, and if there is a mismatch, both the data and the models are re-examined. Sometimes the models can be improved, sometimes the data was mis-interpreted. Every time this happens and we get improved matches between them, we have a little more confidence in their projections for the future, and we go out and look for better tests. That is in fact pretty close to the textbook definition of science.