On mismatches between models and observations

When a comparison is being made in a specific experiment though, there are a few additional considerations. Any particular simulation (and hence diagnostic from it) arises as a result from a collection of multiple assumptions – in the model physics itself, the forcings of the simulation (such as the history of aerosols in a 20th Century experiment), and the initial conditions used in the simulation. Each potential source of the mismatch needs to be independently examined.

3. Flawed Comparisons

Even with a near-perfect model and accurate observations, model-observation comparisons can show big discrepancies because the diagnostics being compared while similar in both cases, actually end up be subtly (and perhaps importantly) biased. This can be as simple as assuming an estimate of the global mean surface temperature anomaly is truly global when it in fact has large gaps in regions that are behaving anomalously. This can be dealt with by masking the model fields prior to averaging, but it isn’t always done. Other examples have involved assuming the MSU-TMT record can be compared to temperatures at a specific height in the model, instead of using the full weighting profile. Yet another might be comparing satellite retrievals of low clouds with the model averages, but forgetting that satellites can’t see low clouds if they are hiding behind upper level ones. In paleo-climate, simple transfer functions of proxies like isotopes can often be complicated by other influences on the proxy (e.g. Werner et al, 2000). It is therefore incumbent on the modellers to try and produce diagnostics that are commensurate with what the observations actually represent.

Flaws in comparisons can be more conceptual as well – for instance comparing the ensemble mean of a set of model runs to the single realisation of the real world. Or comparing a single run with its own weather to a short term observation. These are not wrong so much as potentially misleading – since it is obvious why there is going to be a discrepancy, albeit one that doesn’t have much implications for our understanding.


The implications of any specific discrepancy therefore aren’t immediately obvious (for those who like their philosophy a little more academic, this is basically a rephrasing of the Quine/Duhem position on scientific underdetermination). Since any actual model prediction depends on a collection of hypotheses together, as do the ‘observation’ and the comparison, there are multiple chances for errors to creep in. It takes work to figure out where though.

The alternative ‘Popperian’ view – well encapsulated by Richard Feynman:

… we compare the result of the computation to nature, with experiment or experience, compare it directly with observation, to see if it works. If it disagrees with experiment it is wrong.

actually doesn’t work except in the purest of circumstances (and I’m not even sure I can think of a clean example). A recent obvious counter-example in physics was the fact that the ‘faster-than-light’ neutrino experiment has not falsified special relativity – despite Feynman’s dictum.

But does this exposition help in any current issues related to climate science? I think it does – mainly because it forces one to think about the other ancillary hypotheses are. For three particular mismatches – sea ice loss rates being much too low in CMIP3, tropical MSU-TMT rising too fast in CMIP5, or the ensemble mean global mean temperatures diverging from HadCRUT4 – it is likely that there are multiple sources of these mismatches across all three categories described above. The sea ice loss rate seems to be very sensitive to model resolution and has improved in CMIP5 – implicating aspects of the model structure as the main source of the problem. MSU-TMT trends have a lot of structural uncertainty in the observations (note the differences in trends between the UAH and RSS products). And global mean temperature trends are quite sensitive to observational products, masking, forcings in the models, and initial condition sensitivity.

Working out what is responsible for what is, as they say, an “active research question”.

Update: From the comments:

Page 2 of 3 | Previous page | Next page


  1. M. Werner, U. Mikolajewicz, M. Heimann, and G. Hoffmann, "Borehole versus isotope temperatures on Greenland: Seasonality does matter", Geophysical Research Letters, vol. 27, pp. 723-726, 2000. http://dx.doi.org/10.1029/1999GL006075