RealClimate logo

Verification of regional model trends

Filed under: — rasmus @ 15 April 2013

Guest post by Geert Jan van Oldenborgh, Francisco Doblas-Reyes, Sybren Drijfhout and Ed Hawkins

Climate information for the future is usually presented in the form of scenarios: plausible and consistent descriptions of future climate without probability information. This suffices for many purposes, but for the near term, say up to 2050, scenarios of emissions of greenhouse gases do not diverge much and we could work towards climate forecasts: calibrated probability distributions of the climate in the future.

This would be a logical extension of the weather, seasonal and decadal forecasts in existence or being developed (Palmer, BAMS, 2008). In these fields a fundamental forecast property is reliability: when the forecast probability of rain tomorrow is 60%, it should rain on 60% of all days with such a forecast.

This is routinely checked: before a new model version is introduced a period in the past is re-forecast and it is verified that this indeed holds. In seasonal forecasting a reliable forecast is often constructed on the basis of a multi-model ensemble, as forecast systems tend to be overconfident (they underestimate the actual uncertainties).

As the climate change signal is now emerging from the noise in many regions of the world, the verification of regional past trends in climate models has become possible. The question is whether the recent CMIP5 multi-model ensemble, interpreted as a probability forecast, is reliable.

As there is only one trend estimate per grid point, necessarily the verification has to be done spatially, over all regions of the world. The CMIP3 ensemble was analysed in this way by Räisänen (2007) and Yokohata et al. (2012). In the last few months three papers have appeared that approach this question for the CMIP5 ensemble with different methodologies: Bhend and Whetton (2013), van Oldenborgh et al. (2013) and Knutson et al (J. Climate, to appear).

All these studies reach similar conclusions. For temperature: the ensemble is reliable if one considers the full signal, but this is due to the differing global mean temperature responses (Total Climate Responses, TCR).

When the global mean temperature trend is factored out, the ensemble becomes overconfident: the spatial variability is too low. For annual mean precipitation the ensemble is also found to be overconfident. Precipitation trends in 3-month seasons have so much natural variability compared to the trends that the overconfidence is no longer visible.

These conclusions match with earlier work using the Detection and Attribution framework showing that the continental-averaged temperature trends can be attributed to anthropogenic factors (eg Stott et al, 2003), but zonally-averaged precipitation trends are not reproduced correctly by climate models (Zhang et al, 2007).

The spatial patterns for annual mean temperature and precipitation are shown in figure 1 below. The trends are defined as regressions on the modelled global mean temperature, i.e., we plot B(x,y) in

(1) T(x,y,t) = B(x,y) Tglobal,mod(t) + η(x,y,t)

This definition excludes the TCR and minimises the noise η(x,y,t) better than a trend that is linear in time.

Figure 1: Panels a and b show the trend in annual mean GISTEMP temperature analysis and the GPCC precipitation analysis over 1950–2011 defined by Eq.(1). Panels c and d show the same for the CMIP5 multi-model mean (historical+RCP4.5). Panels e and f show the percentile of the observed trend in the CMIP5 ensemble of trends. Coloured areas denote where the observed trend is in the tails of the ensemble. Panels g and h collect these percentiles (north of 45ºS) in rank histograms. The top and bottom 5% should only occur 5%, but in fact two to four times more of the map is in these percentiles. The grey area in the rank histograms denotes the 90% confidence inerval.

The conclusion that the ensemble is somewhat overconfident is based on the bottom two panels. These show that over 10%–20% of the maps the observed trends are in the top and bottom 5% of the ensemble. For a reliable ensemble this should be 5%. The deviations are larger than we obtain from the differences between the models (the grey area).

On the maps above the areas where the modelled trends fall in the tails of the ensemble are coloured. In part of these areas the discrepancies are due to random weather fluctuations, but a large fraction has to be ascribed to forecast system biases. (The results do not depend strongly on the observational dataset used, with HadCRUT4.1.1.0, NCDC LOST and CRU TS 3.1 we obtain very similar figures, see the Supplementary Material of van Oldenborgh et al).

These forecast system biases can arise in three ways.

First, the models may underestimate low-frequency natural variability. Knutson et al show that natural variability in the warm pool around the Maritime Continent is indeed underestimated up to time scales of >10 years, contributing to the discrepancy there in Fig.1e. In most other regions the models have the correct or too large variability.

Another cause may be the incorrect specification of local forcings such as aerosols or land use. As an example, visibility observations suggest that aerosol loadings in Europe where higher in winter in the 1950s than assumed in CMIP5. This influences temperature via mist and fog (Vautard et al, 2009) and other mechanisms.

Finally, the model response to the changes in greenhouse gases, aerosols and other forcings may be incorrect. The trend differences in Asia and Canada are mainly in winter and could be due to problems in simulating the stable boundary layers there.

To conclude, climate models can and have been verified against observations in a property that is most important for many users: the regional trends. This verification shows that many large-scale features of climate change are being simulated correctly, but smaller-scale observed trends are in the tails of the ensemble more often than predicted by chance fluctuations. The CMIP5 multi-model ensemble can therefore not be used as a probability forecast for future climate. We have to present the useful climate information in climate model ensembles in other ways until these problems have been resolved.


  1. T.N. Palmer, F.J. Doblas-Reyes, A. Weisheimer, and M.J. Rodwell, "Toward Seamless Prediction: Calibration of Climate Change Projections Using Seasonal Forecasts", Bulletin of the American Meteorological Society, vol. 89, pp. 459-470, 2008.
  2. J. Raäisaänen, "How reliable are climate models?", Tellus A: Dynamic Meteorology and Oceanography, vol. 59, pp. 2-29, 2007.
  3. T. Yokohata, J.D. Annan, M. Collins, C.S. Jackson, M. Tobis, M.J. Webb, and J.C. Hargreaves, "Reliability of multi-model and structurally different single-model ensembles", Climate Dynamics, vol. 39, pp. 599-616, 2011.
  4. J. Bhend, and P. Whetton, "Consistency of simulated and observed regional changes in temperature, sea level pressure and precipitation", Climatic Change, vol. 118, pp. 799-810, 2013.
  5. G.J. van Oldenborgh, F.J. Doblas Reyes, S.S. Drijfhout, and E. Hawkins, "Reliability of regional climate model trends", Environmental Research Letters, vol. 8, pp. 014055, 2013.
  6. P.A. Stott, "Attribution of regional-scale temperature changes to anthropogenic and natural causes", Geophysical Research Letters, vol. 30, 2003.
  7. X. Zhang, F.W. Zwiers, G.C. Hegerl, F.H. Lambert, N.P. Gillett, S. Solomon, P.A. Stott, and T. Nozawa, "Detection of human influence on twentieth-century precipitation trends", Nature, vol. 448, pp. 461-465, 2007.
  8. R. Vautard, P. Yiou, and G.J. van Oldenborgh, "Decline of fog, mist and haze in Europe over the past 30 years", Nature Geoscience, vol. 2, pp. 115-119, 2009.

19 Responses to “Verification of regional model trends”

  1. 1
    Bob Tisdale says:

    Geert Jan van Oldenborgh’s video abstract of the “Reliability of regional climate model trends” paper is very well done:

    Thanks, Geert Jan.

  2. 2
    John Parsons says:

    A layperson here, anxiously awaiting a translation. JP

  3. 3
    Louis Hooffstetter says:

    This is a very interesting post. Please elaborate:

    “…the climate change signal is now emerging from the noise in many regions of the world…”
    Exactly how is the “climate change signal” separated from the noise, and is this “climate change signal” considered to be anthropogenic?

    “the verification of regional past trends in climate models has become possible.”
    Exactly how are “regional past trends in climate models” verified? Is this through “hindcasting”?

  4. 4
    MikeH says:

    @2 John Parsons. Follow Bob Tisdale’s link and watch the video.

  5. 5
    SCM says:

    Interesting. I looked at some regional climate projections for Australia a while back and had noticed the depressingly large error bars on precipitation and temperature projections. Are the issues solely dependent on missing physics in the models or could spatial resolution be an issue?

  6. 6
    T Marvell says:

    The full article:

    In general, I wish posts would give links to complete articles when possible, since some readers would have to pay to download from publishers’ sites.

  7. 7

    John Parsons: we checked whether past trends on the regional scale (which, in the jargon, means a few hundred kilometres still) are correctly simulated by the set of climate models that the IPCC is using to make the Fifth Assessment Report, due in September. You cannot demand that the models completely reproduce the trend, because it is also influenced by unpredictable weather. Also, we know that climate models are not perfect, so we ask rather that they fall in the range of models. Taking these two uncertainties into account we still find that the trends are more often than expected not simulated well by the climate models. We do not yet know why, but it means that we cannot take the climate model output for the future simply as a numerical climate forecast, but have to use the useful information in the climate model output in more sophisticated ways.

  8. 8

    Louis Hooffstetter: the climate change signal is separated from the noise using a linear regression on the global mean temperature. This gives the best signal to noise ratio. We compare that to the modelled trends, computed the same way but using the modelled global mean temperature. These models include all forcings: natural (solar variability, volcanic & natural aerosols) and anthropogenic (greenhouse gases, anthropogenic aerosols). Howevr, it is clear from the maps that indeed the anthropogenically forced signal dominates (as is also shown by the detection and attribution studies).

    Indeed, past trends are verified by considering the historical runs of these climate models as hindcasts. We have also done this for decadal prediction hindcasts with very similar results (see

  9. 9
    Paul S says:

    John Parsons,

    I think the essence is:

    – the CMIP5 multi-model ensemble seems to encapsulate surface temperature change at the global scale fairly well.

    – However, when looking at smaller spatial scales different model runs seem to be too samey in what they predict. There are fairly robust spatial change patterns in the models which aren’t well reflected in observations. This means any probablistic prediction based on the robustness of trends across the model ensemble is unlikely to provide an accurate forecast for what will be observed. In the words of our guest hosts: ‘the ensemble is somewhat overconfident’.

    – That being the case they suggest that the multi-model ensemble would not be useful for making weather forecast style probablistic predictions for climate.

    I wonder how this finding relates to the Deser 2012 paper featured on here a few months ago? That paper described a model experimental setup where they produced a large number of realisations from a single model, paying special attention to varying initial conditions and found large differences in regional trends. Is there some reason that CMIP5 models would be initialised in similar ways that produce false robustness?

  10. 10

    @T.Marvel: this is an open access publisher, so there should be no problem downloading the PDF from their site.

  11. 11

    Paul S: a same-model ensemble like Clara Deser’s (or our ESSENCE ensemble) only captures the natural variability, not (part of) the model uncertainty, so these ensembles are in general hugely overconfident. Yokohata et al (2012) in their rank histograms also include perturbed-physics ensembles where a single model is run with different parameter settings, which should include all natural variability and some of the model uncertainty. With a larger spread than a single-model ensemble these ensembles still are hugely overconfident. Of curse these statements depend strongly on the realism of the natural variability, Tom Knutson’s paper has some good graphs on this.

  12. 12
    Matthew R Marler says:

    My thanks to Geert Jan van Oldenborgh for this post.

  13. 13
    John Parsons says:

    Geert Jan, Thank You so much for that synopsis. I went to the video and found it to be tremendously helpful. It looks to be important work, which will doubtlessly be very helpful to model development.

    I hope you and your colleges know how much we amateur scientists appreciate R/C.

    Best Regards, JP

  14. 14
    John Parsons says:

    Many Thanks to Bob T., Mike H. and Paul S. for your very helpful comments. JP

  15. 15
    GlenF says:

    SCM@#5 re Australia:

    The Fig 1C pattern for Australia troubled me on sight. It’s correct; you can reproduce something similar here (linear time basis):

    Problem is, most of that is probably not an AGW signal, at least in the E and SE (it surely is in the far SW). Compare, for example, the pattern for 100 years, here: Or have a look at the last 40 years.

    Aus rainfall is just really really variable. Expecting any sort of GCM ensemble to reproduce something like the 60-year pattern of change is a step too far IMO (except in the SW). That is in annual means; extremes may be another matter.

  16. 16
    Tim M. says:

    The acronym TCR usually refers to ‘transient climate response’, which is defined in terms of an idealized CO2 emissions scenario (1% per year increase to doubled CO2):

  17. 17

    GlenF: The method we use takes the natural variability into account (to the extent that it is correctly simulated by the models). We do not look at the multi-model mean, but at the full spread of the ensemble, and ask whether trends are in the top or bottom 5% more frequently than expected by chance. Part of the coloured regions on Fig.1c are due to chance and our method cannot tell which ones are and which ones are not. We can only state that there are too many areas with colours compared with chance alone. A detailed study on SE Australia would start with a check whether the models correctly reproduce the large natural variability there. We are planning to do work in this direction, but first I have to finish some other obligations.

  18. 18

    GlenF: yes, you are correct, I guess this shows that I come from seasonal forecasting and not climate change research. It should be Total Global Response (TGR).

  19. 19

    Mr. Geert Jan van Oldenborgh, I saw your video. Your explanations are clear are convincing. The conclusion,the large scale patterns are very similar, but the regional scales are overconfidence is very meaningful. It is not possible that uncertainties may be reduced to zero. Local variabilities have significant impacts on the precision of forecasts from general scales.