We add some of the CMIP6 models to the updateable MSU [and SST] comparisons.
After my annual update, I was pointed to some MSU-related diagnostics for many of the CMIP6 models (24 of them at least) from Po-Chedley et al. (2022) courtesy of Ben Santer. These are slightly different to what we have shown for CMIP5 in that the diagnostic is the tropical corrected-TMT (following Fu et al., 2004) which is a better representation of the mid-troposphere than the classic TMT diagnostic through an adjustment using the lower stratosphere record (i.e. ).
This data for the historical and SSP3-70 scenario (135 simulations) is for the region 20ºS-20ºN. This allows us to provide an updateable comparison to the equivalent satellite temperature diagnostics from RSS v4, UAH v6 and the new NOAA STAR v5. As with the earlier CMIP6 comparisons, I’ll plot the observational time series against both the full ensemble and the ensemble screened by the transient climate response (TCR) as we recommended in Hausfather et al (2022), and plot the time series and trend histogram.
Two things are clear. First the 24-model ensemble as a whole is clearly warming faster than the observations, but the histogram shows that this ensemble is heavily skewed by including 53 ensemble members from CanESM5/CanESM5-CanOE (green in the histogram) which unfortunately has a very high climate sensitivity (ECS 5.6ºC, TCR 2.7ºC). The TCR Screened ensemble (only including the 15 models that have 1.4ºC < TCR < 2.2ºC) is in red and is closer to the observations in terms of trends, but only 7 simulations (from 6 models) out of 53 simulations have trends within the uncertainties of the observations.
The above selection of CMIP6 models does not include the range of configurations of the GISS coupled models that we looked at in Casas et al. (2023). Since this is a somewhat differently designed ensemble, I’ll plot that similarly (45 simulations), and note too that these are global means, again, for the corrected-TMT product (for the historical and SSP2-45 scenarios after 2014). This ensemble samples model structural variability (vertical resolution, model top, interactive composition) and some aspects of forcing uncertainty (notably for aerosols and ozone), as well as the initial-condition (‘weather’) variability we are used to seeing.
As above, the GISS ensemble diverges slightly from the observations. I’ve also included a line for the AMIP ensemble mean (red) (simulations that use the observed sea surface temperatures as an additional forcing) which shows that the specifics of the interannual variability and observed trend can be matched if the sequence of El Niño and La Niña etc. are matched. For the 1979-2022 trends, the GISS ensemble is a closer match to the observations then the 24-model selection shown above – particularly the GISS-E2.2 simulations all of which are within the uncertainties of the observational spread.
The point of this exercise is first, to include CMIP6 in the comparisons. While we know that this is a trickier ensemble to work with because of the broad (and unrealistic) spread in climate sensitivity, the point in highlighting the GISS model efforts here too is to point out that we are starting to do a better job in terms of sampling different kinds of uncertainty. The CMIP ensembles are still ‘ensembles of opportunity’, but increasingly we are able to take slices through the ensemble to isolate different kinds of sensitivity that are perhaps orthogonal to what has been possible before and make a difference to many observational comparisons – not just the MSU records.
Since I’m making new graphs, here are the SST comparisons as well. The SST files come from the U. Melbourne collation and have 135 simulations from 13 separate models (when I downloaded them) (for the historical runs continued by SSP2-45). Unfortunately, a large part of the ensemble is again the CanESM5 model (52 runs), but there are 73 simulations from 9 models that pass the TCR screen used above. I’m plotting the HadSST4 and ERSSTv5 global means for the observations. The ensemble subsets don’t quite overlap (there are two additional models here – CIESN and GISS-E2.1-G, 14 missing ones, and 11 in common), but the overall picture is very similar. There is a subset of models with high TCR/ECS that warm too quickly, but the bulk of the remaining models are doing well.
The SST comparisons have popped up on twitter in the last few months, led by Roy Spencer who didn’t point out the obvious bifurcation in models, and then repeated by a number of wannabe contrarians who don’t know what they are posting and care even less. Maybe these graphs will be useful for adding some clarity?
The contrast between the excellent agreement of the screened ensemble for global SST and the slight overestimate (on average) for the tropical or global corrected-TMT is interesting. A couple of things will likely play into that. First the MSU TMT diagnostics are more dominated by changes in the tropics than the surface fields because of the effects of convection, and so the exceptional nature of the La Niña-like trend in the Eastern Pacific Lee et al (2021) is going to be magnified aloft. Second, the impact of forcings is slightly different at different layers, notably for aerosols and ozone changes, and so uncertainties there may be playing different roles in different levels.
One final caveat, I’ve been rather lazy in plotting these ensembles so that I can show the impact of both forced changes and the spread due to internal variability and structural uncertainty. Unfortunately, when you have an ensemble that has fifty runs from a single model and then a handful of models with only one or two runs, then it’s hard to know what’s best. Some papers have taken a single run from each model which seems fair, but actually confuses structural uncertainty and internal variability. Some take the ensemble mean of each model and then plots the average of the averages, which might be a good estimate of the forced change, but loses the information from the weather. Maybe these things need to be estimated separately and put together artificially. Another post though…
- S. Po-Chedley, J.T. Fasullo, N. Siler, Z.M. Labe, E.A. Barnes, C.J.W. Bonfils, and B.D. Santer, "Internal variability and forcing influence model–satellite differences in the rate of tropical tropospheric warming", Proceedings of the National Academy of Sciences, vol. 119, 2022. http://dx.doi.org/10.1073/pnas.2209431119
- Q. Fu, C.M. Johanson, S.G. Warren, and D.J. Seidel, "Contribution of stratospheric cooling to satellite-inferred tropospheric temperature trends", Nature, vol. 429, pp. 55-58, 2004. http://dx.doi.org/10.1038/nature02524
- Z. Hausfather, K. Marvel, G.A. Schmidt, J.W. Nielsen-Gammon, and M. Zelinka, "Climate simulations: recognize the ‘hot model’ problem", Nature, vol. 605, pp. 26-29, 2022. http://dx.doi.org/10.1038/d41586-022-01192-2
- M.C. Casas, G.A. Schmidt, R.L. Miller, C. Orbe, K. Tsigaridis, L.S. Nazarenko, S.E. Bauer, and D.T. Shindell, "Understanding Model‐Observation Discrepancies in Satellite Retrievals of Atmospheric Temperature Using GISS ModelE", Journal of Geophysical Research: Atmospheres, vol. 128, 2022. http://dx.doi.org/10.1029/2022JD037523
- S. Lee, M. L’Heureux, A.T. Wittenberg, R. Seager, P.A. O’Gorman, and N.C. Johnson, "On the future zonal contrasts of equatorial Pacific climate: Perspectives from Observations, Simulations, and Theories", npj Climate and Atmospheric Science, vol. 5, 2022. http://dx.doi.org/10.1038/s41612-022-00301-2