“The result is totally incompatible (at >95% confidence)…”
Is that a completely fair statement? The two pdfs are different, presumably at 95% confidence, and clearly the match between Steig and Orsi is MUCH better than O’Donnell and Orsi, but eyeballing it looks like the overlap of the two pdfs is much greater than 5%… if blue was a model, and red was observations, I would not say that the model trend was falsified at 95% confidence. But maybe this is a different kind of comparison?
[Response: What matters here is the paired probability. The chances of the real values in the Orsi distribution being at the LOW end, and simultaneously O’Donnell et al. being at the high end is the product of the two probabilities together. For example, two overlapping Gaussian PDFs that overlap at 1-sigma are compatible only at (.33/2)^2 = 0.03 probability. –eric]
Thanks, Eric, this is fascinating, and fills gaps in our knowledge of the Southern Hemisphere.
The key, of course, is the trend line, with nothing on the horizon to reverse it. We know now that if we wait until the danger is much more obvious, it will be too late. This knowledge should be driving climate scientists to be more proactive, via community meetings, pressure on political leaders and, especially, insistent approaches to editorial boards and media companies. They are humans, too, and if they are presented with the facts we might have a chance. Mandia, Mann, and Hansen have shown leadership here. The time is right to take this to the next level.
The Orsi paper is compatible with O’Donnell when you consider the location of the bore hole (WAIS divide). The drill site is located in an area depicted by the reddish color on the O’Donnell map, which corresponds to ~0.2C/decade temperature rise.
The Orsi paper also shows that temperatures during the LIA were only half as cold in the Southern Hemisphere as the Northern.
[Response: That is incorrect. I used the data from the specific location of WAIS Divide in the calculation I show. As O’Donnell et al. show in their supplement, they *can* get agreement with the borehole data by using slightly different — and well justified — truncation parameters in their truncated least squares regressions. As for the LIA, *that* is where the specificity of location is relevant. Nothing can be said about “the Southern Hemisphere” in general during the LIA from these results alone.–eric]
Climate change is well attested to. However, if CO2 (greenhouse effect) is the main cause, should not the main warming occur in the winter, and not in the summer?
[Response: No one said anything about the rapid warming in West Antarctic being due to direct radiative forcing from CO2! Incidentally though, the West Antarctica warming *is* dominated by cold season — not by summer. Read the referenced papers.–eric]
“For example, two overlapping Gaussian PDFs that overlap at 1-sigma are compatible only at (.33/2)^2 = 0.03 probability”
Thanks. My mental guesstimate on that pdf multiplication step was obviously a bit off, the overlap looked larger than the math states.
Though… I’m still a little bothered by this comparison method. Say, we take a gaussian pdf estimate of a trend, .2+-.1, and then later God comes down and tells us that the trend is .2+-.00001. If I multiply the pdfs together, I get a very small number, despite the fact that my original pdf had the same best estimate, just much more uncertainty.
(or is the size of the pdf in this case part of the information: eg, there is some year to year noisiness in temperature, such that I can know the trend perfectly and it should still have a pdf?)
(I really need to go back and learn more statistics some day)
Can you specify your response further? Your graph for O’Donnell shows a peak frequency at ~0.8C/decade. The O’Donnell paper shows that Central West Antarctica varies from -0.1 to 0.3 C/decade, which very likely matches your graph (no argument there). However, the drill site from the Orsi paper appears to reside in the area that O’Donnell, et. al. show to be ~0.2C/decade. I am looking at the specific locations of both papers, and they appear to show similar temperature rises at similar sites. O’Donnell, et. al. do show a large gradient in their temperature rise plot, so that precise location is critical for comparison. If I am incorrect, please explain.
[Response: I used the grid box closest to WAIS Divde (79.5 S, 112W) in both cases. If you average over a few nearby grid boxes, same result — see comment from Ned #2 below. –eric]
I see how you get the high confidence that currently we are warmer than the past 1000 years. I have a couple of questions, though.
Because the accuracy and precision decrease going back in time, does not the average/mean/mode/median (whichever best describes the centre line) become, in essence, smoothed? Does this not mean that the actual highs and lows become represented by lower numbers as we go back? I.e, the temperature range OF THE MEDIAN is inversely proportional to time? If so, then we can only say that the MWP was MORE than the depicted temperature.
During the MWP the northern hemisphere portion representing western Europe over to Greenland and Newfoundland was clearly warmer than present. Do you have to consider this portion non-representative of the world back then to say that the MWP was cooler than today? If the MWP northern portion as I described was non-representative of the world back then, why would it be today? Could today’s Arctic/Northern be unusually warm while the Antarctic/Southern is warm today, as it was unusually warm then while, presumably, the Antarctic/Southern was cool then?
I’m trying to understand how global global has to be to be non-regional. Also how representative of “reality” old data numbers are relative to new data numbers.
Dan H. writes: The drill site is located in an area depicted by the reddish color on the O’Donnell map, which corresponds to ~0.2C/decade temperature rise.
and also: the drill site from the Orsi paper appears to reside in the area that O’Donnell, et. al. show to be ~0.2C/decade.
Are you just eyeballing that? I did a rough georeferencing of the O’Donnell map, plotted the location of the drill site, measured the RGB values, and compared them to the RGB values in the map legend. My plot had the site falling near the corners of four grid cells, so I’m not sure what the actual value is, but it appears to be in the range +0.10 to +0.15C/decade. It is definitely lower than Orsi’s +0.23C/decade.
I don’t see any reason to doubt the histograms shown in the figure in this post.
“Because the accuracy and precision decrease going back in time, does not the average/mean/mode/median (whichever best describes the centre line) become, in essence, smoothed? Does this not mean that the actual highs and lows become represented by lower numbers as we go back?”
No. To the extent your logic is right about the smoothing, the *highs* would be ‘represented by lower numbers’, but the *reverse* would be true for the lows. There’s no reason to think the mean (or median) would shift–absent systemic bias, of course.
If you are going to calculate the probability that two things are incompatible, then you need to provide a working technical definition of what constitutes incompatibility in this situation. What exactly is your definition?
It appears that you have done the following (please correct me if I have misinterpreted any of the steps).
-Take four “independent” estimates of the temperature trend for central West Antarctica along with their standard errors.
-Construct four Gaussian distributions having whose means and standard deviations are those particular values.
-Interpret these distributions as representing genuine probability density functions of the actual temperature trend. It should be noted that in reality the actual trend being estimated is a fixed (i.e., non-random) unknown value.
-Declare that the O’Donnell estimate is “totally incompatible” with various others.
When a reasonable question is raised on this in comment 1, you provide an inline response containing what is claimed to be a probability calculation (bold mine):
The chances of the real values in the Orsi distribution being at the LOW end, and simultaneously O’Donnell et al. being at the high end is the product of the two probabilities together. For example, two overlapping Gaussian PDFs that overlap at 1-sigma are compatible only at (.33/2)^2 = 0.03 probability. –eric]
Ignoring my previous objections on using the probability densities to calculate realistic probabilities , there are some further issues with you have done.
From the number, .33, given in the example, it appears that for each curve you have calculated a one-sided area of the portion of that curve “under” the other distribution and then multiplied these values together to calculate the probability of the event defined in the bolded text. From your conclusion, you imply that that event constitutes “compatibility”.
How exactly does one being larger than usual AND at the same time the other being smaller than usual constitute compatibility? One would think that the two values would be more compatible if EITHER the larger one was smaller and the smaller one remained smaller OR the smaller one was larger and the larger one remained the same. The calculation for your example would then be .325*.675 + .675*.325 = 0.439, considerably different from the result given in your comment.
In case you are having difficulty accepting this, try doing your calculation on two identical Gaussians shifted by only .01 of a standard deviation. Do you really think that the compatibility of those two distributions is a mere 0.25?
[Response: I defined “incompatible” very precisely. NB I think you are right that I made the wrong calculation though — see Tamino’s comment below. But your calculation is wrong for the example of moving two identical Gaussians by 0.01 std. devs. You need to calculate the joint probability in overlapping part of the distributions, which in the example you gave is essentially all of the distribution. You’ll find the probability of incompatibility in that case is very very low. In fact, about 99.6% of the distributions will overlap, so the probability that they are the same is .996^2 = 99.2%. That’s just a wee bit different than the probability for O’Donnell vs. reality.–eric]
“We find that in response to a mid-range increase in atmospheric greenhouse-gas concentrations, the subsurface oceans surrounding the two polar ice sheets at depths of 200 – 500 m warm substantially compared with the observed changes thus far. Model projections suggest that over the course of the twenty-first century, the maximum ocean warming around Greenland will be almost double the global mean, with a magnitude of 1.7–2.0 C. By contrast, ocean warming around Antarctica will be only about half as large as global mean warming, with a magnitude of 0.5–0.6 C. A more detailed evaluation indicates that ocean warming is controlled by different mechanisms around Greenland and Antarctica. We conclude that projected subsurface ocean warming could drive significant increases in ice-mass loss, and heighten the risk of future large sea-level rise.”
I’m gonna agree with RomanM about the incompatibility issue. I too get the impression that the plotted pdf’s are based on a central estimate and standard deviation estimate. If two identical Gaussians (i.e., same standard deviation) intersect at the 1-sigma point, then the difference of their means is 2-sigma. But the standard deviation of the difference is sigma times sqrt(2), so if we compute a z-statistic for the difference it’s only 2/sqrt(2) = sqrt(2). The two-sided p-value for that is 0.157. Am I missing something?
I’ll add that using the “joint probability” argument can be tricky (it has to be done just right), and the definition of “overlappy” is unclear.
[Response:Tamino, “Mea culpa, you are right (though Roman M is still wrong about his claim about two distributions that are only .01 sigma off in their mean.). In fact the distributions overlap at slightly more than 1 std, so what the results actually show is that there is >80% chance O’Donnell is wrong compared with reality as measured by the boreholes. I would still call that incompatible. The irony should not be lost that the blogosphere was alive with shrill claims of fraud etc. on the grounds that O’Donnell et al. got a *different* result from Steig et al (which is identical to Orsi at very high confidence). Now it’s going to be alive with claims that O’Donnell cannot be rejected at better than 80% confidence? –eric]
I defined “incompatible” very precisely. Your calculation is wrong for the example of moving two identical Gaussians by 0.01 std. devs. You need to calculate the joint probability in overlappy part of the distributions, which in the example you gave is essentially all of the distribution. You’ll find the probability of incompatibility in that case is very very low.–eric
I still don’t see where there is an operational definition. Please point it out to me.
From what I infer reading your example, the “overlappy” portion is the part under the function min(f1(x), f2(x)) where f1 and f2 are the two densities. In the example, this turns out to be the area outside of one standard deviation for the Gaussian which is about .318. In the example I cite, it is for all practical purposes almost 1.
From the description in your inline comment 1, you then do the calculation for the simultaneous occurrence of two specific independent events (the choice of which really not making sense to me): (.318/2)^2 = 0.025. In my example, this would be (1/2)^2 = 0.25.
Perhaps you could please do the “correct” calculation in the latter case if I am not understanding you properly.
[Response:See the response I already gave. I think Tamino is right; I will get out the actual data and make sure I have the precise location of overlap of the distributions. The answer won’t change much: there is “only” about an 80% probability that O’Donnell et al. got the trend wrong.–eric]
“The chances of the real values in the Orsi distribution being at the LOW end, and simultaneously O’Donnell et al. being at the high end is the product of the two probabilities together.”
I’m still a bit confused as to what this means. When I try to experiment with continuous distributions, I get nonsensical answers. For example, if I assume a uniform distribution from 0 to 1 (ie, density 1), and multiply by a 2nd uniform distribution from 0 to 1, I get back a distribution from 0 to 1 which integrates to 1. Great, they are 100 percent compatible. But if I do the same operation with a uniform distribution of 0 to 2 (ie, density one half), multiply by a 2nd uniform distribution of 0 to 2, I get back a uniform distribution from 0 to 2 with density one quarter… which integrates to 50 percent… which means that I’m doing something wrong here…
To make it simpler, I thought about a six-sided die. The pdf of a normal die is a discrete function of 1/6 probability at each of the integers from 1 to 6. If I take a 2nd normal die, then I can take the product of the two probabilities, and I get 1/36 from 1 to 6… which sums to 1/6… which is the probability that the 2nd die gives me the same number as the first die. Which suggests the the product of pdfs is actually telling me something about the probability that random draws from each pdf will be the same, rather than the probability that the first pdf is compatible with the second. Or am I doing this wrong?
As an alternate way of looking at it: assuming Orsi is “truth”, can I integrate the Orsi pdf to the right of the intersection of the green and red lines and state that is the probability that Steig et al. is superior to O’Donnell? This makes more intuitive sense to me… eg, Orsi indicates some possibility that reality falls in a region where O’Donnell has higher probability density than Steig, in which case O’Donnell would have been a better bet. But most (80 to 90 percent?) of the Orsi probability density is to the right of that point…
[Response:This doesn’t make any sense. No statistical fiddlings are going to wind up with the conclusion that two distributions with identical means (mine and Orsi’s) being LESS likely to be the same than two distributions with means separated by a full standard deviation O’Donnell’s and Orsi’s)!.–eric]
The answer won’t change much: there is “only” about an 80% probability that O’Donnell et al. got the trend wrong.–eric
So you know what the true value of the parameter is. Who needs statistics? ;)
[Response:Actually, yes! The borehole data is a direct measure of the true value. The statistical reconstruction approach I originally used — and the mean trend of which is validated by Orsi et al. — is not needed anymore, since now we have direct measurements of temperature from the borehole. The O’Donnell et al. results are not needed anymore either.–eric]
I didn’t say that my answer was “right”. What I said was that .25 would be the answer if the method that you used was applied to the case of a small shift sideways.
[Response:No, it wouldn’t.]
This brings me back to my earlier criticism of the misconception that the curves drawn somehow represent a probability distribution for an unknown constant value. This is tantamount to the interpretation of a confidence interval as “the probability that the unknown parameter is in that given interval is 95%” or whatever the confidence level might be. It is simply not a true interpretation.
On a constructive note, let me suggest an off-the-cuff definition of “compatibility” for estimates, not for questionable distributions defined around them:
Two estimates for a parameter are incompatible at a given significance level α1 if for each value in the parameter space, the null hypothesis is rejected at the α2 level by at least one of the estimates.
The two levels of significance may possibly be different depending on the underlying situation. For a Gaussian mean, this reduces to the two single sample confidence intervals overlapping.
[Response: Actually the confidence interval for Orsi et al. is indeed the probability (given their assumptions for their inverse modeling of course )that the unknown parameter truly lies in a particular place in under the curve. In reality, this is not really the same for Steig et al. or O’Donnell et al., where in fact the ‘true’ value is known and the uncertainties reflect the magnitude of the trend vs. the magnitude of the variance — that is, the likelihood of this particular estimate of the trend changing with e.g. one more year of data. So I agree with you there – strictly speaking this is somewhat apples and oranges. However, it’s what we have to work with. I would certainly be interested if you have a different suggestion for how to use this information, but ultimately the question is whether O’Donnell et al. underestimate the trend significantly? Do you have some clever way of somehow showing they are NOT, while maintaining the claim that O’Donnel et al. is different from Steig et al??>—eric]
[Response:Minor update, and then I really do need to get back to work: the correct calculation is as you pointed out to include the possibility that while one ‘true’ value lies in the overlapping tail, the other stays where it is (and all such permutations). What’s then needed is the cumulative distribution function for the O’Donnell et al. results, up to the overlap point (~1.6) and the same for Orsi et al. Then the result following Tamino’s point is p~0.19. So my statement — on this basis — of >80% chance that O’Donnell et al. were wrong is valid. Oh, and the chance that Steig et al. (2009) was right is umm. .well, you do the math. ;)–eric]
If you ignore the blade of the stick, and examine only the period from 1000 to 1900 it seems to me that there is a slight cooling trend (with decadal and century scale variations superimposed). Is this correct? Is the trend significant?
The high northern latitude decline in temperatures is not particularly surprising, as this is what one would expect (at least for summertime temperatures) from insolation changes. But these precessional insolation changes should only cool in the northern hemisphere, not the southern. So, to see the same cooling trend at sites in the southern hemisphere is interesting.
[Response: My view is that this is a conundrum only if one assumes that the only thing that matters is seasonal insolation intensity. Peter Huybers has done nice work showing that that is by no means likely to the the case, and shows that actually S and N high latitudes OUGHT to be in phase (both cooling through the Holocene). See Huybers and Denton, 2008. Also, a few model runs I”ve looked at for the late Holocene show this too.–eric]
Comment by Halldór Björnsson — 23 May 2012 @ 3:33 PM
I thought the decline over the past some thousands of years, before human changes became important, was the expected one from the post-ice-age peak.
The decline from the “post ice age peak” you refer to is driven by the orbital changes, in this case the precession of the equinoxes. These changes are expected to be most clearly apparent in the summertime temperatures in the northern hemisphere, which is what is observed.
However, this particular insolation signal should enhance summertime temperatures in the southern hemisphere; currently we are heading to a time when the NH winter solstice (and SH summer solstice) is at perihelion. This will happen about 1.5K years from now. When that happens SH summertime temperatures should be at their warmest, and NH summertime temperatures should be at their coldest. If other orbital factors (obliquity and eccentricity of the orbit) were “favorable” a glaciation might ensue. Of course with anthropogenic global warming this is unlikely to happen (see Archers Long Thaw book for details).
But while we are in an interglacial, it is not a given that an insolation change that lowers NH summer temperatures should also be expressed as a cooling trend in the southern hemisphere proxies.
I can think of several ways in which this might happen but before embarking on explanations, I’d like to know if the cooling trend is significant.
[Response:I can’t speak to the cooling trend in the Gergis et al reconstruction (having not carefully considered that result yet) but the cooling trend in the last 2000 years in Antarctic (West Antarctica) is unambiguous. See e.g. Fegyveresi et al., 2011 in Journal of Glaciology. –eric]
Comment by Halldór Björnsson — 23 May 2012 @ 5:39 PM
When comparing the two pdf’s, you assume they vary independently, presumably due to sampling noise, etc.
I’m not saying that is unjustified, but it is worth noting that the actual overlapping distribution could be much
lower (or higher) if the two measurements suffered from correlated (or anti-correlated) biases.
[Response:Very true. However, they are completely independent. 100% Which is what makes the comparison so worth doing.–eric]
If someone has access to high quality image files of the NH hockey stick from Mann(2008), and the Australasia hockey stick from Gergis(2012), it would be interesting to see the images displayed one above the other, on the same timescale. I used the Gergis image from this site, and the Mann image from the SkS site, and there seems to be some interesting agreement periods. For example, the sharp drops in proxy temperatures around 1350 and 1460 are present in both records, as near as I could see.
From what I understand, sometimes the NH and SH temperatures will march together, and other times be out of phase. A discussion of the two proxy records comparing and discussing the temperature swings, by someone knowledgeable of likely temperature swings over the last 1200 years, would be interesting to read.
I see “the relative importance of GHG and insolation on the warmth intensity varies from one interglacial to another.” http://dx.doi.org/10.1007/s00382-011-1013-5
Climate Dynamics, Volume 38, Issue 3-4, pp. 709-724 2/2012
[Response: My own view is that while the research on ozone and its influence on Antarctic temperatures is solid, its importance is vastly overstated. It accounts for temperature change only in summer; see David Thompson’s excellent review: here.
Note also that the notion of Antarctic cooling was based on the short ~1970-2000 record. The full data show that even at South Pole this is not the trend — it’s neutral to slightly warming, as of course we showed back in 2009 (and as O’Donnell confirmed). It *has* been cooling over the Halley Station area, but that is about it.
The 0.23C/dec rise must be caused to a certain extent to CO2 radiative forcing, maybe not as clear than it is at arctic latitudes since the rise of Co2 to ~400ppm is a global phenomenon. The ice melt in the antarctic is caused by a number of factors: warmer wind and ocean currents and black soot deposits etc. Dont forget that even if the ice extent graph doesn’t significantly change with a warmer antarctic ocean there is a hightened hydrologic cycle thus more snow fall year round giving the impression that the area under ice is constant. What is happening right now and for the past 30+ years is significant accelerated thinning of the ice shelves. Another factor which needs a lot more research but seems very alarming is the antarctic ocean bottom water or current which is 60% less today than it was 30 years ago. This frigid water around 0C is heavily saline and travels south where it eventually warms and rises and is essential to moderating the southern ocean currents and temperature and hence the southern hemisphere climatic patterns. These factors all lead to potential and imminant tipping points which probably will change the global climate faster and more suddenly than even most climate scientists can conceive of.
Comment by Lawrence Coleman — 25 May 2012 @ 6:46 AM
[Response:Indeed, one of our very first RC posts was about this. See here.]
Comment by John E. Pearson — 25 May 2012 @ 7:59 AM
The original HS uptick/blade unambiguously predated the supposed era of man-made warming (1980 onwards). I’ve never understood why this isn’t discussed more. Anyone have any ideas ?
[Response: Anthropogenic warming started much earlier than 1980 – it is rather that it wasn’t unambiguously detectable until the 1980s (i.e. the signal to noise ratio was small). However, any proxy reconstruction is attempting to match the climate changes from whatever cause. The instrumental record shows warming since the 1900s and so you expect the reconstructions to show the same thing. The issue of attribution – why the changes are what they are – is a separate issue. – gavin]
I’ve been looking for a response by the authors to Post 37 above, to the questions in this paragraph: “During the MWP the northern hemisphere portion representing western Europe over to Greenland and Newfoundland was clearly warmer than present. Do you have to consider this portion non-representative of the world back then to say that the MWP was cooler than today? If the MWP northern portion as I described was non-representative of the world back then, why would it be today? Could today’s Arctic/Northern be unusually warm while the Antarctic/Southern is warm today, as it was unusually warm then while, presumably, the Antarctic/Southern was cool then?” While I understand we can do little else but rely on proxy records for much of the SH history over this recent past, I think these questions need consideration.
The thing is that the anomalous warmth of the North Atlantic during the putative MWP is not at all a speculation. We have measurements from other portions of the globe that show the rest of the planet (with the possible exception of China) was not anomalously warm. There simply was no global MWP.
In contrast, warming today is global. Yes, the arctic has warmed more than the rest of the planet, but that is in fact what the models predict.
[Response:I am not sure I’d go so far as to say that it’s definite that the MWP wasn’t globally warm. However, the evidence certainly does strongly support that. Indeed, borehole data from Taylor Dome in Antarctica (never published unfortunately) was used by Broecker to argue that the MWP was an ocean-circulation driven “seesaw” phenomenon (cold in the South when warm in the North). Not sure I buy that — indeed, the Orsi et al. data show that at least for West Antarctica, it is wrong — but neither do I think I know the answer. We’ll do a post on this some time soon. –eric]
Back in the post on O’Donnell et al., Eric thought aloud about doing a new paper taking their valid suggestions on board (but getting it right). There isn’t one yet, is there? Or did I miss something?
[Response: Have simply not had the time. I probably will do something about it this summer. It’s more compelling with the Orsi et al results because they offer independent validation. That way it is not just me and O’Donnell arguing with another another. In the end, physics trumps statistics.–eric]
The link to the Gergis et al paper at J. Climate is broken, and the paper no longer appears to be accessible. We have no information as to why or whether it’s temporary or not, but we’ll update if necessary when we have news.
[Response: Thanks, but we’ve just been sent this from one of the authors:
Print publication of scientific study put on hold
An issue has been identified in the processing of the data used in the study, “Evidence of unusual late 20th century warming from an Australasian temperature reconstruction spanning the last millennium” by Joelle Gergis, Raphael Neukom, Stephen Phipps, Ailie Gallant and David Karoly, accepted for publication in the Journal of Climate.
We are currently reviewing the data and results.
…which of course implies that the reconstruction, conclusions etc. will need to be revisited. What impact it will have is a little unclear at this point. – gavin]
The authors publicly thanked them. Why shouldn’t that information be disseminated ? The fact you don’t like them is irrelevant. There are many people I dislike and of whom I disapprove. However, if such people have done something worthy I’m not going to pretend that it didn’t happen and not acknowledge this.
The fact is that the error was reported at Climate Audit. In amongst the supposed innuendo etc., there was very useful quantitative work going on in reproducing results.
[Response:My impression is actually that the particular error was not first identified at Climate Audit. Karoly’s letter says “also”. But whatever, this is a perfectly good example of science working properly. My initial impression is that Gergis et al.s’ results will not wind up changing much, if at all.–eric]
Perhaps now the climate community will move towards better publication standards as has been seen in econometrics (eg archiving of code and data) and medical research (eg pre-registration of drug trials).
[Response: I have nothing against code archiving (the GISS GCM code is open source, and I prepared turnkey code for our recent paper in AOAS for instance), and this can be useful. But in this case, people are doing independent replication – at least of some elements – and I have often argued that this is more significant. I don’t see how ‘pre-registration of drug trials’ has any relevance here though. Paleo-climate is an observational science – like cosmology or archeology – and there are very few possibilities for real world experiments such as you would set up for a double-blind drug trial. You are stuck with the data that people have collected and you need to make the best of that. – gavin]
There is a difference between an error being reported at Climate Audit and being discovered by Climate Audit. It appears that the Journal was aware of the problem before McIntyre posted about it. Also, I’m not sure the authors “publicly thanked” him, that was an editor at the journal who wrote the email to McIntyre? Either way, a private email which the recipient posts on his site isn’t the same thing as a public statement. Have the authors actually released a public statement of thanks? As Roger points out, let’s not pretend. I think it is important to get the relevant details before we hand out accolades.
To Gavin’s point, even McIntyre admits the stick may still exist even in the updated results.
Comment by Unsettled Scientist — 8 Jun 2012 @ 6:39 PM
[edit – please try and stay focused on issues, not people’s feelings or motivations]
If you write a mail to a blogger then he will post it. This is going public. Whether it was discovered independently or not is irrelevant. The fact is that he and others at CA discovered it through detailed reproduction work and was thanked accordingly.
They put in the hours and did something good for science. Its churlish to nitpick and not to acknowledge it regardless of what one may think of them in other areas.
Re: Unsettled Scientist at 8 Jun 2012 6:39PM
Rather than an editor at the journal, it was the senior author of the paper, Dr. Karoly, who wrote the email to Stephen McIntyre. I haven’t seen any statement about this issue from the Journal itself.
Comment by Armand MacMurray — 9 Jun 2012 @ 2:18 AM
Nothing wrong with an ackowledgement of course, but Karoly’s email mentions that they found the error by themselves as well. It’s entirely unimportant in the end.
Note what McIntyre say about it though:
“As readers have noted in comments, it’s interesting that Karoly says that they had independently discovered this issue on June 5 – a claim that is distinctly shall-we-say Gavinesque (See the Feb 2009 posts on the Mystery Man.)
I urge readers not to get too wound up about this”
So, clearly dog-whistling with something “interesting” and “Gavinesque”, and then, just for the form, saying “let’s not get too wound up about this” (so that he can defend himself against any claim that he made a big deal out of it)
Jeez, I wonder which of these two mutually exclusive messages will end up being most influential with his followers.
‘In the end, physics trumps statistics.–eric’
Hmm; ‘In The End’, maybe, but not in the real world and in the present :-)
Man: Ah. I’d like to have an argument, please.
Receptionist: Certainly sir. Have you been here before?
Man: No, I haven’t, this is my first time.
Receptionist: I see. Well, do you want to have just one argument, or were you thinking of taking a course?
Eric, you say “My initial impression is that Gergis et al.s’ results will not wind up changing much, if at all.–eric]”
That seems an odd thing to say. Only a few of the 27 proxies used in the study satisfy the described selection process. A new inspection, by the intended selection process, of all the proxies originally considered may find a number of others which should have been included. However that may turn out, the group of proxies will be quite different to the original 27 considered in the paper.
I wonder how anyone could have a reasonable opinion that this will change very little or nothing.
Of course, a similar result could probably be achieved by changing the method, but that would be a very odd thing to do and would leave the authors susceptible to the criticism that they were seeking some particular outcome, whether or not this was in fact the case.
[Response:I don’t know what it will change, because we don’t know exactly what the significance of this “error” was yet. Basing calibrations on data detrended during the instrumental period is not common practice, but neither is it unheard of. If their “mistake” was to have actually used non-detrended data to establish their calibration relationships, as is standard practice, then two commenters at Climate Audit itself have already provided data, from their calculations, showing that a high percentage, though not all, of the 27 sites used by Gergis et al., are significantly related to warm season temperatures at p = .05. –Jim]
Comment by Geoff Cruickshank — 9 Jun 2012 @ 4:36 AM
The ClimateAudit claim for having first identified the “issue” could be valid but things aren’t entirely clear.
ClimateAudit was thanked (not publicly) ‘on behalf of all the authors of the Gergis et al (2012) study‘ thus – “We would like to thank you and the participants at the ClimateAudit blog for your scrutiny of our study, which also identified this data processing issue.” The timing of the ‘discovery’ of the “issue” was an identical date to the comment at ClimateAudit that first expresssed concern on the matter (although when the “issue” was truly expressed within ClimateAudit is something else again).
If we assume the ClimateAudit site as the discoverer, should we be mentioning it here? (Perhaps we should at the same time be acknowledging the contribution made by Limin Xiong to Gergis et al 2012, which, who knows, could have been just as essential to the eventual findings of the paper.)
I am no expert on what goes down at ClimateAudit but I see an irony in denialist websites (whose main ingredient is wanton ignorance melded with rich portions of error) being applauded for their very occasional successes. Yet websites such as RealClimate here (whose main ingredient is the evidence-based science) are continually sniped at and derided by those self-same denialist sites. (The sniping fire of course is not one way.)
So, is ClimateAudit like a good traffic warden in a car park – a hated but essential functionary in preventing parking chaos? Or does ClimateAudit go further like a bad traffic warden – terrorising motorists by continually trying to have ‘offending’ automobles towed away and crushed for the most minor infringement of petty regulation? I am no expert here.
As for “losing nothing by acknowledging McI,” will McI or the wider denialist community be ‘dining out’ on the strength of this ‘success’? If so, would any ‘acknowledging’ then simply become some trophy for the fellow diners to gloat over?
[Response: There is a very important asymmetry here. For the most part, scientists are actually interested in finding out stuff about the real world – mistakes made in analyses and bugs in code, while an unfortunate fact of life – are impediments to that, and so have to be fixed when found. The idea that scientists are inerrant plays no role in real discussions of the issues. However, there is a group that sees every error, regardless of its significance, as more ‘proof’ that all the science on climate change can be discarded. So each reported error becomes part of a litany of oft-repeated mantras designed to reinforce that central idea. The point of such exercises is to underline that scientists are imperfect (which they are), though it too-often morphs into statements that scientists are deliberately deceiving (which they are not). (It is the latter statements that are offensive, not the former). On the contrary, despite personally feeling bad after making a mistake, fixing errors is a big part of making progress (though the more interesting errors are usually more subtle than the one being discussed today). The way forward is not to fret about having made mistakes or worry about giving ‘different sides’ ammunition, but to see what actually changes and what it implies for anyone’s understanding of the real world. – gavin]
Actually I think this episode illustrates beautifully the difference between the scientists and the “auditors”. The auditors are crowing over a victory and suggesting that the changes will favor their position. The scientists just want to know the new result.
The described methodology was passed by review but it’s now been discovered it wasn’t followed so the paper has been withdrawn. The obvious response now is to redo the analysis with the original methodology correctly applied to the same data and publish whatever the result is. Of course “unprecedented warming” wasn’t being sought so it doesn’t matter whether it is there or not. Even without climate change a historical temperature reconstruction of this sort provides a useful addition to our scientific knowledge. The only possible hiccup is if insufficient correlating proxies can be found in which case the paper should be shelved and a retraction of the reported claims tendered (this paper made news headlines).
The way forward is not to fret about having made mistakes or worry about giving ‘different sides’ ammunition, but to see what actually changes and what it implies for anyone’s understanding of the real world.
@Armand MacMurray, thanks for that info. I haven’t followed this issue closely. As Roger pointed out, Climate Audit is full of innuendo, and that’s just not my cup of tea when it comes to science. I tried reading through McIntyre’s posting, but even while admitting that this may not impact the results much he puts words like get in quotes and phrases as, “They might still claim to “get” a Stick using the reduced population of proxies that pass their professed test.” Using that kind of language isn’t exactly a gracious way to accept thanks, privately or publicly.
I’m still not sure it was a public thanks. In my line-of-work I’ve recently seen some individuals publishing on their website the details of a private correspondence from a colleague from another institution, and that is just scummy. So that kind of thing just leaves a bad taste in my mouth.
[Response:Yes, I agree. This is exactly the kind of behavior by which he repeatedly shoots himself in the foot, weakens his cause and creates unnecessary bad blood.–Jim]
I’d consider it a public thanks if it was made public by the author, not but the recipient. This way is a private thanks, a classy way to handle a public thanks is simply to link to it.
To Ray’s point, I agree. Give credit where credit is due. I just don’t know the full details and I find McIntyre’s innuendo off-putting. I’ll definitely give credit to the journal and the authors for being good scientists, acknowledging the problems in their own work, and moving forward to address them.
Comment by Unsettled Scientist — 9 Jun 2012 @ 8:44 AM
#51 Ray Ladbury
You say “Actually I think this episode illustrates beautifully the difference between the scientists and the “auditors”. The auditors are crowing over a victory and suggesting that the changes will favor their position. The scientists just want to know the new result”.
Ray, you could equally well have said “Actually I think this episode illustrates beautifully the difference between scientists A and scientists B. Scientists B are crowing [shame on them] over a victory and suggesting that the changes will favor their position. Scientists A just want to know the new result” (their enthusiasm unabated).
@ Comment by Unsettled Scientist — 9 Jun 2012
Paragraph 1: “Climate Audit is full of innuendo, and that’s just not my cup of tea when it comes to science.”
Paragraph 2 (referencing Climate Audit): “I’m still not sure it was a public thanks. In my line-of-work I’ve recently seen some individuals publishing on their website the details of a private correspondence from a colleague from another institution, and that is just scummy. So that kind of thing just leaves a bad taste in my mouth.”
McIntyre has been “auditing” climate-science for something like a decade now. And it looks like this will be the first time that he has actually unearthed significant errors in a published paper. Science will never be error-free, but the fact that it has taken McIntyre (someone who is obsessed with uncovering scientific malfeasance/wrongdoing and who has had nearly unlimited spare time to pursue his obsession) on the order of a decade to find nontrivial errors in a paper really is a testament to the robustness of climate-science as a whole.
This whole thing seems reminiscent of the IPCC Himalayan glacier brouhaha. A real error uncovered, and an embarrassing one at that — but nonetheless an isolated one that did not have any impact on the science as a whole.
And has McIntyre ever acknowledged that he erred when he used tree-ring data to train his “hockey-stick” red-noise noise generator without de-trending the tree-ring data first (to remove the long-term climate signal)? Because he failed to de-trend the tree-ring data before using it as a noise model, the “random noise” that he claimed would produce “hockey sticks” was itself contaminated with “hockey stick” signal statistics. That was definitely a major screwup/oversight on his part. It seems to me that when it comes to acknowledging errors, there’s more than a bit of a double-standard here.
[Response:He didn’t unearth them, one of his readers did. But it is not at all clear that this mistake by Gergis et al is in fact even a real problem–not nearly enough is known about exactly what the error was, much less what kind of impact on their study it will have. It has to do with how they did their calibrations. I’ve made several points over there since he raised this issue; the latest one is here: http://climateaudit.org/2012/06/08/gergis-et-al-put-on-hold/#comment-337241 Until Steve McIntyre addresses this “Screening Fallacy” issue (as he calls it), with conclusive, detailed analysis, he just doesn’t have a case, period. As for the detrending, I think you’re mixing up two different detending steps. The detrending of the tree cores to remove the age/size effect is not the issue, it’s rather the detrending, during the instrumental data period of the (1) climate data and (2) site-level chronologies (mean ring value for each year, after removing the age/size effect).–Jim]
“This whole thing seems reminiscent of the IPCC Himalayan glacier brouhaha. A real error uncovered, and an embarrassing one at that — but nonetheless an isolated one that did not have any impact on the science as a whole.”
A while back, out of curiousity I looked up the *physical* IPCC AR4, that is the books themselves, in a library. Took a few photos of exactly what glaciergate amounts to. Feel free to link these if they’re useful.
What defines science is its ability to make correct predictions: eclipses, Trinity, Apollo, for example. (That something happens to be published doesn’t necessarily confer scientific credibility, far from it).
For those who study climate to arrogate to themselves the title “climate scientist” is in my view sadly hubristic and premature. Radiative physics is one thing but understanding the actual behaviour of clouds quite another. For me a field of study needs observable and verifiable results to qualify as science. In climatology there just aren’t any. Not any.
Sorry Ray, until the climate community can make some convincing predictions about climate development and evolution they’re just wannabe climate scientists. Maybe one day, but not nearly yet.
[Response: Utter nonsense. There have been skillful predictions of all sorts – for future events: the impact of Pinatubo, the impact of ozone depletion, long term trends in temperature, etc. – for resolutions of apparent inconsistencies in the observed data – CLIMAP LGM SST, MSU temperatures, etc. – out-of-sample hindcasts for paleoclimate – the LGM, 8.2kyr event, the mid-holocene etc. That you choose not to acknowledge any of this is far more a commentary on your perception than it is the reality. – gavin]
Comment by simon abingdon — 10 Jun 2012 @ 12:31 AM
Considering that the Gergis et al paper has been “put on hold,” would it not be proper to indicate the same under the chart prominently displayed at the beginning of this section, or remove the chart in its entirety?
Regardless of ones views/opinions on the overall subject global warming, this episode does highlight one aspect that frustrates sceptics.
They make the charge, speaking in the generality here, that the MSM blindly and unquestioningly report stories, with lots of coverage, where a summary might fall under the headings ” it’s worse than we thought”, “it’s worse than it’s ever been”, etc. – and that this is the nature of the MSM. Their frustration is inflated then because no matter what happens afterwards, those headlines stick and any rereporting as a result of other evidence, retractions, etc – never get the same degree of coverage.
’twas ever thus.
[Response: The media does have biases of course, but the biases are towards things that are ‘news’ – controversy, sensation, man-bites-dog etc. But this is not bias towards any particular side in the climate issue – indeed, anyone of us can list dozens of high profile stories that got things horribly mangled or where marginal issues and people are erroneously elevated to the central stage. Generic paranoia about media coverage is not useful, one is better off trying to understand the fault lines in the whole enterprise and trying to avoid them. – gavin]
What writers of and posters to, Real Climate do not acknowledge is that the MSM seizes on papers such as that of Gergis et al to promulgate the position of the proponents of anthropogenic climate change/global warming but do not run subsequent stories announcing that the results may not be entirely as initially presented. Consequently although scientists and possibly even auditors, may be aware that there could be a modification of the results, this is not made clear to the general public and could generate a bias in public opinion. That said, I agree entirely with Gavin that “giving different sides ammunition” is not the way forward and I wish it were possible for both sides to discus their differences without rancour which, unfortunately, stems more from posters to rather than writers of blogs addressing climate science
Ah, I see James and Ian have received their talking points.
What horsecrap! Dudes, do you think the level of emphasis for the scientific position versus the “skeptic” position even remotely approaches the 97% level of the actual scientific consensus? I do not know of a single other scientific topic where the level of consensus and its portrayal in the media are so divergent.
Ray, I think you’re confusing signal and noise wrt the media coverage. Note how the instant a comment was made on this thread about one study being reconsidered out of 3 under discussion, the thread swerved into political perceptions of climate science/scientists – ignoring the “signal” represented by the other 2 studies and further ones cited in the topic post. We’ve now over 50% of the comments being irrelevant to the topic at hand.
I see news articles on climate science regularly, and very little “anti-climate science” – outside the political arena [might be a Canadian slant]. In addition, the “media problem” James and Ian bring up is irrelevant here as the study wasn’t really even published yet, let alone promoted in the general press, just fodder for bloviators. Please resume the previously scheduled program. ;)
#61 Simon, You may be referring to the good folks at Accuweather , some of whom have derided AGW as a joke, however as I note on my blog and website, they are terribly wrong most times with long term forecasts precisely because they don’t “believe” in AGW. They have a problem conceptualizing the impacts from trace gases. There are large successes in prediction with those who include every trace gas even water vapour which makes clouds…
Regardless of the state of this paper, anthropogenic climate change and global warming are both real. What your post highlights is the misconception in the general public that any new paper could change that. The scientific consensus isn’t swayed by the latest & greatest paper to be published. The consensus is built by taking the totality of evidence into account. When multiple independent avenues of investigation over the course of many years and decades point towards the same conclusion, it becomes the consensus understanding. Whether or not ACC or GW exist is not the interesting scientific question. We are now refining our knowledge of how they work. Any paper showing ACC or GW to not exist, no matter how well done, would need to be backed up by a plethora of papers to follow it, using multiple angles of viewing the world, before it would change the consensus.
Comment by Unsettled Scientist — 10 Jun 2012 @ 10:09 AM
Ray:Ah, I see James and Ian have received their talking points.
Yes. Trundle over to WUWT and you can read Ian and James’ lessons as delivered by the charismatic authority figure in charge.
A bobble like Gergis’ is an opportunity to gauge the desperate hunger for real meat on the part of contrarians. They don’t require much substance to stay alive but when the rare treat is offered they become embarrassingly excited.
In any case it’s going to be interesting to see the Gergis rework.
The detrending of the tree cores to remove the age/size effect is not the issue… –Jim
Didn’t word my post as well as I should have — the “detrending” I was referring to (not the best terminology) was in reference McIntyre’s failure to remove the long-term (long autocorrelation time) climate signal from the tree ring data before he ran it through the hosking.sim() R procedure to generate his red noise.
So it appears that McIntyre’s “trendless red noise” was contaminated by long-term “hockey stick” climate signal because of his failure to pre-process the tree-ring data properly before using that tree-ring data as a red-noise “template”.
To my non-expert eyes, this seems like a pretty major oversight on McIntyre’s part — If I’m not totally off-base here, this seems like something that he should acknowledge, especially given the fact that he used his results to imply that Dr. Mann was so inept that he would confuse a random noise artifact with a coherent climate signal.
Re Ian @ 70. You are correct. No one involved in then climate change debate will change their mind on the basis of one paper. But fundamental to the case that current climate change is a major concern is evidence that current temperatures are exceptional both in absolute level and rate of rise. The Gergis et al paper is an important addition to that case. ‘IF’ the Gergis et al paper is eventually rejected, the case that current temperatures are exceptional GLOBALLY is severely weakened. It would also indicate that the current peer-review process is inadequate.
[Response: No. Concern about human caused climate change rests on the six points we’ve outlined many times. The relation of today’s temperatures with respect to past periods is certainly of interest – but note that the UNFCCC was signed in 1992 – years before the first multi-proxy reconstruction, and when the anthropogenic signal was barely out of the noise. You are indulging in some serious revisionism here, designed merely to make hay from this particular episode. Secondly, since the Gergis reconstruction was only for Australasia, and indeed was the first for the region, whatever comes of the revisions this will still be the only information from that region. Since this didn’t exist last year, even if you discount this to zero (which won’t be the case), you aren’t any worse off than you were last year. What then has been weakened? And as for peer review, you appear to be under a serious mis-apprehension that peer review is a guarantee of correctness – this is simply not so. Peer review is merely the first step in evaluating any new idea – it is a minimum condition and not sufficient in itself. But this should be obvious, even complete garbage like McClean et al or Soon and Baliunas can get past peer review with a little ‘luck’. – gavin]
re Accuweather, it might be worthwhile to note that Brett Anderson (no relation) who has run their climate change blog forever is excellent (dare I say superb?) and does not burk the facts. In addition, it appears that over time there are others over there who are acknowledging reality. Yes, I know they have Bastardi and doubtless others, but it has changed for the better significantly IMO.
But fundamental to the case that current climate change is a major concern is evidence that current temperatures are exceptional both in absolute level and rate of rise.
Restated, as the physics of this situation is quite clear at a nearly intuitive level and extending through outlandishly bulky calculations, it’s essential to contrarians to have alternative means of somehow discounting what’s painfully obvious. A myriad of jackalope hypotheses depend on maintaining the illusion of ignorance of the temperature record as well as being able to dismiss startling observations.
One more reason that Jim didn’t mention for expecting the Gergis et al. result not to change much, is a couple of simple reality checks that can be done by inspecting the actual manuscript.
See figure S2.2 page 56. Whether you use 27 proxies for the reconstruction, or 14, the resulting curves over the common period are highly similar. They tell the same story, in spite of only half of the proxies in one also being present in the other. Even using only four proxies doesn’t really change the storyline. To me this suggests that the story must be true. And if such large differences in the choice of proxies used produce such minor changes in outcome, why would changes in screening strategy be any more consequential? This stuff is way more robust than most people imagine.
Or look at figure 2 page 47. Here, the reconstruction reproduces every single twist and turn in the instrumental data — not photorealistically, but well enough. Sure, the reconstruction was calibrated against the instrumental record — but, every gory detail? That only makes sense if those proxies really follow temperature. Remember that the recon is just a weighted sum of contributing proxies; calibration determines the weights. The only way the recon can follow the bumps in the instrumental record is if some or all of those proxies already do.
So, this reconstruction may have its problems and may well not be the best that can be extracted from this data, but warts and all it is definitely a valid one.
Thanks for the correction. Joe Bastardi’s departure does not change the substance of the rest of what I said. AccuWeather has been gradually altering, and Brett Anderson has never been anything but solid on the facts. I still believe they are moving towards a more reality-based position on climate change and global warming, in line with developments in the world.
77 Robert , nice to know that someone there has some understanding of AGW, it would be infinitely better if they actually refuted those who claimed that trace gases are of no significance. In here lies the difference, and especially the great confusion, contrarians love and live by this confusion, if on one hand one representative says something and the other completely the opposite, how can they represent the same science? How is the lay audience suppose to judge all this? Especially since one view clashes violently against the other. Freedom to speak is great, it reveals the point of view behind the person, but the institution practicing meteorology is equally free to speak, I haven’t heard Accuweather refuting Bastardi while he confidently broadcasted to millions that a trace gas is of no import. I am waiting for this meteorological company to defend correct science in order to clear the air.
Without this, the good work of Mr Anderson is much diluted with junk science having no prediction prowess.
Martin Vermeer — 10 Jun 2012 @ 3:59 PM:
This stuff is way more robust than most people imagine.
(Yeah, I know that I’m gonna sound like a broken record by posting these results here *yet* again — but I’ll risk boring the regulars in the hopes of catching the attention of some of the new lurkers who happen to pop in here.)
Now, this doesn’t have anything to do with the Gerges et al. data, but it’s still a nice example what kind of results you can get from a totally $#@!ing robust data-set.
Some time ago, I tried generating global-average temperature estimates from very small numbers of GHCN stations (using *raw* data).
Divided up the planet into just 4 grid-cells, picked a single *rural* station per grid cell (i.e the station with the longest temperature record in that grid cell), and computed average temperature anomaly results from just those 4 *rural* stations via a brain-dead-simple anomaly-averaging procedure.
Repeated the process several more times by successively increasing the number of grid-cells (whilst keeping the grid-cell areas as close to constant as possible) and selecting 1 *rural* station per grid-cell as described above.
That would be *raw* data from exclusively *rural* stations, lurkers — so issues involving UHI or data “homogenization” have been taken off the table here.
As you can see, data from only a few temperature stations is needed to see a clear global-warming signal. And by the time you get to ~70 (out of thousands) of stations, you get results that track the official NASA/GHCN results pretty darned closely.
The take-home lesson for new lurkers here is: A brain-dead-simple averaging procedure, applied to raw data collected from a very tiny percentage of the GHCN stations, still produces global-average temp results that match the official NASA/GISS results *very* closely.
The take-home lesson for lurkers here is: In spite of what Anthony Watts and other deniers have been telling you for gawd-knows how many years, the surface temperature record is way more robust than most people imagine.
What writers of and posters to, Real Climate do not acknowledge is that the MSM seizes on papers such as that of Gergis et al to promulgate the position of the proponents of anthropogenic climate change/global warming but do not run subsequent stories announcing that the results may not be entirely as initially presented. Consequently although scientists and possibly even auditors, may be aware that there could be a modification of the results, this is not made clear to the general public and could generate a bias in public opinion.
The reason the paper was so important was because it purported to show an “Australian hockey stick”. They started with 62 proxies, discarded 35 leaving 27 and have a whopping 2 proxies going back 1000 years and 0 of them actually from Australia.
They made a big deal about screening de-trended data to eliminate the obvious statistical problems of pre-screening for your desired signal. Now it seems likely they will remove that statement and screen the paleo-climate way instead, (well this proxy matches our signal, so we are going to use it to independently confirm our signal, this one doesn’t match our signal, so we’re not going to use it to disprove our signal).
The paper originally showed it was less than one tenth of a degree C warmer today than 1000 years ago with an uncertainty of 2 tenths of a degree (nice). It’s not going to be a ‘stronger more robust’ paper when it’s re-submitted without the de-trending statement. The uncertainty will increase.
[Response:No, you have a number of things factually incorrect. First, their reconstruction is for the entire Australasian region, the boundaries of which they explicitly define (by lat/long). Second, they most emphatically did not make a “big deal” about their detrending of the data before calibration–indeed they under-stated it, describing it with only a couple of sentences. You wouldn’t make such a statement if you had actually read their manuscript. Rather, Climate Audit has made the very big deal about it, as is their wont. Third, your assessment of how and why screening of proxies is done is just plain wrong, I absolutely guarantee that. And you have no basis for knowing whether the paper will be stronger or weaker after their re-assessment of the situation. You are simply guilty of believing what they promulgate endlessly at Climate Audit, without making your own investigation of the issue.–Jim]
D. Robinson: The reason the paper was so important…
So, to be all Perry Mason about it, you stipulate that if the paper is corrected and the results do not change, you will continue to find it important? That is to say, your belief about the importance of the recent change in temperature will be swayed?
Important, or not important? I’m guessing that for a lot of folks the answer is, “it depends.”
Re: Jim (response to Martin – 85). What everyone can reliably predict, on both sides of the debate, is that the paper will be re-submitted relatively unchanged, but with reference to ‘screening de-trended data’ removed from the description of the methodology. Your own comments, here and in another place, have indicated this would be acceptable to your side of the debate.
[Response:Negative. Nobody can predict what their re-examination of their paper will produce, nor what the Methods section will state. And I have never said anything remotely like what you claim, anywhere. Furthermore I don’t have a “side”, regardless of what you imagine–Jim]
Amazing how quickly these discussions swerve out of any rational basis. There’s a wealth of information in D. Robinson and oakwood’s remarks but it mostly doesn’t pertain to Gergis et al.
The prejudice seems to be that Gergis et al is a mini-conspiracy dove-tailed with a larger plot. The results of the paper are predefined as described by CA and crew. The two recent comments by D. Robinson and oakwood neatly follow McIntyre’s narrative:
Looking ahead, the easiest way for Gergis et al to paper over their present embarrassment will be to argue (1) that the error was only in the description of their methodology and (2) that using detrended correlations was, on reflection, not mandatory. This tactic could be implemented by making only the following changes:
[McIntyre imaginary edits as repeated here by D. Robinson and oakwood]
“Had they done this in the first place, if it had later come to my attention, I would have objected that they were committing a screening fallacy (as I had originally done), but no one on the Team or in the community would have cared. Nor would IPCC.
So my guess is that they’ll resubmit on these lines and just tough it out. If the community is unoffended by upside-down Mann or Gleick’s forgery, then they won’t be offended by Gergis and Karoly “using the same data for selection and selective analysis”.”
I’ll hazard a guess that McIntyre’s expecting the conclusions of the paper to remain unchanged. Whether his remarks are inoculation against disappointment or sincere belief remains even more speculative.
BTW, McIntyre’s use of words such as “forgery” is exactly why his followers shouldn’t be surprised when other people find his language distracting and counterproductive. If you’ve not already soaked in that style of debased hyperbole it’s quite conspicuous.
Good grief, all you have to do is just print the words “hockey stick” and the crazies come out in force.
[Response:Disturbing is the word for it.–Jim]
Comment by SecularAnimist — 11 Jun 2012 @ 12:39 PM
Jim, simply give us here a succinct and brief statement on how you would handle the issue. Your statements have been unclear. I recall you seemingly to agree with the Gergis screening process and seemed to insist on the detrended procedure. Later you referred to either selection process – detrended or not detrended – could be used.
[Response:How I would handle what issue?. I’ve already made my position clear through numerous comments, many of which I’ve had to repeat over and over again at Climate Audit, because of all the misdirections and misinterpretations they engage in over there. I’ve already said there are arguments that can be made either way regarding what time scale of data to use during calibration, and I have never even remotely insisted on their detrending approach. You almost surely got such an interpretation from what’s been said at Climate Audit by somebody or other. I don’t have time to launch into some extended discussion of calibration methods choices here. This stuff is all discussed in the literature and dendrochronology core texts like Fritts etal, Cook and Kairiukstis et., but if I get the time, I’ll do a post on it. UPDATE: I further note, that your comment is a verbatim quote of a question posed by someone at Climate Audit. I mean, you can’t even come up with your own question demand/accusation??–Jim]
D. Robinson, Gergis and colleagues appear to have used an abundance of caution in constructing their record as is plain when reading their paper(manuscript is still available for those interested; see upthread for link). For folks who’ve bothered to read or even just skim the paper, Gergis’ conclusion is obviously not based solely on two proxies though that’s apparently what some of us are led to repeat. We don’t even have to go past the abstract to get that:
“Applying eight stringent reconstruction ‘reliability’ metrics identified post A.D. 1430 as the highest quality section of the reconstruction, but also revealed a skilful reconstruction is possible over the full A.D. 1000–2001 period.”
Notice that there’s a distinction drawn between the later and earlier portion of the record?
One of the striking things about McIntyre’s critique is how he’s able to invert Gergis’ meticulous approach to data quality and recast it as nefarious.
If you’re very bothered by a diminishing number of proxies go ahead and toss the earliest 400 years, but remember you can’t fill in those missing years with your imagination. You’re left with a record that’s worse, if you’re keen on “flat.”
On the other hand, if you prefer more proxies, drape the entire Gergis record on the ensemble of other similar reconstructions and watch it vanish; the Gergis result is hardly a surprise and fussing over the number of early proxies in the reconstruction isn’t going to make things pleasingly unbent. More proxies whether they come in ones or twos or by the dozen only help cement what’s fairly obvious at this point.
As usual, the lesson here is to actually look at the paper in question– look at everything– not just follow instructions.
#90 Jim, a post laying out the real issues would be a useful antidote, but I can understand you might have better things to do.
It seems to me that Gergis et al have at least three choices.
a) They could stick with the screening methodology exactly as described. But this would apparently result in only 8 proxies in the reconstruction, as far as I can tell.
b) They could adjust the methodology to what was actually done (i.e. not detrend proxies and temp series before screening). This seems to be what McIntyre expects. But I would be surprised if this were done without at least exploring other approaches, given that this was a key, explicitly justified, element of the original methodology.
c) They could relax some of the other screening criteria, but still apply to detrended series. Here Mann et al 2008 gives some possible changes.
The screening process requires a statistically significant (p < 0.10) correlation with local instrumental surface temperature data during the calibration interval. Where the sign of the correlation could a priori be specified (positive for tree-ring data, ice-core oxygen isotopes, lake sediments, and historical documents, and negative for coral oxygen-isotope records), a one-sided significance criterion was used …
Presumably using p < 0.1 and one-sided significance screening (eminently justified, IMHO) would result in a proxy set somewhere between 8 and 28 in size (although at least one tree-ring and one coral proxy would get tossed).
Or they could do all of the above (or other variations) and compare the robustness of the resulting reconstructions.
[Response:Common practice is to use non-detrended data and to calibrate on annual-scale data over the instrumental period, as validated by split calibration and validation periods and some specific validation (aka “verification”) statistics. So, if that’s what happened, that’s really all they have to say, and then report the results. The other, broader misconceptions in dendro-climatology, particularly wrt screening of proxies, are what need to be written about, given the enormous amount of confusion and outright false statements that have been generated at Climate Audit on this topic.–Jim]
[JB: … Common practice is to use non-detrended data and to calibrate on annual-scale data over the instrumental period.]
I don’t disagree with that, of course, but the authors initially wrote:
For predictor selection, both proxy climate and instrumental data were linearly detrended over the 1921–1990 period to avoid inflating the correlation coefficient due to the presence of the global warming signal present in the observed temperature record.
The “on hold” statement reads:
While the paper states that “both proxy climate and instrumental data were linearly detrended over the 1921–1990 period”, we discovered on Tuesday 5 June that the records used in the final analysis were not detrended for proxy selection, making this statement incorrect. Although this is an unfortunate data processing issue, it is likely to have implications for the results reported in the study. The journal has been contacted and the publication of the study has been put on hold.
“Likely to have implications for the results” suggests the authors are at least considering alternatives to simply specifying screening with non-detrended data.
[Response: Yes, thanks for clarifying that. I didn’t mean to imply that their results themselves wouldn’t change, only that they don’t need to do some kind of massive re-working of their entire analysis, if in fact they had simply followed standard procedures but mistakenly reported otherwise.–Jim]
D. Robinson, we know from the instrumental record that temperatures have trended upward over Australasia as they have globally. This — what you call “the signal” — we do not need proxies for to tell us, we knew that already. The interesting thing is what happens further back, where we don’t have instrumental data. But tell me, would you actually trust as temperature proxies, proxies that fail to repond to this clear signal?
[Response:The logic of many of those questioning tree ring analysis goes like this: (1) tree rings are maybe marginally OK as a climatic proxy, as long as you don’t do any type of screening of supposedly “bad” sites; you have to take all the data, or none of it, based on an apriori decision of whether or not a certain species is a climatic responsder, from which you can’t deviate after you’ve collected the data, (2) a sizeable fraction of your sites don’t pass a given screening criterion, therefore you can’t use those sites, and therefore, any sites of that species, (3) therefore tree rings can’t be used for paleo-climatic estimates at all and if you try to do so, you are guilty of believeing in, and executing, the “Screening Fallacy”, which they suppose, with nebulous definition and no actual demonstration of validity, is some kind of inviolate law.–Jim]
The application of the “screening fallacy” to this scenario somewhat escapes me. I’m not inclined to try for an explanation at McIntyre’s site as it’s such an emotional mess.
[Response:My experience is that you are very unlikely to get an answer that means anything. I’ve already tried, several times, very directly and the lack of a clear and definite answer of what he means by the term, supported with a clear conceptual formulation and specific code and other evidence, has been complete. Accordingly, I don’t believe he himself knows what he means–Jim]
McIntyre’s central objection seems to be this:
“In the terminology of the above articles, screening a data set according to temperature correlations and then using the subset for temperature reconstruction quite clearly qualifies as Kriegeskorte “double dipping” – the use of the same data set for selection and selective analysis. Proxies are screened depending on correlation to temperature (either locally or teleconnected) and then the subset is used to reconstruct temperature. “
[Response:It’s just a foolish argument. It’s this simple: trees respond to multiple potential external inputs and you want to find that set of trees that is responding most strongly and clearly to that external input in which you are most interested. This necessarily involves a selection process, and that selection process is based on the correlation between an observed process (radial tree growth) and a variable (temperature) that we know from many other lines of evidence is a legitimate physical driver of that observed process. The fact that other drivers can also affect that process, and might sometimes even be correlated with that observed process in a statistically confounding way, does not negate the legitimacy of this procedure.–Jim]
I’m not sure how this does not actually rule out calibration as applied by McIntyre. For instance, if we have an assembly line churning out thermometers and we select samples for various brackets of accuracy against an independent standard, does that mean we can’t obtain useful measurements from thermometers found to be performing within certain brackets of accuracy? Would our measurements be better if we ignored a known good standard and simply included all the thermometers coming off the line?
Gergis and crew didn’t use tree rings or coral for samples for screening; rather as in the case of the thermometer production line example they used an independent standard:
“Our instrumental target was calculated as the September–February (SONDJF) spatial mean of the HadCRUT3v 5[deg]x5[deg] monthly combined land land ocean temperature grid (Brohan et al., 2006; Rayner et al., 2006) for the Australasian domain over the 1900–2009 period.”
McIntyre’s objection seems to come down to Gergis’ rejection of measurements that don’t hold up against an independent standard, as describe in Gergis et al:
“Only records that were significantly ( p [less than] 0.05) correlated with the detrended instrumental target over the 1921–1990 period were selected for analysis.”
Ok. So looking at this through McIntyre’s lens, how would one obtain a proxy record calibrated against an independent standard? As far as I know, all thermometers are proxies (unless there’s an experimental physicist who can say otherwise?) so it seems that if we test McIntyre’s logic to the breaking point, we don’t really know the temperature of anything because all of our measurement devices have been tested, with under-performing devices rejected, thus leading to a “screening fallacy.”
[Response:Right. As I noted in my response to Martin, his arguments if taken to their logical extreme, will eventually close off all use of tree rings as a climatic proxy. He says otherwise (that tree rings are not completely worthless as a proxy) but this conclusion is where his arguments ultimately take you if you follow them.–Jim]
Or is the “screening fallacy” a “get out of jail free” card?
I’ve got a question for the dendro-folks…It’s probably already discussed in a paper I can be pointed to.
In other sampling-type studies, a certain number needs to be taken so that the power of the sample can accurately describe the whole with a manageable margin of error. Does this sort of thing not play a role at all in dendrochronology after selecting a site a priori for its determined ability to signal global temperature (ie, ‘treeline’, etc.)?
I’m just thinking there are hundreds of thousands of trees in a region, and the cores per site can be 20-30. I’m assuming since that’s not enough to be representative of the whole that there’s got to be a set of arguments why:
(a) out of an entire population of, say 10,000 trees, 10 cores out of 20-30 are said to effectively model the instrumental temperature record (regardless of what they display in other eras), and have that be certainly NOT a product of chance, but rather a lead-pipe lock that it is describing the signal, or
[Response:Why you say “10 cores out of 20-30″ I don’t know, because all the cores of a site, not just some fraction thereof, are used to provide the climate signal of interest. But to your point specifically: please point me to any verbal concept, any algorithm, any model code, ANY WAY in which a stochastic process can lead to the types of high inter-series (i.e. between cores) correlations typically seen in the tree ring data archived at the ITRDB. Just go there and start randomly looking at the mean interseries correlations documented in the COFECHA output files of each site, and tell me exactly how you would generate such high numbers with any type of stochastic process that’s not in fact related to climate. And then how you would, on top of that, further relate the mean chronology of those sites to local temperature data at the levels documented in many large scale reconstructions. I want to see that model. And if I don’t see that model, I’m not buying these bogus “screening fallacy” arguments. Make sense?–Jim]
(b) that it doesn’t matter how many trees exist in a region, nor how many cores are taken relative to the total number of trees– it’s simply about discovering any individual trees that evidence the temperature signal and discovering as many co-located as possible. It doesn’t matter how many there are, but instead how well they match the signal.
[Response:At the risk of having this statement completely misunderstood and mangled by the usual suspects…if you only had one tree out of 10K that responded well to temperature, and you found and cored that one tree, you would have legitimate evidence of a temperature signal. Fortunately, the situation is nowhere remotely so extreme as that, and that’s because, in fact, that many trees respond in this way, and therefore you only need some couple dozen or similar to get a signal that emerges strongly from the noise at any given location. And why do many trees respond this way? Because, lo and behold….temperature is a fundamental determinant of tree radial growth in general, i.e. a fundamental tenet of tree biology.–Jim]
OK, there are thousands of trees. However, if the same physical conditions persist over the entire range, then trees of a particular species ought to respond in a comparable fashion, no? And remember, the time series consists of rings for a single individual. You are looking at multiple time series to average out extraneous effects–e.g. microclimate.
“…if you only had one tree out of 10K that responded well to temperature”
But how do you know that one tree is not just more noise, which by coincedence just happens to match roughly to 20th century temperatures? You have not answered this question.
(expecting the typical blockage of comments you don’t like, a response would still be nice, nonetheless…)
[Response:Yeah, no kidding, because that’s not the question that was asked. It’s a hypothetical given; the point is that if you knew you had one good thermometer and 9,999 bad ones, the one is all you need to make your estimates. The answer to the question you are trying to ask is the one I already answered: the mean interseries correlations and their relationship to local temperature are waaaaaaaaaaaaaaaaaay too strong and too frequent to be explained by chance. It’s not even a topic for legitimate consideration frankly; indeed, it’s game-playing–Jim]
Rephrasing Jim’s remarks, I suppose we might say that if we had 100,000 monkeys sitting in 100,000 trees banging away on typewriters and one of ‘em turned out Moby-Dick while the rest were spewing gibberish that singular primate would indeed turn out to be Melville. The probability of an actual lower-functioning primate authoring a single chapter let alone 135 is effectively nil.
In this case there are even fewer trees so the chances of an accidental masterpiece are rather less, even though the plot still involves monkeys and obsession.
“Moby Dick seeks thee not. It is thou, thou, that madly seekest him!”
I think there’s a minor, legitimate but well-understood point underlying the selection fallacy talk at CA that could be better observed in scientific publication. It is actually true that if you select proxies according to their correlation with instrumental temperature in some training period, then you don’t have an independent measure of the temperature in that period. If you have ten clocks in a shop and set them going according to your watch, then the fact that 11 timepieces show the same does not improve your knowledge of the time, though each clock does tell the time.
The lack of real info in the training period isn’t serious – we have more reliable instruments.
When a graph shows instrumental and proxy temps for the last 1000 years, it’s intended to convey the best temp estimate. But it usually includes the 20Cen proxy curves, and for that purpose, they don’t belong. Instead, their proper rationale is to show how well the proxies did in fact align with instrumental – ie how well they were chosen.
This confounding underlies a lot of recent troubles. With “hide the decline”, I imagine post-1960 divergent temps sometimes aren’t shown because we know they aren’t right, and nobody, most of CA included, seriously believes that they are. The legitimate objection is to hiding the divergence.
But I think the proper remedy is not to try to show both things on one plot. To show temperatures, just show proxies pre-training period and instrumental after. Or to show how effectively the proxies correlated, just show instrumental and proxies in the training period (and include divergence).
A great deal of the fussing at CA is over suspicions that someone is favoring proxies that show hockey sticks. But the hockey stick, correctly understood, has a proxy shaft and an instrument blade. The proxies only tell you about the shaft (because of “selection fallacy”). So there should be nothing to achieve by selecting for HS-ness. If the proxies were not plotted post 1900, that would be made clearer.
[Response: This is a fair enough point, but the real test to prevent overfitting (which is the real accusation) is cross-validation and out-of-sample verification. This is something that has been done in many papers but which is generally completed ignored by McI et al when they start talking about fallacies. There are multiple ways to do CV and I wouldn’t pretend to be an expert on all the different methods, but it seems to me that this would be the most positive contribution that the critics could make. If they don’t think that is done properly, what would they suggest instead, and what impact would it have. – gavin]
But surely you need to know why discarded trees don’t respond to temperature like they should. Because what happened to them in the 20th century might have happened to the ‘chosen’ ones 500 years ago?
Or is it just a kind of ‘genetic’ thing?
[Response:There’s always a genetic element in trees due in large part to their outcrossing nature and consequent generally high genetic diversity, but that’s not the primary consideration here. All trees, indeed all plants, respond to temperature in some way; the questions are ones of “how?” and “how much?” and “when?”. Yes, it’s important to know why a given set of trees may not be responding to T in a simple and stable way, given that unimodal responses to T are very common across a wide range of response variables in the biological world, and hence, these nonlinearities can create real errors in the paleo-estimates. I’d recommend reading carefully (among many others) the following set of articles:
Kingsolver, J. (2009) The Well-Temperatured Biologist. American Naturalist 174:755-768.
D’Arrigo, R., et al (2008) On the ‘Divergence Problem’ in Northern Forests… Global and Planetary Change 60:289–305
Loehle, C. (2009) A mathematical analysis of the divergence phenomenon. Climatic Change 94:233–245
And why do many trees respond this way? Because, lo and behold….temperature is a fundamental determinant of tree radial growth in general, i.e. a fundamental tenet of tree biology
In fact the timber industry depends on this, the entire point of planting even-aged stands is that they’ll all grow at very close to the same rate within a given stand which grows within a uniform microclimate, so you can go in and harvest them all in one fell swoop 40, 60, or 80 years later and end up with a very uniform product.
Romain #107: redundancy, redundancy, redundancy. Yes, anything might have happened 500 years ago, but hardly to all trees or tree species at the same time, or to trees and corals and lake sediments and speleothems at the same time (and if it did, let’s call it ‘climate’). Compare subsets and look for outliers.
Screening is just a way to get rid of obvious problem trees / problem sites rightaway. Like not hiring a CFO with a conviction for embezzlement, though lack of a criminal record doesn’t make people honest. Trust, but verify.
> why discarded trees don’t respond to temperature like they should.
You’re assuming “they should” — but why are you assuming that?
Look into the reasons.
Basic Principles and Methods of Dendrochronological … – BioOne http://www.bioone.org/doi/abs/10.3959/2011-2.1
by PP Creasman – 2011 –
May 25, 2011 – The adjustment for cockchafer beetle outbreaks (Melolontha melolontha L. and M. hippocastani F.) in the Hohenheim University oak and pine tree-ring …
The “McIntyre team” needs to spend some time down on the farm. Every farmer knows that his field of corn responds almost uniformly to annual weather conditions. The corn farmer doesn’t need to go out and survey a thousand stalks to see how his corn is growing. Just a glance here and there, and a careful examination of a few plants gives him a really good “statistically meaningful” sampling. When I grew up on the farm, our local farmers had a saying about how fast the corn had to grow to get a good crop… the corn had to be “knee high by the 4th of July” (in western Pennsylvania and eastern Ohio, and in general across the American Midwest).
Too bad plant biology doesn’t play by the rules the McIntyre team pretends hold sway. Farming would be far less risky, if a significant proportion of the corn field grew well in bad years and matured to produce a reasonable crop. But like the unfortunate farmers in Texas found out last summer, in a bad year the entire field grows badly, and there isn’t a salable crop. They don’t call ‘em “crop failures” for no reason.
[Response:Paul, that’s different. A corn crop in a local area is a system with very high levels of genetic and environmental uniformity. Therefore, you can as you state, get a pretty good idea of performance from a pretty small sample of plants (I had a job inspecting corn fields for a while). But perennial, woody plants growing on the fringes of their growth tolerance envelope are a different beast. There’s a lot more variability that has to be accounted for and signal to noise is harder to discern. What we need to do, and what I’m trying to emphasize without explicitly stating it, is, if we are to approach this problem fairly, to neither under-sell, nor over-sell, the real analytical challenges that exist with estimating former climates from tree rings. No need to mountainize molehills, nor vice versa. Indeed, no legitimacy to either.–Jim]
Exactly! That is why different proxy studies yield different results. The following paper shows that some tree rings (genus Pilgerodendron uvifera) respond positively to temperature in one area, and negatively in another. Oftentimes, the trees respond most significantly to minimum temperatures.
[Response: “Exactly!”… err.. not quite. Actually, not at all. Rather, different places with identical trees often limited by different factors – temperature, rainfall, competition, soil nutrients etc. and so will have different patterns of change over time. Read the papers Jim suggested. – gavin]
“But surely you need to know why discarded trees don’t respond to temperature like they should. Because what happened to them in the 20th century might have happened to the ‘chosen’ ones 500 years ago?”
I think you’re still missing Jim’s point.
Why certain trees diverge from the record is a question, but it’s a question that does not pertain here.
An airplane crashes. The DFDR is mangled in the wreck but the QAR is not. The crash can still be reconstructed because the conditions that destroyed the DFDR did not ruin the recording on the QAR. The exact circumstances of the DFDR’s destruction are interesting but not relevant to understanding why the crash happened. Meanwhile the chances of the QAR inadvertently mimicking the record of the flight are nil.
Follow Jim’s response to Salamano at 103 above. “I doubt it” isn’t a useful argument; if you choose not to believe Jim’s description you’ll need to do the work to show with numbers how a proxy temperature record can match an independent record in detail, by chance.
112 Dan H. “Treeline and high elevation sites in the central and southern Chilean Andes (32°39ʼ to 55°S) have shown to be an excellent source of paleoenvironmental records because their physical and biological systems are highly sensitive to climatic and environmental variations. In addition, most of these sites have been less disturbed by logging and other human induced disturbances, which enhances the climatic signals present in the proxy records (Luckman 1990; Villalba et al. 1997).” Did you actually read the paper?
Yes, I did. The question is did you? Your quote from the report simply states that region is an excellent source of records, because of their sensitivity and the undisturbed environment. Had you read further, you would have read that not all trees respond similarly. The trick is that sometimes, you have to read past the first paragraph.
For instance, “Sites located in the southern portion of this range show a strong negative response to summer temperatures. Conversely, a positive response to summer and annual temperatures is reported for two sites at or near treeline in the Coastal Archipelagoes (46°10ʼ, 700 m asl). A marked increase in tree-growth since the mid-20th century, attributed to an increase in mean annual temperature in this period, is reported (Szeics et al. 2000). However, none of the other Pilgerodendron sites show evidence of a warming trend during the 20th century, following a similar pattern as the one described for the 3,622-year temperature reconstruction from Fitzroya tree-rings (Lara and Villalba 1993).”
This is evidently not the only species showing this response.
Notice how we’re no longer discussing the droids we were looking for?
Pilgerodendron uvifera has zero to do with the point being made by Jim regarding the impossibility of a proxy temperature history as recorded by tree rings accidentally resembling the instrumental record.
You have to watch the hands and the ball, or you’ll miss the instant when you’ve been had.
[Response:Thanks Hank. The Kingsolver paper is not freely accessible unfortunately, which is too bad, because it’s excellent. Anybody with literature access, interested in a truly synthetic and insightful discussion of the effects of temperature on biological systems, should grab it and read it–Jim]
All the authors need to do is show that the selected proxies correlate with each other outside the screening period just as well as they do during the screening period, then you have firm evidence that the temperature correlation is maintained throughout the reconstruction. (the alternative is that they correlate based on a non-temperature variable and just happened to correlate with temperature during the screening period – unlikely – especially seeing we’ve chosen these proxies based on an a priori understanding that they should correlate with temperature)
Is it typical to show this?
[Response:If you are referring to the within-site scale of analysis, which I think you are, then your fundamental point is sound and I agree with it. All data submitted for archive at the NOAA Paleo ITRDB are evaluated for a number of basic quantitative characteristics using a program called COFECHA. One of these metrics is the mean inter-series correlation, the average of all pairwise correlations on the annual resolution data over the common interval. This is similar to what you are suggesting, but not broken out by instrumental vs pre-instrumental period (obviously, since these periods are not defined given that the relevant climate data are not part of the submitted tree ring data), nor by each individual pair of ring series. It would be easy to write an R script or function to do this once the climate data are in hand. As for existing studies, I’m pretty sure Ed Cook et al did analyses along these lines some few years ago, but don’t have exact refs. at fingertips–Jim]
If correlation fails outside the screening period then it must be said that the temperature correlation is failing also in that period – to the extent indicated. In this case the noise outside the screening period will have a tendency to cancel and we will see the instrumental record given undue emphasis (relative to the rest) over the screening period and hence claims about “unprecedented [whatever characterised the screening period]” become rather dubious.
(above true for decadal – higher resolution subject to regional differences)
[Response: It isn’t the case that proxies need to correlate with themselves – because of course the actual temperatures don’t either (assuming you are looking at proxies of local temperatures). For a local or regional reconstruction where you expect more coherence, this might be a more useful test. – gavin]
Response: It isn’t the case that proxies need to correlate with themselves – because of course the actual temperatures don’t either (assuming you are looking at proxies of local temperatures).
But isn’t it so that areal temperature reconstructions (e.g., for the NH) based on subsets of proxies (like tree rings vs. non- tree rings, or why not even-numbered vs. odd-numbered) each with acceptable geographical coverage, can be meaningfully intercompared?
“Temperature proxies are millenia long series of suspected but unknown temperature sensitivity among other things (noise). Other things include moisture, soil condition, CO2, weather pattern changes, disease and unexpected local unpredictable events. Proxies are things like tree growth rates, sediment rates/types, boreholes, isotope measures etc.
In an attempt to detect a temperature signal in noisy proxy data, today’s climatologists use math to choose data which most closely match the recently thermometer measured temperature (calibration range) 1850-present.
The series are scaled and/or eliminated according to their best match to measured temperature which has an upslope. The result of this sorting is a preferential selection of noise in the calibration range that matches the upslope, whereas the pre-calibration time has both temperature and unsorted noise. Unsorted noise naturally cancels randomly (the flat handle), sorted noise (the blade) is additive and will average to the signal sorted for.”
That to me was a helpful explanation of what their central criticism is about. It is the same as what McIntyre describes as “screening fallacy”, afaict.
“To me it sounds entirely reasonable to weigh the proxies based on how well they reproduce the instrumental temperature record. You seem to assume a good correlation over this period is based on noise, i.e. coincidence? Or at least, that noise could have contributed to the good correlation, which is fair enough. (With sorting, you mean weighing, right? (giving it more weight, i.e. importance, in the final reconstruction) )
You argue that inasfar as by coincidence the noise correlated with the measured temperature increase since 1850, that takes care of the upswing in the proxie reconstruction (the blade), whereas the pre-1850 proxies have random noise which causes the flat shaft.
Is that good paraphrasing of your position? [to which Jeff later replied “yes it is very close.”]
In that case the critical point is really, to what extent is a good correlation between measured temperature and proxies coincidence (ie not due to a causal relation between the proxie and temperature), and to what extent is it due to a real causal relationship? Inasfar as the latter dominated, there shouldn’t be a problem.
This could -and I think has- been investigated by people studying the actual dynamics of the proxies involved, eg plant physiologists for tree proxies.
It also shows that in the end, finding statistical relation still has to rest on physics (or chem or biology) in order to be properly interpreted.”
In trying to understand where the other is coming from, this was -to me at least- a very useful discussion.
[Response:Bart, the problem is that these explanations are just fundamentally wrong, for a number of reasons. Let’s start with the last one you mentioned, because these “correlation by chance” arguments implicitly assume that we have no fundamental understanding (i.e. at the population, individual plant, tissue, cellular or molecular levels of analysis) of the effect of temperature on tree growth. This is so ludicrous that it hardly deserves mention; you have to close your eyes to the mountains of evidence at these levels of analysis to make those arguments, and is why I cited the Kingsolver article below.]
[Continued: Second, knowledge of tree ring sampling and statistics provide another set of arguments. Each site is sampled by, typically, 20 to 30 tree cores (and sometimes many more, as in Gergis et al here). A robust mean (to downweight extreme outliers) of each year’s set of sampled rings is then taken (after “detrending” out the age/size effect from each core), to give the site “chronology” (i.e. a single estimate of the climate parameter of interest, for each year). Even with extremely high levels of auto-correlation in each such resulting series (much higher than typically observed), what is the chance of getting this robust mean to correlate at say p = .05 or .10 over a typical 100 year instrumental calibration period, with the climate variable of interest? Remember, the red noise process that is hypothesized as the cause of the supposed spurious correlations, has to be operating on each tree individually. This, not even counting that such relationships once computed, typically have to pass verification tests based on e.g. two 50 year intervals, in both directions?]
[Continued: So, to summarize this… Notwithstanding these two extremely strong and entirely independent arguments, and also not even including the previously mentioned fact that the typical ITRDB tree ring site has much higher inter-series correlations over the full chronology length than could ever possibly result by chance, we have people who are neither biologists nor tree ring experts making these kinds of nebulous, unclear, and just flat out unfounded arguments that fly under vague and concocted terms such as “pre-selection bias” and “screening fallacy”. What do they expect, that nobody’s going to know enough about these topics to counter these bogus arguments? Well, good luck on that.–Jim [edited for clarity]]
Jim — inline response needs paragraph breaks, please, for us older readers.
Good explanation there from you and from Bart.
[Response:Thanks Hank; have to break into separate “Responses” to make paragraphs, which I did.–Jim]
Good summary by Bart of the idea put forward by Jeff that the data could just happen to give a slope recently and a flatline earlier.
[Response:An important point I forgot to make relates to that: the selection of sites to include is not influenced by the multi-decadal trend during the calibration interval, it depends only on the yearly scale correlation probability; i.e. there is no selection bias for chronologies that have positive slopes during the instrumental period.–Jim]
Seems to me it’s like the argument about how likely it is that all the molecules that are warmer go to the north side of the room and all the cooler ones go to the south side — seems like it could happen. Understanding the probability, though, makes it something you aren’t likely to expect to see.
[Response:Right, could happen, but at such a low likelihood as to be meaningless.–Jim]
You seem to argue the same as what I’m arguing: That there are physical reasons for a relationship between the variables to exist. I agree. But that doesn’t seem to be the crux of the argument with McIntyre. His argument is hardly ever physics based, but rather maths based, as it is in this case.
[Response:I agree Bart, that’s why I added the second argument based purely on statistical considerations, to counter that.–Jim]
If I understand his criticism correctly, it is that just from the methodology, one cannot distinguish whether the hockeystick shape is due to such a shape existing in the underlying data or due to the screening process. That means that even if one expects a physics-based relation, the fact that a relation arises is no proof of anything.
[Response:I think it’s better not to think in terms of “proof” Bart, but in terms of strength of evidence. The fact that we know that temperature affects radial growth, and the processes that lead to radial growth, lends evidence to the idea that when we see radial growth corresponding to temperature change, that there is indeed a relationship. To ignore such biophysical evidence is completely unwarranted.–Jim]
You say that the probability that the relation with the instrumental period is by chance, is very small. Ok, I accept that. As I accept that there are sound physical and biological reasons to expect such a relation.
But the methodology would still create a hockeystick shape out of random noise. Lucia’s example is quite convincing, as is Jeff’s explanation of why that would be. That’s a problem, because it lowers trust in the results.
[Response:Only very rarely in any realistic situation Bart. That’s the overall point. Think perhaps a little more about the replication issue mentioned.–Jim]
Bart’s comment and synopsis along w/Jim’s response could well serve as the basis for a fuller treatment as an RC posting. Might save an awful lot of continued grinding on the issue to have a fully developed popular treatment available? Half-life of the issue show signs of being long (after all, it’s an isotope of something that’s been going on for a full 14 years).
Jim, in comment #125, you said: “Even with extremely high levels of auto-correlation in each such resulting series (much higher than typically observed), what is the chance of getting this robust mean to correlate at say p = .05 or .10 over a typical 100 year instrumental calibration period, with the climate variable of interest?”
Can you point out any specific examples of data which had a correlation in the range of p=.05 to .10?
[Response:Of course…almost every reconstruction study uses a p value in that range. Indeed, point me to one that uses some higher value–Jim]
SecularAnimist — can you give examples of theories proven or disproven by empirical observation?
My normal SOP as a scientist would be to publish results with an error analysis to show that my experimental data made theory X (un)likely, very (un)likely etc. Look at particle physics for example and the hunt for the Higgs. When they find a peak with a 5-sigma deviation from the background they might say they’ve found it — but really they are saying it is extremely likely this is it. Most areas of science won’t reach statistical levels of certainty as high as this, but theories will still be accepted as the framework for understanding the data.
SecularAnimist, Hank Roberts & Bart Verheggen — Outside of mathematics (and deductive logic) there are no proofs. Proofs only exist in deductive logic, which is what mathematicians use. [So do physicists, economists, etc., but when acting in the role of establishing some axioms and deducing consequences. For a fine example, see http://en.wikipedia.org/wiki/Arrow%27s_impossibility_theorem
In sciencce we use inductive logic, making decisions based on the weight of the evidence. IMO the best (but not only) approach is Bayesian reasoning for which please read the highly informative and entertaining Probability Theory: The Logic of Science by the late E.T. Jaynes.
Comment by David B. Benson — 14 Jun 2012 @ 2:49 PM
“Science is organized common sense where many a beautiful theory was killed by an ugly fact.” Thomas Henry Huxley
Gator wrote: “can you give examples of theories proven or disproven by empirical observation?”
That’s not what I said. I said that predictions from theory can be proven or disproven by empirical observation.
David B. Benson wrote: “Outside of mathematics (and deductive logic) there are no proofs.”
Einstein’s theory of relativity predicted that the path of light would be altered by the gravitational field of a star. Are you saying that it is impossible to prove that prediction correct or incorrect by empirical observation?
I understand that the point of a scientific theory is not to be “true” or “false” in the abstract mathematical sense, but to make correct predictions about the results of observation. Which requires the ability to prove whether or not those predictions are correct through empirical observation.
If there is no such thing as “proof” outside of the abstract mathematical sense, then there is no possibility of testing predictions against empirical observation to prove those predictions correct or incorrect, and there is no possibility of doing science.
SecularAnimist wrote “Einstein’s theory of relativity predicted that the path of light would be altered by the gravitational field of a star. Are you saying that it is impossible to prove that prediction correct or incorrect by empirical observation?”
Like Hank said above: you’re half right! You can prove the theory or prediction is incorrect with empirical study. But with successful empirical tests of a theory, all you’ve shown is that the predicted behavior holds in the instances you’ve tested it. Once you’ve tested it enough times and in enough different enough settings, you become more confident in the theory (possibly to the extent that you stop evaluating it for most purposes), but it’s still not proven in the formal sense of the word.
There’s the rub. When you talk to scientists, they’re likely to insist on the formal sense of proof. You are using proof in a much more casual way.
I *think* we’re running into that hoary old problem of how scientists use specific words, though I couldn’t *prove* it in the scientific sense – couldn’t resist the silly turn of speech.
Scientists, in order to communicate, use words to which they assign more precise meanings than we generally apply in day-to-day life. In this case, I think proof is overdefined but that’s just me. I ran into an example in Oreskes, I think, about how we couldn’t prove that the sun would rise, but we are quite sure it will, something along those lines. Or, say, if one sees someone on the road using a cell phone one gives them a wide birth, being convinced without high-level scientific proof of danger in distraction (not saying there isn’t statistical information, just that on the roads we use a much looser definition to avoid trouble).
I am not entirely convinced that this stringent and careful practice is serving us well in this day of ignorant attacks being taken as gospel by the public, but we’re stuck with it.
MartinJB wrote: “When you talk to scientists, they’re likely to insist on the formal sense of proof. You are using proof in a much more casual way.”
Which is exactly why scientists are generally so ineffective at communicating with the public about global warming and climate change.
John Q. Public asks, “Has it been proved that humanity’s emissions of CO2 are causing the Earth to heat up?”
The scientist answers, “Well, it’s really impossible to ‘prove’ anything in science, because, you see, in a strict formal sense of the word ‘proof’, you can only ‘prove’ things in abstract mathematics, so … blah, blah, blah …”
SecularAnimist — The overwhelming weight of the evidence establishes beyond a reasonable doubt that humans have added more than enought carbon dioxide to cause the climate to warm.
Longer than just yes but absolutely factual without the slightest hedging. Take your pick.
Comment by David B. Benson — 14 Jun 2012 @ 5:29 PM
MartinJB @138 — That is Karl Popper’s notion of falsifiability as the hallmark of science. Unfortunately it is actually not always applicable and towards the end of his days, Sir Karl came to agree that falsifiability is too strong a notion for all the sciences. I suspect that in parts of biology and the vetenary & medical sciences, for example, that criterion can only rarely be met.
Instead one uses a form of http://en.wikipedia.org/wiki/Bayes_factor
to determine of two hypotheses, H0 and H1 say, whether or not there is enough evidence to distinguish the two (often called models in this context). That is the hard part in that simply collecting more evidence may be out of the question. Even so, there is certain sujective aspect to determining whether the weight of the evidence clearly supports one of the two hypotheses. There are various tests such as http://en.wikipedia.org/wiki/Akaike_information_criterion
to assist in the determination. Nonetheless, determining whether treatment B is actually ‘better’ than treatment A remains overly difficult. I’ll just say it is much easier than it was 15 years ago.
Comment by David B. Benson — 14 Jun 2012 @ 5:41 PM
Bart & Jim: you’re talking past each other.
Steve’s (and Lucia’s, and JeffID’s) argument only appears convincing because they omit the other half of the “Mannian” method: the part where the final reconstruction is actually tested against new, unseen data that was not used in the calibration / selection process.
I believe the reason why Lucia and Jeff don’t “register” this bit is due to unfamiliarity with basic machine learning concepts. Mann is basically applying standard machine learning to climate reconstructions (that’s pretty much the defining contribution of his career). People with a “hard” engineering background have usually not been exposed to machine learning. So when they see the “training” bit, they go berserk and ignore the “testing” bit. Or they see it just as a distraction, rather than the entire point of the method. The punchline of the paper isn’t the reconstruction itself, it’s the p-value of the fit on the (unseen) test data. It’s a completely different approach, so people can get confused.
As for Steve McIntyre, well, I suppose he has his reasons.
You’re talking about pure dendrochronology, where the situation is different. There, you can find matches so strong that the probability of getting them by chance vanishes (example), so you don’t really need machine learning methods – though of course you want to have some external validation, especially in the distant past where uniformity is far from guaranteed (independent ENSO reconstructions or volcanic eruptions provide important sanity checks).
The “vanishing p-value” argument does not apply to massive multiproxy reconstructions a la Mann (or Gergis), where the match of each individual proxy to temperature is relatively weak. That’s what Bart is talking about. In this case, simply selecting proxies based on match with historical records *would* be a mistake – if you did *just* that. Which Mann doesn’t (neither did Gergis, apparently).
[Response:I don’t agree with this assessment. I’ve already mentioned that any calibration relationship has to pass validation/verification tests–that idea’s been around a long time; it’s not a “completely different approach” but a fundamental tenet of model testing and evaluation, something you learn as an undergraduate student. And the point of these papers is indeed the reconstruction itself–that’s the whole point. As for the relationship between proxy and climate, that can be strong, weak or non-existent; you can’t make the sweeping statement you did. Whatever fundamental relationship exists between ring response and environment is what will emerge in the final estimate. And please, don’t refer this whole issue back to Mike; that’s the skeptic approach of trying to focus and blame everything on him. The approaches used in this field have long histories and many people and viewpoints are involved–Jim]
while I totally get your point, it has its problem. If a scientist says “yes, it’s proven” when speaking to the public, someone who wants to cause trouble will say that “look, this scientist said it’s proven, but in his publication he has these error bars. Liar!!!!” I’ll take David Benson’s formulation.
Popper’s falsifiability is certainly a powerful notion, and one that works very well where it works. However, where it works is for relatively simple logical systems. A hypothesis can certainly be proven false. However, a theory that has a long track record of successful predictions, but suddenly gets one wrong…not so much. It is much more likely that the theory will be “tweaked,” or that some previously unconsidered influence will be found to be important for that situation. That is why the whole notion of “falsifying” anthropogenic warming or evolution is just silly. The theories will certainly change–I doubt either Arrhenius or Darwin would recognize their respective creations. The probabilistic/Bayesian approach does come a lot closer, and when joined with the information theoretic approach is pretty powerful.
as a former biologist (I did fisheries growth and population modeling), I absolutely agree. The idea of reducing the complexity of a system sufficiently or collecting enough data to even approach falsifying a model is, well, amusing. Almost as amusing as when my current colleagues (I’m in finance now) speak with certainty about the conclusions of economic models….
> something you learn as an undergraduate student.
Since when, youngster? :-)
Agreeing with Jim to watch the ‘founding father’ myth — statistical methods change a lot still, it’s a very new field, and new ideas get passed around and used by one group and adopted by others if they’re useful and if the statisticians consulted agree it may be worth applying new methods to new kinds of data.
I’ve seen some mighty complicated statistical analysis plans.
For climate change, there are no groups, no test and control planets, no pre- and post-treatment comparisons.
But the data are often paleo and point source records for comparisons.
I think y’all could restate the comparisons in really simple terms, here or somewhere, and begin to teach the rest of us some statistics.
Does John Q. Public respond to statements that sound like they were made by Dr. Sheldon Cooper, and do they run to the scientific lit so they can jibber jabber about error bars?
How about something like, we know way beyond a reasonable doubt that human beings are causing the climate to warm too quickly. We know this has the potential to turn very ugly. Etc.
Or “proof enough”…
You ought to turn on your TVs and listen to the language aimed at certain segments of John Q. Public for technical matters: The Doctors, Dr. Oz… If they misestimated their audience by much, they’d probably be off the air already. And yes I know, it’s depressing.
Ray Ladbury @146 — In the strict sense of proof, a hypothesis can only be proven false via deduction from some prior axioms. A hypothesis can be demonstrated to be false in this universe by an appropriate experiment, but that is not a proof sensu stricto.
Could you write a bit more about what you mean by information theoretic methods? These are recent enough that many here on Real Climate would probably appreciate a short description.
Comment by David B. Benson — 14 Jun 2012 @ 9:50 PM
toto #144: actually the Gergis at al. paper already contains an example like the one you show, in Figure 2 on page 47. I would like to see that one duplicated using random synthetic proxies only…
I did some numerical experimenting myself, result and octave code. It is simply not true that ‘screening’ selects for hockey stick like proxies. Yes, it selects for proxies having the same trend as that screened against (artificial trend line in green, 1900-2000), and the effect of this propagates back to about 1850 due to the assumed autocorrelation; but before that the proxies are unaffected. They are exactly the same as without the trend screening, centering on zero. There is no ‘screening bias’ or ‘screening fallacy’.
Note that in my figure the ‘stick’ of the hockey stick points to the middle of the ‘blade’, i.e., it’s more like a bishop’s staff! Not at all like in real reconstructions (like Gergis et al. Figure 4 on page 49 — or actually, all serious hockey sticks since the dawn of time (like, MBH98/99), where the pre-instrumental temperatures average well below the mean of the instrumental period, which is precisely the point.
Comment by Martin Vermeer — 14 Jun 2012 @ 10:48 PM
Comment by Susan Anderson — 14 Jun 2012 @ 11:18 PM
Gavin said above “despite personally feeling bad after making a mistake, fixing errors is a big part of making progress”; well said indeed. But Lotharsson in linking to a recent attack on me by Tim Lambert at Deltoid, which picked up on Tamino’s pointing out an error in my recent paper in TSWJ(also noted here) did not mention that I (1) admitted the error and then (2) explained why I was in error, because I overlooked that the independent variable in my Table 1(radiative forccing by the aggregate of all GHGs) was unlikely to be autocorrelated, as for example CH4 and CO2 are not correlated See Table 2.1 in AR4 WG1). Tamino did not find I repeated my error in the many other regressions I reported in my paper and its SI.
More generally, why is it that the expert econometricians here and at Lambert’s and Tamino’s never themselves undertake and report regressions rejecting the nul that increases in GHGs since 1958 do NOT explain temperature anomalies?
And, AFAIK, nobody here has commented on what seems to me the basic problem with the Gergis et al paper, given its sweeping title,
“Evidence of unusual late 20th century warming from an Australasian temperature reconstruction spanning the last millennium”.
That is its total absence of any proxy records for mainland Australia, as shown in its Fig.1. A more modest title would have been appropriate and would have attracted less attention.
[Response: There are only a few TR records of multi-centennial length on the mainland (Neukom and Gergis, 2012) and, not surprisingly given the climate and topography of Australia, they are apparently not good T proxies. To me, it appears that the authors have a pretty good spatial distribution of proxies across their defined region–Jim]
Further to Susan’s mention of the DotEarth mention of McM, what struck me about Revkin’s piece was his generalization about the improving effects of “blog review” on science, while failing to note that this “blog review” activity is confined nearly if not completely exclusively to those provinces of investigation that collide with revenue streams and other worldly matters not having to do with science.
As Revkin feels free to make his unsupported claims, I’ll offer that the confusion and retardation of public understanding of science is not worth the apparently sparse benefits of “blog review.”
Scientific progress is pretty much an accidental effect of “blog review,” rather like being “saved” from driving over a cliff by crashing into a telephone pole.
It’s also interesting to note that even as Oransky warningly waves the Dutch finger at Gergis, cautioning about retractions, he uses Wegman as an example without mentioning the silent, post hoc “redo” of one of Wegman’s papers. Revkin doesn’t help his readers with additional information, a spot-on example of what journalists call “phoning it in.”
By information theoretic methods, I mean the methods that allow comparison of models and characterization of the efficiency of models in accommodating new information/data. Hirotsugo Akaike’s Information Criterion was the first entry in this regard–for although AIC is looked on by most as a useful tool, it is actually an unbiased estimator of the Kullback-Liebler divergence (which can be viewed as an expectation of the logarithmic divergence between two distributions) between the “real” distribution and the model being examined. Similar quantities approach the problem from a Bayesian viewpoint(e.g. BIC/SIC) or from a more general viewpoint (DIC). All of these quantities are useful for comparing predictive power of statistical models, but what do they mean? As it turns out, you can look at the behaviors of quantities like AIC as measuring not just the goodness of fit of the model, but also how sharply the data define the parameters of the model.
A further step in this direction–which I haven’t seen applied too much yet–is information geometry, in which spaces defined by the parameters of a model have a metric defined by the Fisher Information matrix. This is a very elegant approach, and has some interesting features when you look at different models–e.g. the fact that the exponential distribution is a limiting case of both the Weibull and the Beta distribution means that the parameter spaces of these distributions coincide along a line in their spaces. I’ve used this somewhat in my own research–though I haven’t explicitly used the information geometric point of view as nobody in my field would understand it.
There is nothing unintentional about the escalating and rather successful effort to derail scientific communication and scientific progress. We are moving from the naming of RealClimate, Mann et al. as charlatans and traitors to actual threats of violence, and I find it disturbing, to put it mildly. Inasmuch as Andy appears to be unable to do as much science as I can in reviewing these claims, and that’s not much, he should acknowledge that he does not know what he does not know, and stop stoking the fires of ignorance and hatred. This is not a high school social and merit is merit. That’s why I persist in a discussion here that is largely above my head (though I still work hard and read almost all of it), in the hope that there is some way to penetrate the fog before it’s way too late.
Comment by Susan Anderson — 15 Jun 2012 @ 10:10 AM
#159–In a way, the disgusting developments you rightly deplore, Susan, are encouraging: the famous dictum Isaac Asimov put in the mouth of Salvor Hardin, that “Violence is the last refuge of the incompetent,” is fairly intuitive. Most of us know that the loudness of one’s shouts is not correlated positively with the soundness of one’s knowledge–and that in fact the need to scream often screams most loudly of intellectual impotence.
One hopes at least that the recent over-reach of the Heartland Institute will prove to be something of a pattern–or a deterrent.
Susan: Inasmuch as Andy appears to be unable to do as much science as I can in reviewing these claims, and that’s not much, he should acknowledge that he does not know what he does not know, and stop stoking the fires of ignorance and hatred.
Or at least ponder on the meaning of the word “promoter.” Revkin’s not the only journalist who seems to fall prey to the role of being a bucket on a conveyor belt raising what is low to a higher level.
Serving as Steve McIntyre’s volunteer editor to clean up his blog review speech habits and pass McIntyre’s opinions upward after being stripped of revealing context is a form of promotion.
In post #133, I asked if you can point out any specific examples of data which had a correlation in the range of p=.05 to .10?
You said: “[Response:Of course…almost every reconstruction study uses a p value in that range. Indeed, point me to one that uses some higher value–Jim]”
But let me rephrase the question, because what I really want to know is whether the data has beat the odds (in support of your “…relationship to local temperature are waaaaaaaaaaaaaaaaaay too strong ….” argument above.)
[Response:My “waaaay too strong” statement addresses strength of relationship, rather than frequency of significant relationship, just to clarify a little, but I’m following.–Jim]
As I understand it, the p value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed. So obtaining one result with p value of .1 out of, say, 10 tree rings (10%) could hardly be considered remarkable. Or obtaining around 100 correlations out of 1000 with p=.1 could not be considered remarkable. I would argue that in those cases, a p=.1 could easily be attributable to random noise.
[Response:Yes, I agree, but with a potentially important caveat depending on exactly what you mean by “random noise”.–Jim]
So my rephrased question is: Can you point to any specific examples where data shows much greater than 10% percentage of the tree rings samples matched temperatures with a correlation of p=.1 or better?
[Response:Thanks for clarifying, wasn’t clear to me before. Yes, I can point you to such. But have you made a look yourself first?–Jim]
I had hoped that Andy would take as a caveat the fact that Heartland considered him to be sympathetic. It appears that instead he is rushing obliviously on into a status worthy of parody on Comedy Central.
To rise weakly to his defense, I really think Andy desperately wants to believe in the bona fides of all players in the so-called “climate debate”. However, to persist in such a belief in the face of the behavior exhibited by the denialati betrays a naivete so extreme that I just want to reach out and comfort him by inviting him to a p-o-ker game.
David B. Benson wrote: “The overwhelming weight of the evidence establishes beyond a reasonable doubt that humans have added more than enought carbon dioxide to cause the climate to warm.”
I feel like I’m getting pedantic here, but this is an important point about the semantics of effectively communicating the realities of AGW to the general public.
As I understand it, you are arguing that it’s wrong to say that anthropogenic warming has been “proved” — and indeed, wrong to ask whether it’s been “proved” — because, as you wrote earlier, “outside of mathematics (and deductive logic) there are no proofs”.
So it’s interesting that you use the phrase “beyond a reasonable doubt” in suggesting an alternative way of asserting anthropogenic causation — since “beyond a reasonable doubt” is the standard of proof required of the prosecution in a criminal trial.
As it happens, I’ve served on a jury in a criminal trial. When the judge instructed the jury before we went to deliberate, he very clearly and carefully stated that the standard of proof (his word) required for a conviction was “beyond a reasonable doubt” — and explicitly distinguished that from proof “to a mathematical certainty” (his exact words). In effect, he told us “the word ‘proof’ has multiple meanings, and this is the meaning that applies here”.
So, I think your very use of the phrase “beyond a reasonable doubt” serves to point out that there are other “formal” uses of the word “proof” that are different from the use of that word in mathematics and logic, and that in context are entirely legitimate, appropriate, and well-understood.
My point again is this:
When scientists are asked “Has it been proved that humanity’s emissions of CO2 are causing the Earth to heat up?” they have two choices.
They can launch into a lecture about how the one and only “correct” use of the word “proved” is the way it’s used in formal mathematics, and that outside of that context nothing can ever be “proved”. Which is both counterproductive as it leaves the questioner confused and doubtful about the reality of anthropogenic causation, and a semantic fallacy since there are, in fact, other legitimate, formal and well-understood meanings of the word “proof” — in law, for example.
Or, they can recognize that the questioner is using the word “proved” in something much closer to its legal sense than its mathematical sense — that the questioner is a juror and not a mathematician — and simply answer, “Yes, it has been proved beyond a reasonable doubt”.
Comment by SecularAnimist — 15 Jun 2012 @ 12:32 PM
Susan @ 158
As to “…some way to penetrate the fog before it’s way too late” I can only suggest a re-examination of the roles played by deniers and clueless “promoters” (very apt) and how they end up shaping the conversation to either contain or destroy opposition. Distraction, screening, intimidating, bluffing, and perhaps more than anything else wasting time, ends up defining what takes place for those trapped in the denier arena –out of all proportion to actual numbers or scientific acumen.
And always, always, always ask who exactly is your audience? Take it to them.
You’re right, I overstated: a bad unhelpful habit. Promoting, or just giving a pass and cleaning up is not stoking fires, though the end effect is the same. The framers of the hate talk make it an almost irresistible force. Since the planet is an immovable object there are fireworks ahead.
Andy’s desire to find good in people permits advantage taking. I think many indulge in wishful thinking when they assume it’s meant well; people are to be prevented from thinking for themselves.
Looking beyond to the real audience, this seems to point to make as often as possible, that people can and must think for themselves.
Comment by David B. Benson — 15 Jun 2012 @ 9:25 PM
SecularAnimist @163 — On the other hand, scientists require unambigously defined terms (to the greatest possible extent). I realize not all follow the high standards which the mathematicians must maintain, but increasingly that is (uniformly) so. Thereby for STEM folk, scientists, technicians, engineers and mathematicians (but also philosophers), proof has only one meaning, that as used in deductive logic.
I fully realize that it has other meanings in other settings including jurisprudence and potable alcohol testing. But climatology is a STEM subject, one which being science has to make do with less that certainty in matching observations and measurements to calculations from models.
However that may be I see nothing wrong with a answer of beyond a reasonable doubt which almost all will understand correctly.
Comment by David B. Benson — 15 Jun 2012 @ 9:34 PM
> can and must think for themselves
well, most people “might could” think for themselves,
on some subjects, but not so likely able to think on other subjects;
“if y’can think for yourself y’ought to;
if not, think hard on who you trust.”
In most people’s heads, I think,
it’s a zoo in there.
Going only by my own.
One of the aspects of science that I think it is hard for laymen to appreciate is the fact that the truth doesn’t lie in the middle. Even scientific consensus is not so much an “averageing” effect as it is an assessment of what theories/viewpoints are making most rapid progress. Truth is where the evidence points.
This makes science different from other disciplines–like journalism or theology–that at least purport to seek truth. They try to find balance between extremes. Science tries to minimize entropy.
BTW, one of the clearest statements of the 2nd law of thermo applies here:
If you add a teaspoon of wine to a gallon of sewage, you get sewage. If you add a teaspoon of sewage to a gallon of wine, you get sewage.
Thanks Ray. Time for me to brush up on those laws. I am not your typical layperson, I’m afraid, as it would be nice if most laypeople gave science its proper value. It took a lot of hard work for me to know as little as I do, but understanding how science works is deeply embedded in my life. Dad continues to produce physics, and we wonder if he will be alive by the time it is recognized, or if he is suffering from emeritusitis. He recently attended a meeting about new developments with cold atoms and such and said it was just like his early work, only different (ot, maundering on as usual).
layman; have you checked eg doctoral thesis of H Grudd, where he states “The Torneträsk density record is the longest in the world and with a temperature signal that is exceptionally strong: The correlation to instrumental
summer (June – August) temperature is 0.79.”
I’m wondering. Dendrocronology work aimed at temperature reconstruction sometimes picks places where temperature change has a stronger effect than say drought on tree ring growth in order to get a clearer signal.
When I read about the ice bore hole results here, I thought, well, you can constrain earlier temperatures to some extent since NYC would not be built where it is if it had been much warmer than now for very long (say 300 years) owing to induced sea level rise. But that is not very high temporal resolution. So, I wonder if, just as a dendrochronologist might choose and forest edge to gain sensitivity, a glaciologist might use ice on different slopes to set an upper limit on past temperature by saying this or that configuration could not have survived any warming similar to the present for three decades or six etc. Combined with the age of the ice, some fairly robust limits might be worked out for a region perhaps.
Thanks for the link. The coldest summer in the past 1500 years was 1904, which appears to match the glacial extent of the LIA.
[Response: Not according to anyone else’s definition of the term. – gavin]
The warmest summers were centered around 1000, which showed both higher and prolonged warmth compared to the recent century.
Of particular note is the increased sensitivity of the growth to temperature in the past two centuries. While no explanation could be given for this change, could it be the combined effect of increasing temperatures and atmospheric CO2 concentrations?
[Response: So is your contention that any cold period at any point on the Earth at any time in the past 500 years is the Little Ice Age? Regardless of how coherent that is with anything else? Good to know. Generally speaking, things are defined so that they can be discussed as distinct entities – people who then subsequently extend them to everything are kind of missing the point. – gavin]
I’m guessing Gavin means “summer” — redefined in this thesis for this location explicitly.
This is not the summer you were thinking of.
Dan H. uses several terms in his sentence, besides the one unattributed quotation he takes out of context without quotes. The rest is his usual.
The thesis (by a student of Briffa’s) describes four statistical approaches used to reconstruct temperature for a particular location: “… an area of about 100×100 km … mountains of alpine type …. The climate is continental subarctic ….”
It seems this author had to redefine “summer” to get the thesis result.
That’s reasonable, I suppose, to document a small local area with unique conditions. Perhaps it’s why this didn’t get published, as it’s not directly comparable to other work?
Oh, heck, I meant to give up on Dan H.
I’m swearing off.
I think you added another 0 to 50 years from the mid 19th century to 1900. An event like the LIA, which lasted several centuries, could very easily have local ending times which vary by 50 years (not 500). Actually reading the report would shed some light on these definitions, insteading of guessing.
The fuss made me look for a definition or explanation of ” screening fallacy”. There were three things usefull on the first 6 google pages; then the topic ran out. One piece by a stats guy explained things to me.
The interesting thing is that the ” fallacy” only exists at CA. Other climate parrot sites repeat McIntyre’s stuff ( incoherent to me) and the Blackboard has a couple of attempts.
My conclusion is that the “fallacy” doesn’t exist. It’s just made up by McI . He seems to be saying that if you check data for relevance you’re cheating but if you don’t check data you’re a fraud.
[Response: Ha! You post a link to the Nesje et al paper that says “The maximum ‘Little Ice Age’ glacial extent in different parts of southern Norway varied considerably, from the early 18th century to the late 19th century.”, with a comment where you misleading claimed that they actually said the LIA extended ‘after 1900′. Now instead of defending your previous post, you have googled around to find something else entirely. Your first link doesn’t mention the ‘Little Ice Age’ at all, while the second manages to use the phrase 22 times without ever defining it. Their conclusion is clear though “Maximum glaciation occurred during the ’Little Ice Age’ starting with a pronounced glacial advance in the thirteenth or fourteenth centuries, and culminating at the most distal moraines during the nineteenth century.”. So, yet again we have you supposedly citing a conclusion or statement from a paper, that on inspection doesn’t actually support your statement, and when challenged on this, you produce links to other papers, none of which support the original quote either. While this might be tremendous fun for you, it is mildly irritating to everyone else, so let me propose a new rule – just for you. If you want to post here, any actual scientific claims need to be backed by a real citation, and the claim has to be actually backed by the citation you give. If you make claims with no reference, they will get binned. If you make a claim that is not supported by your reference, it will get binned. So the way to not get binned is to make scientific claims that can be supported by the literature you cite. Should be easy, no? – gavin]
>Actually reading the report would shed some light on these definitions, insteading of guessing.
Oh the irony! Thank you Gavin for the service you (and other RC contributors) provide those actually wishing to learn and discuss the science in an honest fashion. I lack your ability to stay focused in the face of this kind of nonsense. I have learned a lot in the past 2 years here — not just about climate science, but about the relationship of science and society as well.
Comment by Unsettled Scientist — 19 Jun 2012 @ 5:02 PM
[In no description does after 1900 fall within the Little Ice Age. – gavin]
That doesn’t make it incorrect. Glaciers in much of the Arctic on Baffin Island, Western Greenland and Bylot island had their LIA maximum dates between 1880 and 1910… If you need references I can provide – My work is reconstructing former glacier advances during the LIA.
One would think that this late advance probably has something to do with the significant impact of Krakatau and Novrupta on Temperature’s along the North Atlantic basin during this time period.
[Response: I’m not saying that there were not glacier advances in some places in the early 1900s, rather that it makes no sense to associate them with the ‘Little Ice Age’. – gavin]
[Response:I agree with Gavin here, but in point of fact it is quite common to refer — in published literature — to glacier moraines reflecting advances in the late 1800s and early 20th century as “LIA moraines”. It is equally common to refer to moraines in the 1600s as “LIA moraines.” Hence the total confusion on this topic. It’s time we did an LIA post, methinks.–eric]
Gavin, I’m curious as to what your thoughts are regarding this quote from the Gergis paper: “Although the Mount Read record from Tasmania extends as long as 3602 years, in this study we only examine data spanning the last 1000 years which contains the better replicated sections of the Silver Pine chronology from New Zealand (Cook et al., 2002b; Cook et al., 2006) and is the key period for which model simulations have been run for comparison with palaeoclimate reconstructions (e.g. Schmidt et al., 2012).”
Can you explain why the longer record was not ‘examined’? The original study was supposed to be 2000 years but apparently somewhere it was chopped in half. How/why did that happen? And when did your ‘model simulation’ for comparison with this palaeoclimate reconstructions begin? Before or after they decided to only examine the past 1000 years as compared to the past 2000 years? And does their reanalysis of the data have any affect on your model simulations?
[Response: The specifications for the PMIP last millennium runs were worked out in ~2010 and were a compromise that took into account the existence of suitable forcing time histories (solar, volcanic, land use etc), length of simulations, scientific interest etc. We decided to start the runs in 850AD so that we had multiple estimates for the solar forcings and have enough time before the high medieval period to see any structure around then. See Schmidt et al (2011) for more details. The decisions had nothing to do with the existence or not of a few very long annually resolved paleo records. Since Gergis et al are looking at a multiproxy reconstruction, they need multiple proxies, and even at 1000AD they only have two, so extending back beyond that adds little or nothing. People will hopefully use the simulations in all sorts of analyses – and comparisons with regional reconstructions are likely to be part of that. But there is nothing specific about any one reconstruction that influnenced the simulations (many of which have been completed and are downloadable from the CMIP5 database). – gavin]
Eric #187, perhaps the more generic subject of easily misunderstood jargon. My favourite nit is the use of “orbital” and “sub-orbital” by climatologists for time periods corresponding to, or shorter than, Milankovich periods. In fact, changes in the Earth orbit are only one contributing factor in this; the orientation of the Earth’s axis of rotation being actually more important.
Lost in all the discussion about the ending of the LIA is the question of why tree rings have been more sensitivy to temperature recently. Does anyone have any other thoughts besides higher CO2 concentrations?
I doubt there is a need to say this, but I’m a skeptic, or denier, or whatever you want to call me. Ignorant, stupid, whatever.
Am I correct in understanding that the effects of C02 are so well understood, that many, some, most, hold that without change, massive warming will occur, change the planetary climate system, etc.?
Given the climate system is so well understood, then shouldn’t the cooling agents, such as Sulfur compounds, also be understood to a certainty? And if they are, why aren’t those who think C02 is such a danger offering up the cheap alternative of injecting sulfur compounds into the atmosphere?
In other words, help me to understand if C02 is going to cause these problems, and yet curtailing C02 emissions would be so incredibly difficult, and the climate system is so well understood, why are there no advocates of Sulfur compounds?
[Response:There ARE advocates for sulfur compounds. Bizarrely, many of them are “skeptics”, which contradicts the logic you very reasonably lay out. Don’t ask me why — ask Bjorn Lomborg.
There are at least two fundamental differences between CO2 and sulfur in this context. First, CO2 (being a gas) is well mixed through the atmosphere, and so we can know to very high accuracy how much there will be at any location as long as how much we know there is on average. Second, CO2 is CO2 is CO2 — it’s radiative properties in the lab are the same as anywhere else. The “sulfur” you refer to is really sulfate, which is a solid, and a hygroscopic solid besides. SO2 gas is emitted to the atmosphere but it quickly becomes oxidized to sulfate (SO4(2-)) before it has a chance to become well mixed. So it’s not uniformly distributed, which means its global radiative properties can’t be known (for example, lots of sulfate over Antarctic during winter (where there is no sunlight) will have no effect). And sulfate in the lab isn’t sulfate in the atmosphere. Sulfate particles can be various sizes, and they can have various compositions (since it isn’t just sulfate in the aerosol – it’s sulfate and water and organic nitrogen compounds and …etc.). So it’s radiative properties cannot be known very precisely in the real atmosphere just from lab measurements.
Now, some experts on aerosol radiative properties will tell you we know all this well enough to start pumping SO2 into the atmosphere. Note that this can’t just be done once – it has to be done continuously because the darned stuff keeps glomming onto water and falling out as (acid) rain.
Most (all?) of us at RC think this is a silly idea, despite that fact that, yes, the CO2 part is very very certain.
I strongly encourage you to read Alan Robock’s excellent RC article on this, here. –eric]
Gavin, thanks so much for your reply. Now I have some silly questions: Do authors contact other authors before referencing them in their papers? Does that need approval or is it just done? In this case it just seems so out of the blue…
[Response: No, it’s like links on the Internet- once something is published, any one can reference it. Sometimes people send out preprints, but that is optional and varies widely. – gavin]