Extreme Madness: A critique of Pall et al (Part 1: General comments on the paper and discussion of use of statistics)

Do what I say, not what I do. Refrain from seeking out papers in scientific journals, because they inevitably create more questions than answers. Jobs for the boys, I suppose.

I first read about Pall et al last Thursday when a headline on guardian.co.uk caught my eye: Climate change doubled likelihood of devastating UK floods of 2000. What could that possibly mean? The point is we know the floods occurred.

Are we saying there would have been a 50% chance of them happening if global warming hadn’t occurred? That would at least make sense, but seems to me extremely unlikely, since autumn 2000 was apparently the wettest since records began in 1766. The chances of an entirely different set of weather events in a parallel universe coming together to produce something as extreme is clearly much less than one in two.

I was just mulling over this when a Realclimate post notification popped into my Inbox. Nature, which this week splashed on rain (ho, ho), had of course caught the eye of Gavin Schmidt, who reported on Pall et al and another paper in the same issue. I immediately dived in where professional scientists with people to upset fear to tread and voiced some of my concerns. Gavin responded (I stand by the points I made which he disagrees with, btw) and the debate went on, a Mathieu chipped in, violently agreeing with me, as I pointed out and I similarly responded to some remarks by a Thomas.

At this point I started to get serious about the issue. The rest of this post is a more systematic critique of Pall et al.

What a way to conduct a debate

It is absurd that we are attempting to formulate policy on the basis of information that is not in the public domain. Particularly since a weekly scientific news cycle has developed as the main journals try to grab headlines. As well as the main Guardian article, George Monbiot also commented soberly on Pall et al, remarking that:

“[Pall et al] gives us a clear warning that more global heating is likely to cause more floods here.”

though when he says:

“They found that, in nine out of 10 cases, man-made greenhouse gases increased the risks of flooding…”

he (or the dreaded sub-editor) has in fact lost the sense of Pall et al’s Abstract, which went on to say:

“…by more than 20%”.

so George’s “nine out of 10” is in fact an understatement.

The science news cycle process does rather allow a bit of spin. I hate to say it, but the main Guardian piece does have the feel of having been planned in advance – hey, journo, here’s three quotes for the price of one. As well as Myles Allen (the leader of the Pall et al team, and one of the paper’s authors), a Richard Lord QC is also quoted. It’s not immediately obvious, but Lord appears to be a long-time collaborator of Allen in what has to be described as a political project to use the legal system to tackle the global warming problem. I’m not at all sure about the “blame game” in general. It seems if anything to put obstacles in the way of reaching international agreement on emissions cuts.

It wasn’t until Friday afternoon that I was able to read the whole of Pall et al, rather than just the Abstract (thanks Ealing Central Library). Nature is a good journal, but I don’t think they paid for the work that went into Pall et al. In fact the climate modelling was actually executed by volunteers at climateprediction.net. This is an exciting initiative, but, as someone who once participated (I was pleased my model showed an extreme result of something like 11C 21st century warming!), it would be much better – and I’d be much more likely to take the trouble to participate again – if the results were presented in an open manner, rather than held back (it seems) for scientific papers that appear a year after they’re submitted, so well after the experiment. Much more could be done to at least explain the findings of all the experiments to date on the site.

Anyway, here I finally am with a cup of tea, a hot-cross bun and my dissecting kit, so let’s proceed…

The Pall et al method

It turns out that what Pall et al did was initialise the state of climate models to April 2000. They ran one set of 2268 simulations (their A2000) with the actual conditions and other sets (of 2158, 2159, 2170 and 2070 simulations) each with one of 4 counterfactuals (each with 10 “equally probable” variants so 40 scenarios in all), with global warming stripped out.

They fed the climate model inputs into a flood model to determine run-off and considered the floods had been predicted if the average daily run-off was equal or greater than the 0.41mm recorded in autumn 2000.

The result was a set of graphs showing the results with and without global warming. Basically these consist of a bunch of results from the global warming case and each of the 4 models. They show these as cumulative frequency distributions, such that 100% of the global warming case (a line of dots on the log scale they use) result in run-off above 0.3mm/day, 13% (1 in about 7.5) above 0.4mm, maybe (the graphs are quite small) 12% (1 in about 8.5) above the actual flood level of 0.41mm/day and so on, with around 1.2% (1 in about 80) as high as 0.55mm a day (which presumably is a Biblical level). Actually I’ve just realised that in fact the graphs (Pall et al’s Fig 3) are printed with the horizontal access logarithmic scale marked with the same subdivisions for occurrence frequency (as I carelessly read before my final Realclimate post) and its inverse, return time (which actually is a log scale) – you’d think a peer reviewer or someone at Nature would have spotted that in the 10 and a half months between submission and publication.

The other cases (the A2000Ns in Pall et al’s terminology) are each 10 similar lines of dots, so appropriately enough they appear as a spray, running below the A2000 line, except in 2 cases which manage to nip above the A2000 line.

Call me naive, but I think this shows that in >95% of cases (that is, except for two out of 40, part of the time) the 2000 floods were worse than they would have been without global warming. That is, according to the modelling, the exercise has shown, statistically significantly, that the flooding was worse as a result of global warming. All we need to do is assume the same model errors affected all the scenarios approximately equally. This seems an intelligent conclusion.

But that’s not what the authors do. They randomly select from the A2000s and each of the 4 sets of A2000Ns to produce graphs of the probability distribution of the run-off being more likely to exceed the threshold of 0.41mm/day (the actual level). They also produce a combined graph, and this is where the aforementioned increased risk of greater than 20% in 9 out of 10 cases comes from, as well as an increased risk of 90% in 2 out of 3 cases and the Guardian headline of approximately double the risk at the median.

The point is that Pall et al don’t want to just say “flooding will be more severe”, they want to be able to calculate the fraction of attributable risk (FAR) for anthropogenic global warming (AGW) for the particular event. Why? So they can take people to court, that’s why.

As I noted in my final Realclimate post on the topic, it seems to me that Pall et al are trying to push things just a little too far.

About this 0.41mm threshold

This wasn’t where I intended to start, but it seems logical. Why define the flood event in this way? Why not say anything over say 0.4mm/day would count as a flood? Floods aren’t threshold types of things anyway.

Further, why are we including runs with very high runoffs? These types of models are known to sometimes “go wild”. Surely we’re interested in forecasting the actual flood event, not some other extreme.

One effect of choosing the 0.41mm threshold is it makes the flood reasonably rare. But as I argued repeatedly on Realclimate, the flood definitely happened; one reason it’s rare in the modelling experiment is because the model (and/or the initial data it was supplied with) is not good enough to forecast it more than about 1 in 8.5 cases or about 12% of the time. We’ll have to come back to this.

Now here’s another pet hate. The fact that the flood is rare in both the A2000 and A2000N model runs means that the result can (and is) expressed as a % increase in risk, even if George Monbiot (or his sub-editor) managed to miss this off. If the occurrence in both sets of data had been higher then these percentages would have been considerably lower.

For example, Fig 3b (using GFDLR30 data in purple, for those with access to the paper) is the easiest to read as the A2000 series is much better than the purple set of A2000Ns at predicting the flood. For the “best” (probably warmest) of the purple A2000N series, I can therefore read off intersection data together with that for the A2000 series. For 0.41mm/day A2000 predicts the flood about 12% of the time (1 run in every 8.5) whilst the A2000N predicts it 5% of the time (one year in 20). We’d conclude on the basis of this data that the increased risk of the flood because of AGW is around 140% (i.e. 12/5 = 2.4 times what it was before).

But for 0.35mm I get 50% (1 in 2) and 33% (1 in 3) respectively, so the flood risk is only about 50% greater!

As a check, if I go even higher to 0.46mm I get 5% (1 in 20) and about 1.5% (around 1 in 70), so the flood risk is 233% greater.

It’s well known, as discussed in the other paper in this week’s Nature, Min et al, that climate models tend to underestimate extreme precipitation events, so choosing a lower runoff threshold for the flood might have made some sense. On the other hand, exceptionally extreme events become much more likely with AGW.

I can’t find any calibration between the models used by Pall et al and actual rainfall (e.g. by trying to simulate other years) – maybe they’re just not very good at forecasting rainfall in flood years or maybe they forecast the same rainfall every year, regardless of the initial condition in April.

Criticism 1: The paper should have included the the real-world distribution of run-offs which the modelling is supposedly correlated with.

Criticism 2: The paper should have included validation of the model against actual run-offs over a number of years. Some model runs should have been initialised to the conditions in April 1999, 2001 etc.

If I’d been editor of Nature (and I never will be if this upsets the wrong people – the sacrifices I make for truth), I might have asked for such a calibration or at least a sensitivity analysis between the “increased risks” and the flood threshold value chosen.

Criticism 3: The results should have been presented as a graph of increased risk of floods of different severity (and therefore different return times).

About this computing time

As I mentioned earlier, Pall et al ran over 10,000 simulations the autumn 2000 weather. Yet whilst their mean case is that the floods in the AGW case were about 2.5 times likely as without AGW, they are only 90% confident that the floods were 20% more likely to occur.

Huh?

If I do an opinion poll – as I happen to have – I can tell you within a small % how the nation will vote.

So I stared at Pall et al’s method and the more I think about it the more bizarre it seems. They’ve only gone and sampled the samples! In their Fig. 4 they’ve presented a Monte Carlo distribution of samples of pairs from each set of simulations, plotting the probability in each case of the floods being worse because of AGW. They don’t give the sample size – 43 say – of each of these Monte Carlo samples, but unless I’ve gone completely mad, these plots are sensitive to the sample size. i.e. if they’d taken a sample size of say 87 random pairs of simulations the certainty (that the floods are 2.5 times as likely to occur in the AGW case) would have been greater (probably by the square root of 2, but that’s just an educated guess). This is basically an example of how what we used to call “technowank” in the IT trade can go badly wrong.

If I’m right and I think I am, Pall et al have not only presented the wrong headline finding (the world should have been informed that the floods, according to their modelling exercise, were 2.5x as likely because of AGW +/- not very much), they’ve also thrown away the advantage of using so much computer time – I read somewhere that those 10,000+ simulations would have cost £120m if run commercially rather than as volunteers’ screensavers!

They say it’s better to understand how to do something simple, than misunderstand something complex. Well, they don’t actually, I just made that up. Anyway, here’s some schoolboy stats Pall et al could have employed:

From their graphs, about 12% of the AGW simulations were greater than their 0.41mm threshold for the flood. With a sample size of 2268, what my textbook calls the STandard Error of Percentages (STEP), the standard deviation of this estimate of the whole (infinite) population of simulations is given by:

SQRT((12*(100-12))/2268) = 0.68%

That is, it’s likely (within 1 SD) that the actual risk of flooding in the AGW case (according to our model) is 12+/-0.68%.

Similarly for the counterfactual ensemble (all 40 sets combined), it’s likely (based on inspection of their Fig.4 that the number of AGW simulations exceeding the 0.41mm threshold is 2.5x the number of non-AGW ones doing so) that the flood risk without AGW is within 4.8%+/-:

SQRT((4.8*(100-4.8))/8557) = 0.23%

There’s probably some clever stato way of combining these estimates, but all I’m going to do is crudely compare the top estimate of each with the bottom estimate of the other – that gives us roughly 2 standard deviations. On this basis, according to our modelling, the actual likelihood of the floods occurring because of AGW has increased by a factor of very likely between 12.68/4.57 = 2.8 and 11.32/5.03 = 2.2, with a best estimate of 2.5 times.

This is an important conclusion because the problem with global warming is not just or even mainly the increase in averages, in this case of precipitation. That may not be noticeable.

I think I’ll stop here and consider in another post my more philosophical arguments as to how the methodology of the Pall et al study is dubious.

In the meantime:

Criticism 4: Pall et al’s statistical approach understated the certainty of their modelling result. In fact the study provides some evidence that:

Even the limited warming over the 20th century is very likely, according to a comparative modelling exercise, to have made flooding of the severity of that in 2000 between 2.2 and 2.8 times as likely as in 1900. Historical records suggest the 2000 floods were around a once in 400 year event before global warming, but as a result of the warming up to 2000 they are, according to this modelling exercise, a once in 140 to 180 years event.

Criticism 5: The study should have run ensembles with the expected increase in temperatures expected by (say) 2030 and 2050.

(to be continued)

22/2/11, 11:13: Correction of typo and minor mods for clarity.
22/2/11. 16:07: Corrected another typo and clarified the meaning of the STEP calculations.