Extreme Madness: A Critique of Pall et al (Part 2: On Risk and Uncertainty)

Keeping my promises? Whatever next! I said on Sunday that I had more to say on Pall et al, and, for once, I haven’t lost interest. Good job, really – after all, Pall et al does relate directly to the E3 project on Rapid Decarbonisation.

My difficulties centre around the way Pall et al handle the concepts of risk and uncertainty. I’m going to have to start at the beginning, since I doubt Pall et al is fundamentally different in many respects from other pieces of research. They’re no doubt at least trying to follow standard practice, so I need to start by considering the thinking underlying that. I feel like the Prime alien in Peter Hamilton’s Commonwealth Saga (highly recommended) trying to work out how humans think from snippets of information!

Though I should add that Pall et al does have the added spice of trying to determine the risk of an event that has already occurred. That’s one aspect that really does my head in.

Let’s first recap the purpose of the exercise. The idea is to try to determine the fraction of the risk of the 2000 floods in the UK attributable (the FAR) to anthropogenic global warming (AGW). This is principally of use in court cases and for propaganda purposes, though it may also be useful to policy-makers as it implies the risk of flooding going forward, relative to past experience.

Now, call me naive, but it seems to me that, in order to determine the damages to award against Exxon or the UK, those crazy, hippy judges are going to want a single number:
– What, Mr Pall et al, is your best estimate of the increased risk of the 2000 autumn floods due to this AGW business?
– Um, we’re 90% certain that the risk was at least 20% greater and 66% certain that the risk was 90% greater…
– I’m sorry, Mr Pall et al, may we have a yes or no answer please.
– Um…
– I mean a single number.
– Sorry, your honour, um… {shuffles papers} here it is! Our best estimate is that the 2000 floods were 150% more likely because of global warming, that is, 2 and a half times as likely, that is, the AGW FAR was 60%.
– Thank you.
– Yes?
– How certain is Mr um {consults notes} Pall et al of that estimate.
– Mr Pall et al?
– Let’s see… here it is… yes, we spent £120 million running our climate model more than 10,000 times, so our best estimate is tightly constrained. We have calculated that 95% of such suites of simulations would give the result that the floods were between 2.2 and 2.8 times more likely because of global warming [see previous post for this calculation].

But Pall et al don’t provide this number at all! This is what Nature’s own news report says:

“The [Pall et al] study links climate change to a specific event: damaging floods in 2000 in England and Wales. By running thousands of high-resolution seasonal forecast simulations with or without the effect of greenhouse gases, Myles Allen of the University of Oxford, UK, and his colleagues found that anthropogenic climate change may have almost doubled the risk of the extremely wet weather that caused the floods… The rise in extreme precipitation in some Northern Hemisphere areas has been recognized for more than a decade, but this is the first time that the anthropogenic contribution has been nailed down… The findings mean that Northern Hemisphere countries need to prepare for more of these events in the future. ‘What has been considered a 1-in-100-years event in a stationary climate may actually occur twice as often in the future,’ says Allen.” [my stress]

When Nature writes that “anthropogenic climate change may have almost doubled the risk of the extremely wet weather that caused the floods” [my stress] what they are actually referring to is the “66% certain that the risk was 90% greater”, mentioned by Pall et al in court (and as “two out of three cases” in the Abstract of Pall et al even though the legend of Fig 4 in the text clearly states that we’re talking about the 66th percentile, i.e. 66, not 66.66666… but I’m beginning to think we’ll be here all day if we play spot the inaccuracy – the legend in their Fig 2 should read mm per day not mm^2, that would get you docked a mark in your GCSE exam).

We could have a long discussion now about the semantics and usage in science of the words “may” and “almost” as in the translation of “66% certain that the risk was 90% greater” into “may have almost doubled”, but let’s move on. The point is that in the best scientific traditions a monster has been created, in this case a chimera of risk and uncertainty that the rest of the human race is bound to attack impulsively with pitch-forks.

So how did we get to this point?

Risk vs uncertainty

It’s critical to understand what is meant by this these two terms in early 21st century scientific literature.

Risk is something quantifiable. For example, the risk that an opponent may have been dealt a pair of aces in a game of poker is perfectly quantifiable.

First, why, then do poker players of equal competence sometimes win and sometimes not? Surely the best players should win all the time, because after all, all they’re doing is placing bets on the probability of their opponent holding certain cards. One reason is statistical uncertainty. There’s always a chance in a poker session that one player will be dealt better cards than another. Such uncertainty can be quantified statistically.

But there’s more to poker than this. Calculating probabilities is the easy part. The best poker players can all do this. So the second question is why, then, are some strong poker players better than others? And why do the strongest human players still beat the best computer programs – which can calculate the odds perfectly – in multi-player games? The answer is that there’s even more uncertainty, because you don’t know what the opponent is going to do when he has or does not have two aces. Some deduction of the opponent’s actions is possible, but these require understanding the opponent’s reasoning. Sometimes he may simply be bluffing. Either way, to be a really good poker player you have to get inside your opponent’s head. The best poker players are able to assess this kind of uncertainty, the uncertainty as to how much the statistical rules to apply in any particular case, uncertainties as to basic assumptions.

Expressing risk and uncertainty as PDFs

PDFs in this case doesn’t stand for Portable Document Format, but Probability Density (or Distribution) Function.

The PDF represents the probability (y-axis) of the risk (x-axis) of an event, that is, the y-axis is a measure of uncertainty. Pall et al’s Fig 4 is an example of a PDF. It’s where their statement in court that they were 90% sure that the risk of flooding was greater than 20% higher because of AGW (and so on) came from.

The immediate issue is that risk is a probability function. Our best estimate of the increase in risk (the FAR) because of AGW is 150%, so we’re already uncertain whether the 2000 floods were caused by global warming (the probability is 60% or 3/5). So we have a probability function of a probability function. The only difference between these probability functions is that the one is deemed to be calculable, the other not. Though it has in fact been calculated! Furthermore, as we’ll see, some aspects of the uncertainty in the risk can be reduced, and other aspects cannot – the PDF includes both statistical uncertainty and genuine “we don’t know what we know” uncertainty (and I’m not even discussing “unknown unknowns” here, both types of uncertainty are unknown knowns).

Risk and uncertainty in Pall et al

What Pall et al have done is assume their model is able to assess risks correctly. Everything else, it seems, is treated as uncertainty.

Their A2000 series is straightforward enough. They set sea surface temperatures (SSTs) and the sea-ice state to those observed in April 2000 and roll the model (with minor perturbations to ensure the runs aren’t all identical).

But for the A2000N series they use the same conditions, but set GHG concentrations to 1900 levels, subtract observed 20th century warming from SSTs and project sea-ice conditions accordingly. There’s one hint of trouble, though, they note that the SSTs are set “accounting for uncertainty”. I’m not clear what this means, but it doesn’t seem to be separated out in the results in the same way as will be seen is done for other sources of uncertainty.

They then add on the warming over the 20th century that would have occurred without AGW, i.e. with natural forcings only, according to 4 different models, giving 4 different patterns of warming in terms of SSTs etc. As will be seen, for each of these 4 different patterns they used 10 different “equiprobable” temperature increase amplitudes.

First cause of uncertainty: 4 different models of natural 20th century warming

As Pall et al derive the possible 20th century natural warming using 4 different models giving 4 different patterns of natural warming, there are 4 different sets of results, giving 4 separate PDFs of the AGW FAR of flooding in 2000. Now, listen carefully. They don’t know which of these models gives the correct result, so – quite reasonably – they are uncertain. Their professional judgement is to weight them all equally, so that means that so far, they’ll only be able to say at best something like: we’re 25% certain the FAR is only x; 25% certain it’s y; 25% certain it’s z; and, crikey, there’s a 25% possibility it could be as much as w!

Trouble is, they can only run 2,000 or so of each of 4 non AGW simulations. So for each of the 4 there’ll be a sampling error. They treat this statistical uncertainty in exactly the same way as what we might call their professional judgement uncertainty, which certainly gives me pause for thought. So what happens is they smear the 4 estimates x, y, z and w and combine them into one “aggregate histogram” (see their Fig 4). That’s how they’re able to say we’re 90% certain the FAR is >20% and so on.

Nevertheless, their Fig 4 also includes the 4 separate histograms for our estimates x, y, z and w. It’s therefore possible for another expert to come along and say, “well, x has been discredited so I’m just going to ignore the pink histogram and look at the risk of y, z and w” or “z is far and away the most thorough piece of work, I’ll take my risk assessment from that”, or even to weight them other than evenly.

One of the 4 models may be considered an outlier, as in fact the pink (NCARPCM1) one is in this case. It’s the only one with a most likely (and median) FAR below the overall median value (or the overall most likely value which happens to be higher than the overall median). Further investigation might suggest it should be discarded.

Another critical point: x, y, z and w can be determined as accurately as we want by running more simulations, because the statistical uncertainty reduces as the square root of the number of data items (see Part 1).

I’m not going to argue any more as to whether the 4 models introduce uncertainty. Clearly they do. I have no way of determining which of the 4 models most correctly estimate natural warming between 1900 and 2000. It’s a question of professional judgement.

However, I will point out that if uncertainty between the models is not going to be combined statistically (as in the previous post) I am uneasy about combining them at all:

Criticism 6: The headline findings against each of the 4 models of natural warming over the 20th century should have been presented separately in a similar way to the IPCC scenarios (for example as in the figure in my recent post, On Misplaced Certainty and Misunderstood Uncertainty).

Second cause of uncertainty: 10 different amounts of warming from each of the 4 models of natural 20th century warming

But Pall et al didn’t stop at 4 models of natural 20th century warming. They realised that each of the 4 models has statistical uncertainty in its modelling of the amount of natural warming to 2000. The models in particular each noted a risk of greater than the mean warming. This has to be accounted for in the initial data to our flood modelling. Never mind, you’d have thought, let’s see how often floods occur overall, because what we’re interested in is the overall risk of flooding.

But Pall et al didn’t simply initialise their model with a range of initial values for the amplitude of warming for each of their 4 scenarios. They appear to have created 10 different warming amplitudes for each of the 4 scenarios and treated each of these as different cases. This leaves me bemused, as the 4 scenarios must also have had different patterns of warming, so why not create different cases from these? Similarly, they seem to have varied initial SST conditions in their AGW model since they “accounted for uncertainty” in that data. Why, then, were these not different cases?

I must admit that even after spending last Sunday morning slobbing about pondering Pall et al, rather than just slobbing about as usual, I am still uncertain(!) whether Pall et al did treat each of the 10 sub-scenarios as separate cases. If not, they did something else to reduce the effective sample size and therefore increase the statistical uncertainty surrounding their FAR estimates. Their Methods Summary section talks about “Monte Carlo” sampling, which makes no sense to me in this case as we can simply use Statistics 101 methods (as shown in Part 1).

The creation of 10 sub-scenarios of each scenario (or the Monte Carlo sampling) effectively means that, instead of 4 tightly constrained estimates of the risk, we have 4 wide distributions. Remember (see previous post) the formula for calculating the statistical uncertainty (Standard Deviation (SD)) that the mean of a sample represents the mean of the overall population is:

SQRT((sample %)*(100-sample%)/sample size) %

so varies with the square root of the sample size. In this case the sample sizes for each of the 4 scenarios was 2000+, so that of each of the 10 subsets was only around 200. The square root of 10, obviously, is 3 and a bit, so the error associated with a sample of 200 gives an error 3 times as large as if the sample size were 2000.

For example, one of the yellow runs is an outlier: it predicts floods about 15% of the time. How confident can we be in this figure?:

SQRT((15*85)/200) = ~2.5

So it’s likely (within 1 SD either way) that the true risk is between 12.5 and 17.5% and very likely (2 SD either way) only that it is between 10 and 20%.

So if we ran enough models we might find that that particular yellow sub-scenario only implied a flood risk of somewhere around 10%. Or maybe it was even more. The trouble is, in salami-slicing our data into small chunks and saying we’re uncertain which represents the true state of affairs, we’ve introduced statistical uncertainty. And this affects our ability to be certain, since it is bound to increase the number of extreme results in our suite of 40 scenarios, disproportionately affecting our ability to make statements as to what we are certain or very certain of.

Criticism 7: The design of the Pall et al modelling experiment ensures poor determination of the extremes of likely true values of the FAR – yet it is the extreme value that was presumably required, since that was presented to the world in the form of the statement in the Abstract that AGW has increased the risk of floods “in 9 out of 10 cases” by “more than 20%“. The confidence in the 20% figure is in fact very low!

Note that if the April 2000 temperature change amplitude variability had been treated as a risk, instead of as uncertainty, the risks in each case would have been tightly constrained and the team would have been able to say it was very likely (>90%) that the increased flood risk due to AGW exceeds 60% (since all the 4 scenarios would yield an increased risk of more than that) and likely it is greater than 150% (since 3 of the 4 scenarios suggest more than that).

The problem of risks within risks

Consider how the modelling could have been done differently, at least in principle. Instead of constructing April 2000 temperatures based on previous modelling exercises and running the model from there, they could have modelled the whole thing (or at least the natural forcing representations) from 1900 to autumn 2000 and output rainfall data for England. Without the intermediate step of exporting April 2000 temperatures from one model to another there’d be no need to treat the variable as “uncertainty” rather than “risk”.

Similarly, say we were interested in flooding in one particular location. Say it’s April 2011 and we’re concerned about this autumn since the SSTs look rather like those in 2000. Maybe we’re concerned about waterlogging of Reading FC’s pitch on the day of the unmissable local derby with Southampton in early November. Should we take advantage of a £10 advance offer for train tickets for a weekend away in case the match is postponed or wait until the day and pay £150 then if the match is off?

In this case we’d want to feed the aggregate rainfall data from Pall et al’s model into a local rainfall model. By Pall et al’s logic everything prior to our model would count as “uncertainty”. We’d input a number of rainfall scenarios into our local rainfall model and come up with a wide range of risks of postponement of the match, none of which we had a great deal of confidence in. I might want to be 90% certain there was a 20% chance of the match being postponed before I spent my tenner. I’d have to do a lot more modelling to eliminate statistical uncertainty if I use 10 separate cases than if I treat them all the same.

How Pall et al could focus on improving what we know

If we inspect Pall et al’s Figs 3, it looks first of all that very few – perhaps just 1 yellow and 1 pink – of the 40 non-AGW cases result in floods 10% of the time (this includes the yellow run that predicts 15%). About 12% of the AGW runs result in floods. Yet we’re only able to say we are 90% certain that the flood risk is 20% greater because of AGW. This would imply at most 4 non AGW runs within 20% of the AGW flood risk (i.e. predicting a greater than 10% flood risk).

If we look at Pall et al’s Fig 4, we see that, first:
– the “long tail” where the risk of floods is supposedly somewhat (FAR <-0.25!) greater “without AGW” is almost entirely due to the yellow outlier case. If just 10 runs in this case had not predicted flooding instead of predicting it then the long tail of the entire suite of 10,000 runs would have practically vanished.
– the majority of the risk of the FAR being below its 10th percentile (giving rise to the statement of 90% probability of a FAR of greater than (only) 20%) is attributable to pink cases.

It would have been possible to investigate these cases further, simply by running more simulations of the critical cases to eliminate the statistical uncertainty. I can hear people screaming “cheat!”. But this simply isn’t cheating. Obviously if 10x as many runs of the critical cases as non-critical ones are done, they’d have to be scaled down when the statistical data is combined (but this must have been done anyway as the sample sizes for the different scenarios were not the same). It’s not cheating. In fact, it’s good scientific investigation of the critical cases. If we want to be able to quote the increased risk of flooding because of AGW at the 10 percentile level (i.e. that we’re 90% sure of) with more certainty then that’s what our research should be aimed at.

Of course, if we find that the yellow sub-scenario really does suggest a risk of flooding of 15%, somewhat more than with AGW on top, and we don’t see regression to the mean, that might also tell us something interesting. Maybe the natural variability is more than we thought and that April 2000 meteorological conditions (principally SSTs) were possible that would have left the UK prone to even more flooding than actually occurred with more warming.

Criticism 8: Having introduced unnecessary uncertainty in the design of their modelling experiment, Pall et al did not take use of the opportunities available to eliminate such uncertainty by running a final targeted batch of simulations.

Preliminary conclusion

It looks like there’s going to have to be a Part 3 as I have a couple more points to make about Pall et al and will need a proper summary.

Nevertheless, I understand a lot better than I did at the outset why they are only able to say we’re 90% certain the FAR is at least 20% etc.

But I still don’t agree that’s what they should be doing.

We want to use the outputs of expensive studies like this to make decisions. Part of Pall et al’s job should be to eliminate statistical uncertainty, not introduce it.

They should have provided one headline figure of the increased risk due to global warming, about 2.5 times as much, taking into account all their uncertainties.

And the only real uncertainties in the study should have been between the 4 different patterns of natural warming. These are the only qualitative differences between their modelling runs. Everything else was statistical and should have been minimised by virtue of the large sample sizes.

If we just label everything as uncertainty and not as risk, we’re not really saying anything.

After all, it might be quite useful for policy-makers to know that flood risks are already 2.5 times what they were in 1900. This might allow the derivation of some kind of metric as to how much should be spent on flood defences in the future, or even on relocation of population and/or infrastructure away from vulnerable areas. Knowing that the scientists are 90% certain the increased risk is greater than 20% really isn’t quite as useful.

The aim of much research in many domains, including the study of climate, and in particular that of Pall et al should be to quantify risks and eliminate uncertainties. It rather seems they’d done neither satisfactorily.

(to be continued)

23/2/11, 16:22: Fixed typo, clarified remarks about the value of Pall et al’s findings to policy-makers.