The epidemiological concept of the basic reproduction number for an infection, R0, is so context-dependent as to be almost useless, and should only be used indicatively, for example when ranking viruses and pathogens in order of infectiousness. R, the effective reproduction number, can be useful shorthand for the level of control of an epidemic, but may encourage too much of a focus on whole-population measures when it may be more appropriate to target particular groups of people and those social interactions most likely to transmit the infection.
For several decades computer chess programs have evaluated positions in terms of pawn-equivalents. For example, the computer (or a human) might assess that White has an advantage of 0.75, that is, three-quarters of a pawn. In actual fact, White might have a positional advantage, or be a pawn up with a slight positional disadvantage, or a pawn down with a large positional advantage, and so on. Then the AI application AlphaZero, currently the strongest chess-playing entity on the planet, demonstrated, among other things, that it is more effective to evaluate chess positions in terms of the probability of winning – or rather the “expected score”, since draws, as well as losses, are possible.
Why didn’t earlier computer chess programs use the expected score to assess positions? Because that was simply impractical on machines capable of looking only a few moves ahead. But the use of pawn-equivalent position evaluations continued for much longer than necessary. I predict that the old method will gradually disappear from use over the next decade or so.
I wonder if something similar has occurred with the epidemiological concept of the reproduction number, R. Use of the concept implicitly assumes the population is homogenous, whereas modern epidemiological models (presumably) represent the population in a sophisticated way, or, at least, the computing resources to do so are readily available.
The Reproduction Number Concept
The reproduction number of an infection, such as Covid-19, is simply the number of people each infected individual passes the disease on to, on average. For example, if, on average, everyone with Covid-19 infects 3 other people before they either recover or die, the prevalence of the disease will increase exponentially. The rate at which the number of cases will increase depends on the life-cycle of the disease, such as how long it takes to become infectious and how long someone is infectious for.
The basic reproduction number is known as R0 (“R zero” – the 0 is often written as a subscript rather than in normal text, as I’ve shown it for convenience). R0 supposedly measures the infectiousness of a pathogen. For example, one of the most infectious diseases is measles, for which R0 is quoted as “around 15” (or often “12-18”). Influenza is somewhat lower, anywhere from 0.9-2.8, depending on the source. And for Covid-19, values of 1.4-5.7 have been estimated.
You can see the problem. Whilst we can say that measles is more infectious than Covid-19, which is (probably) more infectious than flu, the range of values puts huge error bars on any numerical analysis.
The reason is obvious. The infectiousness of a pathogen is not solely a property of the pathogen itself. It depends, crucially, on the social context, that is, the behaviour of the host, which varies greatly. A disease may spread rapidly in an urban setting, but much more slowly in a rural area where people rarely meet each other. Transmission will depend on how people interact, social norms for response to disease (do people isolate themselves as soon as they feel under the weather?; do they avoid people who seem to be ill?). Sanitation standards may be crucial. The list goes on and on, and not only that, the behaviour of individuals in a population may vary dramatically.
If we’re unable to establish the basic reproduction number with any accuracy, modelling an epidemic in advance is far from straightforward. We can’t, for example, assume that if the Chinese estimated R0 to be 3, that it will also be 3 in the UK. And we can’t assume R0 will be the same in London as in the Highlands and Islands of Scotland.
Which brings us to a semantic problem. The epidemiologists will no doubt claim I misunderstand. They’ll say that what I need is the effective reproduction number, R, for London and for Scotland.
Indeed I do. But in that case, what exactly does R0 represent? How do you define the base case where R0 applies?
And, I’ve noticed, there’s a tendency to assume R0 is the maximum possible reproduction number. That is, the “science” of epidemiological modelling tries to establish what interventions are possible to reduce the reproduction number from R0 (e.g. see Fig 1, below).
I’ve not seen anyone note, for example, that R might be higher than the Chinese estimation of R0 in denser urban environments than Wuhan, more reliant on overcrowded public transport, and with different attitudes to the habitual wearing of face-masks, such as London and New York. This is not a trivial matter. Covid-19 spread rapidly in both cities. If it had been realised earlier that this would happen, they might have locked down earlier, saving many thousands of lives.
Let’s put the problems with R0 to one side.
The effective reproduction number, R, is the number of new infections per case in a specific population. In particular, as an epidemic progresses, more and more of the population will have had the disease and hopefully recovered and acquired immunity.
A vaccine would also confer immunity, of course, reducing R by an amount depending on the proportion of the population vaccinated and the effectiveness of the vaccine.
Using R to Manage Epidemics
At least we can, in principle, establish R empirically. If an epidemic is in progress we can measure case numbers, estimate the average time from infection to transmission and take a stab at R for the infection in question and the population in question.
What can we actually do, though, if we have no vaccine or effective treatment as was the case for foot and mouth disease in 2001 and 2007, swine flu in 2009 and now Covid-19?
Well, we might be able to make “non-pharmaceutical interventions” (NPIs), such as, for human epidemics, behaviour changes mandated or encouraged by government.
Computer models are used to try to estimate R after such interventions, perhaps based on experience from previous epidemics, or at least (as for Covid-19 in the UK) educated guesses. We might imagine each intervention reduces the number of transmission events by a certain percentage. Here’s an illustration of what we might hope to achieve:
I’m not sure you need a large research grant to produce something like this, even if you talk about “Bayesian statistics” and “Credible intervals” rather than “confidence intervals”.
Problems with Modelling R
What don’t I like about Fig 1?
Well, first, it’s not what I’d expect to see. I’d expect to see Rt declining with time as the population to whom the disease can be transmitted declines. This should happen without any intervention as people recover from Covid-19 and become immune. This effect is the main driver of basic epidemiological models.
This decline should also happen after interventions as well, in fact even more dramatically. For example, after the “lockdown” measures A, B, C and D suggested in Fig 1 (which actually happened moreorless simultaneously), most people were confined to their households. For a short while those infected would pass the virus on to their families and other housemates. After that, though, far fewer people would be exposed. So we’d expect a plateau and then a steady decline, exactly as observed for London from about 4th-12th April (although “people in hospital” is only a very approximate proxy for the rate of infections around 10-14 days previously):
This observation makes we wonder how well the model captures interactions in closed household groups as distinct from random population interactions.
Second, and much more important, Fig 1 implies that there is a single R for the whole country. There clearly is not. As Fig 2 indicates (e.g. by the slope of the lines from 17th-30th March), London has been hit much harder, or at least earlier (since there’s some evidence London started to lockdown earlier than elsewhere), than other regions.
Third, and even more important, infection occurs between individuals, so what you need to capture is the interactions between individuals taking part in different types of behaviour. For example, if you use public transport to travel to work there are more opportunities for you to catch Covid-19 and later to infect someone else than if you drive.
A model should, of course, capture the fact that in some regions more people behave in particular ways than in other parts of the country. For example, more people use public transport in London.
So, if we had a model that started off with a set of behaviours where transmission is possible and how likely, and then mapped these behaviours to groups of people (e.g. a million London commuters), regional variation in R would arise as an emergent property of the system.
You really want to predict infection hotspots because that tells you where you need to focus your resources and enforce lockdown sooner.
But it might be even more important to model specific behaviour. I was thinking about this when writing yet another email to my MP over the weekend.
I’ll save the email itself for the Covid-19 files, but suffice to say that I fretted about, among other things, supermarket staff and bus drivers. Both of these groups have a greater than average risk of catching Covid-19, so need to be protected. But what tends to be stressed less is that, once infected, the risk is that they will pass the virus on to more people than would the average person with Covid-19.
The point is that supermarket staff and bus drivers interact with people all day, whereas the rest of us are either staying at home, or in the case of some essential workers, using the bus only maybe twice a day.
It’s therefore important for all of us that supermarket staff and bus drivers are not only protected but also prevented as much as possible from passing an infection on, if the protection fails.
Epidemiological models could capture the risks and demonstrate the value of very specific interventions. For example, the use of face-masks by store-workers might reduce the risk of them becoming infected but it much more significantly reduce the risk of them infecting others.
Similarly, if the models captured interactions in hospitals and care home settings, they might have been able to show the crucial importance of ensuring Covid-19 could not be transmitted between hospital staff and hence to non-coronavirus patients. At the very least the models might have indicated that care home residents should be tested or quarantined after hospital visits to close down that route of transmission of the virus.
I don’t know how sophisticated the epidemiological models in use in the UK are “under the bonnet”, but it does seem that the way they are being used and the emphasis on R has led to a focus on general broad-brush interventions. Would infections be reduced further if more attention was paid to routes of transmission either by modelling “risky” behaviour or simply by qualitative analysis?
After the lockdown 4 weeks ago now, I would have expected to see much more attention paid than has been apparent to those settings where the virus continues to be transmitted, such as in supermarkets and on public transport. It seems to have taken publicity around high mortality of staff before procedures have been changed.