Over the past 15 years, outbreaks caused by viruses such as Ebola, SARS, and Zika have cost governments billions of US dollars. Combined with a perception among scientists, health workers and citizens that responses to outbreaks have been inadequate, this has fueled what seems like a compelling idea. Namely, that if researchers can identify the next pandemic virus before the first case appears, communities could drastically improve strategies for control, and even stop a virus from taking hold. Indeed, since 2009, the US Agency for International Development has spent US$170 million on evaluating the “feasibility of preemptively mitigating pandemic threats” using biodiversity-based sequencing surveys. Spearheaded by the Global Virome Project (GVP), proponents suggest that by combining large-scale sequencing-based biodiversity surveys with machine learning, we will be able to predict the next pandemic virus before it occurs.
In a commentary in Nature, we describe how this is misguided. In addition to the immense cost involved (e.g., GVP has requested $1.2 billion), such efforts are destined to fail in their proposed goals (preventing the next pandemic), and trust is undermined when scientists make overblown promises about disease prevention that cannot be kept. No amount of biodiversity-based sequencing will allow us to predict what the next pandemic virus might be. Instead, we urge those working on infectious disease to focus funds and efforts on a much simpler and more cost-effective way to mitigate outbreaks – proactive, real-time surveillance of human populations.
In our commentary, we don’t discount the importance of biodiversity-based sequencing for understanding basic mechanisms of virus diversity and evolution. We also don’t discount the importance of modeling approaches for predicting and forecasting outbreaks of known viruses.
To further clarify some of our points, I wanted to lay out the various levels at which one might predict whether a virus (or bacteria/parasites) could cause a pandemic. I consider four different levels below – some of which are doable, some of which – as our commentary argues – are simply not possible.
Level 1 – Predicting whether a virus that is already endemic in certain areas of world and is already capable of causing outbreaks, could also cause outbreaks in other parts of the world.
- Examples include Ebola and Zika.
- For Ebola, scientists actually predicted that West Africa was at risk, but sadly, those studies were not published until after the 2013-2016 Epidemic started (e.g., Pigott et al., eLife, 2014 and 2016).
- It is reasonable to think that we will be able to predict new endemic areas for viruses that we already know are capable of causing outbreaks/epidemics.
- So in summary – we’re quite close to being able to predict Level 1, but we’re not quite there yet (although, I believe Pigott did a very good job with the Ebola papers).
Level 2 – Predicting whether a known virus that is already capable of infecting humans, but not currently causing outbreaks, could become pandemic.
- Examples include H5N1 ‘bird’ flu and other influenza types.
- A lot of research is going into this question, and similar to Level 1, we’re making gains. We’re far from there though, but presumably we can get a lot better at this.
Level 3 – Predicting whether a known virus – not currently able to infect humans – could become pandemic.
- These are viruses that circulate in animals, but we haven’t yet seen human infections. Examples again include certain types of flu and many animal-specific viruses.
- Here we’re kinda left in the dark (since human infections have yet to be observed) and it’s unlikely that we’ll ever be able to predict whether such viruses might have pandemic potential.
Level 4 – Predicting whether a yet-to-be-discovered virus could have pandemic potential (by sequencing viruses in wildlife and the environment).
- Examples include the first outbreaks of e.g. Ebola, Marburg, MERS, SARS – viruses we didn’t know existed until they caused human outbreaks.
- This is what GVP (and similar initiatives) are trying to do. What we call ‘biodiversity-based prediction’.
- You would survey the environment and animals (using e.g. metagenomic sequencing) to discover new viruses (this part is doable – and is being done. You will find new viruses. Lots).
- It is never going to be possible to predict whether any of these could have pandemic potential.
It is reasonable that within the next ten to twenty years, we will be able to perform Level 1 prediction – essentially figure out where the next Ebola or Zika outbreak (or epidemic) might occur. The reason for that is that these viruses have pretty specific ‘requirements’, so e.g. if we perform sophisticated analyses of what factors (environmental, climatic, social, infrastructural, etc.) drive epidemics of ‘outbreak prone’ viruses such as Ebola and Zika, then we can look where else in the world similar environments exist. E.g., for Zika it appears that places with (likely year-around) Aedes aegypti mosquitoes could be at risk of future Zika outbreaks.
For Level 2 prediction, it gets a lot harder because there are so many potential candidate viruses, yet very few of these will eventually end up being capable of causing epidemics, let alone pandemics.
For Level 3 – and especially Level 4 it’s simply impossible and it’s all down to the numbers. There are so many undiscovered viruses out there and the percentage of undiscovered viruses is likely 99.99%. And we’re only talking about viruses, so not including bacteria, parasites, and other potential pathogens. Adding to this, is the fact that humans get into contact with known and unknown zoonotic viruses all the times (probably billions of encounters annually – although this number and the following are all guesses – we really have no idea), yet only in exceptionally few cases (one in tens of millions?) do these encounters lead to outbreaks – and even more rarely, to epidemics and pandemics. Almost every single time outbreaks do occur, it’s down to viruses we already know about (chikungunya, dengue, Zika, Ebola, flu, yellow fever, etc.). Now, imagine you try to do the same with unknown viruses. What you’re trying to predict is likely something that happens maybe once out of tens of billions of encounters with one virus out of millions of potential viruses. If you have a basic understanding of statistics, it’s easy to see that this is simply not going to be possible. No machine learning algorithm – no matter how smart – will get you there (plus you need thousands – if not tens of thousands – of ‘test’ examples of known virus <> outbreaks to train your model – which do not exist).
So in short, while you can survey the global virusphere using e.g. metagenomic sequencing to discover new viruses, you will lose your fight against the numbers if you’re trying to extrapolate that into predicting whether the discovered viruses could have pandemic potential. While some of them might, the vast majority of them won’t (let alone be able to infect humans).
To wrap up, I wanted to give some examples of what can actually be done when we’re trying to prevent pandemics. It has to do with early diagnosis. Again, there are a couple of different levels here, and we’re making great strides in some of them. When it comes to diagnosis/detection there are at least four levels in decreasing degree of complexity:
Detecting an unknown virus before it jumps into humans, causing an outbreak:
- This sounds great, right? Why wait for the outbreak to happen, when you could potentially stop it before the virus even jumps? This is Level 4 prediction, so not possible.
Detecting a known virus before it jumps into humans, causing an outbreak:
- Two version of this – one is Level 3 prediction, which again isn’t possible.
- The other version, however, is when you know a virus can already cause epidemics (e.g., Zika, Ebola), you can survey the environment to look for presence of the virus. For example, look for Zika in mozzies, look for Ebola in bats and great apes. Technically this is complicated, but we will be able to eventually do it (in various degrees of complication – e.g., Zika surveillance can already be done, but Ebola surveillance is much harder).
Detecting a know virus as soon as it jumps into humans – stop the virus in ‘Patient Zero’:
- Currently not really feasible in most cases (think Ebola epidemic in West Africa), but with the development of portable sequencing (which will eventually – in the next ten-twenty years – lead to a single diagnostic test that can test for everything, and multiplex rapid diagnostic tests, this is going to be possible in future (probably ~20 years down the line for most cases – some (pathogens that we frequently test for) will be sooner). Eventually this will allow us to stop some pandemics even before they get started.
Detecting the first cluster of cases:
- This is the crux and is really the part that’s just on the cusp of being feasible. Forget about detecting the virus before it jumps, even forget about detecting the first patient – detect the first cluster. If we again use Ebola in West Africa as an example, if we could have detected that first cluster in Meliandou, then we would have been able to prevent the ensuing epidemic. Sadly, a delay in diagnosis (of three months), meant that the outbreak spun out of control before we even knew it was occurring. For Zika, the surveillance delay was likely more than a year. We already have the technologies to ensure that such an event would be highly unlikely to occur again in future, but it’s all about implementation and investment in prevention, instead of response (something – for whatever reason – humans appear to be really bad at). This is what we’re advocating in our commentary – spend the money where it’s proven it’ll make a difference!
You can read the rest of our commentary here: https://www.nature.com/articles/d41586-018-05373-w
Andrew Rambaut also made a very nifty calculator to estimate the futility of biodiversity-based pandemic prediction methods: https://rambaut.github.io/rareEventScreening/