On this page we are linking to various material that have to do with our investigations of the origin and evolution of SARS-CoV-2, the virus responsible for the COVID-19 pandemic.
Paper - The Proximal Origin of SARS-CoV-2
This short peer-reviewed article in Nature Medicine summarizes our scientific investigations into the proximal origins of SARS-CoV-2. Due to constraints on article format, many references could not be included, and experimental details were limited. We deeply investigated both natural and artificial scenarios for the proximal origin of SARS-CoV-2. Our analyses strongly support a natural origin of the virus and all the data are fully consistent with evolutionary processes having led to the emergence of SARS-CoV-2 in the human population.
This study was motivated by high-level concerns about potential lab origins of the virus, decade-long research in Wuhan on bat coronaviruses, and, of course, scientific curiosity of how this virus likely came to be. There was also a separate call from the White House OSTP to understand this question better, which was initiated in response to our early analyses.
The two main features that caught our eye as we started these analyses were the receptor binding domain (RBD) in SARS-CoV-2 and the polybasic (furin) cleavage site between the S1 and S2 subunits (and associated predicted O-linked glycans).
The RBD appeared to be a high affinity binder of the human ACE2 receptor (based on structural modeling we performed - but we now know this to be true, which was puzzling when compared to SARS-CoV and other coronaviruses. This particular RBD had not previously been observed in nature, however, as we were conducting our analyses, several groups reported that this exact RBD can also be found in coronaviruses sampled in pangolins, conclusively proving that the SARS-CoV-2 RBD is present in nature and the result of natural evolution. It's important to understand that finding it in a pangolin coronavirus, does not necessarily mean that SARS-CoV-2 (or the RBD) came from pangolins - we can only sample a very tiny fraction of the entire virosphere out there, so this RBD may also be present in all sorts of other (unsampled) coronaviruses, including in bats.
When we performed our analyses, the polybasic cleavage site we identified was predicted, but we now know it's functional and likely plays a role in SARS-CoV-2 pathogenesis. This site is novel among SARS-like coronaviruses, however, these sites can be found all across the coronavirus family, including the exact PRRAR (found in feline coronaviruses). When we wrote our paper, we predicted that related viruses may be found with insertions (if not necessarily polybasic sites) are the S1/S2 junction, and indeed RmYN02 - a SARS-like coronavirus found in bats - has an insertion at this junction. Similar to the SARS-CoV-2 RBD, the polybasic cleavage site is therefore a common feature observed during the evolution of coronaviruses.
You can access of Nature Medicine the paper here.
Earlier version also on Virological.
Very relevant (and technical) Virological post by Bill Gallaher about the molecular mechanism of how SARS-CoV-2 gained its furin/polybasic cleavage site.
Another very relevant Virological post by Spyros Lytras about the evolution of the furin/polybasic cleavage site.
A detailed dissection of the evolution of furin cleavage sites across the coronavirus family from Bob Garry and Bill Gallaher on Virological.
A followup study on Virological looking at the many SARS-CoV-2 related viruses identified in diverse populations of bats, spread over large geographical areas.
Summary of the WHO report on the "Global Study of Origins of SARS-CoV-2"
I reanalyzed a lot of the data from the WHO report, which provided additional details of the early epidemic in Wuhan, including showing that the Huanan seafood market was the early epicenter and that SARS-CoV-2 was not widely circulating prior to the detection of the pandemic. I summarized all the main findings in this longer Twitter thread.
Refuting the "Yan Report"
A report from Dr. Li-Meng Yan and colleagues came out in September 2020 proposing that SARS-CoV-2 had been created in the lab. This report is one of many examples of conspiracy theories circulating about lab-based scenarios for the proximal origin of SARS-CoV-2. While such scenarios should not be dismissed without inquiry and deserve careful scientific inquiry (as we did in our "Proximal" paper), this particular report is entirely unsupported, non-scientific, and appear to have been created to support baseless conspiracy theories.
Twitter thread outlining the issues with the Yan Report.
Authenticity of recent SARS-like coronavirus genomes
We downloaded raw data and reassembled relevant genomes from recently discovered SARS-like coronaviruses from bats and pangolins to assess their authenticity. The analyses show that all the genomes are authentic and as reported in the relevant manuscripts describing their sequencing and assembly.
Data available via Google Cloud.
Summary of the findings with further discussion on Virological.
In response to concerns raised to a paper published in PLOS Pathogens (Liu et al.), I performed a separate (and more in-depth) analysis for MP789 with the same results. Data and conclusion can be found via Google Cloud.
Snakes not the reservoir of SARS-CoV-2
Early on in the pandemic it was suggested that snakes served as the reservoir of SARS-CoV-2. This conclusion was based on very naive and entirely wrong analyses. More in-depth analyses clearly show that snakes are unlikely to be the reservoir for the virus.
Post with analyses and discussion on Virological.
Also very relevant post by David Robertson on Virological.
Early evolutionary estimates of SARS-CoV-2 molecular clock and the timing of the COVID-19 pandemic
In January 2020, we performed one of the earliest estimates of the evolutionary rate and timing of SARS-CoV-2 based on 27 sequences shared publicly on GISAID from Chinese investigators. The clock of ~1E-3 s/s/y and timing of mid-November / early-December have both remained remarkably stable as more genomes have been produced (more than 100k as of this writing).
Post on Virological describing the findings, including important caveats and criticism of our analyses, etc.
The Scripps Research Institute
La Jolla, CA, USA