Several of the recombinant sequences in these trees show that recombination events do occur across geographically divergent clades. We used TreeAnnotator to summarize posterior tree distributions and annotated the estimated values to a maximum clade credibility tree, which was visualized using FigTree. Wang, L. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to - PubMed Consistent with this, we estimate a concomitantly decreasing non-synonymous-to-synonymous substitution rate ratio over longer evolutionary timescales: 1.41 (1.20,1.68), 0.35 (0.30,0.41) and 0.133 (0.129,0.136) for SARS, MERS-CoV and HCoV-OC43, respectively. Its origin and direct ancestral viruses have not been . We compiled a dataset including 27human coronavirus OC43 virus genomes and ten related animal virus genomes (six bovine, three white-tailed deer and one canine virus). Coronavirus: Pangolins may have spread the disease to humans J. Virol. Dudas, G., Carvalho, L. M., Rambaut, A. Syst. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. Extended Data Fig. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. A reduced sequence set of 25sequences chosen to capture the breadth of diversity in the sarbecoviruses (obvious recombinants not involving the SARS-CoV-2 lineage were also excluded) was used because GARD is computationally intensive. However, formal testing using marginal likelihood estimation41 does provide some evidence of a temporal signal, albeit with limited log Bayes factor support of 3 (NRR1), 10 (NRR2) and 3 (NRA3); see Supplementary Table 1. Early detection via genomics was not possible during Southeast Asias initial outbreaks of avian influenza H5N1 (1997 and 20032004) or the first SARS outbreak (20022003). 3) clusters with viruses from provinces in the centre, east and northeast of China. PDF single centre retrospective study BFRs were concatenated if no phylogenetic incongruence signal could be identified between them. Microbiol. Time-measured phylogenetic reconstruction was performed using a Bayesian approach implemented in BEAST42 v.1.10.4. Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated phylogenetic detection of recombination using a genetic algorithm. Coronavirus origins: genome analysis suggests two viruses may have combined When the genomic data included both coding and non-coding regions we used a single GTR+ substitution model; for concatenated coding genes we partitioned the alignment by codon position and specified an independent GTR+ model for each partition with a separate gamma model to accommodate inter-site rate variation. Duchene, S. et al. Here, we analyse the evolutionary history of SARS-CoV-2 using available genomic data on sarbecoviruses. The fact that they are geographically relatively distant is in agreement with their somewhat distant TMRCA, because the spatial structure suggests that migration between their locations may be uncommon. PureBasic 53 13 constellations Public Python 42 17 1. PubMed Central GARD identified eight breakpoints that were also within 50nt of those identified by 3SEQ. RegionB showed no PI signals within the region, except one including sequence SC2018 (Sichuan), and thus this sequence was also removed from the set. The consistency of the posterior rates for the different prior means also implies that the data do contribute to the evolutionary rate estimate, despite the fact that a temporal signal was visually not apparent (Extended Data Fig. SARS-CoV-2 genetic lineages in the United States are routinely monitored through epidemiological investigations, virus genetic sequence-based surveillance, and laboratory studies. Despite the SARS-CoV-2 lineages acquisition of residues in its Spike (S) proteins receptor-binding domain (RBD) permitting the use of human ACE2 (ref. The key to successful surveillance is knowing which viruses to look for and prioritizing those that can readily infect humans47. Press, 2009). Transparent bands of interquartile range width and with the same colours are superimposed to highlight the overlap between estimates. Anderson, K. G., Rambaut, A., Lipkin, W. I., Holmes, E. C. & Garry, R. F. The proximal origin of SARS-CoV-2. Virus Evol. 17, 15781579 (1999). In the variable-loop region, RaTG13 diverges considerably with the TMRCA, now outside that of SARS-CoV-2 and the Pangolin Guangdong 2019 ancestor, suggesting that RaTG13 has acquired this region from a more divergent and undetected bat lineage. 3). Genetic lineages of SARS-CoV-2 have been emerging and circulating around the world since the beginning of the COVID-19 pandemic. A new SARS-CoV-2 variant (B.1.1.523) capable of escaping immune protections The proximal origin of SARS-CoV-2 | Nature Medicine July 26, 2021. CAS Don't blame pangolins, coronavirus family tree tracing could prove key Preprint at https://doi.org/10.1101/2020.02.10.942748 (2020). These means are based on the mean rates estimated for MERS-CoV and HCoV-OC43, respectively, while the standard deviations are set ten times higher than empirical values to allow greater prior uncertainty and avoid strong bias (Extended Data Fig. Even before the COVID-19 pandemic, pangolins have been making headlines. Press, H.) 3964 (Springer, 2009). Posterior means with 95% HPDs are shown in Supplementary Information Table 2. pango-designation Public Repository for suggesting new lineages that should be added to the current scheme Python 968 73 pangolin Public Software package for assigning SARS-CoV-2 genome sequences to global lineages. Wu, Y. et al. Microbiol. Root-to-tip divergence as a function of sampling time for non-recombinant regions NRR1 and NRR2 and recombination-masked alignment set NRA3. Su, S. et al. Sequences are colour-coded by province according to the map. The shaded region corresponds to the Sprotein. PubMed Decimal years are shown on the x axis for the 1.2 years of SARS sampling in c. d, Mean evolutionary rate estimates plotted against sampling time range for the same three datasets (represented by the same colour as the data points in their respective RtT divergence plots), as well as for the comparable NRA3 using the two different priors for the rate in the Bayesian inference (red points). In regionA, we removed subregion A1 (ntpositions 3,8724,716 within regionA) and subregion A4 (nt1,6422,113) because both showed PI signals with other subregions of regionA. He, B. et al. For weather, science, and COVID-19 . However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. 206298/Z/17/Z. Viruses 11, 174 (2019). 23, 18911901 (2006). Epidemiology, genetic recombination, and pathogenesis of coronaviruses. This study provides an integration of existing classifications and describes evolutionary trends of the SARS-CoV . Proc. Trends Microbiol. It performs: K-mer based detection Map/align, variant calling Consensus sequence generation Lineage/clade analysis using Pangolin and NextClade Access the DRAGEN COVID Lineage App on BaseSpace Sequence Hub Nevertheless, the viral population is largely spatially structured according to provinces in the south and southeast on one lineage, and provinces in the centre, east and northeast on another (Fig. While pangolins could be acting as intermediate hosts for bat viruses to get into humansthey develop severe respiratory disease38 and commonly come into contact with people through traffickingthere is no evidence that pangolin infection is a requirement for bat viruses to cross into humans. Forni, D., Cagliani, R., Clerici, M. & Sironi, M. Molecular evolution of human coronavirus genomes. This underscores the need for a global network of real-time human disease surveillance systems, such as that which identified the unusual cluster of pneumonia in Wuhan in December 2019, with the capacity to rapidly deploy genomic tools and functional studies for pathogen identification and characterization. acknowledges support by the Research FoundationFlanders (Fonds voor Wetenschappelijk OnderzoekVlaanderen (nos. Bioinformatics 22, 26882690 (2006). performed recombination and phylogenetic analysis and annotated virus names with geographical and sampling dates. Phylogenetic classification of the whole-genome sequences of SARS-CoV-2 Evol. On first examination this would suggest that that SARS-CoV-2 is a recombinant of an ancestor of Pangolin-2019 and RaTG13, as proposed by others11,22. Two exceptions can be seen in the relatively close relationship of Hong Kong viruses to those from Zhejiang Province (with two of the latter, CoVZC45 and CoVZXC21, identified as recombinants) and a recombinant virus from Sichuan for which part of the genome (regionB of SC2018 in Fig. Our results indicate the presence of a single lineage circulating in bats with properties that allowed it to infect human cells, as previously described for bat sarbecoviruses related to the first SARS-CoV lineage29,30,31. We extracted a total of 2189 full-length SARS-CoV-2 viral genomes from various states of India from the EpiCov repository of the GISAID initiative on 12 June 2020. Yres, D. L. et al. The presence of SARS-CoV-2-related viruses in Malayan pangolins, in silico analysis of the ACE2 receptor polymorphism and sequence similarities between the Receptor Binding Domain (RBD) of the spike proteins of pangolin and human Sarbecoviruses led to the proposal of pangolin as intermediary. As a proxy, it would be possible to model the long-term purifying selection dynamics as a major source of time-dependent rates43,44,52, but this is beyond the scope of the current study. Coronavirus: Pangolins found to carry related strains - BBC News D.L.R. Biol. J. Med. Mol. A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the Spike protein. "This is an extremely interesting . Unlike other viruses that have emerged in the past two decades, coronaviruses are highly recombinogenic14,15,16. and P.L.) obtained the genome sequences of 10 SARS-CoV-2 virus strains through nanopore sequencing of nasopharyngeal swabs in Malta and analyzed the assembled genome with pangolin software, and the results showed that these virus strains were assigned to B.1 lineage, indicating that SARS-CoV-2 was widely spread in Europe (Biazzo et al., 2021). The estimated divergence times for the pangolin virus most closely related to the SARS-CoV-2/RaTG13 lineage range from 1851 (1730-1958) to 1877 (1746-1986), indicating that these pangolin . The red and blue boxplots represent the divergence time estimates for SARS-CoV-2 (red) and the 2002-2003 SARS-CoV (blue) from their most closely related bat virus, with the light- and dark-colored versions based on the HCoV-OC43 and MERS-CoV centered priors, respectively. Developed by the Centre for Genomic Pathogen Surveillance. Cov-Lineages Coronavirus Disease 2019 (COVID-19) Situation Report 51 (World Health Organization, 2020). The research leading to these results received funding (to A.R. The genetic distances between SARS-CoV-2 and Pangolin Guangdong 2019 are consistent across all regions except the N-terminal domain, implying that a recombination event between these two sequences in this region is unlikely. with an alignment on which an initial recombination analysis was done. CAS Software package for assigning SARS-CoV-2 genome sequences to global lineages. This is not surprising for diverse viral populations with relatively deep evolutionary histories. Open reading frames are shown above the breakpoint plot, with the variable-loop region indicated in the Sprotein. Bioinformatics 28, 32483256 (2012). Lancet 395, 949950 (2020). 30, 21962203 (2020). Using both prior distributions, this results in six highly similar posterior rate estimates for NRR1, NRR2 and NRA3, centred around 0.00055 substitutions per siteyr1. The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus genome sequences. Note that six of these sequences fall under the terms of use of the GISAID platform. Indeed, the rates reported by these studies are in line with the short-term SARS rates that we estimate (Fig. Meet the people who warn the world about new covid variants It is RaTG13 that is more divergent in the variable-loop region (Extended Data Fig. SARS-CoV-2 itself is not a recombinant of any sarbecoviruses detected to date, and its receptor-binding motif, important for specificity to human ACE2 receptors, appears to be an ancestral trait shared with bat viruses and not one acquired recently via recombination. from the European Research Council under the European Unions Horizon 2020 research and innovation programme (grant agreement no. Avian influenza a virus (H7N7) epidemic in The Netherlands in 2003: course of the epidemic and effectiveness of control measures. We compiled a set of 69SARS-CoV genomes including 58 sampled from humans and 11 sampled from civets and raccoon dogs. And this genotype pattern led to creating a new Pangolin lineage named B.1.640.2, a phylogenetic sister group to the old B.1.640 lineage renamed B.1.640.1. Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes. B.W.P. PubMed 4. performed Srecombination analysis. Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. 11,12,13,22,28)a signal that suggests recombinationthe divergence patterns in the Sprotein do not show evidence of recombination between the lineage leading to SARS-CoV-2 and known sarbecoviruses. All sequence data analysed in this manuscript are available at https://github.com/plemey/SARSCoV2origins. USA 113, 30483053 (2016). Biol. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. The Bat, the Pangolin and the City: A Tale of COVID-19 The sizes of the black internal node circles are proportional to the posterior node support. The web application was developed by the Centre for Genomic Pathogen Surveillance. By 2009, however, rapid genomic analysis had become a routine component of outbreak response. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. First, we took an approach that relies on identification of mosaic regions (via 3SEQ14 v.1.7) that are also supported by PI signals19. PubMedGoogle Scholar. Sarbecovirus, HCoV-OC43 and SARS-CoV data were assembled from GenBank to be as complete as possible, with sampling year as an inclusion criterion. We thank T. Bedford for providing M.F.B. Lin, X. et al. The coverage threshold and consensus sequence generation threshold were set to 20 and 90 respectively. The S1 protein of Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13. Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. Coronavirus Software Tools - Illumina, Inc. Because 3SEQ is the most statistically powerful of the mosaic methods61, we used it to identify the best-supported breakpoint history for each potential child (recombinant) sequence in the dataset. 62,63), the GTR+ model and 100bootstrap replicateswas inferred for each BFR >500nt. RegionsB and C span nt3,6259,150 and 9,26111,795, respectively. To estimate non-synonymous over synonymous rate ratios for the concatenated coding genes, we used the empirical Bayes Renaissance countingprocedure67. Originally, PANGOLIN used a maximum-likelihood-based assignment algorithm to assign query SARS-CoV-2 the most likely lineage sequence. TMRCA estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent for the different data sets and different rate priors in our analyses. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. 6, 8391 (2015). 2). The coronavirus genome that these researchers had assembled, from pangolin lung-tissue samples, contained some gene regions that were ninety-nine per cent similar to equivalent parts of the SARS . Nature 503, 535538 (2013). In March, when covid cases began spiking around India, Bani Jolly went hunting for answers in the virus's genetic code. 25, 3548 (2017). Rambaut, A., Lam, T. T., Carvalho, L. M. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). PubMed Central The relatively fast evolutionary rate means that it is most appropriate to estimate shallow nodes in the sarbecovirus evolutionary history. Provided by the Springer Nature SharedIt content-sharing initiative, Molecular and Cellular Biochemistry (2023), Nature Microbiology (Nat Microbiol) . The assumption of long-term purifying selection would imply that coronaviruses are in endemic equilibrium with their natural host species, horseshoe bats, to which they are presumably well adapted. To avoid artefacts due to recombination, we focused on NRR1 and NRR2 and the recombination-masked alignment NRA3 to infer time-measured evolutionary histories. 3). In the absence of a strong temporal signal, we sought to identify a suitable prior rate distribution to calibrate the time-measured trees by examining several coronaviruses sampled over time, including HCoV-OC43, MERS-CoV, and SARS-CoV virus genomes. Scientists defined the pangolin lineage of this variant to be B.1.1.523 and it was originally recognized as a variant under monitoring on July 14, 2021. If stopping an outbreak in its early stages is not possibleas was the case for the COVID-19 epidemic in Hubeiidentification of origins and point sources is nevertheless important for containment purposes in other provinces and prevention of future outbreaks. Green boxplots show the TMRCA estimate for the RaTG13/SARS-CoV-2 lineage and its most closely related pangolin lineage (Guangdong 2019). Virus Evol. J. Gen. Virol. Zhou et al.2 concluded from the genetic proximity of SARS-CoV-2 to RaTG13 that a bat origin for the current COVID-19 outbreak is probable. Concatenated region ABC is NRR1. J. Virol. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide. & Holmes, E. C. A genomic perspective on the origin and emergence of SARS-CoV-2. PubMed A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. 5). Across a large region of the virus genome, corresponding approximately to ORF1b, it did not cluster with any of the known bat coronaviruses indicating that recombination probably played a role in the evolutionary history of these viruses5,7. Lie, P., Chen, W. & Chen, J.-P. Add entries for pangolin-data/-assignment 1.18.1.1 (, Really add a document on testing strategy. SARS-CoV-2 Variant Classifications and Definitions Concurrent evidence also proposed pangolins as a potential intermediate species for SARS-CoV-2 emergence and suggested them as a potential reservoir species11,12,13. Because 3SEQ identified ten BFRs >500nt, we used GARDs (v.2.5.0) inference on 10, 11 and 12 breakpoints. CoV-lineages GitHub This provides compelling support for the SARS-CoV-2 lineage being the consequence of a direct or nearly-direct zoonotic jump from bats, because the key ACE2-binding residues were present in viruses circulating in bats. & Andersen, K. G. Pandemics: spend on surveillance, not prediction. Biol. These datasets were subjected to the same recombination masking approach as NRA3 and were characterized by a strong temporal signal (Fig. Split diversity in constrained conservation prioritization using integer linear programming. 16, e1008421 (2020). Membrebe, J. V., Suchard, M. A., Rambaut, A., Baele, G. & Lemey, P. Bayesian inference of evolutionary histories under time-dependent substitution rates. 5. is funded by the MRC (no. A tag already exists with the provided branch name. Posterior rate distributions for MERS-CoV (far left) and HCoV-OC43 (far right) using BEAST on n=27 sequences spread over 4 years (MERS-CoV) and n=27 sequences spread over 49 years (HCoV-OC43). Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M. & Kawaoka, Y. Evolution and ecology of influenza A viruses. Schierup, M. H. & Hein, J. Recombination and the molecular clock. Five example sequences with incongruent phylogenetic positions in the two trees are indicated by dashed lines. RegionsAC had similar phylogenetic relationships among the southern China bat viruses (Yunnan, Guangxi and Guizhou provinces), the Hong Kong viruses, northern Chinese viruses (Jilin, Shanxi, Hebei and Henan provinces, including Shaanxi), pangolin viruses and the SARS-CoV-2 lineage. Virological.org http://virological.org/t/ncov-2019-codon-usage-and-reservoir-not-snakes-v2/339 (2020).