TY - JOUR
T1 - A framework for automated scalable designation of viral pathogen lineages from genomic data
AU - McBroome, Jakob
AU - de Bernardi Schneider, Adriano
AU - Roemer, Cornelius
AU - Wolfinger, Michael T.
AU - Hinrichs, Angie S.
AU - O’Toole, Aine Niamh
AU - Ruis, Christopher
AU - Turakhia, Yatish
AU - Rambaut, Andrew
AU - Corbett-Detig, Russell
N1 - Accession Number: WOS:001163665400002
PubMed ID: 38316930
PY - 2024/2
Y1 - 2024/2
N2 - Pathogen lineage nomenclature systems are a key component of effective communication and collaboration for researchers and public health workers. Since February 2021, the Pango dynamic lineage nomenclature for SARS-CoV-2 has been sustained by crowdsourced lineage proposals as new isolates were sequenced. This approach is vulnerable to time-critical delays as well as regional and personal bias. Here we developed a simple heuristic approach for dividing phylogenetic trees into lineages, including the prioritization of key mutations or genes. Our implementation is efficient on extremely large phylogenetic trees consisting of millions of sequences and produces similar results to existing manually curated lineage designations when applied to SARS-CoV-2 and other viruses including chikungunya virus, Venezuelan equine encephalitis virus complex and Zika virus. This method offers a simple, automated and consistent approach to pathogen nomenclature that can assist researchers in developing and maintaining phylogeny-based classifications in the face of ever-increasing genomic datasets.
AB - Pathogen lineage nomenclature systems are a key component of effective communication and collaboration for researchers and public health workers. Since February 2021, the Pango dynamic lineage nomenclature for SARS-CoV-2 has been sustained by crowdsourced lineage proposals as new isolates were sequenced. This approach is vulnerable to time-critical delays as well as regional and personal bias. Here we developed a simple heuristic approach for dividing phylogenetic trees into lineages, including the prioritization of key mutations or genes. Our implementation is efficient on extremely large phylogenetic trees consisting of millions of sequences and produces similar results to existing manually curated lineage designations when applied to SARS-CoV-2 and other viruses including chikungunya virus, Venezuelan equine encephalitis virus complex and Zika virus. This method offers a simple, automated and consistent approach to pathogen nomenclature that can assist researchers in developing and maintaining phylogeny-based classifications in the face of ever-increasing genomic datasets.
UR - http://www.scopus.com/inward/record.url?scp=85184170227&partnerID=8YFLogxK
U2 - 10.1038/s41564-023-01587-5
DO - 10.1038/s41564-023-01587-5
M3 - Article
C2 - 38316930
AN - SCOPUS:85184170227
VL - 9
SP - 550
EP - 560
JO - Nature Microbiology
JF - Nature Microbiology
SN - 2058-5276
IS - 2
ER -