TY - JOUR
T1 - SpecieScan
T2 - Semi-Automated taxonomic identification of bone collagen peptides from MALDI-ToF MS
AU - Vegh, Emese
AU - Douka, Katerina
N1 - Publisher Copyright:
© The Author(s) 2024. Published by Oxford University Press.
PY - 2024/3/1
Y1 - 2024/3/1
N2 - Motivation: Zooarchaeology by Mass Spectrometry (ZooMS) is a palaeoproteomics method for the taxonomic determination of collagen, which traditionally involves challenging manual spectra analysis with limitations in quantitative results. As the ZooMS reference database expands, a faster and reproducible identification tool is necessary. Here we present SpecieScan, an open-access algorithm for automating taxa identification from raw MALDI-ToF mass spectrometry (MS) data. Results: SpecieScan was developed using R (pre-processing) and Python (automation). The algorithm’s output includes identified peptide markers, closest matching taxonomic group (taxon, family, order), correlation scores with the reference databases, and contaminant peaks present in the spectra. Testing on original MS data from bones discovered at Palaeothic archaeological sites, including Denisova Cave in Russia, as well as using publicly-available, externally produced data, we achieved >90% accuracy at the genus-level and ~92% accuracy at the family-level for mammalian bone collagen previously analysed manually. Availability and implementation: The SpecieScan algorithm, along with the raw data used in testing, results, reference database, and common contaminants lists are freely available on Github (https://github.com/mesve/SpecieScan).
AB - Motivation: Zooarchaeology by Mass Spectrometry (ZooMS) is a palaeoproteomics method for the taxonomic determination of collagen, which traditionally involves challenging manual spectra analysis with limitations in quantitative results. As the ZooMS reference database expands, a faster and reproducible identification tool is necessary. Here we present SpecieScan, an open-access algorithm for automating taxa identification from raw MALDI-ToF mass spectrometry (MS) data. Results: SpecieScan was developed using R (pre-processing) and Python (automation). The algorithm’s output includes identified peptide markers, closest matching taxonomic group (taxon, family, order), correlation scores with the reference databases, and contaminant peaks present in the spectra. Testing on original MS data from bones discovered at Palaeothic archaeological sites, including Denisova Cave in Russia, as well as using publicly-available, externally produced data, we achieved >90% accuracy at the genus-level and ~92% accuracy at the family-level for mammalian bone collagen previously analysed manually. Availability and implementation: The SpecieScan algorithm, along with the raw data used in testing, results, reference database, and common contaminants lists are freely available on Github (https://github.com/mesve/SpecieScan).
KW - species identification
KW - bioinformatics
KW - Phylogenetics
KW - MALDI-TOF-MS
KW - ZooMS
UR - https://cris.vub.be/en/publications/speciescan-semiautomated-taxonomic-identification-of-bone-collagen-peptides-from-malditof-ms(9bf4ce04-1ace-402c-8ad5-8dd1d779db1b).html
UR - http://www.scopus.com/inward/record.url?scp=85187200168&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btae054
DO - 10.1093/bioinformatics/btae054
M3 - Article
C2 - 38337062
SN - 1367-4803
VL - 40
SP - 1
EP - 12
JO - Bioinformatics
JF - Bioinformatics
IS - 3
M1 - btae054
ER -