Skip to main navigation Skip to search Skip to main content

Statistical approaches enabling technology-specific assay interference prediction from large screening data sets

  • Vincenzo Palmacci
  • , Steffen Hirte
  • , Jorge Enrique Hernández González
  • , Floriane Montanari
  • , Johannes Kirchmair

Publications: Contribution to journalArticlePeer Reviewed

Abstract

High throughput screening (HTS) technologies allow the biological testing of hundreds of thousands of compounds per day. Typically, a substantial proportion of the initial hits obtained by HTS are artifacts caused by assay interference. Therefore, global and technology-specific in silico models for identifying and predicting compounds interfering with biological assays have been developed. The global models benefit from training on large screening data sets, while the specialized models benefit from training on assay technology-specific experimental data. In this work, we develop and explore strategies for generating better predictors of technology-specific assay interference by utilizing the large bioactivity data matrices global models are trained on and employing partially new compound labeling approaches to maintain the assay technology awareness of specialized models. We demonstrate the utility of the statistically derived interference labels in machine learning using fluorescence-based assay interference as a representative example. Our random forest and multi-layer perceptron classifiers showed improved performance compared to existing models, achieving Matthews correlation coefficients (MCCs) of up to 0.47 on holdout data and up to 0.45 on an external test set. These results demonstrate that accurate assay-specific interference labels can be derived from large bioactivity data matrices, enabling the development of new machine-learning models without the need for further experimental data.

Original languageEnglish
Article number100099
JournalArtificial Intelligence in the Life Sciences
Volume5
DOIs
Publication statusPublished - Jun 2024

Funding

VP and JK gratefully acknowledge the support from the European Commission's Horizon 2020 Framework Programme (AIDD; grant no. 956832). JEHG thanks the Sao Paulo Research Foundation (Grant: 2022/03901\u20138) for financial support. JK gratefully acknowledges the financial support received for the Christian Doppler Laboratory for Molecular Informatics in the Biosciences by the Austrian Federal Ministry of Labour and Economy, the Austrian National Foundation for Research, Technology and Development, the Christian Doppler Research Association, Boehringer-Ingelheim RCV GmbH & Co KG and BASF SE. We thank Roxane Jacob and Matthias Welsch from the University of Vienna, Christian Doppler Laboratory for Molecular Informatics in the Biosciences, and Vincent-Alexander Scholtz, also from the University of Vienna, for their insightful discussions regarding the development of machine learning models. We thank Djork-Arn\u00E9 Clevert from Bayer AG Berlin for providing valuable feedback, and Stefan Mundt and Michael Koch from Bayer AG Wuppertal for enabling us to work with their measured data and providing expert advice. Furthermore, we thank the anonymous Reviewers for their valuable feedback and suggestions. During the preparation of this work, the authors used ChatGPT to improve the readability of the manuscript. After using this tool/service, the authors reviewed and edited the content as needed and took full responsibility for the content of the publication. VP and JK gratefully acknowledge the support from the European Commission's Horizon 2020 Framework Programme (AIDD; grant no. 956832 ). JEHG thanks the Sao Paulo Research Foundation (Grant: 2022/03901-8 ) for financial support. JK gratefully acknowledges the financial support received for the Christian Doppler Laboratory for Molecular Informatics in the Biosciences by the Austrian Federal Ministry of Labour and Economy, the National Foundation for Research, Technology and Development, the Christian Doppler Research Association, Boehringer-Ingelheim RCV GmbH & Co KG and BASF SE.

Austrian Fields of Science 2012

  • 102004 Bioinformatics
  • 301207 Pharmaceutical chemistry

Keywords

  • Assay interfering compounds
  • Biological assays
  • Fluorescence
  • High-throughput screening
  • Machine learning

Fingerprint

Dive into the research topics of 'Statistical approaches enabling technology-specific assay interference prediction from large screening data sets'. Together they form a unique fingerprint.

Cite this