Dealing with a data-limited regime: Combining transfer learning and transformer attention mechanism to increase aqueous solubility prediction performance

Magdalena Wiercioch (Corresponding author), Johannes Kirchmair

Publications: Contribution to journalArticlePeer Reviewed

Abstract

Aqueous solubility is a key chemical property that drives various processes in chemistry and biology. Its computational prediction is challenging, as evidenced by the fact that it has been a subject of considerable interest for several decades. Recent work has explored fingerprint-based, feature-based and graph-based representations with different machine learning and deep learning methodologies. In general, many traditional methods have been proposed, but they rely heavily on the quality of the rule-based, hand-crafted features. On the other hand, limitations in the quality of aqueous solubility data become a handicap when training deep models. In this study, we have developed a novel structure-aware method for the prediction of aqueous solubility by introducing a new deep network architecture and then employing a transfer learning approach. The model was proven to be competitive, obtaining an RMSE of 0.587 during both cross-validation and a test on an independent dataset. To be more precise, the method is evaluated on molecules downloaded from the Online Chemical Database and Modeling Environment (OCHEM). Beyond aqueous solubility prediction, the strategy presented in this work may be useful for modeling any kind of (chemical or biological) properties for which there is a limited amount of data available for model training.

Original languageEnglish
Article number100021
JournalArtificial Intelligence in the Life Sciences
Volume1
DOIs
Publication statusPublished - Dec 2021

Austrian Fields of Science 2012

  • 106005 Bioinformatics
  • 301207 Pharmaceutical chemistry
  • 102019 Machine learning
  • 102018 Artificial neural networks

Keywords

  • Aqueous solubility
  • Cheminformatics
  • Deep Learning
  • Drug discovery
  • Regression
  • Transformer model

Fingerprint

Dive into the research topics of 'Dealing with a data-limited regime: Combining transfer learning and transformer attention mechanism to increase aqueous solubility prediction performance'. Together they form a unique fingerprint.

Cite this