Skip to main navigation Skip to search Skip to main content

Deciphering Molecular Embeddings with Centered Kernel Alignment

Publications: Contribution to journalArticlePeer Reviewed

Abstract

Analyzing machine learning models, especially nonlinear ones, poses significant challenges. In this context, centered kernel alignment (CKA) has emerged as a promising model analysis tool that assesses the similarity between two embeddings. CKA's efficacy depends on selecting a kernel that adequately captures the underlying properties of the compared models. The model analysis tool was designed for neural networks (NNs) with their invariance to data rotation in mind and has been successfully employed in various scientific domains. However, CKA has rarely been adopted in cheminformatics, partly because of the popularity of the random forest (RF) machine learning algorithm, which is not rotationally invariant. In this work, we present the adaptation of CKA that builds on the RF kernel to match the properties of RF. As part of the method validation, we show that the model analysis method is well-correlated with the prediction similarity of RF models. Furthermore, we demonstrate how CKA with the RF kernel can be utilized to analyze and explain the behavior of RF models derived from molecular and rooted fingerprints.

Original languageEnglish
Pages (from-to)7303-7312
Number of pages10
JournalJournal of Chemical Information and Modeling
Volume64
Issue number19
DOIs
Publication statusPublished - 14 Oct 2024

Austrian Fields of Science 2012

  • 102004 Bioinformatics
  • 102001 Artificial intelligence
  • 102019 Machine learning

Keywords

  • Machine Learning
  • Neural Networks, Computer
  • Algorithms
  • Cheminformatics/methods
  • Models, Molecular

Fingerprint

Dive into the research topics of 'Deciphering Molecular Embeddings with Centered Kernel Alignment'. Together they form a unique fingerprint.

Cite this