Abstract
Utilizing the growing wealth of chemical reaction data can boost synthesis planning and increase success rates. Yet, the effectiveness of machine learning tools for retrosynthesis planning and forward reaction prediction relies on accessible, well-curated data presented in a structured format. Although some public and licensed reaction databases exist, they often lack essential information about reaction conditions. To address this issue and promote the principles of findable, accessible, interoperable, and reusable (FAIR) data reporting and sharing, we introduce the Simple User-Friendly Reaction Format (SURF). SURF standardizes the documentation of reaction data through a structured tabular format, requiring only a basic understanding of spreadsheets. This format enables chemists to record the synthesis of molecules in a format that is understandable by both humans and machines, which facilitates seamless sharing and integration directly into machine learning pipelines. SURF files are designed to be interoperable, easily imported into relational databases, and convertible into other formats. This complements existing initiatives like the Open Reaction Database (ORD) and Unified Data Model (UDM). At Roche, SURF plays a crucial role in democratizing FAIR reaction data sharing and expediting the chemical synthesis process.
| Original language | English |
|---|---|
| Article number | e202400361 |
| Journal | Molecular Informatics |
| Volume | 44 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 23 Jan 2025 |
| Externally published | Yes |
Austrian Fields of Science 2012
- 102019 Machine learning
- 102033 Data mining
- 102035 Data science
- 104015 Organic chemistry
Keywords
- chemical reactions
- machine learning
- FAIR data
Fingerprint
Dive into the research topics of 'Simple User-Friendly Reaction Format'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver