AI in chemistry and beyond: Machine learning for reactivity using expert descriptors and mechanistic information

Thumbnail

Event details

Date 09.05.2023
Hour 15:1516:15
Speaker Kjell Jorner is an Assistant Professor of Digital Chemistry at ETH Zurich since January 2023. His work focuses on accelerating chemical discovery with digital tools, with a special emphasis on reactivity and catalysis. His group does interdisciplinary research, drawing from the fields of computational chemistry, cheminformatics and machine learning. Before joining ETH Zurich, he was a postdoctoral researcher with Alán Aspuru-Guzik (2021-2022) and at AstraZenecaUK (2018-2020). Kjell has a PhD from Uppsala University (2018) on computational physical organic chemistry for the photochemistry of aromatic compounds.
Location
Category Conferences - Seminars
Event Language English

Deep learning based on string or graph representations of molecules has shown great progress in the last few years. Important applications include, for example, synthesis prediction, protein structure prediction and machine learning potentials. All of these applications benefit from an abundance of well-curated datasets on the order of at least hundreds of thousands of points. For many applications in chemistry, datasets are much smaller, on the order of tens or hundreds of datapoints. Machine learning with classical methods has here been the gold standard, based on expert-picked descriptors. These descriptors are mostly problem-specific and often calculated with quantum-chemical software. During the last few years, we have developed the open-source Morfeus Python package for calculating descriptors, mainly related to catalysis and reactivity. Morfeus was for example used to calculate descriptors for the Kraken database of phosphine ligand properties.While most reactivity models include only information on reactants and/or products, increased accuracy can be obtained by including information from high-energy intermediates and transitions states along the reaction path. We will highlight our work on using mechanistic information to predict activation energies and selectivities for the nucleophilic aromatic substitution reaction, representing ~9% of all reactions carried out in the pharmaceutical industry.Although including mechanistic information can improve model performance, obtaining the required high-energy structures is time-consuming and non-robust. We have therefore worked on robuster and faster methods based on force fields in the Polanyi package. Using Polanyi, we recently created the first reactivity benchmark task for generative models in the Tartarus suite. Tartarus allows comparison on which are more chemically realistic and challenging than conventional tasks such as logP or QED optimization.