"Machine learning in chemistry and beyond" (ChE-651) seminar by Cory Simon "Classifying the toxicity of pesticides to honey bees via support vector machines with random walk graph kernels"


Event details

Date 07.06.2022 17:1518:15  
Speaker Cory Simon hails from a small town in Ohio. He earned his B.S. in Chemical Engineering from the University of Akron. He then studied mathematics at the University of British Columbia in Vancouver, Canada for two years. In 2016, he earned his Ph.D. in Chemical Engineering from the University of California, Berkeley. He conducted scientific research at Virginia Tech, Okinawa Institute of Science and Technology, Lawrence Berkeley National Laboratory, École Polytechnique Fédérale de Lausanne, and Altius Institute for Biomedical Sciences and interned in industry at Bridgestone Research (chemical engineering) and Stitch Fix (data science). Since 2017, Cory is an assistant professor at Oregon State University in the School of Chemical, Biological, and Environmental Engineering. His research group employs molecular models and simulations, machine learning, and statistical mechanics to discover nanoporous materials for gas storage, separations, and sensing. Cory digs hiking/backpacking in scenic places, snowboarding, wine, and going on walks with his dog, Oslo.
Location Online
Category Conferences - Seminars
Event Language English

Pesticides benefit agriculture by increasing crop yield, quality, and security. However, pesticides may inadvertently harm bees, which are valuable as pollinators. Thus, candidate pesticides in development pipelines must be assessed for toxicity to bees. 

Leveraging a data set of 382 molecules with toxicity labels from honey bee exposure experiments, we train a support vector machine (SVM) to predict the toxicity of pesticides to honey bees. We compare two representations of the pesticide molecules: (i) a random walk feature vector listing counts of length-L walks on the molecular graph with each vertex- and edge-label sequence and (ii) the MACCS structural key fingerprint (FP), a bit vector indicating the presence/absence of a list of pre-defined subgraph patterns in the molecular graph. We explicitly construct the MACCS FPs, but rely on the fixed-length-L random walk graph kernel (RWGK) in place of the dot product for the random walk representation. 

The L-RWGK-SVM achieves an accuracy, precision, recall, and F1 score (mean over 2000 runs) of 0.81, 0.68, 0.71, and 0.69 on the test data set---with L=4 the mode optimal walk length. The MACCS-FP-SVM performs on par/marginally better than the L-RWGK-SVM, lends more interpretability, but varies more in performance. We interpret the MACCS-FP-SVM by illuminating which subgraph patterns in the molecules tend to strongly push them towards the toxic/non-toxic side of the separating hyperplane. 

Practical information

  • Informed public
  • Free


  • Kevin Maik Jablonka, Solène Oberli, Puck van Gerwen


  • Kevin Maik Jablonka, Solène Oberli, Puck van Gerwen