Conditional enzyme design with unsupervised language models


Date 04.09.2024
Hour 15:3016:30
Speaker Noelia Ferruz
CH G1 495
Event Language English

Artificial Intelligence (AI) methods are emerging as powerful tools in fields such as Natural Language Processing (NLP) and Computer Vision (CV), impacting the applications we use in our daily lives. Language models have recently shown incredible performance at understanding and generating human text, producing text often indistinguishable from that written by humans. Inspired by these recent advances, we trained a language model, ZymCTRL, a model trained on enzyme sequences and their associated Enzymatic Commission (EC) numbers. By combining each sequence with its respective catalytic function, the model has learned a joint distribution of the sequence patterns that govern function. To assess the quality of generation and their validity in real-life scenarios, we thoroughly tested the model using carbonic anhydrases and lactate dehydrogenases. In all cases, the model generated enzymes whose activities aligned with their natural counterparts, even with sequence identities as low as 40%. Lastly, we have trained REXzyme, a translation machine capable of designing enzyme sequences for user-defined chemical reactions.

Protein Design Artifical intelligence Enzymes

