In a groundbreaking study published in May 2023 in the journal Nature Communications, researchers from the Institute for Protein Design at the University of Washington and The Howard Hughes Medical Institute demonstrated the successful application of deep learning methods to enhance computational protein design. By augmenting existing energy-based physical models, the researchers achieved a remarkable ten-fold increase in the success rates in binding a designed protein with its target protein. This advancement opens up new possibilities in drug development, especially in designing more effective treatments against diseases like cancer and COVID-19.
Understanding the protein search space challenge
Proteins, particularly those related to diseases, play a crucial role in medical research. However, their enormous complexity presents a significant challenge. For instance, a typical protein studied in the lab is composed of 65 amino acids, and with 20 different amino acid choices at each position, there are 65 to the power of 20 binding combinations, exceeding the estimated number of atoms in the universe. This vast search space makes traditional methods of protein design computationally infeasible.
The role of deep learning
Deep learning, a subset of artificial intelligence, utilizes computer algorithms to analyze patterns in data and extract higher-level features from raw input. In this study, researchers used deep learning to learn iterative transformations of protein sequences and structures, converging on highly accurate models rapidly. By incorporating deep learning methods, the team evaluated the quality of interfaces where hydrogen bonds and hydrophobic interactions formed, rather than attempting to enumerate all the energies individually.
The deep learning augmented De Novo protein binder design protocol
The researchers developed an AI-augmented pipeline for protein design that integrated deep learning tools like AlphaFold 2 and RoseTTA fold. They employed the powerful Frontera supercomputer, funded by the National Science Foundation, to parallelize the protein design trajectories, significantly increasing computational efficiency. The RifDock docking program generated millions of protein ‘docks’ representing interactions between potentially bound protein structures. These docks were then split into smaller chunks and assigned to Frontera’s compute nodes for processing.
Improving computational efficiency
The team also utilized the software tool ProteinMPNN, developed by the Institute for Protein Design, which increased the computational efficiency of generating protein sequences using neural networks by over 200 times compared to previous methods.
Data and experimental validation
The modeling data used in the study was derived from yeast surface display binding data collected by the Institute for Protein Design. By combining different strands of DNA with yeast, the researchers expressed various designed proteins on the yeast cell surface, allowing them to sort the cells based on their binding abilities. The success rate of the designed structures binding to their target protein was verified experimentally and showed a ten-fold increase.
Although the study results demonstrated significant progress, the researchers acknowledge that there is still much work to be done. They aim to increase the success rate even further, especially when dealing with more challenging targets such as viruses and cancer T-cell receptors. To achieve this, they plan to optimize their software tools and explore increased sampling strategies.
The successful integration of deep learning methods into computational protein design offers a promising avenue for revolutionizing drug development. By leveraging AI technologies and powerful supercomputers like Frontera, researchers have unlocked new possibilities in understanding proteins and designing more effective drugs against diseases like cancer and COVID-19. As advancements in AI continue, the future holds even more potential to transform the field of medicine and save countless lives with improved treatments.