In a groundbreaking research endeavor, scientists from the University of Eastern Finland have unlocked a new era in drug discovery, capitalizing on the potential of machine learning. Through a strategic partnership with industry experts and the computational capabilities of supercomputers, this collaboration has yielded remarkable outcomes, significantly compressing the time required for virtual drug screening.
This article provides comprehensive insights into this pioneering methodology, underlining its implications for the pharmaceutical sector and the broader scientific community.
Pioneering virtual screening
The realm of virtual screening, a pivotal phase in the pursuit of novel drug compounds, has undergone a monumental transformation thanks to the integration of machine learning. Researchers have relied on rapid computer-aided techniques to sift through extensive compound libraries, seeking potential agents to target specific drug-related entities.
However, the explosion in the size of these libraries has outpaced the capabilities of even the most advanced supercomputers. Consequently, screening a billion-scale compound library against a singular drug target could consume several months or even years, presenting a formidable challenge for the industry.
The landmark study
In a recent publication in the Journal of Chemical Information and Modeling, Dr. Ina Pöhner and her esteemed colleagues from the School of Pharmacy at the University of Eastern Finland. In partnership with CSC – IT Center for Science Ltd., the apex of supercomputing in Finland, and key industrial collaborators from Orion Pharma, embarked on a mission to harness the potential of machine learning for giga-scale virtual screening.
The researchers meticulously established a baseline before introducing artificial intelligence into their workflow. They embarked on a monumental virtual screening campaign that assessed an astonishing 1.56 billion drug-like molecules against two pharmacologically pertinent targets. This exhaustive undertaking, spanning almost half a year, was executed with the aid of the supercomputers Mahti and Puhti, employing the molecular docking technique. This computational method scrutinizes the compatibility of small molecules with a target’s binding site.
Machine learning at the forefront
Following the intensive docking phase, the results underwent a machine learning-boosted screening process orchestrated through HASTEN, a tool Dr. Tuomo Kalliokoski from Orion Pharma masterminded. HASTEN harnesses the power of machine learning to decipher the intricate properties of molecules and their impact on compound scoring. Leveraging a vast dataset amassed through conventional docking, the machine learning model can swiftly predict docking scores for other compounds in the library, dramatically outperforming the resource-intensive brute-force docking approach.
The outcomes were nothing short of extraordinary. With a mere 1% of the complete library docked and employed as training data, HASTEN identified 90% of the highest-scoring compounds in under ten days. This achievement represents a quantum leap in efficiency, highlighting the transformative potential of machine learning in revolutionizing the drug discovery landscape.
The triumph of this research is a testament to the synergy between academia and industry. Professor Antti Poso, who spearheads the computational drug discovery group within the University of Eastern Finland’s DrugTech Research Community, underscored the pivotal role CSC’s state-of-the-art computational resources played in realizing their ambitious objectives. This collaboration is exemplary, illustrating how the amalgamation of ideas, resources, and cutting-edge technology can propel scientific research to unparalleled heights.
Open data a catalyst for future innovation
Recognizing the paramount importance of knowledge dissemination, the study’s authors have generously made substantial datasets available to the public. These datasets encompass a readily deployable screening library for docking, empowering fellow researchers to expedite their screening endeavors.
Furthermore, the dataset comprising 1.56 billion compound-docking results for two targets has been unveiled as benchmarking data. This philanthropic gesture not only fosters collaboration but also paves the way for the development of tools that can streamline processes, conserve resources, and advance the field of computational drug discovery.