Rapid traversal of vast chemical space using machine learning-guided docking screens

dc.contributor.affiliationUniversidade de Santiago de Compostela. Centro de Investigación en Medicina Molecular e Enfermidades Crónicas (CiMUS)
dc.contributor.affiliationUniversidade de Santiago de Compostela. Departamento de Farmacoloxía, Farmacia e Tecnoloxía Farmacéutica
dc.contributor.authorLuttens, Andreas
dc.contributor.authorCabeza de Vaca, Israel
dc.contributor.authorSparring, Leonard
dc.contributor.authorBrea Floriani, José Manuel
dc.contributor.authorMartínez Rodríguez, Antón Leandro
dc.contributor.authorKahlous, Nour Aldin
dc.contributor.authorRadchenko, Dmytro S.
dc.contributor.authorMoroz, Yurii S.
dc.contributor.authorLoza García, María Isabel
dc.contributor.authorNorinder, Ulf
dc.contributor.authorCarlsson, Jens
dc.date.accessioned2026-01-29T13:04:04Z
dc.date.available2026-01-29T13:04:04Z
dc.date.issued2025-03-13
dc.description.abstractThe accelerating growth of make-on-demand chemical libraries provides unprecedented opportunities to identify starting points for drug discovery with virtual screening. However, these multi-billion-scale libraries are challenging to screen, even for the fastest structure-based docking methods. Here we explore a strategy that combines machine learning and molecular docking to enable rapid virtual screening of databases containing billions of compounds. In our workflow, a classification algorithm is trained to identify top-scoring compounds based on molecular docking of 1 million compounds to the target protein. The conformal prediction framework is then used to make selections from the multi-billion-scale library, reducing the number of compounds to be scored by docking. The CatBoost classifier showed an optimal balance between speed and accuracy and was used to adapt the workflow for screens of ultralarge libraries. Application to a library of 3.5 billion compounds demonstrated that our protocol can reduce the computational cost of structure-based virtual screening by more than 1,000-fold. Experimental testing of predictions identified ligands of G protein-coupled receptors and demonstrated that our approach enables discovery of compounds with multi-target activity tailored for therapeutic effect
dc.description.peerreviewedSI
dc.description.sponsorshipA.L. was supported by a postdoctoral scholarship from the Knut and Alice Wallenberg Foundation (KAW2022.0347). J.C. received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement 715052), the Swedish Cancer Society, the Swedish Research Council and the Olle Engkvist Foundation. This research was partially supported by the project AI4Research at Uppsala University. I.C.d.V. was funded by a postdoctoral fellowship provided by the Sven och Lilly Lawski foundation. The computations were enabled using resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS) (partially funded by the Swedish Research Council through grant agreement number 2022-06725) and the supercomputing resource Berzelius provided by the National Supercomputer Centre at Linköping University and the Knut and Alice Wallenberg Foundation. J.B., A.L.M. and M.I.L. were funded by Agencia Estatal de Investigación (PID2020-119428RB-I00), Xunta de Galicia (ED431C 2022/20) and European Regional Development Fund (ERDF). A.L., I.C.d.V. and J.C. thank OpenEye Scientific Software for the use of OEToolkits at no cost. We thank J. Zhang for providing the initial deep neural network code
dc.identifier.citationLuttens, A., Cabeza de Vaca, I., Sparring, L. et al. Rapid traversal of vast chemical space using machine learning-guided docking screens. Nat Comput Sci 5, 301–312 (2025). https://doi.org/10.1038/s43588-025-00777-x
dc.identifier.doi10.1038/s43588-025-00777-x
dc.identifier.essn2662-8457
dc.identifier.urihttps://hdl.handle.net/10347/45594
dc.issue.number5
dc.journal.titleNature Computational Science
dc.language.isoeng
dc.page.final312
dc.page.initial301
dc.publisherSpringer Nature
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/H2020/715052/
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-119428RB-I00/ES/NUEVA APROXIMACION EXPERIMENTAL PARA LA IDENTIFICAICON DE ANTIPSICOTICOS ACTIVOS FRENTE AL DEFICIT COGNITIVO EN ESQUIZOFRENIA
dc.relation.publisherversionhttps://doi.org/10.1038/s43588-025-00777-x
dc.rights© The Author(s) 2025. This article is licensed under a Creative Commons Attribution 4.0 International License
dc.rightsAttribution 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectCheminformatics
dc.subjectComputational chemistry
dc.subjectMachine learning
dc.subjectStructure-based drug design
dc.subjectVirtual drug screening
dc.titleRapid traversal of vast chemical space using machine learning-guided docking screens
dc.typejournal article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublication67b19be7-64a8-45c8-a6e4-ed48a4410ef8
relation.isAuthorOfPublicationefe7f464-2f77-4a92-915f-fda4128451fa
relation.isAuthorOfPublication7765cb9b-b630-44dc-9477-dd266a62bb3c
relation.isAuthorOfPublication.latestForDiscovery67b19be7-64a8-45c8-a6e4-ed48a4410ef8

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2025_NatCompSci_Luttens_Rapid.pdf
Size:
3.33 MB
Format:
Adobe Portable Document Format