How AI is Transforming New Drug Discovery

Traditional drug discovery is immensely difficult. New drug development can require 13-15 years, billions in R&D investment, and still fewer than 10% of Phase I candidates end up receiving FDA approval. Artificial Intelligence advances in computational biology have the potential to greatly accelerate this process to deliver better patient outcomes and reducing research costs in the future, however there are still significant obstacles facing AI in drug discovery.

AI can analyze the spatial geometry and structure of molecules, evaluating their suitability for a specific target protein. Advances in AI could even model non-target interactions with other proteins that could lead to side effects in patients that would stall or even stop development in the clinical phase. These advancements require large amounts of high-quality data. While there are extensive public databases like ChEMBL, this data is still imperfect. Without standardization within these datasets of both experimental and data collection methods, there are limits to the algorithms and models that can be trained on them.

The Challenges with Raw Datasets

Large biotech companies recognize these problems in datasets, curating their own databases that encompass tens of thousands of compounds. Furthermore, these databases are often more standardized than public data, which is aggregated from numerous different labs and experimental procedures. Understandably, companies are unwilling to share these datasets publicly, as they represent incredible investments of both time and money and are significant assets to their parent companies. There has been some effort to encourage sharing of data between corporations, notably Melloddy in the EU, which concluded in 2023. Melloddy brought ten pharmaceutical companies together to train their machine learning programs on a cross-pharma dataset of more than 2.6+ billion confidential experimental activity data points, resulting in models that related molecular structure to biological activity that were significantly more accurate than many of the individual company’s models. While there is no expectation from academic researchers that companies will allow open access to their datasets, projects like Melloddy serve as proof of concept that collaboration could benefit everyone in the AI space.

AI Models Generating New Compounds

Beyond training AI on gathered data from databases, there is also the possibility of AI models generating new compounds that are not either existing drugs or in an existing compound library. These de novo compounds do not have to be limited to just small molecules either; proteins, antibodies, RNA-therapies and gene therapies are all possibilities within the de novo umbrella – as well as possibly other structures that are not currently known. Any de novo compounds identified by an AI model would stress the importance of predicting compound characteristics, namely toxicity and potency, but these are also areas in which AI models could also be the answer. The FDA announced in December of 2022 through the FDA Modernization Act 2.0 that they would be open to non-animal based testing in preclinical trials, utilizing systems like organs-on-chips where AI models could prove critical to developing more ethical experimental methods.

AI could hold the key to the future of drug development. Working through the obstacles of gathering high quality, large and available datasets, discovering new compounds, or validating models to ensure more ethical testing would all pave the way for a future for drug discovery that is safer, faster, and cheaper, reducing the barrier to introducing new transformative therapies that improve patient outcomes.

Reach out to Medelis if you’re like to learn more.