How to improve AI and Large Language Model outputs to enhance drug discovery, according to large pharma experts

Louise King, Product Manager, Biorelate
March 12, 2025
5 min read
Basic Linkedin Icon
Basiic Maill iicon

How do we enable AI models to produce high quality outputs? Having access to an abundance of high quality scientific data is a crucial element. The challenge then comes in ensuring that these outputs are accessible, usable and meaningful to the multi-disciplinary audience who are involved in the drug development process, to increase efficiency in both early and later stages of drug discovery. Data accessibility and increasing efficiency within large pharma companies were issues raised by multiple pharma leaders at the recent Festival of Genomics conference and here, we dive into their insights. 

Balancing generalisability with task specificity

While some commercial AI tools provide more friendly user interfaces (UI) than what is typically developed in-house, they often disappoint when it comes to specific research use cases. As Guglielmo Iozzia (MSD) discussed, there are a number of issues with most commercial LLMs including a lack of transparency (closed source models, proprietary third party services, private training datasets), lack of explainability in their results and worry of IP leakage. Furthermore, general use LLMs often insufficiently tackle tasks that require deep domain knowledge - as Kelly Zalocusky (Recursion) stated, “we don’t want AI to generalise its response” when detail and nuance are so crucial in the drug discovery domain.

How does developing bespoke in-house solutions compare to commercial AI models, according to pharma experts? In-house options require significant technical and scientific know-how as well as the right infrastructure to ensure they not only function correctly, but are accessible across multiple disciplines (Guglielmo Iozzia, MSD). There is also a significant maintenance cost to factor in.

The need to drive value and actionability

Victor Neduva (Roche) discussed the “need to get the basics right” in taking large, multi-modal datasets and making them mineable across multiple (physical) sites, including; having high performance infrastructure, harmonised data access points, and analytics and visualisation tools to cater to both experts and non-experts. Crucially, what many applications fail to deliver on is the ability to extract genuinely actionable knowledge, with easy access to underlying, up-to-date evidence.

Using AI models that do not fully deliver on creating accurate, actionable insights can have significant impacts on later stages of the drug development pipeline, nicely summarised by Tom Diethe (AstraZeneca): “AI can impact early discovery work but a lot of the pain is in the later stages”. Toby Johnson (GSK) affirmed this sentiment by adding that drug discovery and development is a very long endeavor, and later stages of the pipeline will fail due to lack of attention to quality and detail during earlier stages.

Collaboration is key to optimise the value of AI tools

A key theme throughout many of the talks was the absolute necessity for collaboration at all stages. Different departments within large pharma companies interact with data in different ways. AI-based tools to help find, understand and create actionable insights from data are becoming increasingly prevalent, but to extract true value AI has to be accessible to non-data scientists. Zhihao Ding (Boehringer Ingelheim) mentioned that computer scientists need to mix with other disciplines to combine the machine learning scientists with research scientists; in doing so, we can create a more defined use case that AI models can be developed to solve. We need to keep the “lab in the loop” by defining intermediate proof points, rather than making  predictions and waiting til later stage clinical trials before feeding back to the research team (Krishna Bulusu, AstraZeneca). This has the potential to not only improve the LLM output, but also overall efficiency in the drug development process.

At Biorelate, we help pharma get maximum value from advanced Artificial Intelligence technologies by curating the highest-quality data from unstructured sources (literature, patents, etc.), providing the critical context needed to train AI models effectively. Our causal models embed explainable, mechanistic biology—ensuring AI delivers real impact in drug discovery programs. For example, AstraZeneca recently used Biorelate’s causal knowledge graph to enhance survival predictions and biomarker discovery for non-small cell lung cancer patients. Read more on this recent publication in this blog post or see the full publication here

Start a conversation with us about accelerating your drug discovery programmes with higher quality, more explainable data by contacting us at info@biorelate.com and explore more at www.biorelate.com 

Share this post
Basic Linkedin Icon
Basiic Maill iicon