The what, why and how of LLMs in drug discovery - what top pharma thought leaders have to say

Mira Nair, Head of Marketing

August 16, 2024

•

5 min read

‍

On July 11, 2023, Dr Daniel Jamieson, CEO, Biorelate, chaired a virtual panel with thought leaders Dr Ashar Ahmad, Data Science & Advanced Analytics, Associate Director at Grünenthal, Dr Shameer Khader, Global Head of Data Science, Data Engineering and Computational Biology at Sanofi, Jon Hill, Principal Scientist, Boehringer Ingelheim, and Dr Andrew Davis, Director of Knowledge Management, at Novartis.

‍

To get beyond the universal hype and buzz around ChatGPT, Generative Artificial Intelligence (AI), and Large Language Models (LLMs), ubiquitous across all industries, we brought together biopharma thought leaders to dive into what is hallucination and what is real.

‍

What could the applications of LLMs be for data science and bioinformatics? What change could they really bring? What are they key elements that need to be in place to ensure the transformative potential of LLMs is not a hallucination but something achievable over the next few years?

‍

Read our first blog post summarising webinar highlights here.

‍

‍Watch the whole webinar recording here after you read some of the insightful take-aways below (ironically, we did not use ChatGPT in any part of the curation or writing of this blog post).

1. Humans are the key to unlocking the value of Large Language Models.

We, as human researchers, need to define how we are going to work alongside LLMs to achieve our research goals. That is the crux of getting value out of these technologies.

“The future of generative AI is a 'human in the loop' future. That human in the loop is your scientists, your drug discovery scientists, or your data scientists. That is what's going to drive the value.” - Dr Shameer Khader

‍

“There are a lot of different examples of using generative AI today, whether it is in generative chemistry, generative protein structures, or looking at new directions of new relationships between human diseases, but who is going to validate all of this? How much of this is noise? You need to make sure that you have the capabilities internally within a company to support these large language models.” - Dr Shameer Khader

‍

“As a scientist or as any human being, when we think about stuff, when we reason about it, we are reasoning to some degree with the help of natural language. So, a lot of these ideas that these LLMs are a paradigm shift in fact come from this idea that language is somehow special, human language is somehow special, so if you can build a foundation model on natural language, emergent intelligence can come up just because language is so complex. If you can build certain foundational models around that, it can really be revolutionary.” - Dr Ashar Ahmad

2. The time LLMs will save humans in cleaning and curating the data could be re-directed to richer, more empowered data analysis.

No expert on the panel raised any fears about LLMs or AI taking away jobs from human beings in biopharma. Instead, all the thought leaders expressed excitement as to how new technologies could help them make even better use of their expertise and time.

“I think as a scientist, I can integrate better. I'm not just that guy who's told, 'Here is the data, go and find me the hypothesis.' But rather I'm playing a more interactive role in coming up with that hypothesis.” - Dr Ashar Ahmad

3. Without data quality, all the new tools to curate, analyse, or generate data are useless.

The panelists all agreed that data quality is and will forever be a hygiene factor in whether any data generated by NLP, GenAI, or any other tool is useful or reliable.

“FAIR data is really, really important in the omics space.” - Jon Hill

‍

“If you want to do this GraphML, you need a good knowledge graph for that specific therapeutic area or even a good underlying working ontology for that. This is also where LLMs could play a role in trying to compile this in a way that is less resource intensive but data quality is a fundamental issue in that.” - Dr Ashar Ahmad

4. Stop being distracted by 'shiny' interfaces and get into the meat of the data

Some of the panelists shared their experiences of working with new data tools but warned about the importance of seeing what new data was created by these tools, rather than getting distracted by a fancy interface or visualisation.

“The key thing for a team like mine is when a vendor has something to offer, we can't be drawn in by the wow factor, how nice the UI looks, how intuitive it is, we really have to start digging into the data and seeing, does it give us something extra.” - Dr Andrew Davis

‍
See our upcoming webinars: https://webinars.biorelate.com/webinars‍
‍Follow us on LinkedIn for the latest updates on what is possible for biomedical literature search with the latest LLM, AI and NLP technology: https://www.linkedin.com/company/biorelate-limited/

‍

Share this post

Biorelate News

Latest News

Discover new insights and updates for data science in biopharma

View all

Biomedical Named Entity Recognition: Navigating the complexities of biomedical language, and the promise of Large Language Models (LLMs)

This blog post delves into the specific challenges posed by the complexities of interpreting biomedical language with a focus on Named Entity Recognition (NER), setting the stage for a deeper exploration of how Artificial Intelligence (AI) can help.

Unlocking Insights: The Imperative of Structured Data Sets from Unstructured Sources

Manual curation and traditional NLP have limitations holding back our ability to fully leverage available data for drug development decision-making – this post dives into the promise of LLMs to overcome these limitations.

How to improve AI and Large Language Model outputs to enhance drug discovery, according to large pharma experts

Perspectives from pharma leaders on how we enable AI models to produce high quality outputs to ensure data accessibility and usability.

View all