The Genetic Roadmap to Drug Discovery

There is little doubt that we live in an era of life saving medicines. Researchers from around the globe are working diligently to pioneer the next wave of breakthroughs, but the reality is that new treatments may not work for all patients. Variations within an individual’s genome can impact therapeutic response by altering targeted proteins. This impact can be more acute for biologics, which display exquisite target specificity.

But what if there was a way to design more efficacious therapies by knowing in advance what the most prevalent genetic forms of a target protein are within a patient population?

In new research published in Nature Communications, our scientists partnered with Eagle Genomics and EMBL’s European Bioinformatics Institute (EMBL-EBI) to take an important first step in doing just that. By using available data and the latest in genomic computing tools, the team has provided the first population-level, genome-wide study of the distribution of common protein haplotypes that can potentially impact drug binding.

Why do protein haplotypes matter?

It should come as no surprise that each of us have an individual genome comprised of our maternal and paternal allelic sequences, that family resemblance doesn’t just happen by accident. This combination creates what is called a diploid genome, meaning that each gene in our individual sequence has two gene haplotypes. A protein haplotype is a translation of the spliced RNA transcript from these gene haplotypes, and these pairs (or diplotypes) are ultimately responsible for fundamental protein function.

In other words, protein haplotypes are landmarks on a genetic map, signaling to researchers how patient populations may respond to a given treatment.

However, until now, researchers have only had small pieces of that map as most frequency-based protein haplotype selection in drug development has relied on in-house DNA sequencing, with limited data and at extensive cost.

Completing the map

To address the lack of accessible tools and resources for the study of protein haplotypes, MedImmune scientists and our collaborators developed Haplosaurus, a bioinformatics tool for computing protein haplotype sequences from pre-existing genotypes. The tool helps inform drug discovery by combining reference genome sequences, gene models and phased variation data to present a comprehensive overview of predicted protein haplotypes and their frequencies across populations.

The team leveraged internal and external gene targets to validate the benefits of Haplosaurus: one from available literature, C5, and two from MedImmune’s in-house drug discovery pipeline; TLR4 and FPR1. Each example had a different approach to managing haplotype variability; C5 not at all, TLR4 once a development problem arose and FPR1 to avoid any problem from the start; yet in each case, a priori data from Haplosaurus would have been valuable.

What’s more, their collaborators at EMBL-EBI have used the tool to build a database of protein haplotypes from the 1000 Genomes dataset, and have made new Haplosaurus-based views available on the Ensembl website to provide convenient and rapid access to protein haplotypes on a per-gene/isoform basis. Meaning that researchers outside of MedImmune can now design therapeutics while improving their analysis time, data confidence, and relevance to multiple populations.

Within MedImmune, scientists plan to leverage Haplosaurus as part of their standard research process moving forward, and we believe the tool will enable the broader scientific community to better tailor medicines to the people who need them.

For more on this work, check out:

Nature Communications