Skip to navigation Skip to main content Skip to footer

Approved Research

Analyzing unmappable ("camouflaged") genome regions in UK Biobank exome sequencing data using the LPA gene as a model.

Principal Investigator: Dr Sebastian Schoenherr
Approved Research ID: 62905
Approval date: July 27th 2020

Lay summary

The human genome includes still thousands of regions that cannot be properly analysed (called camouflaged or dark regions). The affected genes belong to pathways important to human health, development and reproduction and represent nearly one third of all human protein-coding genes.

In this project, we will analyze in depth such a region, which has also a very large impact on human health, the LPA gene. This gene regulates the Lipoprotein(a) [Lp(a)] concentrations, which are among the strongest genetic risk factors for cardiovascular diseases, myocardial infarction and aortic valve calcification. Moreover, they have been implicated as a risk factor for type 2 diabetes. Lp(a) levels are regulated nearly exclusively genetically by genetic variation in the LPA gene but most of the gene is located in a hitherto not analyzable region named the "KIV-2 repeat". Therefore despite its tremendous importance of LPA for cardiovascular disease and human health little is known about how genetic variation in LPA regulates Lp(a) concentrations in detail. Indeed, about one fifth of the general population (>100 million people in Europe) presents genetically increased Lp(a) levels and thus a better understanding of the mechanisms that regulate Lp(a) can provide a tremendous impact on public health.

In this project we will apply to this gene new bioinformatic and statistical approaches that we have developed for previous smaller studies. We aim at evaluating our approaches to (1) search for novel variants in the LPA gene that influence Lp(a) concentrations, cardiovascular risk and human health, (2) expand our approach also to other clinically camouflaged protein-coding regions and (3) finally provide the scientific community with a computational pipeline to efficiently analyse such complex regions in very large studies like UK Biobank. We expect a time frame of 3 years to provide results to the community.