Skip to navigation Skip to main content Skip to footer

Approved research

Research using UK Biobank Data to identify key parameters that effect disease states using Machine Learning and Network Sensitivity Analysis

Principal Investigator: Dr Mark Lane
Approved Research ID: 51980
Approval date: July 31st 2019

Lay summary

UK Biobank has collected a significant set of healthcare data on approximately 500,000 participants over a period of time. Data collected includes genetic data (that is, DNA from cell samples), accelerometry (that is daily movements and activity monitors), cognitive tests, life-style information, and other data such as mental health, dietary, and occupation. The objective of this research study is to determine the effect of each data set individually and in concert with other data sets as to their ability to predict the presence of certain disease states, such as dementia, heart attacks, stroke, diabetes, asthma, Parkinson's Disease, Chronic Lung Diseases, Depression, Schizophrenia, bi-polar disorder, eye diseases, and fractures. It is hoped that the effect of key parameters and data sets on predicting certain disease states will be identified and shown in relative importance to other parameters and other data sets. Our team as developed an integrated analytics platform that utilizes machine-learning algorithms to leverage various contributions of data, by processing data by itself and together with other data. Analytic platforms typically make trade-offs between adaptability and transparency. That is, the simplest predictive approaches utilize decision trees. Decision trees are easy to understand, but they are not very adaptable. Deep-Learning neural networks can adapt to complex underlying structures in the data, but the details of what is happening or the sensitivity of the resulting solution to particular parameters or data elements is not easily understood. The solution comes from an approach that our team calls Network Sensitivity Analysis (NSA), which creates transparency and traceability for even the most complex neural network topologies. A by-product of this approach is that that sensitivity of any insight to a particular data element can be determined, and in a rigorous economic modeling approach, the value of the insights generated from entire data sets can be properly estimated. The objective is to not only determine to what level and accuracy that certain disease states can be predicted, but also to what extent key parameters and entire data sets are necessary for the produced predictive results