Computational Modelling Group

Developing mathematical tools to identify the genes underlying disease

28th September 2015
Research Team
Alejandra Vergara Lope
Andrew Collins

Graph representing the metadata of thousands of archive documents, documenting the social network of hundreds of League of Nations personals. Published in: Grandjean, Martin (2014).

Next-generation sequencing (NGS) of Deoxyribonucleic Acid (DNA) from patients with diseases is revolutionising medical research stimulating rapid transition towards ‘personalised’ treatment. However, progress in both research and clinical settings is hard to achieve because the interpretation of relationships between genetic variation and disease phenotypes is extremely challenging. The reason is that a large number of genomic and functional properties are complex and heterogeneous. To overcome this, simulation is an essential tool to establish the models required for genomic and functional data.

The aim of this research is to build a model that will discriminate genes most likely to contain disease variation from those less likely by identifying the genetic factors that predispose disease. Furthermore, this project involves the genetic recombination, selection and mutation information and topological data in NGS variant filtering by developing machine learning methods, in particular, Bayesian Networks for ranking plausible disease-causing candidates. Successful analyses will be extremely valuable for identifying disease candidate genes in the context of NGS data with implications for ‘personalised’ medicine.


Life sciences simulation: Bioinformatics, Biomathematics, Epidemiology, NextGen Sequencing

Algorithms and computational methods: Graph Theory, Machine learning

Visualisation and data handling methods: Data Management

Simulation software: Gaussian

Programming languages and libraries: AWK, C, IPython/Jupyter Notebook, MPI, Pandas, Perl, R, Stata

Computational platforms: Iridis

Transdisciplinary tags: NGCM