Mathematical tools for analysis of genome function, linkage disequilibrium structure and disease gene prediction
- Started
- 28th September 2015
- Investigators
- Mahesan Niranjan, Andrew Collins, Reuben Pengelly
Next-generation sequencing (NGS) of Deoxyribonucleic Acid (DNA) from patients with diseases is revolutionising medical research stimulating rapid transition towards ‘personalised’ treatment. However, progress in both research and clinical settings is hard to achieve because the interpretation of relationships between genetic variation and disease phenotypes is extremely challenging. The reason is that a large number of genomic and functional properties are complex and heterogeneous. To overcome this, simulation is an essential tool to establish the models required for genomic and functional data.
The aim of this research is to build a model that will discriminate genes most likely to contain disease variation from those less likely by identifying the genetic factors that predispose disease. Furthermore, this project involves the genetic recombination, selection and mutation information and topological data in NGS variant filtering by developing machine learning methods, in particular, Bayesian Networks for ranking plausible disease-causing candidates. Successful analyses will be extremely valuable for identifying disease candidate genes in the context of NGS data with implications for ‘personalised’ medicine.
Categories
Life sciences simulation: Bioinformatics, Epidemiology, Epigenetics, NextGen Sequencing
Algorithms and computational methods: Graph Theory, Machine learning, Maximum Likelihood
Software Engineering Tools: Git, RStudio
Programming languages and libraries: AWK, C, IPython/Jupyter Notebook, OpenMP, Pandas, Perl, Python, R, Stata
Computational platforms: NGS
Transdisciplinary tags: NGCM