Bioinformatics
Bioinformatics is the storage, manipulation and processing of biological data using computers. This topic includes all modelling of "omics" data - genomics (DNA), proteomics (protein), transcriptomics (mRNA), metabolomics (metabolites) etc. - as well as "classical" sequence analysis (multiple sequence alignment, homology searching, phylogenetic etc.). Bioinformatics projects will usually make use of in vitro and/or in vivo experimental data. For informal bioinformatics discussion/networking at Southampton, please visit the UoS Bioinformatics facebook group.
For queries about this topic, contact Richard Edwards.
View the calendar of events relating to this topic.
Projects
A composite likelihood approach to genome-wide data analyses.
Andrew Collins (Investigator), Jane Gibson, Ioannis Politopoulos
We describe composite likelihood-based analysis of a genome-wide breast cancer case-control sample by determining genome regions of fixed size on a linkage disequilibrium map which delimit comparable levels of linkage disequilibrium. Analysis of findings suggests further validation in more samples from other cohorts as well as the exploitation of novel computationally-intensive methods such as next-generation sequencing.
Antimicrobial Peptide and E. coli Membrane Interactions
Syma Khalid (Investigator), Thomas Piggot, Nils Berglund
Antimicrobial peptides (AMPs) are known to disrupt the membranes of bacterial cells such as E. coli. I work on investigating the nature of these interactions using molecular dynamics (MD) simulations.
Application of RNA-Seq for gene fusion identification in blood cancers
William Tapper (Investigator), Marcin Knut
Gene fusions are often the cause of different blood cancers. As such, accurate identification of them provides information on the underlying cause of a cancer, ensuring appropriate choice of treatment. However, due to shortcomings of the currently applied methods for gene fusion identification, some of them escape undetected. We are employing RNA-Seq, a cutting-edge method for sequencing RNA, the messenger of genetic information, to investigate gene fusions.
Benchmarking methods of identifying intrinsically unstructured proteins for de novo prediction of Short Linear Motifs
Richard Edwards (Investigator), William Anderson
Benchmarking the GOPHER orthologue prediction algorithm.
Richard Edwards, Shaun Maguire
Generation of Orthologous Proteins from High-throughput Evolutionary Relationships (GOPHER) is an orthologue prediction algorithm. This experiment aims to benchmark this algorithm.
Bioinformatic identification and physiological analysis of ethanol-related genes in C. elegans
Richard Edwards, Vincent O'Connor, Lindy Holden-Dye (Investigators), Ben Ient
Investigating the broad molecular, cellular and systems level impacts of acute and chronic ethanol in the nematode, Caenorhabditis elegans, as a model.
Can we calculate the pKa of new drugs, based on their structure alone?
Chris-Kriton Skylaris (Investigator), Chris Pittock, Jacek Dziedzic
The pKa of an active compound in a pharmaceutical drug affects how it is absorbed and distributed around the human body. While there are various computational methods to predict pKa using only molecular structure data, these tend to be specialised to only one class of drug - we aim to generate a more generalised prediction method using quantum mechanics.
Centre for Doctoral Training in Next Generation Computational Modelling
Hans Fangohr, Ian Hawke, Peter Horak (Investigators), Susanne Ufermann Fangohr, Thorsten Wittemeier, Kieran Selvon, Alvaro Perez-Diaz, David Lusher, Ashley Setter, Emanuele Zappia, Hossam Ragheb, Ryan Pepper, Stephen Gow, Jan Kamenik, Paul Chambers, Robert Entwistle, Rory Brown, Joshua Greenhalgh, James Harrison, Jonathon Waters, Ioannis Begleris, Craig Rafter
The £10million Centre for Doctoral Training was launched in November 2013 and is jointly funded by EPSRC, the University of Southampton, and its partners.
The NGCM brings together world-class simulation modelling research activities from across the University of Southampton and hosts a 4-year doctoral training programme that is the first of its kind in the UK.
Characterisation of the Genomic Landscape in Splenic Marginal Zone Lymphoma
Sarah Ennis, Jane Gibson, Jon Strefford (Investigators), Carolina Jaramillo Oquendo, Helen Parker
This project aims to expand the catalogue of mutated genes in splenic marginal zone lymphoma (SMZL) and construct a detailed characterisation of the genetic landscape of this disease. Using next generation sequencing, we aim to identify somatic mutations in over 100 samples, and enrich clinical data with this information to improve patient treatment and prognosis.
Deep Optimisation
Jamie Caldwell
The project will develop the implementation and application of a new optimisation technique. 'Deep optimisation' combines deep learning techniques in neural networks with distributed optimisation methods to create a dynamically re-scalable optimisation process. This project will develop this technique to better-understand its capabilities and limitations and develop GPU implementations. The protein structure prediction problem will be used as the main test application.
Effects of Sample Contamination on Alternate Allele Frequency
Jane Gibson (Investigator), Roshan Sood
Accurate calling of genetic variants is reliant on the purity of samples, contamination will reduce the accuracy of results. Currently there are few programs able to identify contamination in samples, potentially misinforming a researcher or clinician. To better understand the changes caused by sample contamination in
silico simulations were performed where a known percentage of DNA sequence reads from a contaminating
file were added. Understanding the changes will assist the development of a new method and program to
detect sample contamination.
Genetic studies to characterise the role of genetic factors in early-onset breast cancer
Andrew Collins (Investigator), Rosanna Upstill-Goddard
Breast cancer is a highly heterogeneous disease, with many distinct subtypes. In the majority of breast cancer cases the causative genetic component is poorly characterised. This study aims to explore both rare and common mutations in early-onset breast cancer patients and the contribution of such variants to disease using a variety of analytic approaches.
Identification of Gene-Modules Associated to a Predisposition of Post-Traumatic Stress Disorder
Christopher Woelk (Investigator), Michael Breen
The predisposing genetic factors associated to Post-Traumatic Stress Disorder (PTSD) are altogether unknown. Since not all trauma-survivors later develop the PTSD, it has been hypothesized that transcript differentiation prior–to-trauma exposure could be associated to the risk and resilience of PTSD. We apply a systems-level approach to investigate changes in transcript abundance (gene expression profiles) in whole blood of U.S. Marines sampled prior-to-deployment to the battlefield and followed through-out a seven month deployment to obtain disorder related outcomes.
Identification of novel Crustacean Pathogen Receptor Proteins
Richard Edwards, Chris Hauton, Timothy Elliott (Investigators), Oyindamola Lawal, Lloyd Mushambadzi
We are mining EST libraries (sequence fragments of expressed genes) for novel proteins that might play a role in the immune response of crustaceans.
Identification of phage DNA, common insertion sites and their effect on genes within S.pneumoniae
Richard Edwards, Amy Dean
This study seeks to find if there are any common insertion sites across different strains of S.pneumoniae and discover genes that undergo frequent mutation due to phages and if these mutations can be linked to virulence of the strains.
Identifying factors required for DNA methylation using the imprinting control protein ZFP57
Deborah Mackay (Investigator)
Mutation of ZFP57 in humans is associated with widespread loss of DNA methylation at imprinted genes, and clinical features including congenital anomalies and developmental delay (Mackay et al, 08). This indicates that ZFP57 is required for DNA methylation of imprinted genes necessary for normal development.
We propose to identify the DNA sequences targeted by ZFP57, and its protein cofactors. This work will give insight into the biology of imprinting, indicate mechanisms of disease, and identify novel imprinted genes.
Identifying variants in next generation sequencing data of 61 paediatric Inflammatory Bowel Disease patients
Sarah Ennis (Investigator), Gaia Andreoletti
This study aims to characterise the mutations of genes known to predispose Inflammatory bowel disease in 61 paediatric patients using next generation sequencing analysis. Our aim is to identify the relative impact of known genes in individual case presentations of disease and correlate matches with clinical manifestation.
Immunotherapy Research: Modelling MHC Class I Complex Assembly
Timothy Elliott, Jorn Werner (Investigators), Alistair Bailey
This project uses mathematical modelling and simulation to investigate mechanisms by which our cells process and present biological information that is used by our immune system to distinguish between healthy and diseased cells.
Integrated in silico prediction of protein-protein interaction motifs
Richard Edwards (Investigator), Nicolas Palopoli, Kieren Lythgow
Many vital protein-protein interactions are mediated by Short Linear Motifs (SLiMs) which are short proteins typically 5-15 amino acids long containing only a few positions crucial to function. This project integrates a number of leading computational techniques to predict novel SLiMs and add crucial detail to protein-protein interaction networks.
Interactome-wide prediction of short linear protein interaction motifs in humans
Richard Edwards (Investigator)
Short Linear Motifs (SLiMs) are important in many protein-protein interactions. In previous work, we have developed a computational tool, SLiMFinder, which places the interpretation of evidence for motifs within a statistical framework with high specificity, and subsequently enhanced sensitivity through application of conservation-based sequence masking. We are now applying these tools to a comprehensive set of human protein-protein interactions in order to predict novel human SLiMs in silico.
Mass Spec identification of proteins utilising EST libraries
Richard Edwards, Maria Debora Iglesias-Rodriguez (Investigators), Bethan Jones
Expressed Sequence Tag (EST) data presents a particular challenge for the identification of proteins using mass spectrometry (MS): it is often redundant (multiple copies of the same gene), consists primarily of short fragments of coding sequence, contains many sequencing errors and is generally poorly annotated. We are developing computational pipelines to maximise robust protein identifications from EST data despite these challenges.
Mathematical tools for analysis of genome function, linkage disequilibrium structure and disease gene prediction
Mahesan Niranjan, Andrew Collins, Reuben Pengelly (Investigators)
This iPhD project uses a Gaussian Bayesian Networks framework through Machine learning methods to predict which genes are involved in the development of different diseases.
Mathematical tools for analysis of genome function, linkage disequilibrium structure and disease gene prediction
Andrew Collins, Mahesan Niranjan, Reuben Pengelly (Investigators), Alejandra Vergara Lope
This iPhD project uses a Gaussian Bayesian Networks approaches framework through machine learning approach to predict which genes are involved in the development of different diseases.
Mathematical tools for analysis of genome function, linkage disequilibrium structure and disease gene prediction
Mahesan Niranjan, Andrew Collins, Reuben Pengelly (Investigators)
This PhD project uses a Monte Carlo molecular simulation processes approach to predict which genes are involved in the development of different diseases.
Metagenomics: Understanding the impacts of environmental change on soil biodiversity
Richard Edwards, Gail Taylor (Investigators), Joseph Jenkins
Drought is expected to increase in prevalence by 2050. Similarly, the use of biochar (a charcoal based soil amendment) has been suggested as a method to sequester carbon and fertilize soils without need of mineral fertilizers, and its use is increasing. We are using next generation DNA sequencing technology and bioinformatics to determine bacterial genetic diversity from soil samples which have been subject to drought or biochar amendment, to further our understanding of the impacts of environmental change on microbial communities.
Modelling Macro-Nutrient Release & Fate Resulting from Sediment Resuspension in Shelf Seas
Chris Wood
This study involves adapting a previously-published model to take into account the effect resuspension events (both natural and anthropogenic) may have on nutrient dynamics at the sediment-water interface, and hence produce better estimates for the total nutrient budgets for shelf seas.
Multi-scale simulations of bacterial outer-membrane proteins
Syma Khalid (Investigator), Jamie Parkin
Using Iridis to run multiple simulations, I aim to simulate the outer membrane proteins of Pseudomonas aeruginosa, using X-ray crystal structures of proteins only recently resolved by Bert van den Berg, University of Massachusetts. By modelling the proteins in a realistic P. aeruginosa outer membrane, I aim to gain insight into the binding of these proteins to specific substrates and their function.
Multiscale Modelling of Cellular Calcium Signalling
Hans Fangohr, Jonathan Essex (Investigators), Dan Mason
Calcium ions play a vitally important role in signal transduction and are key to many cellular processes including muscle contraction and cell apoptosis (cell death). This importance has made calcium an active area in biomedical science and mathematical modelling.
New Forest Cicada Project
Alexander Rogers, Geoff Merrett (Investigators), Davide Zilli, Oliver Parson
Rediscover the critically endangered New Forest cicada with crowdsourced smartphone biodiversity monitoring techniques.
Predicting Relative Protein Abundance via Sequence-Based Information
Gregory Parkes (Investigator), Mahesan Niranjan
Understanding the complex interactions between transcriptome and proteome is essential in uncovering cellular mechanisms both in health and disease contexts. The limited correlations between corresponding transcript and protein abundance suggest that regulatory processes tightly govern information flow surrounding transcription and translation, and beyond.
In this study we adopt an approach which expands the feature scope that models the human proteome: we develop machine learning models that incorporate sequence-derived features (SDFs), sometimes in conjunction with corresponding mRNA levels. We develop a large resource of sequence-derived features which cover a significant proportion of the H. sapiens proteome, demonstrate which of these features are significant in prediction on multiple cell lines, and suggest insights into which biological processes can be explained using these features.
We reveal that (a) SDFs are significantly better at protein abundance prediction across multiple cell lines both in steady-state and dynamic contexts, (b) that SDFs can cover the domain of translation with relative efficiency but struggle with cell-line specific pathways and (c) provide a resource which can be plugged into many subsequent protein-centric analyses.
Sample tracking in whole-exome sequencing projects
Andrew Collins, Sarah Ennis (Investigators), Reuben Pengelly
Whole-exome sequencing is entering clinical use for genetic investigations, and it is therefore essential that robust quality control is utilised. As such we designed and validated a tool to allow for unambiguous tying of patient data to a patient, to identify, and thus prevent errors such as the switching of samples during processing.
Sensitivity of the critical depth to the choice of particle movement rules in Lagrangian models and the consequences for the predicted timing of the spring bloom
Tom Anderson (Investigator), Melissa Saeland
Individual-based (Lagrangian) models lend themselves to the study of the controls of the spring bloom in the ocean, due to their ability to represent both the turbulence and the phytoplankton motion. Here, we use a Lagrangian phytoplankton model to test some of the most prevalent hypotheses (e.g. critical depth and critical turbulence).
Simulation of biological systems at long length and distance scales
Jonathan Essex (Investigator), Kieran Selvon
This project aims to shed light on cell membrane mechanisms which are difficult to probe experimentally, in particular drug permiation across the cell membrane. If one had a full understanding of the mechanism, drugs could be designed to target particular embedded proteins to improve their efficacy, the viability of nano based medicines and materials could also be assessed, testing for toxicity etc.
Studying microevolution in clinical isolates of Neisseria lactamica
Robert Read (Investigator), Jay Laver, Anish Pandey
We intranasally infected and successfully colonised six volunteers with Neisseria lactamica, a commensal species genetically similar to Neisseria meningitidis. A bioinformatics approach was then used to understand the microevolution of this bacterium and its adaptations to the nasopharynx.
Tag based transcriptome analysis of gene expression in a promising green algae
Richard Edwards (Investigator), Andreas Johansson
We use SuperSAGE in combination with next-generation sequencing to compare differences in gene expression between selected mutants and the wild type of a green algae. The data in the form of millions of 26 bp tags representing short stretches of expressed genes, will be analysed to find patterns of variation in gene expression under different conditions.
The application of next-generation sequencing to unresolved familial disease
Andrew Collins, Sarah Ennis (Investigators), Jane Gibson, Reuben Pengelly
Next-generation sequencing (NGS) allows us to sequence individual patients cost-effectively, allowing us to enter a new era of genomic medicine. The level of genetic detail that we can access through these methods is unprecedented making it suitable for clinical molecular diagnostics.
The tarsal intersegmental reflex control system in the locust hind leg
David Simpson, Philip Newland (Investigators), Alicia Costalago Meruelo
Locomotion is vital for vertebrates and invertebrates to survive. Despite that feet are responsible for stability and agility in most animals, research on feet movements and their reflexes is scarce.
In this thesis, the tarsal reflex responses of locust will be studied and modelled with ANNs to achieve a deeper comprehension of how stability and agility is accomplished.
The choice of ANNs is linked to the applicability of the method into other fields, such as technological designs or medical treatment.
Transgenerational inheritance of allergy in a multi generational cohort
John Holloway (Investigator)
The aim of this project is to determine the vertical transmission of DNA methylation by identification of CpG sites by microarray analysis of 450,000 CpG sites in 252 women of the IoW cohort that are associated with allergic sensitization and testing whether the identified methylation patterns are vertically transmitted to their offspring and whether modifiable environmental conditions during gestation affect DNA methylation.
Uncovering extensive post-translation regulation during human cell cycle progression by integrative multi-’omics analysis
Gregory Parkes (Investigator), Mahesan Niranjan
Analysis of high-throughput multi-’omics interactions across the hierarchy of expression has wide interest in making inferences with regard to biological function and biomarker discovery. Expression levels across different scales are determined by robust synthesis, regulation and degradation processes, and hence transcript (mRNA) measurements made by microarray/RNA-Seq only show modest correlation with corresponding protein levels.
In this work we are interested in quantitative modelling of correlation across such gene products. Building on recent work, we develop computational models spanning transcript, translation and protein levels at different stages of the H. sapiens cell cycle. We enhance this analysis by incorporating 25+ sequence-derived features which are likely determinants of cellular protein concentration and quantitatively select for relevant features, producing a vast dataset with thousands of genes. We reveal insights into the complex interplay between expression levels across time, using machine learning methods to highlight outliers with respect to such models as proteins associated with post-translationally regulated modes of action.
We uncover quantitative separation between modified and degraded proteins that have roles in cell cycle regulation, chromatin remodelling and protein catabolism according to Gene Ontology; and highlight the opportunities for providing biological insights in future model systems.
Water Molecules in Protein Binding Sites
Jonathan Essex (Investigator), Michael Bodnarchuk
Water molecules are commonplace in protein binding sites, although the true location of them can often be hard to predict from crystallographic methods. We are developing tools which enable the location and affinity of water molecules to be found.
Whole exome sequencing identifies novel FLNA mutation in familial Ebstein's anomaly
Jane Gibson, Andrew Collins, Sarah Ennis (Investigators), Gaia Andreoletti
We describe the application of whole-exome sequencing in a family in which eight people in three generations presented Ebstein's anomaly.
People
Professor, Medicine (FM)
Professor, Medicine (FM)
Professor, Medicine (FM)
Professor, Chemistry (FNES)
Professor, Engineering Sciences (FEE)
Professor, Biological Sciences (FNES)
Professor, Medicine (FM)
Professor, Biological Sciences (FNES)
Professor, Electronics and Computer Science (FPAS)
Professor, Medicine (FM)
Professor, Medicine (FM)
Professor, Medicine (FM)
Professor, Biological Sciences (FNES)
Professor, Medicine (FM)
Reader, Optoelectronics Research Centre
Reader, Medicine (FM)
Reader, Biological Sciences (FNES)
Reader, Engineering Sciences (FEE)
Reader, Biological Sciences (FNES)
Reader, Biological Sciences (FNES)
Senior Lecturer, Institute of Sound & Vibration Research (FEE)
Senior Lecturer, Medicine (FM)
Senior Lecturer, Biological Sciences (FNES)
Senior Lecturer, Medicine (FM)
Senior Lecturer, Institute of Sound & Vibration Research (FEE)
Lecturer, Biological Sciences (FNES)
Lecturer, Electronics and Computer Science (FPAS)
Lecturer, Biological Sciences (FNES)
Lecturer, Mathematics (FSHS)
Lecturer, Ocean & Earth Science (FNES)
Lecturer, Electronics and Computer Science (FPAS)
Lecturer, Electronics and Computer Science (FPAS)
Lecturer, Chemistry (FNES)
Principal Research Fellow, National Oceanography Centre (FNES)
Principal Research Fellow, Chemistry (FNES)
Senior Research Fellow, Ocean & Earth Science (FNES)
Senior Research Fellow, Medicine (FM)
Senior Research Fellow, Biological Sciences (FNES)
Research Fellow, Medicine (FM)
Research Fellow, Chemistry (FNES)
Research Fellow, Medicine (FM)
Postgraduate Research Student, Mathematics (FSHS)
Postgraduate Research Student, Medicine (FM)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Chemistry (FNES)
Postgraduate Research Student, Medicine (FM)
Postgraduate Research Student, Medicine (FM)
Postgraduate Research Student, Civil Engineering & the Environment (FEE)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Institute of Sound & Vibration Research (FEE)
Postgraduate Research Student, University of Southampton
Postgraduate Research Student, Biological Sciences (FNES)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Chemistry (FNES)
Postgraduate Research Student, Medicine (FM)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Electronics and Computer Science (FPAS)
Postgraduate Research Student, Medicine (FM)
Postgraduate Research Student, Biological Sciences (FNES)
Postgraduate Research Student, National Oceanography Centre (FNES)
Postgraduate Research Student, National Oceanography Centre (FNES)
Postgraduate Research Student, Medicine (FM)
Postgraduate Research Student, Biological Sciences (FNES)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Medicine (FM)
Postgraduate Research Student, Electronics and Computer Science (FPAS)
Postgraduate Research Student, Chemistry (FNES)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Electronics and Computer Science (FPAS)
Postgraduate Research Student, Chemistry (FNES)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Mathematics (FSHS)
Postgraduate Research Student, Chemistry (FNES)
Postgraduate Research Student, National Oceanography Centre (FNES)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Ocean & Earth Science (FNES)
Postgraduate Research Student, Engineering Sciences (FEE)
Postgraduate Research Student, Electronics and Computer Science (FPAS)
Postgraduate Research Student, Management (FBL)
Undergraduate Research Student, Biological Sciences (FNES)
Undergraduate Research Student, Biological Sciences (FNES)
Technical Staff, iSolutions
Administrative Staff, Research and Innovation Services
Administrative Staff, Civil Engineering & the Environment (FEE)
Enterprise staff, Medicine (FM)
Alumnus, Biological Sciences (FNES)
Alumnus, former UG, Biological Sciences
Alumnus, University of New South Wales, Australia
Alumnus, Biological Sciences (FNES)
Alumnus, University of Southampton
Alumnus, Health Protection Agency
Alumnus, University of Southampton
Alumnus, former UG, Biological Sciences
Alumnus, Biological Sciences (FNES)
Alumnus, Electronics and Computer Science (FPAS)
Alumnus, Chemistry (FNES)
External Member, Korea Institute of Science and Technology
None, None
None, None
None, None
None, None
None, None
None, None
None, None
None, None
None, None