Computational Modelling Group

Genetic studies to characterise the role of genetic factors in early-onset breast cancer

1st September 2011
Research Team
Rosanna Upstill-Goddard
Andrew Collins

Early-onset breast cancer (in women < 40 years old) is relatively rare but tumours in these women are generally more aggressive and survival rates are poor. Characterising the underlying cause of breast cancer cases is a difficult undertaking, often requiring a range of analysis methods. We have performed various analyses, exploring rare and common variation, as a means to characterise the cause of early-onset breast cancer. Samples were selected from the Prospective study of Outcome in Sporadic versus Hereditary breast cancer (POSH) cohort containing patients with early age of onset. Due to the high computational demands of data processing and analysis the Iridis supercomputer was used in all studies.

Rare variation analysis was carried out through a whole-exome sequencing study of eight patients with particularly young age of onset (< 30 years) and severe disease subtypes. The spectrum of rare variation was unique for each case, demonstrating the distinct genetic makeup of individual breast cancer cases.

Analysis of the role of common variation in breast cancer was assessed using single nucleotide polymorphism (SNP) data. Two studies were carried out; 1) exploration of gene-gene interactions by comparing breast cancer cases to non-disease controls; 2) determination of disease subtype (based on estrogen receptor tumour status) using a machine learning classification technique (a support vector machine). The gene-gene interaction study identified a number of potential interactions between pairs of SNPs in breast cancer susceptibility genes, however, verification within a much larger sample cohort is necessary. The machine learning study found that with only 200 SNPs in ~140 genes it was possible to discriminate between two major estrogen receptor subtypes; over 90% of samples were correctly classified . The gene set was enriched for genes in immune pathways suggesting a role for immune system variation in tumour outcomes.


Life sciences simulation: Bioinformatics, Epidemiology, NextGen Sequencing

Algorithms and computational methods: Machine learning, Support Vector Machine

Computational platforms: Iridis, Windows