Computational Modelling Group

Characterisation of the Genomic Landscape in Splenic Marginal Zone Lymphoma

Started
1st October 2017
Research Team
Carolina Jaramillo Oquendo, Helen Parker
Investigators
Sarah Ennis, Jane Gibson, Jon Strefford

Wordcloud of all genes present in database. The size of each word is proportional to the number of mutations in each gene. This figure highlights the most recurrently mutated genes such as NOTCH2, KLF2, TNFAIP3 and TP53.

The development and application of massive parallel sequencing have allowed large amounts of genomic information to be generated on patients with cancer. Splenic Marginal Zone Lymphoma (SMZL) remains less studied than other more common blood cancers and a heterogeneous diagnosis combined with a superficial understanding of the molecular pathogenesis make it difficult to establish appropriate treatment and prognosis. Using next generation sequencing, this project aims to expand the catalogue of mutated genes in SMZL in parallel to the optimisation of a bioinformatics pipeline to be able to identify variants without having paired (normal-tumour) samples.

Our biggest challenge is the lack of paired (control) samples that are crucial in removing the noise from the sequencing data, therefore we will use different computational tools to try and eliminate any false positive that may be called. We will also create, annotate and filter a list of previously identified SMZL variants to establish a high quality up-to-date SMZL database which will serve as as a reference point.

Using targeted high-throughput sequencing, tumour samples from different centres across Europe will be sequenced using the HaloPlex HS Target Enrichment System (Agilent) consisting of 55 target genes relevant in B-cell malignancies. The raw data will be run through an in-house pipeline in iridis4 where all samples will be aligned to the latest reference genome. The variants in each sample will then be called using various tools (Varscan, GATK, Pisces, MuTect2) and then annotated with predictive scores and known database information. Once a list of mutations is established, these will be linked to clinical data with the aim of improving diagnosis and developing staging systems and development of targeted drugs to fight this disease.

Categories

Life sciences simulation: Bioinformatics, NextGen Sequencing

Visualisation and data handling methods: Database

Software Engineering Tools: RStudio

Programming languages and libraries: AWK, Python, R

Computational platforms: Iridis, Linux

Transdisciplinary tags: Quantitative Biology