Identifying variants in next generation sequencing data of 61 paediatric Inflammatory Bowel Disease patients

1st October 2012
Research Team
Gaia Andreoletti
Sarah Ennis

Structure of NOD2. The frameshift mutation that showed altered allelic frequency in our small cohort is depicted by an arrow.

Ulcerate colitis, Crohn’s disease and indeterminate colitis are forms of inflammatory bowel disease (IBD) an inflammatory auto-immune disorder of the gastrointestinal tract. The incidence of IBD over the paediatric population in the UK is 5.2 per 100,000 children per year. Next generation sequencing has become the feasible method for studying the missing hereditability not explained by genome wide studies. This study aims to identify the low frequency variants of a panel of 163 genes that predispose IBD in exome data of paediatric patients. 61 patients were selected for exome sequencing for the study. Genomic DNA was extracted using the salting out method and the samples were exome sequenced at the Wellcome Trust Centre in Oxford. Through Iridis, the large amount of data generated by exome sequencing, about 30 gigabases of DNA sequence per exome, were aligned and tested for quality control by using in-house pipeline and customised scripts. Variants different from the reference genome (hg19) were annotated for each individual. Across the panel of genes a total of 681 unique variants were found among 145 genes. We selected 10 genes with high frequencies of variants in our case series. Variants where genotypic classes (common homozygote, heterozygote and rare homozygote) were observed in more than three individuals were tested for a significant different allele frequency distribution compared to our Southampton reference database of alleles (n =97). A subset of these 10 genes show significantly reference allele frequencies distributions differences compared to controls even when test is in relative small case series (n=61). As expected multiple variants in NOD2, a gene known to be associated with Crohn’s disease, show altered allelic frequency in our paediatric patients.


Life sciences simulation: Bioinformatics, Epidemiology, NextGen Sequencing

Programming languages and libraries: Perl, R

Computational platforms: Iridis