Computational Modelling Group

Sample tracking in whole-exome sequencing projects

Homepage
http://genomemedicine.com/content/5/9/89
Started
1st January 2013
Ended
31st December 2013
Research Team
Reuben Pengelly
Investigators
Andrew Collins, Sarah Ennis

Relationship between size of simulated datasets and the occurrence of non-unique profiles for several populations. Datasets were simulated from known information about markers included in the panel.

Next-generation sequencing is revolutionising genetic research, and has now matured to the point where it is entering clinical practice where validity of results is paramount; reporting incorrect conclusions back to a patient's clinical team may misdirect a patient's treatment and cause unnecessary stress dependent on the nature of the error.

Whole-exome sequencing (WES), where a subset of a patient's DNA is investigated is a particularly popular methodology currently. We have designed a panel of intrinsic genetic markers that allows for tracking of samples throughout WES sample processing, which is also applicable to whole-genome sequencing. This panel can be used to identify and prevent errors in the linking of patient and data.

Suitable markers were selected by comprehensive filtering of 28,440,783 available markers known in public databases. A maximally informative panel of 24 was then constructed. Power of this final panel to provide a unique profile for an individual was assessed by simulation of 14 populations, requiring ~ 28 CPU-days in total, facilitated by IRIDIS 3. The ability to uniquely identify > 85,000 individuals in all populations was validated, allowing for a large redundancy in power for routine use.

This work has now been published:

Pengelly et al., 2013. A SNP profiling panel for sample tracking in whole-exome sequencing studies. Genome Medicine 5:89. DOI: 10.1186/gm492

Categories

Life sciences simulation: Bioinformatics, Biomedical, NextGen Sequencing

Socio-technological System simulation: Human population

Visualisation and data handling methods: Data Management

Programming languages and libraries: Perl

Computational platforms: Iridis

Transdisciplinary tags: Quantitative Biology, Scientific Computing