Copy Number Variation detection from 1000 Genomes project exon capture sequencing data
BMC Bioinformatics 2012, 13:305 doi:10.1186/1471-2105-13-305
Published: 17 November 2012
Published: 17 November 2012
Abstract (provisional)
Background
DNA capture technologies combined with high-throughput sequencing now enable cost-effective,
deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP
discovery and genotyping. However there has been little attention devoted to Copy
Number Variation (CNV) detection from exome capture datasets despite the potentially
high impact of CNVs in exonic regions on protein function.
Results
As members of the 1000 Genomes Project analysis effort, we investigated 697 samples
in which 931 genes were targeted and sampled with 454 or Illumina paired-end sequencing.
We developed a rigorous Bayesian method to detect CNVs in the genes, based on read
depth within target regions. Despite substantial variability in read coverage across
samples and targeted exons, we were able to identify 107 heterozygous deletions in
the dataset. The experimentally determined false discovery rate (FDR) of the cleanest
dataset from the Wellcome Trust Sanger Institute is 12.5%. We were able to substantially
improve the FDR in a subset of gene deletion candidates that were adjacent to another
gene deletion call (17 calls). The estimated sensitivity of our call-set was 45%.
Conclusions
This study demonstrates that exonic sequencing datasets, collected both in population
based and medical sequencing projects, will be a useful substrate for detecting genic
CNV events, particularly deletions. Based on the number of events we found and the
sensitivity of the methods in the present dataset, we estimate on average 16 genic
heterozygous deletions per individual genome. Our power analysis informs ongoing and
future projects about sequencing depth and uniformity of read coverage required for
efficient detection.
No comments:
Post a Comment