Targeted DNA Sequencing
The cost per base of DNA sequence has plummeted over the past decade, enabling the proverbial “$1,000” genome for research and discovery. At the same time, the utility of targeted DNA sequencing has also increased in the clinical marketplace. This is especially true when looking at somatic variation in tumors, where heterogeneity requires a greater depth of sequencing for low frequency mutation detection. The preferred method of targeted sequencing in the clinic today uses multiplex PCR or multiplex PCR derivatives. The major problem with this approach is the inability to detect novel, complex variation. Examples include translocations, gene fusions, mobile element rearrangements and length altering insertions and deletions. iGenomX has modified its core workflow to provide an amplicon based targeted sequencing technology that can detect complex variation. The technology uses a 2 primer system for the specificity of PCR, but allows for sequencing of “known to unknown”.
Primer sets are designed on one strand of the DNA double helix spaced every 50-100 bp across a target locus. The process follows the same 3 step work flow: 1. Primer extension and termination 2. Capture and library conversion. 3. Amplification. The workflow is shown below:
Figure 1: iGenomX employs an Assisted De Novo assembly pipeline for data analysis. Synthetic primer sequences are identified in Read 1. Next, duplicate reads are removed. Reads are binned based on the primer set for a contiguous locus, synthetic sequences are trimmed and the reads are de novo assembled.
The example data below illustrates how the TTR gene encodes transthryretin. It is a carrier protein that transports thyroid hormones in plasma and cerebral spinal fluid. The gene is 9,564 bp long. iGenomX primer sets were designed every 300-600bp across the forward strand. Miseq 2 x 75 bp paired end sequencing was performed to an average depth greater than 200x. 97% of the full gene was covered by a sequence depth of 1x or greater. 86% of the gene was covered by more than 10 reads and 82% of the gene was covered by more than 20 reads. Below shows the read level view in IGV.
Figure 2: Reads from 17 targeted primer pairs across 9.6kb. Read1(red) aligns to reverse strand and read2(blue) aligns to forward strand.
Figure 3: Local view of a single target locus shows “known to unknown” sequence coverage. Complex variation downstream of primer pair can be de novo assembled.