Single Molecule Assembly
Because of the complexity in the human genome sequence, the current standard practice is to align sequencer reads to a reference human genome. This approach, called re-sequencing, reduces the capacity requirements of both the sequencer and the computer and therefore the cost. However, re-sequencing does not allow for detection of long range information, such as structural variation and phase information. Linked reads have been used for nearly two decades to improve human genome assemblies with short read sequencing. Technologies such as mate-pair library generation, synthetic long reads, chromatin proximity ligation and microdroplet technology have been developed for this purpose. Micro-droplet technology allows for a long distance between linked reads (>100kb) and more linked reads per long DNA molecule (10-40). The problem is that current technologies are limited to use as a scaffold for a standard short read genome assembly. This is due to sequence bias, error propagation and sequencing artifacts. As a result, linked read assembly requires the combination of two data sets from two different libraries and two sequencing runs at nearly 3 times the cost of a standard short read genome assembly. However, when iGenomX master mix is used with high throughput micro-droplet technology for linked read analysis, the result is a high quality genome assembly from a single technology that allows for both variant identification and linked read information.
Table 1: Variant calling of NA12878. GATK best practices used.
Table 2:Phasing statistics on NA12878.
Table 3: Variant calling and phase statistics on additional samples.