Whole Genome Sequencing

Whole genome sequencing data provides the most comprehensive look of an organisms genetic code. The size and complexity of the genome being studied determines the capacity required of both the sequencer and the computer. Small genomes without a lot of repetitive sequence are easier to assemble. Larger, more complex genomes require more capacity. Uniformity and completeness of sequence coverage is therefore extremely important because of the direct affect on cost and data quality.

Small genomes (i.e. bacteria)

E coli strain ATCC 11303 is 4,641,652 bp long and 50.67% GC. Technical replicates with varying input DNA were sequenced using MiSeq 2x75 paired end sequencing.

Table 1: Coverage statistics

Coverage statistics

Percent of genome covered

Figure 1: Percent of genome covered at a fraction of mean coverage (coverage uniformity).

Customer experience:
“Company” is an agricultural genomics company. The team at "Company" applied iGenomX whole genome library construction to 22 different bacterial strains of variable GC composition. 50ng of input DNA was used as starting material. Organisms with extreme GC content were included. The average N50 contig length was 171,578bp.

"No GC bias was observed (better than Nexterra). Coverage and uniformity are similar to Nexterra. The process is very simple and would be easy to automate."

Large Genome Sequencing:

The human genome contains 3.2 billion base pairs. 2ng genomic DNA input of human Corriell sample NA12878 was sequenced on a single NextSeq 500 flow cell with 2x150 paired-end reads. Genotype information was compared to the NIST “Genome in a bottle” high quality variant set. Genotypes for iGenomX data obtained through Freebayes.

Table 2: Coverage Statistics

coverage statistics

Table 3: Variant  Statistics

Variant statistics

Call rate is defined as the percent of known high quality variants in the NA12878 GIAB data set correctly called by iGenomX technology.

1. Ti/Tv Reference: Liu, Qi et al. “Steps to Ensure Accuracy in Genotype and SNP Calling from Illumina Sequencing Data.” BMC Genomics 13.Suppl 8 (2012): S8. PMC. Web. 7 Oct. 2015

Because of the complexity in the human genome sequence, the current standard practice is to align sequencer reads to a reference human genome.  This approach, called re-sequencing, reduces the capacity requirements of both the sequencer and the computer and therefore the cost.  However, re-sequencing does not allow for detection of long range information, such as structural variation and phase information.

Learn how iGenomX technology applies to single molecule assembly.