NEXT GENERATION GENOTYPING USING RIPTIDE™
Performance specifications when best practices are applied
Keith Brown,3 Azeem Siddique,1,3 Gaia Suckow,1,3 Nils Homer,2 Jay Carey,2 Phillip Ordoukhanian,1,3 Steve Head,1,3 Joseph Pickrell5 , Ryan Kim5 1 The Scripps Research Institute, La Jolla, CA, USA; 2 Fulcrum Genomics, Somerville, MA, USA; 3 iGenomX, Carlsbad, CA; 4 Macrogen, Rockville, MD, USA; 5 Gencove, New York, NY, USA
In this study, 384 genomic DNA samples were prepared as multiplex sequencing libraries using the Riptide™ high throughput library preparation kit and sequenced on four lanes of a Novaseq S4 flow cell with 2x 150 PE reads (96 samples per lane). Samples include 96 Japanese (JPT), 96 Yoruban (YRI) and 96 samples from the Wellderly Study (WELL). The JPT samples were processed in duplicate. Because the WELL dataset consists mostly of individuals of European descent, the JPT and YRI populations were chosen as a “stress test” of the application.
Data from the sequencer was demultiplexed using FGxTools and individual FastQ files uploaded to Gencove for VCF generation. VCF and BAM files were downloaded from the Gencove site to provide the following WGS and Genotyping statistics on 38 million bi-allelic variants per sample. VCF concordance was evaluated using RTG Tools vcfeval.
Note: JPT and YRI samples are included in the reference set used for imputation. WELL sample performance will be presented separately.
The mean read count per sample was 46.5M which resulted in mean coverage per sample of 1.75x. Of the 38M Gencove variant calls, approximately 146K variants are on the Illumina GSA array within the NIST GIAB high confidence regions on genome build GrCH37. Precision: TP/(TP+FP), sensitivity: TP/(TP+FN) and accuracy: f-measure are plotted below for both the meta-data and individual samples within the 288 YRI and JPT populations. Precision rates by allele frequency also shown.
Table 1: Concordance values on approximately 3.5M called variant positions in two replicate samples.
• To achieve the best balance of throughput, cost and data quality, we recommend that 96 samples per Novaseq lane (384 per flow cell) should be sequenced when genotyping human genomes with this approach.
• For SNP discovery, GWAS and applications involving non-human sample types, please contact us to discuss the study design and sequencer requirements for your study.