Figure 1.

A comparison of actual sequencing capacity (orange) to what would be expected if sequencing technology was following Moore’s Law (blue). The significant decrease in 2007 coincides roughly with the introduction of next-generation sequencing technology. Data is from the National Human Genome Research Institute (Wetterstrand, 2012).

 


Figure 2.

Schematic overview of steps in genotyping-by-sequencing (GBS) library construction, sequencing, and analysis. (1) Genomic DNA is quantified using fluorescence-based method. (2) Genomic DNA (gDNA) is normalized in a new plate. Normalization is needed to ensure equal representation of all samples and equal molarity of gDNA and adapters. (3) A master mix with restriction enzyme(s) and buffer is added to the plate and incubated. (4) The DNA barcoded adapters are added along with ligase and ligation buffers. (5) Samples are pooled and cleaned. (6) The GBS library is polymerase chain reaction (PCR) amplified. (7) The amplified library is cleaned and evaluated on a capillary sizing system. (8) Libraries are sequenced. Data analysis: Following a sequencing run, FASTQ files containing raw data from the run are used to parse sequencing reads to samples using the DNA barcode sequence. Once assigned to individual samples, the reads are aligned to a reference genome. In the case of species without a complete reference genomic sequence, reads are internally aligned (alignment of all sequence reads will all other reads from that library) and single nucleotide polymorphisms (SNPs) identified from 1 or 2 bp sequence mismatch. Various filtering algorithms can then be used to distinguish true biallelic SNPs from sequencing errors.

 


Figure 3.

Integration of genotyping-by-sequencing (GBS) in the context of plant breeding and genomics for a species without a completed reference genome.

 


Figure 4.

Removal of missing data in genotyping-by-sequencing by increasing coverage of the library via resequencing. In a set of international wheat breeding germplasm, several lines (samples) were replicated across two or more libraries. Replicating a sample two times increased the coverage of single nucleotide polymorphisms (SNPs) to 60% while five replications increase the coverage to over 90%. While very effective as a means to remove missing data, replicated sequencing increases the per-sample cost. The average per-sample cost is $15. In this situation for wheat, the number of replications is roughly equivalent to the sequencing coverage of the library (i.e., 5 replications give approximately 5x coverage). Data from J. Poland (unpublished data, 2012).