The prominence of quinoa in organic food markets has led to increasing attention by scientists to the crop’s unique nutritional benefits, including an ideal amino acid balance in the seed, lack of gluten, and potentially novel abiotic stress tolerance mechanisms (Vacher, 1998; Maughan et al., 2009a; Morales et al., 2011). Heightened awareness of quinoa’s role in food security issues in Andean South America, specifically as a major protein source for subsistence farmers and as a cash crop for marginal highland soils, has led to an increased interest in quinoa and the establishment of several new breeding programs in Andean South America. In addition to germplasm conservation, principal objectives of these programs include enhancing grain yield, disease resistance, drought tolerance, and heat tolerance during anthesis and modifying saponin content (Ochoa et al., 1999). Breeders in these programs recognize that the development and use of molecular markers are critical to meeting these objectives (A. Gandarillas, personal communication, 2011). Notable is the 2011 declaration of the Food and Agriculture Organization that 2013 is the official International Year of Quinoa with the main objective to “promote the benefits, characteristics and potential use of quinoa in the fight against hunger and malnutrition, as a contribution to a global strategy on food security” (United Nations, 2011).
Genetic markers are essential tools for modern plant breeding research programs (Eathington et al., 2007). They are particularly important for germplasm conservation and core-collection characterization and in breeding applications, such as marker-assisted selection (Ganal et al., 2009). In quinoa, Mason et al. (2005) developed the first set of 208 polymorphic simple sequence repeat (SSR) markers while Jarvis et al. (2008) reported the development of an additional 216 new SSR markers and a linkage genetic map consisting of 275 molecular markers, including 200 SSR markers. Unfortunately the map was incomplete as it consisted of 38 linkage groups (n = 18) covering just over 900 cM.
Single nucleotide polymorphisms (SNPs), defined as single-base changes, are quickly becoming the marker system of choice in plant breeding programs. Single nucleotide polymorphisms are the most abundant type of sequence variation in eukaryotic genomes (Garg et al., 1999; Batley et al., 2003). They can be cost-effectively discovered and genotyped using several next-generation technologies, including bead arrays (Shen et al., 2005), nano-fluidic devices (Wang et al., 2009), and genotyping-by-sequencing (Miller et al., 2007; Maughan et al., 2010; Elshire et al., 2011). Indeed, SNPs have already been used in a wide array of research areas, ranging from genomewide association studies (Aranzana et al., 2005; Huang et al., 2010; Poland et al., 2011) to dense linkage map development (Troggio et al., 2007; Close et al., 2009).
The goals of this project were to (i) identify the first large scale set of SNPs for quinoa, (ii) develop functional SNP assays for quinoa, (iii) evaluate the informativeness (minor allele frequency) of the SNPs on a diversity panel consisting of accessions from the Internal Potato Center (CIP) international quinoa nursery, USDA germplasm collection, and the Brigham Young University Cheonopodium collection, and (iv) construct the first SNP based genetic linkage map for quinoa.
Materials and Methods
Plant Materials and DNA Extraction
Eight quinoa accessions were used in the initial genomic reduction experiment to identify SNPs (Table 1). The accessions represent the broad geographical distribution of quinoa (Andean, Valley, and coastal ecotypes) and include the parents of five quinoa mapping populations (designated as population [Pop] 1, Pop39, Pop40, PopM3, and PopGO; Table 2). Population 1 and Pop39 are the most advanced populations and were used for linkage map development. Both populations segregated for the presence of an easily identifiable dominant maker (stem color) that facilitated the identification of true hybrid F1 plants and were selfed to F2:8 recombinant inbred lines (RILs). The germplasm diversity panel included 113 quinoa accessions representing the USDA quinoa collection, CIP international quinoa nursery, and eight additional accessions representing several related Chenopodium taxa (i.e., C. hircinum Schrad., C. berlandieri, C. watsonii A. Nelson, and C. ficifolium Sm.). Accessions for the diversity panel from external sources were kindly provided by David Brenner (USDA, Iowa State University, Ames, IA), Angel Mujica (Universidad del Altiplano, Puno, Peru), Daniel Bertero (University of Buenos Aires, Argentina), Eulogio de la Cruz (National Institute of Nuclear Research [ININ], Ocoyoacac, Mexico), and Helena Storchova (Institute of Experimental Botany, Prague, Czech Republic). All plants were greenhouse grown at 25°C under broad-spectrum halogen lamps with a 12-h photoperiod in 15-cm pots using Sunshine Mix II (Sun Grow) supplemented with Osmocote fertilizer (Scotts) at Brigham Young University, Provo, UT.
|Name||Species†||Coded name||Passport origin||Source‡|
|PI 614881||C. quinoa||A1||Jujuy, Argentina||USDA-NPGS|
|PI 614883||C. quinoa||A2||Jujuy, Argentina||USDA-NPGS|
|PI 614884||C. quinoa||A3||Jujuy, Argentina||USDA-NPGS|
|PI 587173||C. quinoa||A4||Jujuy, Argentina||USDA-NPGS|
|Jujuy||C. quinoa||A_Jujuy||Jujuy, Argentina||CIP-FAO|
|PI 614902||C. quinoa||B2||Oruro, Bolivia||USDA-NPGS|
|PI 614904||C. quinoa||B3||Oruro, Bolivia||USDA-NPGS|
|PI 614905||C. quinoa||B4||Oruro, Bolivia||USDA-NPGS|
|PI 614907||C. quinoa||B6||Oruro, Bolivia||USDA-NPGS|
|PI 614909||C. quinoa||B8||Oruro, Bolivia||USDA-NPGS|
|PI 614910||C. quinoa||B9||Oruro, Bolivia||USDA-NPGS|
|PI 614911||C. quinoa||B10||Oruro, Bolivia||USDA-NPGS|
|PI 614912||C. quinoa||B11||Oruro, Bolivia||USDA-NPGS|
|PI 614915||C. quinoa||B13||Oruro, Bolivia||USDA-NPGS|
|PI 614916||C. quinoa||B14||Oruro, Bolivia||USDA-NPGS|
|PI 614919||C. quinoa||B15||Oruro, Bolivia||USDA-NPGS|
|PI 614920||C. quinoa||B16||Oruro, Bolivia||USDA-NPGS|
|PI 614927||C. quinoa||B23||La Paz, Bolivia||USDA-NPGS|
|PI 614928||C. quinoa||B24||La Paz, Bolivia||USDA-NPGS|
|PI 614929||C. quinoa||B25||La Paz, Bolivia||USDA-NPGS|
|PI 614931||C. quinoa||B27||Oruro, Bolivia||USDA-NPGS|
|PI 614932||C. quinoa||B28||Oruro, Bolivia||USDA-NPGS|
|PI 614933||C. quinoa||B29||Oruro, Bolivia||USDA-NPGS|
|PI 614934||C. quinoa||B30||Oruro, Bolivia||USDA-NPGS|
|PI 614935||C. quinoa||B31||Oruro, Bolivia||USDA-NPGS|
|PI 614936||C. quinoa||B32||Oruro, Bolivia||USDA-NPGS|
|PI 614937||C. quinoa||B33||Oruro, Bolivia||USDA-NPGS|
|PI 614938||C. quinoa||B34||Oruro, Bolivia||USDA-NPGS|
|PI 478415||C. quinoa||B35||La Paz, Bolivia||USDA-NPGS|
|PI 478418||C. quinoa||B36||Potosi, Bolivia||USDA-NPGS|
|PI 478410||C. quinoa||B37||La Paz, Bolivia||USDA-NPGS|
|PI 478414||C. quinoa||B38||La Paz, Bolivia||USDA-NPGS|
|PI 614002||C. quinoa||B39||Cochabamba, Bolivia||USDA-NPGS|
|Ames 13215||C. quinoa||B40||La Paz, Bolivia||USDA-NPGS|
|PI 478408||C. quinoa||B41||La Paz, Bolivia||USDA-NPGS|
|Ames 13217||C. quinoa||B42||La Paz, Bolivia||USDA-NPGS|
|Ames 13218||C. quinoa||B43||La Paz, Bolivia||USDA-NPGS|
|Ames 13219||C. quinoa||B44||La Paz, Bolivia||USDA-NPGS|
|Jaccha Grano||C. quinoa||B_JacchaGrano||Bolivia||PROINPA|
|Real||C. quinoa||B_Real||Oruro, Bolivia||CIP|
|Ames 22153||C. quinoa||C1||Pichilemu, Chile||USDA-NPGS|
|Ames 22154||C. quinoa||C2||Cajon, Chile||USDA-NPGS|
|Ames 22155||C. quinoa||C3||Pichaman, Chile||USDA-NPGS|
|Ames 22156||C. quinoa||C4||Cajon, Chile||USDA-NPGS|
|Ames 22157||C. quinoa||C5||Lo Valdivia, Chile||USDA-NPGS|
|Ames 22158||C. quinoa||C6||Llico, Chile||USDA-NPGS|
|Ames 22159||C. quinoa||C7||Bucalemu, Chile||USDA-NPGS|
|Ames 22160||C. quinoa||C8||Iloca, Chile||USDA-NPGS|
|Ames 22161||C. quinoa||C9||Llico, Chile||USDA-NPGS|
|PI 614880||C. quinoa||C10||Los Lagos, Chile||USDA-NPGS|
|PI 614880||C. quinoa||C10b||Los Lagos||USDA-NPGS|
|PI 614882||C. quinoa||C11||La Araucania, Chile||USDA-NPGS|
|PI 614885||C. quinoa||C12||Bio-Bio, Chile||USDA-NPGS|
|PI 614886||C. quinoa||C13||Maule, Chile||USDA-NPGS|
|PI 614887||C. quinoa||C14||Bio-Bio, Chile||USDA-NPGS|
|PI 614888||C. quinoa||C15||Bio-Bio, Chile||USDA-NPGS|
|PI 614889||C. quinoa||C16||Bio-Bio, Chile||USDA-NPGS|
|PI 433232||C. quinoa||C17||Groben, Chile||USDA-NPGS|
|PI 584524||C. quinoa||C18||Chillan, Chile||USDA-NPGS|
|RU-2||C. quinoa||CENG_RU-2||England–Chilean origin||CIP|
|C. quinoa||CHOL_NL6||Holland–Chilean origin||CIP|
|PI 596293||C. quinoa||CO1||Colorado, U.S.||USDA-NPGS|
|G-205-95DK||C. quinoa||C_G20595DK||Denmark–Chilean origin||PROINPA|
|Ames 13228||C. quinoa||E1||Otavalo, Equador||USDA-NPGS|
|NSL 86628||C. quinoa||MD1||Maryland, U.S.||USDA-NPGS|
|PI 510532||C. quinoa||P1||Puno, Peru||USDA-NPGS|
|PI 510533||C. quinoa||P2||Puno, Peru||USDA-NPGS|
|PI 510536||C. quinoa||P3||Puno, Peru||USDA-NPGS|
|PI 510537||C. quinoa||P4||Puno, Peru||USDA-NPGS|
|PI 510543||C. quinoa||P5||Puno, Peru||USDA-NPGS|
|PI 510547||C. quinoa||P6||Puno, Peru||USDA-NPGS|
|PI 510551||C. quinoa||P8||Puno, Peru||USDA-NPGS|
|PI 596498||C. quinoa||P9||Cusco, Peru||USDA-NPGS|
|PI 510542||C. quinoa||P10||Puno, Peru||USDA-NPGS|
|PI 510540||C. quinoa||P11||Puno, Peru||USDA-NPGS|
|PI 510550||C. quinoa||P12||Puno, Peru||USDA-NPGS|
|PI 510545||C. quinoa||P13||Puno, Peru||USDA-NPGS|
|PI 510548||C. quinoa||P14||Puno, Peru||USDA-NPGS|
|Ames 26191||C. quinoa||P15||Puno, Peru||USDA-NPGS|
|PI 510546||C. quinoa||P16||Puno, Peru||USDA-NPGS|
|C. quinoa||P_0654||Puno, Peru||PROINPA|
|03-21-072RM||C. quinoa||P_0321072RM||Puno, Peru||CIP|
|03-21-079BB||C. quinoa||P_0321079BB||Puno, Peru||CIP|
|CICA-17||C. quinoa||P_CICA17||Cusco, Peru||CIP|
|E-DK-4||C. quinoa||P_EDK4||Denmark–Peruvian origin||CIP|
|C. quinoa||P_G20595||Peruvian origin||CIP|
|Huariponcho||C. quinoa||P_Huariponcho||Puno, Peru||CIP|
|Illpa||C. quinoa||P_Illpa||Puno, Peru||CIP|
|Kancolla||C. quinoa||P_Kancolla||Puno, Peru||CIP|
|Salcedo||C. quinoa||P_Salcedo||Puno, Peru||CIP|
|NSL 86649||C. quinoa||SC1||South Carolina, U.S.||USDA-NPGS|
|Ames 19047||C. quinoa||TX1||Texas, U.S.||USDA-NPGS|
|NSL 92331||C. quinoa||WA1||Washington, U.S.||USDA-NPGS|
|Ames 29207||C. berlandieri var. macrocalycium||Ames 29207||Maine, U.S.||USDA-NPGS|
|BYU 567||C. berlandieri subsp. nuttalliae||BYU 567||Mexico||ININ-Mexico|
|Ames 29307 BYU 802||C. berlandieri var. boscianum||BYU 802||Louisiana, U.S.||USDA-NPGS|
|BYU 652||C. berlandieri var. zschackei||BYU 652||Utah, U.S.||BYU|
|BYU 943||C. ficifolium||BYU 943||Czech Republic||IEB|
|BYU 1101||C. hircinum||BYU 1101||Argentina||UBA|
|BYU 1102||C. hircinum||BYU 1102||Argentina||UBA|
|BYU 839||C. watsonii||BYU 839||New Mexico (2X)||BYU|
|Population (Pop)†||SNPs‡||Unique contigs‡||SNP base coverage||Minor allele frequency range
||SNP type (%)
Total genomic DNA was extracted from 30 mg of freeze-dried leaf tissue from a single plant (diversity panel and RIL populations) as previously described by Sambrook et al. (1989) with modifications described by Todd and Vodkin (1996). Extracted DNA was quantified using a Nanodrop (ND 1000 Spectrophotometer, NanoDrop Technologies Inc., Montchanin, DE) and diluted to 150 ng μL−1 in deoxyribonuclease (DNase)-free water.
The genomic reduction protocol used was described in detail by Maughan et al. (2009b). In brief, a total of 450 ng of total genomic DNA of each the eight DNA samples were separately double digested for 1 h at 37°C with 3 U of the restriction enzymes EcoRI and BfaI (New England Biolabs) in 1x NEB (New England Biolabs) #4 restriction buffer. The resultant DNA fragments were immediately ligated with 1.5 μM 5´-TEG (tetra-ethyleneglycol) biotinylated and 3´-phosporylated EcoRI adapters and 15 μM 3´-phosphorylated BfaI adapters using 3 U of T4 ligase (New England Biolabs) (Supplemental Table S1) at 16°C for 3 h. The DNA fragments smaller than 100 bp were excluded from the reactions via spin chromatography using Chroma Spin-400 columns (ClonTech Laboratories). Non-biotin labeled DNA fragments (BfaI-BfaI restriction fragments) were removed from the reaction using M-280 streptavidin beads (Invitrogen) according to the manufacture’s specifications. The remaining EcoRI-BfaI and EcoRI-EcoRI fragments, with the biotin label still attached, were resuspended in 100 μL of 10:1 Tris-ethylenediaminetetraacetic acid (EDTA) buffer (10 mM Tris and 1 mM EDTA, pH 8.0).
Eight primer pairs, designed to be complementary to the EcoRI and BfaI adaptor sequences and to carry unique 5´ 10-base multiplex identifier (MID) barcode sequences, were synthesized by Integrated DNA Technologies (Supplemental Table S1). A single MID barcode primer pair was used to amplify 1 μL of the streptavidin-cleaned DNA fragments for each of the eight quinoa SNP discovery samples. Amplification of each sample was performed in 50 μL polymerase chain reactions (PCRs) using 1x Advantage HF 2 PCR Master Mix (ClonTech) and 0.2 μM of the MIDX-EcoRI and MIDX-BfaI primer pairs, with the following thermocycling profile: 95°C for 1 min followed by 22 cycles of 95°C for 15 s, 65°C for 30 s, and 68°C for 2 min. The DNA concentrations for each of the PCRs were determined fluorometrically using Quant-iT picogreen dye (Invitrogen) and were used to construct a 5 μg pool with equimolar concentrations of each of the eight quinoa samples. The pooled sample was electrophoresized on a 1.5% Meaphor agarose gel (Cambrex BioScience) in 0.5x Tris-acetate-EDTA at 40 V for 8 h and visualized with ethidium bromide staining. A gel slice representing DNA fragments ranging from ∼500 to 650 bp was removed and the DNA fragments extracted using a Qiaquick column (Qiagen) and sequenced using standard protocols for 454-pyrosequencing as a service at the Brigham Young University DNA Sequencing Center (Provo, UT) using a Roche-454 GS FLX instrument and Titanium reagents without DNA fragmentation.
Single Nucleotide Polymorphism Discovery
The DNA reads from the454-pyrosequencing runs were bioinformatically trimmed and separated into unique MID barcode pools representing the eight accessions used in the SNP discovery experiment using the process-tagged sequences function in CLCBio Workbench (CLC bio, 2012). For SNP discovery, DNA sequence reads were de novo assembled using the Roche Newbler assembler (version 2.3) (Roche, 2010) with the minimum overlap length set to 50 bp, the minimum overlap identity set to 95%, and the minimum contig length set to ≥200 bp. Putative SNPs within contigs were identified from the exported.ace file using custom Perl scripts (Stajich et al., 2002; Maughan et al., 2009b) when (i) coverage depth at the SNP was ≥6, (ii) the minor allele frequency (MAF) represented at least 20% of the reads, and (iii) 100% of the alleles within a parental pool were identical. All SNPs, including 5´ and 3´ sequence information, have been deposited in the Single Nucleotide Polymorphism database (dbSNP) in GenBank. The SNPs are submitted under the handle MAUGHAN in batch number 2012A (GenBank: ss530859297 to ss530860195; build B138).
Single Nucleotide Polymorphism Assay Development and Genotyping
Putative SNP containing contigs that showed no significant homology to the RepeatMasker (version 3.2.9 Arabidopsis) database or to the Arabidopsis thaliana (L.) Heynh. chloroplast (GenBank accession NC_000932) or mitochondrial (NC_001284) genomes as determined by BLASTn (Altschul et al., 1990) analyses (E-value ≥ 1 × 10−5) were then processed by the primer design software PrimerPicker (KBiosciences, 2009) using default design parameters. Primer sequence information for each of the functional KASPar SNP assays is provided in Supplemental Table S1. Single nucleotide polymorphism genotyping was accomplished via competitive allele-specific PCR using KASPar genotyping chemistry (KBioscience Ltd.) with the Fluidigm (Fluidigm Corp.) nanofluidic 96.96 dynamic array (Wang et al., 2009). For genotyping on the 96.96 dynamic array chip, a 5-μL sample mix, consisting of 2.25 μL genomic DNA (20 ng μL−1), 2.5 μL of 2x KASP reagent Mix (KBioscience Ltd.), and 0.25 μL of 20x GT sample loading reagent (Fluidigm Corp.) was prepared for each DNA sample. Similarly, a 4 μL 10x KASP Assay, containing 0.56 μL of the KASP assay primer mix (allele specific primers 12 μM and common reverse primer 30 μM), 2 μL of 2x Assay Loading Reagent (Fluidigm Corp.), and 1.44 μL DNase-free water was prepared for each SNP assay. The assay mix and sample mix were then loaded onto a 96.96 dynamic array chip, mixed, and thermal cycled using an IFC (integrated fluidic circuit) Controller HX and FC1 thermal cycler (Fluidigm Corp.) according to the manufacture’s protocols. Thermal cycling consisted of an initial thermal mix cycle (70°C for 30 min and 25°C for 10 min) and a hot-start Taq polymerase activation step (94°C for 15 min) followed by a touchdown amplification protocol as follows: 10 cycles of 94°C for 20 sec, 65°C for 1 min (decreasing 0.8°C per cycle), 26 cycles of 94°C for 20 sec, and 57°C for 1 min and then hold at 20°C for 30 sec. End-point fluorescent images of the chip were acquired on an EP-1 imager (Fluidigm Corp.) and the data analyzed with Fluidigm SNP Genotyping Analysis Software (Fluidigm, 2011).
Single Nucleotide Polymorphism Diversity Data Analysis
A matrix of pairwise genetic distances was generated from this data using Nei’s distance method (Nei and Li, 1979). This distance matrix was used for the principal coordinate analysis and to create the NeighborNet visualization performed by the program Splitstree (Bryant and Moulton, 2002). The population genetic software programs STRUCTURE version 2.2.3 (Pritchard et al., 2000) and STRUCTURE HARVESTER (Earl and Vonholdt, 2012) were used to infer the number of distinct groups within our panel. Structure analysis was run five times, with 100,000 generations for each K (with K ranging from 1 to 7), with a burn-in period of 10,000.
Genetic Linkage Map Construction
Marker segregation was analyzed for conformity to Mendelian ratios expected for an RIL population using a chi-squared test. Markers were initially grouped based on independence logarithm of the odds (LOD) scores ≥ 4.5 using the G2 statistic as calculated by JoinMap 4 for recombination frequency (Van Ooijen, 2006). Markers within groups were then ordered using the regression mapping algorithms corrected with the Kosambi mapping function as described by Stam (1993), with the modification that the squares of the LOD scores were used as weights to assign more weight to informative loci. Successive rounds of marker placement were used to add loci to the map. After the addition of each locus, a ripple test was applied to test for goodness-of-fit and assure the optimal map order.
Results and Discussion
Genomic Reduction and Single Nucleotide Polymorphism Discovery
The genomic reduction protocol used here is based on the conservation of restriction sites across individuals followed by fragment selection and next-generation sequencing. Multiplex identifier barcode tags are incorporated into the DNA fragments of the individual DNA samples, which are subsequently used to deconvolute the sequencing pool. Assembled fragments from the pool are then examined for SNP identification between accessions (see Maughan et al. [2009b] for a figure detailing the genomic reduction protocol).
Stevens et al. (2006) estimated the genome size of quinoa to be 967 Mb. Assuming a 35% GC content, we calculated that the biotin-stepavidin bead selection of EcoRI (GAATTC) containing fragments should produce 330,397 fragments of which ∼66,079 (∼ 20%) are within the target size range (500–650 bp). Therefore, the genomic reduction process should reduce the complexity of the DNA by nearly 52-fold, leaving approximately 19 Mb of DNA for sequencing of each DNA sample. A single 454-pyrosequencing run produces on average 1.3 million reads with an average read length of ∼400 bp or slightly more than 500 Mb. Hence, we estimate that a single 454-pyrosequencing run with eight pooled quinoa barcoded accessions would result in ∼14x coverage across the 19 Mb of DNA of each pooled individual. Using 1.5 plates of 454-pyrosequencing we obtained a total of 1,851,738 sequence reads producing a total of 518.3 Mb of sequence with an average read length of 284 bp. After removal of the MID barcode and adaptor sequence, assembly of all reads using Newbler assembler (Roche, 2010) created 22,911 large contigs (>200 bp; 13 Mb) with an average read length of 482 bp with 94% of the bases with quality scores above 40.
Since a major objective of this research was to identify SNPs specific to mapping populations, sequence reads were sorted into eight MID barcode pools corresponding to the eight individuals used in genomic reduction experiment. A total of 1,717,000 (438 Mb; 93%) were unambiguously sorted into eight barcode pools. To enter a pool, an exact match to all 10 bases on the MID barcode was required. Although we attempted to mix the samples in equimolar amounts before sequencing, the number of reads in each pool ranged from a high of 262,459 (15% of the total count) to a low of 183,495 (11%), a slight discrepancy likely due to difficulties associated with fluorometric DNA quantification of the PCR samples before pooling.
After partitioning the reads into individual MID barcode pools, the Newbler assembler (Roche, 2010) was used to remove the MID barcode and adaptor sequences and to create reference contigs specific to each of the five mapping populations (see plant materials; Table 1). CLCBio Workbench version 2.6.0 (CLC bio, 2012) was then used to reference map parental reads for each population to the population specific reference contigs created by Newbler for each population. The average reference contig size across all five assemblies was 357 bp, with an average contig read depth of 18x and an average base coverage of 11x.
Population-specific SNPs were identified from each of the biparental assemblies using a minimum base cutoff coverage threshold (≥6x) and a minor allele frequency threshold (≥20%), further filtered based on a 100% within parent uniformity threshold (all reads derived from a single barcode were required to be genotypically identical). This parameter minimizes erroneous SNPs due to faulty assemblies (co-assembly of homeologous regions) as well as SNPs that were heterozygous in the parental lines. Although quinoa displays disomic inheritance, it is an ancient allotetraploid (Ward, 2000), hence co-assembly of some homeologous regions is possible even with the increased stringency assembly parameters (minimum length overlap = 50 bp and minimum overlap identity = 95%). Therefore, at the minimum read depth of six, two of the sequence reads (20%) would need to be called as the minor allele variant, of which all would have to be derived from the same MID barcode pool.
A total of 14,178 SNPs were identified across the five populations, ranging from a high of 3615 SNPs in 1888 contigs in Pop39 to a low of 2092 SNPs in 995 contigs in PopM3 (Table 2) with an average of 2836 SNPs observed in 1462 contigs across all populations. The number of contigs that contained at least one SNP varied from a high of 1128 (6%) in Pop39 to a low of 554 (3%) in PopM3. Over all populations, 5.1% of all contigs contained at least one SNP, with the largest class of contigs containing a single SNP (59%; Fig. 1a). The average base coverage at a SNP across all populations was 7.8x (Fig. 1b) whereas the average SNP density (SNP per base pair) across the five populations was 1 per 2160 bp. The SNPs can also be described by their allele frequency within the sequencing pool. Here the minimum allele frequency threshold was set to 20%, meaning that at a minimum coverage of 6x, at least two reads from the same parental source must have at least two separate but identical reads in the contig assembly (at the same coverage the opposing parental source would have four identical but separate sequence reads in the assembly). Over all populations, 28% of the SNPs fit within the 20 to 29% range and 37% within the 30 to 39% range with the remaining 35% falling within the 40 to 50% range (Table 2). With regards to SNP type, transition mutations (A/G or C/T) were the most numerous, outnumbering transversions (A/T, C/A, G/C, and G/T) by 1.6x margin, which is in accordance with the observation that transition SNPs are the most frequent SNP type reported in both plant and animal genomes and are thought to result from hypermutability effects of CpG (—C—phosphate—G—) dinucleotide sites and deamination of methylated cytosines (Zhang and Zhao, 2004; Morton et al., 2006). Of the 14,178 SNPs identified, 94% (13,262) were unique to just one population while the remaining 7% (916) were identified in at least one other population. The largest overlap of SNPs was between populations Pop1 and Pop39, where 331 SNPs were shared in common; we note that this was not unexpected as both populations share a common paternal parent (0654).
Single Nucleotide Polymorphism Assay Development
Primer sets for 1248 putative SNPs were designed for competitive allele-specific PCR based on the KASPar genotyping chemistry. The SNPs were selected from those that were putatively polymorphic in Pop1 and/or Pop39 with the intention that these populations would be used for linkage map development. All 1248 SNPs were screened using the Fluidigm 96.96 dynamic array chip. A total of 511 (41%) SNPs produced clearly separated genotypic clusters that could be easily scored with the automated Fluidigm SNP genotyping analysis software (Fluidigm, 2011) (Fig. 2). The percent conversion is significantly lower than that identified by Maughan et al. (2011) who reported a nearly 70% conversion rate for KASPar-based SNP assay development in Amaranthus spp., a diploid species, but only slightly higher than the 36% conversion rate reported for tetraploid cotton (Gossypium hirsutum L.). Byers et al. (2012) speculated that the decrease in conversion rate for cotton was the result of increased complexity associated with polyploidy—a problem associated with the dual amplification of homeologous (orthologs) loci using competitive allele-specific PCR (KASPar chemistry). While polyploidy is likely a major reason for SNP assay development failure, other factors, including paralogy, proximity of repeat elements, poor assemblies, and/or sequencing errors, may also contribute to assay design failure.
Diversity Panel and Phenetic Analysis
The diversity panel consisted of 113 accessions of C. quinoa and eight related Chenopodium taxa (Table 1). Of the 511 polymorphic SNP assays developed, 427 were successfully screened on the full diversity panel (SNP assays with greater than 20% missing data were removed from the analysis). Limiting the analysis to just the 113 quinoa accessions, a total of 854 alleles were identified with an average of 5.4% missing data per accessions for a total of 46,043 scored data points. Across the quinoa accessions, MAF ranged from 0.02 to 0.50 with an average MAF of 0.28 per SNP (Supplemental Table S2). Considering a SNP with a MAF ≥ 0.35 as highly polymorphic and a SNP with MAF ≥ 0.10 as polymorphic, 198 (46%) of the SNP loci were highly polymorphic while 385 (90%) were polymorphic (Table 2). We note that the maximum value for a biallelic SNP is 0.5 (both alleles present at equal frequency).
Phenetic analysis of the 113 quinoa accessions using STRUCTURE analysis (Pritchard et al., 2000) clearly separated the accessions into two distinct subgroups according to Evanno delta K values (Earl and Vonholdt, 2012), which are easily visualized using a NeighborNet tree (Bryant and Moulton, 2002) or a principal coordinate analysis (Fig. 3). The two groups agree well with previous morphological and microsatellite studies, which separated the quinoa accessions into two distinct groups: Chilean coastal and Andean ecotypes (Risi and Galwey, 1989; Christensen et al., 2007). Six accessions were placed intermediate to the coastal and Andean ecotypes, including E-DK-4, G205DK, 321079BB, CO1 (PI 596293), E1 (Ames 13228), and C11 (PI 614882) (Fig. 3b). Since E-DK-4, G205DK, 321079BB, and CO1 are breeding lines, it is not surprising that they are positioned in an admixture position. E1 and C11 are the two accessions with the highest percentage of missing data (26% and 27%) and the lowest nodal bootstrap value (54%) in the dendrogram; therefore, their positioning is potentially artificial.
Quinoa is a member of a complex of interfertile New World species. Included in this complex are several weedy and domesticated species, including North American C. berlandieri and C. berlandieri subsp. nuttalliae (Saff.) H. D. Wilson & Heiser and South American C. hircinum. Targeted DNA sequencing (E.N. Jellen and H. Storchova, personal communication, 2011) has revealed a close relationship between this complex and certain diploids, including North American C. watsonii and the Eurasia native C. ficifolium, which appears sporadically as a weed in eastern North America (Jellen et al., 2011). To assess the potential transferability and utility of the SNPs for germplasm characterization in these related species, we tested the functionality of the SNP assays across a Chenopodium species panel, including two accessions C. hircinum, four accessions of C. berlandieri [subsp. nuttalliae, var. macrocalycium (Aellen) Cronquist, var. boscianum (Moq.) Wahl, and var. zschackei (Murr) Murr], and a single accession of both C. watsonii and C. ficifolium (Table 1). One hundred and forty-four (34%) of the SNPs clearly amplified and separated all eight taxa into distinct genotypic clusters while 318 (74%) were clearly genotyped in six of the eight taxa. Seventy-one (17%) failed to amplify in any of the related taxa. Perhaps not surprisingly, the most successful cross-species amplifications were in the two C. hircinum accessions, where 81% of the SNP assays were successful. Both C. hircinum accessions originate from the lowlands of Northwest Argentina, near the center of domestication of quinoa (near lake Titicaca, Bolivia), are allotetraploid, and are presumed potential progenitors of C. quinoa (Jellen et al., 2011). Indeed, quinoa and C. hircinum are difficult to separate systematically based solely on weedy versus domesticated characteristics due to a close crop–weed sympatric relationship where interspecific hybridization between the species is highly likely (Wilson, 1988). Between the two C. hircinum accessions, five SNP assays were polymorphic. Seventy-nine percent of the SNP assays amplified in the four C. berlandieri accessions. Like C. hircinum, the C. berlandieri complex is classified within subsect. Favosa, and North American C. berlandieri is considered the likely ancestor of the New World allotetraploid complex (Wilson, 1990; Wilson and Manhart, 1993). Both C. hircinum and C. berlandieri are interfertile with quinoa and represent potentially important resources for quinoa germplasm enhancement. Perhaps not unexpectedly the two diploid species, C. watsonii and C. ficifolium, successfully amplified the fewest SNP assays, with Eurasian C. ficifolium successfully functioning with the fewest SNP assays (44%).
Inclusion of the eight related Chenopodium species within the phenetic analysis of the 113 quinoa accessions produced three clearly separated clusters (Fig. 3a), representing the coastal and Andean quinoa ecotypes and a third cluster inclusive of the related Chenopodium species. The inability to clearly separate the related species into individual subgroups was not unexpected and is an example of marker ascertainment bias. We should expect C. hircinum and C. berlandieri to cluster separately and at a significant distance from each other; however, the quinoa-biased SNP assays (biased in that the SNP assays were developed to differentiate quinoa accessions, not C. hircinum or C. berlandieri accessions) are unlikely targeting polymorphic loci in the related Chenopodium species, suggesting that heterologous SNP assays should be used with caution in genus-level phylogenetic studies.
To validate the genotyping process, 13 random individuals from the diversity panel were regenotyped and compared across all 511 SNP loci. From 5357 paired data points, processed independently on separate fluidic chips, 5341 (99.7%) were identical matches while 16 (0.30%) were mismatches.
Linkage Map Construction
Two mapping populations, Pop1 and Pop39, were used for linkage map construction. Since both populations were small (n = 61 and n = 67) but share a common paternal parent (0654) the data from both populations were combined to produce an integrated linkage map (n = 128). All 511 SNP loci were genotyped across the entire integrated mapping population using the KASPar genotyping chemistry on a Fluidigm integrated fluidic circuit (IFC). Of the 511 SNP assays, 469 (92%) produced genotypic clusters that could be easily scored; the remaining 42 assays were lost due to issues associate with loading the IFC or eliminated due to extreme segregation distortion (i.e., the SNP showed a pattern consistent with maternal extranuclear inheritance). Of the 469 assays, all produced clearly separated clusters that could be scored with confidence, with each homozygous cluster containing a parental DNA sample (KU-2, NL-6, or 0654, depending on the population) and were scored in a A:H:B fashion, where 0654 was always assigned the A genotype (Fig. 2). Twenty-six (5.5%) SNPs showed segregation distortion (P < 0.01). Since the map is based on crosses between two contrasting quinoa ecotypes (coastal versus Andean), segregation distortion was not unexpected. We note that the distortion was not nearly as high as has been seen in interspecific crosses, where segregation distortion was reported to reach levels as high as 68.5% (Paterson et al., 1988). Skewed markers mapped to a total of nine different linkage groups (Fig. 4) although a majority (77%) mapped to just three chromosomes on linkage groups 4, 7, and 15. Linkage group 15 contained the largest number of distorted markers (11), all skewed to the coastal parental ecotype. The presence of clusters of markers skewed to a single parental genotype is likely associated with chromosomal regions containing gametophytic and/or zygotic factors or selectable quantitative trait loci (QTL) that likely conferred advantage during the development of the mapping populations (Zamir and Tadmor, 1986; Lu et al., 2002). For example, highland quinoas are commonly known to have sterile pollen at high temperatures (Jellen et al., 2011), so QTL for heat tolerance from the coastal parent would likely be favored as a mapping population is advanced to homozygosity. Significant variation in seedling morphology, growth rates and seed production were observed among the RIL plants.
At a minimum LOD score of 4.5, pairwise linkage analysis grouped 451 (96%) SNP markers into 29 linkage groups. Twenty of the linkage groups were populated with ≥9 SNPs while nine were small linkage groups consisting of five or fewer markers (Fig. 4b). The distribution of the markers within the linkage groups varied from 44 to 2 SNPs per linkage group. The total map of 1404 cM was spanned by the 451 SNP loci with individual linkage groups ranging in size from 1 to 112 cM (Fig. 4; Table S2). The largest interval between two linked markers was 22 cM while the average distance between all loci was 3.3 cM. Most intervals (92%) were less than 10 cM apart. Compared to the two previous attempts at linkage map development in quinoa (Maughan et al., 2004; Jarvis et al., 2008), this map consists of nearly double the number of marker loci, spans a greater genetic distance, and is significantly closer to the predicted total length the quinoa genetic map (1700 cM) (Maughan et al., 2004). Relative to the haploid number of chromosomes (n = 18) in quinoa, the linkage groups identified in this study suggest that additional markers are still needed to coalesce linkage groups and provide complete coverage of the genome. To this end we are developing an additional set of expressed sequence tag-based SNPs and several new and much larger RIL populations. These SNP assays and new RIL populations should provide the basis for the first replicated QTL studies in quinoa.
We report the identification of the first large scale set of putative SNP loci for quinoa as well as the development of >500 functional SNP assays for quinoa. To evaluate the informativeness and utility of the SNP assays, we screened a large diversity panel consisting of the 113 quinoa accessions from the CIP international quinoa nursery and USDA germplasm collection. Moreover we report the first genetic linkage map of quinoa based solely on SNP markers. The functional SNP assays were developed using KBioscience KASPar genotyping chemistry detected using a Fluidigm IFC. The combination of the KASPar chemistry with the nano-fluidic chip technology (9.7 nL reaction volume) not only significantly reduces the marker data point genotyping costs (∼US$0.05) but also significantly increase the speed of genotyping. Indeed, a single Fluidigm 96.96 IFC is capable of producing 9216 PCRs in a single run (∼3 h) with little technical expertise. If a Fluidigm system is unavailable, the KASPar SNP assays can be genotyped using a standard fluorescence resonance energy transfer plate reader, an important consideration considering that most quinoa breeding programs in South America may have limited capital equipment resources. Compared to other marker systems (i.e., amplified fragment length polymorphism or SSRs), the SNP assays presented here are relatively inexpensive and easy to use and should greatly enhance future efforts to initiate marker assisted selection for recalcitrant agronomic traits in quinoa, including resistance to downy mildew (Peronospora farinosa f.sp. chenopodii), the most important disease of quinoa.
Supplemental Information Available
Supplemental material is available at http://www.crops.org/publications/tpg.
Supplemental Table S1. Adapters and primer sequences used in the genomic reduction.
Supplemental Table S2. Primer sequence information for each single nucleotide polymorphism (SNP).