Previous PageTable Of ContentsNext Page

Discovery of SNPs in soybean genotypes mainly used as the parents of mapping populations in the USA and Korea

Kyujung Van1, Eun-Young Hwang2, Moon Young Kim1, Suk-Ha Lee1 and Perry B. Cregan2

1 School of Plant Science, Seoul National University, Seoul, 151-921, Korea, Email sukhalee@snu.ac.kr
2
Soybean Genomics and Improvement Laboratory, USADA-ARS, Beltsville, MD 20705, USA Email creganp@ba.ars.usda.gov

Abstract

Single nucleotide polymorphisms (SNPs) including insertion/deletions (indels) serve as useful and informative genetic markers. High throughput and inexpensive SNP typing systems are another reason why there is much current interest in the development of SNP markers. Fifteen soybean genotypes from Korea and USA were used for surveying sequence variation. After fragments were amplified with primers derived from 110 soybean ESTs from GenBank using genomic DNA as a template, direct fluorescent dideoxynucleotide sequencing data of PCR products were analysed by SeqScape software to ensure accurate SNP discovery. Among 35 ESTs with at least one SNP in the 15 soybean genotypes, SNPs occurred at a frequency of 1 per 2,038 bp in 16,302 bp of coding sequence and 1 per 191 bp in 16,960 bp non-coding regions (5’ UTR, 3’ UTR and introns). This corresponds to a nucleotide diversity (θ) of 0.00017 and 0.00186, respectively. Of the 97 SNPs discovered, 80.4% were present in the six North American soybean mapping parents Archer, PI 209332, Peking, Minsoy, Noir 1 and Evans. Only 66 (68%) of the SNPs were present among the nine Korean cultivated genotypes. The addition of Pureunkong to the North American mapping parents increased the number of SNPs detected to 84 (86.6%) while the addition of Hwaeomputkong, which originated from Japan, increased the number to 92 or 94.8% of the total SNPs present among the 15 genotypes. Thus, Hwaeomputkong and six North American mapping parents provide a diverse set of soybean genotypes that can be successfully used for SNP discovery in coding DNA and closely associated introns and untranslated regions.

Media summary

Seven genetically diverse soybean genotypes maximize the efficiency of SNP DNA marker discovery.

Key Words

Single nucleotide polymorphisms (SNPs), insertion/deletions (indels), soybean, ESTs, nucleotide diversity

Introduction

The development of DNA-based markers is the key for selection and improvement of varieties in crop breeding programs (Gupta et al. 2001). Single nucleotide polymorphisms (SNPs) including insertion/deletions (indels) have recently received a great deal of attention as useful molecular markers in genetic analysis. Much progress has been made in the discovery of sequence diversity in crops. The frequency of SNPs in maize (Zea mays ssp. mays L.) was reported as one SNP every 27.6 bp (Tenaillon et al. 2001). A total of 112 SNPs was found in 38 of 54 loci in barley (Kanazin et al. 2002). Recently, a total of 280 SNPs was discovered among 25 diverse soybean genotypes in more than 76 kbp of sequence of PCR products amplified using primers designed to 116 genes and 27 non-genic regions (Zhu et al. 2003). Additionally, many SNP detection methods have been developed. Since these high throughput and inexpensive markers are highly stable, SNPs are useful for construction of high-density genetic maps as well as for genetic association studies (Picoult-Newberg et al. 1999; Rafalski 2002). Mutations in coding DNA sequence (cSNPs) may change amino acid sequence and affect gene function and could therefore be valuable as markers (Collins et al. 1998; Brookes 1999; Marth et al. 1999; Picoult-Newberg et al. 1999). Expressed sequence tag (EST) data serve as a useful source of DNA sequence from which to initiate SNP discovery. In soybean, the Soybean EST Project obtained more than 300,000 ESTs from 84 libraries (http://www.ncbi.nih.gov/dbEST/) as of February, 2004. This study reports results obtained via the screening of 110 soybean ESTs for the identification of SNPs in nine cultivated soybean genotypes from Korea as well as six parents of North America mapping populations which were previously identified for their sequence diversity (Zhu et al. 2003). Thus, an important objective of this research was the identification of a subset of soybean genotypes that can maximize the discovery of SNPs in coding and non-coding perigenic DNA.

Methods

Plant materials and genomic DNA extractions

A set of fifteen soybean genotypes was used for SNP discovery, nine Korean cultivated genotypes, Sinpaldalkong 2, SS2-2, Danbaegkong, Taekwangkong, Jinpumkong 2, Pureunkong, Daewonkong, Dongsan 163 and Hwaeomputkong as well as six North American lines that are the parents of various mapping populations: Archer, PI 209332, Peking, Minsoy, Noir 1, and Evans. The Korean cultivated genotypes were included in this study because they possess interesting phenotypes and because they are the parents of various recombinant inbred line mapping populations. Zhu et al. (2003) identified the six North American mapping parents as a subset of genotypes for SNP discovery because the sequence analysis of these six genotypes identified 85% of the total and 93% of the common SNPs (frequency > 0.1) found in a group of 25 diverse soybean genotypes. Genomic DNA was isolated from fully expanded leaves of the fifteen homozygous soybean genotypes by the CTAB method (Gelvin et al. 1995).

Designing and testing of PCR primers

Primers were designed with Oligo Lite 6.0 (Molecular Biology Insights, Inc., Cascade, CO, USA) to produce fragments of approximately 500 bp in length using a total of 110 soybean ESTs selected from GenBank (data not shown, supplemental material). Each polymerase chain reaction (PCR) primer set was used to amplify genomic DNA of Sinpaldalkong 2 for testing. PCR was performed in a 50 μl volume with 2 units of Taq DNA polymerase (Vivagen, Korea) following the manufacture’s recommended protocols and cycling conditions. Gel electrophoresis on an ethidium bromide stained 1.0% agarose gel confirmed the presence of products. The primer sets that produced a single amplicon with Sinpaldalkong 2 genomic DNA were selected and used in identical amplification reactions with the other fourteen soybean genotypes using the same conditions described above.

Purification and sequence analysis of PCR products

After PCR amplicons were purified by NucleoSpin Extract (Machery-Nagel, Düren, Germany), one of the primers used in the PCR amplification was used as the primer in the sequencing reaction. Sequence analysis was performed with all fifteen cultivars using BigDye Terminator Cycle Sequencing (Applied Biosystems, Forster City, CA, USA). An ABI3700 sequencer (Applied Biosystems, Foster City, CA, USA) was used for the sequence analysis.

Single nucleotide polymorphism survey

With default conditions for basecaller and ending base, mixed-base settings, clear range methods and filter settings, ABI trace files were aligned and mutations were identified using ABI Prism SeqScape Software version 2.0 (Applied Biosystems, Foster City, CA, USA).

Nucleotide diversity (θ)

Nucleotide diversity (θ) was estimated according to Halushka et al. (1999).

Results

Single nucleotide changes in coding regions can lead to alternation of amino acid sequence or early termination of transcription and can therefore affect gene function (Brookes 1999; Collins et al. 1998; Marth et al. 1999; Picoult-Newberg et al. 1999). These functional SNPs would be valuable markers that may allow association of altered gene function with altered phenotype. Thus, our SNP discovery research in soybean was focused on EST sequences using fifteen different genotypes. All of these genotypes are the parents of soybean recombinant inbred line mapping populations in Korea or the USA. Characterizations of SNPs discovered in all fifteen cultivars are shown in Table 1. The comparison of the nature and frequency of SNPs between the Korean and USA mapping parents in 16,302 bp of coding sequence, 7,372 bp of 5’ UTR, 6,659 bp of intron and 2,929 bp of 3’ UTR is also reported (Table 1). A total of 97 SNPs including 8 indels were discovered in all fifteen genotypes, but the number of SNPs discovered was less if SNPs were surveyed separately in either the Korean or North American mapping parents. SNP frequency was higher in non-coding regions in both the Korean and USA mapping parents. Considering only the Korean lines, one SNP occurred every 3,260 bp in coding sequence and one SNP per 278 bp in non-coding sequence. A total of 66 polymorphisms was discovered and the total frequency of SNPs was one every 504 bp (Table 1). Among the USA soybean mapping parents, frequencies of SNPs in both coding and non-coding regions were greater than in the Korean cultivars. In contrast to the nucleotide diversity (θ) in maize (Tenaillon et al. 2001), the fifteen cultivars showed about a 9-fold lower diversity (θ = 0.00103) in the 33,262 bp of sequenced amplicons. The frequency of single base substitutions or indels was higher in the fifteen cultivars versus within either the Korean or North American sub-populations. This was true in the analysis of both the coding and non-coding sequence. The Korean soybean lines had slightly lower genetic diversity than the six USA mapping parents. In human, about 2/3 of the SNPs are transitions and about 1/3 are transversions (Brookes 1999; Wang et al. 1999). Our study also showed a similar ratio of transitions to transversions. A relatively small number of indels were observed and these were all in non-coding DNA (Table 1). The number of SNPs identified between pairs of Korean vs. USA mapping parents was determined (Table 2). The highest level of polymorphism was observed between Hwaeomputkong and Minsoy, indicating that these two genotypes are the most diverse pair among the set of 15. In contrast, only 21 SNPs were identified between Jinpumkong 2 and Evans. Also, two cultivars displaying a high degree of polymorphism were PI 209332 and Minsoy. Peking, Daewonkong and Danbaegkong showed moderate levels of polymorphism. Since the set of six diverse cultivars, Archer, PI 209332, Peking, Minsoy, Noir 1 and Evans, was specifically identified by Zhu et al. (2003) for maximizing the discovery of sequence variation, each of the Korean lines was added to this set to determine if a significant increase in SNP discovery would result. The number of SNPs that was discovered by adding either Pureunkong or Hwaeomputkong to the set of six USA mapping parents identified by Zhu et al. (2003) is presented in Table 3. Pureunkong brought an increase in total SNPs from 78 to 84 (86.6% of the total SNPs). A much greater increase in SNP discovery was obtained by the addition of Hwaeomputkong. In this case, the number of SNPs discovered increased from 78 to 92 (94.8% of the total SNPs). Thus, the addition of Hwaeomputkong as the seventh genotype would clearly increase the efficiency of SNP discovery in soybean.

Table 1. Summary of SNP analysis.

Parameters

Korea

USA

Korea + USA

Number of ESTs screened

110

110

110

Total length of sequenced amplicons (bp)

33,262

33,262

33,262

Number of SNPs (SNPs and indels)

66

78

97

Number of nucleotide substitutions

62

71

89

Transition/tranversions ratio

2.05

2.25

2.12

Frequency of polymophic sites per bp

1/504

1/426

1/343

Frequency of polymorphic sites per bp (coding)

1/3,268

1/2,038

1/2,038

Frequency of polymorphic sites per bp (non-coding)

1/278

1/242

1/191

Number of indels

4

7

8

Overall indel frequency

1/8,316

1/4,752

1/4,158

Frequency of indels per bp (coding)

-

-

-

Frequency of indels per bp (non-coding)

1/4,240

1/2,423

1/2,120

Mean nucleotide diversity (θ )

0.00070

0.00083

0.00103

Table 2. Number of SNPs between different soybean cultivars.

 

Archer

PI 209332

Peking

Minsoy

Noir 1

Evans

Mean

Sinpaldalkong 2

32

41

32

33

32

29

33.2

SS2-2

32

39

30

32

37

27

32.8

Danbaegkong

35

37

38

33

41

30

35.7

Taekwangkong

37

37

37

43

29

29

35.3

Jinpumkong 2

32

35

35

33

34

21

31.7

Pureunkong

26

40

39

41

28

33

34.5

Daewonkong

32

38

36

45

33

32

36.0

Dongsan 163

33

39

32

34

26

25

31.5

Hwaeomputkong

41

49

48

51

39

46

45.7

Mean

33.3

39.4

36.3

38.3

33.2

30.2

35.2

Table 3. Total SNPs and percentage of total SNPs discovered in 15 soybean genotypes by the analysis of selected subsets of genotypes.

Genotypes included subset

SNP no.

Percent of SNPs

Archer, PI 209332

29

29.9

Archer, PI 209332, Peking

50

51.5

Archer, PI 209332, Peking, Minsoy

62

63.9

Archer, PI 209332, Peking, Minsoy, Noir 1,

76

78.5

Archer, PI 209332, Peking, Minsoy, Noir 1, Evans,

78

80.4

Archer, PI 209332, Peking, Minsoy, Noir 1, Evans, Pureunkong

84

86.6

Archer, PI 209332, Peking, Minsoy, Noir 1, Evans, Hwaeomputkong

92

94.8

All 15 genotypes

97

 

References

Brookes AJ, (1999). The essence of SNPs. Gene 234:177-186.

Collins FS, Brooks LD, and Charkravarti A, (1998). A DNA polymorphism discovery resource for research on human genetic variation. Genome Res 8:1229-1231.

Gelvin SB and Schilperoort RA, (1995). Plant Molecular Biology Manual. Norwell, MA: Kluwer Acedemic Publishers.

Gupta PK, Roy JK, and Prasad M, (2001). Single nucleotide polymorphisms: a new paradigm for molecular marker technology and DNA polymorphism detection with emphasis on their use in plants. Curr Sci 80:524-535.

Halushka MK, Fan JB, Bentley K, Hsie L, Shen N, Weder A, Cooper R, Lipshutz R, and Chakravarti A, (1999). Patterns of single-nuleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat Genet 22:239-247.

Kanazin V, Talbert H, See D, DeCamp P, Nevo E, and Blake T, (2002). Discovery and assay of single-nucleotide polymorphisms in barley (Hordeum vulgare). Plant Mol Biol 48:529-537.

Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok P-Y, and Gish WR, (1999). A general approach to single-nucleotide polymorphism discovery. Nat Genet 23:452-456.

Page RD, (1996). TreeView: an application to display phylogenetic trees on personal computers. Comp Appl Biosci 12:357-358.

Picoult-Newberg L, Ideker TE, Pohl MG, Taylor SL, Donaldson MA, Nickerson DA, and Boyce-Jacino M, (1999). Mining SNPs from EST databases. Genome Res 9:167-174.

Rafalski A, (2002). Applications of single nucleotide polymorphisms in crop genetics. Curr Opin Plant Biol 5:94-100.

Tenaillon MI, Sawkins MC, Long AD, Gaut RL, and Doebley JF, (2001). Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proc Natl Acad Sci USA 98:9161-9166.

Thomason JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG, (1997). The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25:4876-4882.

Wang DG, Fan JB, Siao CJ, Berno A, Young P, Sapolsky R, Ghandour G, Perkins N, Winchester E, Spencer J, Kruglyak L, Stein L, Hsie L, Topaloglou T, Hubbell E, Robinson E, Mittmann M, Morris MS, Shen N, Kilburn D, Rioux J, Nusbaum C, Rozen S, Hudson TJ, Lipshutz R, Chee M, and Lander ES, (1998). A large-scale identification, mapping, and genotyping of single nucleotide polymorphisms in the human genome. Science 280:1077-1085.

Zhu YL, Song QJ, Hyten DL, Van Tassell CP, Matukumalli LK, Grimm DR, Hyatt SM, Fickus EW, Young ND, and Cregan PB, (2003). Single-nucleotide polymorphisms in soybean. Genetics 163:1123-1134.

Previous PageTop Of PageNext Page