Previous PageTable Of ContentsNext Page

Peanut Expressed Sequence Tag (EST) Project and the Marker Development for Cultivated Peanut (Arachis hypogaea)

Baozhu Guo1, Meng Luo1,2, Phat Dang3, G. He4 and Corley C. Holbrook5

1 USDA-ARS, Crop Protection and Management Research Unit, Tifton, GA 31793, USA, Email
University of Georgia Coastal Plain Experiment Station, Tifton, GA 31793, USDA, Email
USDA-ARS, U.S. Horticultural Research Laboratory, Ft. Pierce, FL 34945, USDA, Email
Center for Plant Biotechnology Research, Tuskegee University, Tuskegee, AL 36088, USA, Email
USDA-ARS, Crop Genetics and Breeding Research Unit, Tifton, GA 31793, USDA, Email


An EST (expressed sequence tag) project of peanut, a part of U.S. Peanut Genome Initiative supported by U.S. Industry and Peanut Growers, has been successfully initiated at Crop Protection and Management Research Unit, USDA-ARS, Tifton, GA, to develop genomic resource for peanut research. Given its economic and nutritional importance of peanut, peanut is virtually unexplored at the genomic level because of the peanut genome size (2800 Mb or about the size of the human genome) and the complication. There is wide recognition of the need for peanut EST resource. Our EST project has showed the power of developing genomic resource for peanut research community. Expressed sequence tag (EST) libraries for cultivated peanut (Arachis hypogaea L.) were developed from two cDNA libraries constructed using mRNA prepared from leaves of peanut line C34-24 (resistant to leaf spots and tomato spotted wilt virus) and immature pods of peanut line A13 (tolerant to drought stress and preharvest aflatoxin contamination). Randomly selected cDNA clones were partially sequenced to generate a total of 1825 ESTs, 769 from the C34-24 cDNA library and 1056 from the A13 cDNA library, in which 536 and 769 unique ESTs were identified, respectively. We have released the first batch of 1350 ESTs (Genbank accession number CD037499 to CD038843) to GenBank, and recently 100 (GGC)n simple sequence repeats in peanut (accession number AY526357 to AY526456) have been released to GenBank from sequencing 5000 clones of two SSR enriched genomic libraries. Four hundred unigenes have been selected from these ESTs and arrayed on glass slides for gene expression analysis. The application of genomic tools and information in breeding programs would greatly facilitate the genetic enhancement of cultivated peanut.

Media summary

Peanuts are susceptible to Aspergillus infection and aflatoxin contamination. Understanding of the mechanisms of resistance in peanut will improve the selection of safe peanut germplasm.

Key Words

Arachis hypogaea, cultivated peanut, resistance gene, drought tolerance, EST/functional genomics.


Expressed sequence tag (EST) libraries and databases have proven to be powerful tools for gene discovery, gene mapping and for the analysis of quantitative traits. ESTs are generated by large-scale sequencing of randomly picked clones from cDNA libraries constructed from mRNA isolated at a particular development stage and/or tissue. Peanut (Arachis hypogaea L.) is an important crop internationally for both direct human food and oil production. Industry and consumers have emphasized the food quality of peanut, however, peanuts are susceptible to Aspergillus infection, which can result in aflatoxin contamination (Holbrook et al. 1994). Peanuts also have proteins that result in allergic reactions (Li et al. 2000). Improvement of insect resistance, drought tolerance, oil quality and flavor are also great challenges for breeding programs (Holbrook and Stalker 2003).

Although an abundance of morphological variation within A. hypogaea is known, many agronomical traits are difficult to select using conventional selection techniques. This is in large part because most agronomically important traits in peanut are quantitatively inherited (Wynne and Coffelt 1982), and significant genotype and environmental interactions exist. Molecular genetic research with peanut has mainly focused on molecular marker-assisted selection (Stalker and Mozingo 2001) and genetic transformation (Ozias-Akins and Gill 2001). Molecular markers are useful for crop improvement and studies of crop evolution in many species (Mohan et al. 1997). Unfortunately, little variation has been observed using molecular markers in A. hypogaea (Stalker and Mozingo 2001), resulting in limited application. A reliable regeneration and transformation system in peanut has been established, and several genes for insect and virus resistance have been successfully introduced into peanut (Ozias-Akins and Gill 2001). Although the application of transgenic technologies has enormous potential for enhancing trait improvement, the critical component needed for peanut cultivar development is identification of agronomically useful genes. Functional genomics research may aid in the development of polymorphic molecular markers (He et al. 2003), and provide useable genes for transgenic research.

The National Center for Biotechnology Information’s (GenBank) dbEST database contains (June 25, 2004) 22,165,266 ESTs ( Human ESTs are 5,643,076, with 4,191,008 ESTs for mouse and rat. Among plants, Triticum aestivum (wheat) has the most ESTs deposited with NCBI, at 552,245, and Zea mays (maize) second at 397,515. Glycine max (soybean) has 334,668, Oryza sativa (rice) has 284,006, Arabidopsis thaliana has 258,825, and Medicago truncatula has 187,763. Peanut (Arachis hypogaea) is pathetically low on the list for such an important crop, with only 1,366 ESTs listed, in which we released 1,345 in May 2003. There is great interest in the ESTs from peanut research community, U.S. Industries and USDA. At the Peanut Genomics Research Workshop held in Atlanta, Georgia, March 23, 2004, a Strategic Plan for Peanut Genomics was developed and EST Project was on the top list. The target is to have at least 50,000 ESTs available publicly from in 2006 and 150,000 ESTs in 2008. In the present research, two peanut genotypes were used to construct cDNA libraries and generate ESTs. One genotype (C34-24) was selected for its resistance to leaf spots and TSWV, and the other genotype (A13) was selected for its drought tolerance and reduced preharvest aflatoxin contamination. Simple sequence repeat (SSR) markers have been characterized and microarray chips with 400 unigenes have been produced for gene expression analysis.


cDNA libraries construction and transformation

The peanut lines used in this research were C34-24 (leaves) and A13 (immature pots). A total of 24 genotypes were evaluated for SSR. Total RNA was isolated from leaf and pods using TRIZOL reagent and mRNA was purified from total RNA using PolyATtract mRNA isolation kit. Two cDNA libraries were constructed using mRNA from leaves or immature pods.

Plasmid DNA purification and DNA sequencing

Plasmid DNA isolated from randomly selected colonies by using 96 TurboFilter Miniprep kits and a Qiagen BioRobot 9600. Ninety-six individual plasmids from random selected clones were used as template for PCR amplification of the cloned cDNA by using T3 and T7 universal primers. The resulting amplicons were analyzed by agarose gel electrophoresis to identify the size of inserts. The PCR products were concentrated by ethanol precipitation. Pellets were re-suspended in 15 l of formamide and amplicons were sequenced using BigDye Terminator Cycle Sequencing Kit. Sequencing was performed on ABI Prism 3700 Sequencer.

EST processing and sequence analysis

Sequences were edited by the software SEQUENCHER V4.1.4 and by manual trimming to remove the vector sequence. Sequences <150 nucleotides were removed. The remaining ESTs were compared with the GenBank non-redundant protein database using BLASTX algorithm (Altschul et al. 1990). A match was considered significant when the score was higher than 100(optimized similarity score) with E-value scores ≤ 10-10. Novel ESTs were identified by comparison with sequences in non-redundant nucleotide and EST databases of GenBank using BLASTn algorithm. In addition, the sequences of each contig were aligned by using the fragment assembly program of SEQUENCHER, and consensus sequences were generated with 90% identity over a minimum of 50nucleotides.

ESTs functional analysis, simple sequence repeat (SSR) characterization, and real-time PCR

EST putative functions were classified according to the Munich Information Center for Protein Sequences (MIPS) functional classification system applied to Arabidopsis (Mewes et al. 2002; Schoof et al. 2002). A total of 1345 ESTs from these two cDNA libraries have been submitted to GenBank database under accession numbers: CD037499 to CD038843. These sequences were searched for SSR using BLASTn software to identify di- and tri- nucleotide and some tetranucleotide SSR motifs. SSR motifs, which repeat more than seven repeats in dinucleotide, five in trinucleotide, and four in tetranucleotide were counted. Primer pairs were designed for 44 SSRs based on the number of repeats and the sequences of the flanking region using the Primer3 software (Rozen and Skaletsky 2000) (code available at Twenty four cultivated peanut genotypes were screened for polymorphisms. Four hundred unigenes have been arrayed on glass slides for gene expression analysis. The microarray analysis data are validated by RT-PCR and real-time PCR.


Quality of cDNA libraries and ESTs

Two cDNA libraries were constructed, one from bulked leaf samples of C34-24 and one from bulked immature pod samples of A13. In the leaf cDNA library, the insert size distribution ranged from 200 to 1,500 bp, with an average of 550 bp. In the immature pod cDNA library, the insert size distribution ranged from 400 to 1,500 bp, with an average of 650 bp. Sequence analysis showed 769 high quality ESTs from the leaf library and 1056 from the immature-pod library. Three hundred and ninety-one of the 769 leaf ESTs and 378 of the 1056 immature-pod ESTs were assembled to contigs, resulting 536 and 800 unique ESTs in the two libraries, respectively.

Putative function analysis of ESTs

To identify the putative function of these ESTs, the EST sequences were compared with the sequences in the UniGene (Schuler 1997; Wheeler et al. 2003) database of GenBank using a BLASTx algorithm. A nucleotide homology search indicated that 52.8% of the clones from the leaf library and 78.6% of the clones from the immature pod library matched known function genes. A further search in dbEST database of GenBank based on BLASTn algorithm indicated that 27.3% of 363 clones from the leaf library and 22.1% of 226 clones from the immature pod library with un-known function matched sequences in dbEST database.

Based on the MIPS Functional Catalogue criteria (Mewes et al. 2002; Schoof et al. 2002) and the putative functions, the EST sequences in the two cDNA libraries were further characterized by functional category sorting into 15 categories (Fig. 1). Each putative transcript was assigned to a category in the MIPS Functional Catalogue (Mewes et al. 2002). The metabolism-related genes accounted for 6.6% of the cDNA clones in the leaf library and 6.8% in the immature-pod library. In the protein synthesis category, cDNA clones encoding various ribosomal proteins and other formations from simpler components of a protein, rather than of proteins in general, were common in both libraries (Fig. 1). The cDNA clones related to cellular transport and transport mechanisms were low in both libraries (0.1%).

Figure 1. Distribution of ESTs from leaf and immature cDNA libraries among functional categories. MIPS functional categories: 1-Metabolism; 2-Energy; 3-Cell growth, division and DNA synthesis; 4-Transcription; 5-Protein synthesis; 6-Protein destination; 7-Transport facilitation; 8-Cellular transport and transport mechanisms; 9-Cellular biogenesis; 10-Cellular communication/ signal transduction mechanism; 11-Cell rescue, defense and virulence; 12-Ionic homeostasis; 13-Cellular organization; 14-Development; 15-Unclassified protein.

Genes of higher expression

Redundancy of EST sequences is a reflection of the importance of certain genes in different development stages and environment conditions. Because of the small numbers of ESTs generated from these two libraries, ESTs with redundancy of 4 and above were selected in this analysis. In the leaf library, genes related with photosynthesis were ubiquitous among the redundant genes, and 13.9% of total ESTs were coding for Rubisco (ribulose bisphosphate carboxylase). Redundant ESTs for drought-induced genes were observed in both cDNA libraries. The types of drought-induced genes were different in leaf and immature-pod cDNA libraries. The functions of several redundant ESTs from these two cDNA libraries could not be identified from the public database. Most of these ESTs with unknown function do not have homologous sequences in the GenBank and may represent genes unique to peanut.

Putative adversity tolerance genes

To identify genes that control tolerance to drought and disease resistance traits, the ESTs from these two cDNA libraries were characterized according to their putative functions related to previously reported adversity resistance genes. These genes were sorted into different catalogues and distributed among many defense-related pathways.

EST-derived SSR markers and gene discovery

In the expressed sequences there were only three types of dimeric repeat motifs [(GA)n, (CT)n, and (AT)n]. Trimeric repeat motifs are more frequent than dimeric repeats. Tetrameric and hexameric repeat motifs were also found. EST-derived SSR motifs were identified, and 44 SSRs from the released 1350 ESTs were designed. Nine of the 44 primer pairs resulted in polymorphism among 24 genotypes. The number of alleles at each locus ranged from two to five. The rate of detecting polymorphism among peanut lines is higher using EST-derived SSR markers (over 20%) than SSR derived from genomic sequence of peanut (He, personal communication).

To identify novel genes, 400 unigenes were selected from these two libraries and arrayed on glass slides for gene expression analyses. Some data were validated by RT-PCR and real-time PCR (Fig. 2). The information in genomic research could help geneticists and breeders identify genes expressed in a particular tissue or under a specific condition and could aid in understanding the complexity of gene expression and regulation.

Figure 2. Quantitative expression analysis of genes with putative function of disease resistance and drought tolerance by RT-PCR (A, B, C, D). Total RNA was extracted from peanut C34-24 leaf samples of normal plants (A), drought stressed plants (B), or plants challenged by fungal leaf spots (C), and from peanut A13 immature seeds challenged by fungus Aspergillus flavus (D). Genes examined: 1-Histone H3.3 (positive internal control), 2-Allergen Ara h3/4, 3-10kDa protein, 4-Leucine-rich protein, 5-Bax inhibitor, 6-Heat shock cognate protein, 7-Non-specific lipid transfer protein, 8-drought inducible 22 kDa protein, 9-drought inducible RPR 10, 10-Defensin protein.


Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215, 403-410.

He G, Meng R, Newman M, Gao G, Pittman RN, Prakash CS (2003) Microsatellites as DNA markers in cultivated peanut (Arachis hypogaea L.). BMC Plant Bio. 3, 3.

Holbrook CC, Matheron KE, Wilson DM, Anderson WF, Will ME, Norden AJ (1994) Development of a large-scale field system for screening peanut for resistance to preharvest aflatoxin contamination. Peanut Sci 21, 20-22.

Holbrook CC, Stalker HT (2003) Peanut breeding and genetic resources. Plant Breed Rev. 22, 297-356.

Li XM, Serbrisky D, Lee SY, Huang CK, Bardina L, Schofield BH, Stanley JS, Burks AW, Bannon GA, Sampson HA (2000) A murine model of peanut anaphylaxis: T-and B-cell responses to a major peanut allergen mimic human responses. J Allergy Imm 106, 150-158.

Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res 30, 31-34.

Ozias-Akins P, Gill R (2001) Progress in the development of tissue culture and transformation methods applicable to the production of transgenic peanut. Peanut Sci 28, 123-131.

Schoof H, Zaccaria P, Gundlach H, Lemcke K, Rudd S, Kolesov G, Arnold R, Mewes HW, Mayer KF (2002) MIPS Arabidopsis thaliana Database (MAtDB): an integrated biological knowledge resource based on the first complete plant genome. Nucleic Acids Re. 30, 91-93.

Schuler GD (1997) Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J Mo. Med 75, 694-698.

Stalker HT, Mozingo LG (2001) Molecular markers of Arachis and marker-assisted selection. Peanut Sci 28, 117-123.

Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Tatusova TA, Wagner L (2003) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 31, 28-33.

Wynne JC, Coffelt TA (1982) Genetics of Arachis hypogaea L. In ‘Peanut Science and Technology’. (Eds. HE Pattee, CT Young) pp. 50-94. (Am. Peanut Res. Educ. Soc. Inc, Publishers: Texas)

Previous PageTop Of PageNext Page