Previous PageTable Of Contents

Can genomics revolutionise genetics and breeding in sugarcane?

John Manners1,4, Lynne McIntyre1,4, Rosanne Casu1,4, Giovanni Cordeiro2,4, Mark Jackson1,4, Karen Aitken1,4, Phillip Jackson3,4, Graham Bonnett1,4, Slade Lee2,4 and Robert Henry2,4

1 CSIRO Plant Industry, Queensland Bioscience Precinct, 306 Carmody Road, St. Lucia, Brisbane, QLD Australia 4067 Email john.manners@csiro.au
2
Centre for Plant Conservation Genetics, Southern Cross University, Lismore, New South Wales, Australia 2480 Email rhenry@scu.edu.au
3
CSIRO Plant Industry, Davies Laboratory, University Drive, Townsville, QLD Australia 4814
4
Cooperative Research Centre for Sugar Industry Innovation through Biotechnology, http://www.crcsugar.com/

Abstract

Sugarcane has the most complex genome of any crop plant. Commercial sugarcane cultivars are the result of a limited series of crosses and backcrosses derived from two Saccharum species and are poly-aneuploid hybrids with chromosome numbers in excess of 100. Almost all traits are quantitatively inherited and genetic mapping is mainly restricted to dominant single dose DNA markers. In 2003, enormous amounts of DNA sequence information became available via the release of 255,000 expressed sequence tags for sugarcane. It is now possible to identify candidate gene sequences that may underpin important traits in sugarcane and to characterise single nucleotide polymorphisms (SNPs) in these genes. Combinations of SNPs in a gene sequence act as signatures for individual gene haplotypes that may be considered as allele equivalents in the sugarcane genome. Techniques for reliably measuring the dosage of SNPs in the sugarcane genome have emerged and can be used to provide information that enables the gene haplotype (allele) content in a sugarcane genotype to be deduced. This paper describes how these genomics-based tools provide new strategies for genetic analysis and plant improvement in sugarcane.

Media summary

Sugarcane has the most complex genome of any crop plant but access to massive quantities of sugarcane gene sequences now makes possible a revolution in sugarcane genetics and breeding.

Key Words

Saccharum, sugarcane, genome, breeding, single nucleotide polymorphism, expressed sequence tag

Introduction

Sugarcane is one of the most important field crops grown in the tropics and sub-tropics. Commercial sugarcane plants are the result of a limited series of crosses and backcrosses derived from the domesticated species Saccharum officinarum L. (2n = 80) and the wild species S. spontaneum (2n = 40 – 120). As a result of this process, commercial sugarcane plants are inter-specific poly-aneuploid hybrids with chromosome numbers usually in excess of 100 (see Fig. 1). Breeding of improved cultivars of sugarcane is difficult because of the complexity of the sugarcane genome. Consequently, most traits in sugarcane are multigenic and/or multi-allelic, and are quantitatively inherited. This paper describes a new strategy for the genetic analysis of genes and traits in sugarcane based on emerging genomics resources. The large scale availability of sugarcane gene sequence information and the development of technologies to measure sequence variants of genes via single nucleotide polymorphisms (SNPs) potentially provide key technologies that now permit a systematic dissection of the allelic composition and genetic complexity in sugarcane. This will lead to improved markers and selection strategies for important traits in sugarcane breeding.

Limitations of current DNA marker systems in sugarcane

In recent years, considerable progress has been made in the mapping of the sugarcane genome and its progenitors (Aitken et al., 2004). The original genetic maps for sugarcane were based on restriction fragment length polymorphisms (RFLPs) while the most recently developed genetic maps have used markers such as simple sequence repeats (SSRs) and amplified fragment length polymorphisms (AFLPs) that can be analysed with higher throughput. These recent sugarcane maps contain >1000 markers and are large when compared to those of most other crop species but, because of the genomic complexity of sugarcane, these maps are still incomplete. Several researchers have used molecular mapping of sugarcane in conjunction with phenotypic data to localise quantitative trait loci (QTLs). In sugarcane, QTLs usually explain only a small proportion of the variation for the trait, typically less than 10%. It is likely that this reflects the genetic redundancy in sugarcane where any locus and its allelic complement is represented multiple times because of the presence of many hom(oe)ologous chromosomes in the genome.

Figure 1. Diagrammatic representation of the genome of modern sugarcane. The rows represent hom(oe)ologous chromosomes. Light grey indicates Saccharum officinarum chromosomes, darker grey S. spontaneum chromosomes and where both shades of grey are present, recombinant chromosomes. Note that two pairs of S. officinarum chromosomes correspond to two S. spontaneum chromosomes (lowest rows). The arrow indicates the position of one locus with three alleles of a gene. One allele is represented once (white), one allele three times (black) and the remaining allele nine times in the genome. A single dose DNA marker is equivalent to the white allele and is uninformative of the dosage and nature of all other alleles. Adapted from Grivet and Arruda (2002)

The limitation of restricting genetic analysis primarily to dominant single dose DNA markers (usually segregating in a 1:1 in F1 progeny) for QTL mapping in sugarcane has frustrated attempts to fully dissect the contribution of all hom(oe)ologous loci to variation in a trait. For example, if one identifies single dose markers that define a proportion of variation in a trait around one locus on one chromosome it is usually not possible to explore associations of the trait with all the other equivalent loci present on other hom(oe)ologous chromosomes because single dose marker polymorphisms are not identified for all of these loci (see Figs.1 & 2). A typical experimental example of an SSR marker in the F1 progeny of a sugarcane cross is shown in Fig. 2 and a segregating single dose marker is arrowed. Segregation patterns for other bands shown cannot be interpreted unless population sizes are very large.

Figure 2. Example of SSR marker analysis in progeny of a cross between two sugarcane genotypes. Single dose marker segregating in an expected 1:1 ratio is indicated by an arrow. Note that the segregation of other alleles (other sized bands) detected by these markers cannot be interpreted.

Given the highly polyploid nature of sugarcane, allele dosage and allele composition in any given genotype, are likely to be important in determining phenotypic performance. Methods that permit the identification of individual alleles across all hom(oe)ologous chromosomes would revolutionise genetic studies in sugarcane.

Recent advances in gene sequence information in sugarcane

In 2003, a large amount of DNA sequence information for sugarcane was released into the public domain as expressed sequence tags (ESTs) derived from many cDNA libraries (Casu et al. 2004a & b). The majority of these ESTs originated from a genomics program in Brazil with programs in Australia and the USA being the next largest contributors. At the time of writing this article (May 2004) there were 255,000 ESTs from sugarcane in the database, a sum only exceeded for plants by wheat, maize, barley, soybean and rice. Importantly, a high diversity of tissue sources and >30 independent cDNA libraries contributed to the sugarcane ESTs ensuring highly representative coverage of sugarcane gene sequences in the public database.

The large collection of sugarcane ESTs makes it relatively easy to identify sequences of candidate genes that may determine important traits by homologies to functionally characterised genes from other plants. For example, many genes potentially involved in sucrose biosynthesis, catalysis and transport can be identified in the sugarcane EST collection. In addition, cDNA microarrays for sugarcane have been made to study gene expression on a large scale (Casu et al. 2004 a & b). In these experiments, genes with expression profiles that correlate with the onset of sucrose accumulation in developing stems (Casu et al.2004a) and genes with high transcript expression in progeny with high sucrose content in a segregating population have been identified (Casu et al. 2004b). It would be particularly interesting to test whether particular alleles of genes showing these expression patterns are linked to QTLs for sucrose accumulation traits in sugarcane.

SNP discovery and analysis in sugarcane

SNPs represent naturally occurring point mutations in gene sequences. Two avenues are now available for the dissection of variation in gene sequences in sugarcane for SNP discovery. Firstly, the presence of many overlapping ESTs for a particular gene sequence in databases enables the identification of variants and SNPs in the sequence. Research in our laboratories has confirmed that SNPs are readily found in overlapping ESTs even when they originate from a single genotype. This is not surprising, as it is well known that sugarcane is highly heterozygous. Care must be made in assigning SNPs from EST information as they are each based only on a single DNA sequence assay and errors may be present. To avoid artefacts SNPs are usually only assigned when they are observed on multiple ESTs. Individual SNPs identified from sequence databases can be used to test for associations between the absence, presence or dosage of the SNP and a trait in sugarcane populations where there is segregation for the trait.

It is unlikely, however, that only one SNP position will define all alleles of a gene that are present in a genotype. Consequently, comprehensive analysis of a gene will probably require the definition of multiple SNPs in any particular gene. These combinations of SNPs provide a signature that defines what we term ‘gene haplotypes’ and may be considered the equivalent of alleles. To define ‘gene haplotypes’ we have used a second approach to obtain DNA sequence information. In this method, alignments of target gene sequences from sugarcane and other plant homologues are used to identify conserved regions of sequence to which oligonucleotides are designed. When used in the polymerase chain reaction (PCR) these primers amplify intervening sequences from genomic DNA and presumably include copies of all gene haplotypes. Cloned copies of the amplified fragments are sequenced and SNPs identified via sequence alignment; the process can be undertaken to exhaustion until all variants have been identified on multiple occasions. A selection of SNPs can then be identified that will distinguish all the gene haplotypes (alleles) in that gene region in any genotype. If one is investigating a segregating population then one can apply this approach to both parents in the cross to define all the gene haplotypes that may segregate in the progeny. In a recent study of one isogene of sucrose phosphate synthase we have identified 38 gene haplotypes in the two parental genotypes by this method, as defined by the presence of a combination of 24 SNPs (see examples in Table 1). Using SNP combinations to define gene haplotypes as markers for alleles may be complicated by the presence of gene families where very closely related gene sequences may be present in the genome. Some indication as to whether ancestral gene duplications have occurred can be obtained by inspection of the gene complement of other related plant that are genetically less complex than sugarcane, such as rice, sorghum and maize.

SPS Isogene Sequence

Frequency P1

Frequency P2

TAGCGTATTGACTTGCAGCT

29.7%

67.2%

TAGAGTATTGACTTGCAGCT

14.6%

0

TAGCGTATTGACTTGCTGCT

31.4%

0

TAGCGTATTGACTTGCAGAT

24.3%

32.8%

Table 1. Four of the 24 SNPs identified in one isogene of sucrose phosphate synthase. These SNPs (bolded nucleotide) vary in frequency in the two parents (P1 & P2) of a sugarcane mapping population.

In human genetics and medical research, SNPs are now the preferred genetic marker and many techniques have been developed to measure SNPs. Usually PCR is used to amplify the region surrounding the SNP from genomic DNA and the base composition at the SNP position assessed using methods as diverse as electrophoretic fragment separation, specific single nucleotide addition, hybridisation to oligonucleotide arrays, quantitative real-time PCR and simple DNA sequencing. In diploid organisms the requirement for quantitative assessment is not stringent as one usually only needs to distinguish two alleles in homozygotes and heterozygotes. In a polyploid such as sugarcane, more precision is required as there may be as many as 13-14 copies of any one gene, even when it is a single copy gene in a basal haploid genome. We have successfully tested the Pyrosequencing method (Ronaghi et al. 1998) to quantitatively measure base ratios at SNP positions in sugarcane (G. Cordeiro, unpublished). This method provides a highly reproducible measurement of base composition at SNPs in sugarcane. The % base composition of SNPs in 5 replicate assays of a single genomic DNA preparation varied by 0 - 2.3%. Importantly variation of only 0.2 – 3% was obtained when using 6 independent genomic DNA preparations (DNA prepared in different laboratories on different occasions using separately grown plant material) from a single genotype. This means that one can confidently assay a SNP in an allele of a gene even if the allele is present only on one hom(oe)ologous chromosome in sugarcane. For example in Fig. 1, a base constituting the single point mutation that defines the white allele would be expected to represent only 7-8% of bases present at that position in the gene sequence. This also suggests that the SPS sequence that was detected at a frequency of 14.6% in the parent P1 (Table 1, second row) may be present on 2 hom(oe)ologous chromosomes in P1.

Thus the quantitative measurement of individual SNPs in sugarcane is possible. Initially it can be applied to seek associations between the presence and dosage of individual SNPs and the expression of traits in both structured segregating populations or in unstructured genotype collections. Potentially SNP analysis provides the means to undertake more detailed dissection of allele complexity and inheritance in sugarcane.

The potential of SNPs for high-resolution allele analysis in sugarcane

Variation in a trait in sugarcane may be explained, at least in part, by the absence, presence or dosage of a particular allele. At the DNA sequence level, using combinations of SNPs to identify ‘gene haplotypes’, it should be possible to resolve alleles of a single gene even in a complex polyploid such as sugarcane. For genetic analysis in a highly heterozygous polyploid, a high degree of resolution of allelic variation is necessary, and multiple alleles should be discernable. To achieve this, one needs to be able to deduce the composition of gene haplotypes present from the quantitative assessment of base composition at multiple SNPs. For example, using the locus indicated in the model sugarcane genome in Fig. 1, three alleles are identified (white, black and hatch) present in 1, 3 and 9 copies across the 13 hom(oe)ologous chromosomes depicted in this genome. These alleles are theoretically described by four SNPs in the gene at this locus in Table 2 and the pattern of SNP ratios obtained is diagnostic of the allelic composition in this genotype.

 

SNP1

SNP2

SNP3

SNP4

White Allele

C

A

C

G

Black Allele

T

G

G

G

Hatch Allele

T

G

C

A

SNP ratio

1:12
C:T

1:12
A:G

10:3
C:G

4:9
G:A

Table 2. SNPs at four positions in the DNA sequence of three alleles indicated in the hypothetical genome of Figure 1. SNPs are represented by the base present at the same position in the equivalent DNA strand in each allele. The SNP ratio of each base at these positions is what would be expected theoretically in a SNP analysis of the DNA of the genome depicted in Figure 1.

With some knowledge (or assumptions) on the number of hom(oe)ologous chromosomes, together with the precise measurement of ratios of bases at several SNP positions in any gene, it should therefore be possible to deduce the allelic composition for a gene at a locus in any particular sugarcane genotype. By defining gene haplotypes in parents of crosses it should then be possible to deduce their segregation in progeny.

Conclusion

Genomics technologies have made high resolution analysis of allelic diversity and inheritance a feasible proposition for sugarcane. The next challenge is to develop mathematical solutions and algorithms that resolve sugarcane genotypes from combinations of SNP ratios at single loci. Over time, if such comprehensive genetic analysis of sugarcane is possible, we may move to a position in sugarcane breeding where we can select both qualitatively and quantitatively for desired combinations of favourable alleles using information at the level of gene sequence.

References

Aitken KS, Jackson PA and McIntyre CL (2004) A combination of AFLP and SSR markers provides extensive map coverage and identification of hom(oe)ologous linkage groups in sugarcane. Theoretical and Applied Genetics in press.

Casu RE, Dimmock, CM, Chapman SC, Grof, CPL, McIntyre, CL, Bonnett, GD and Manners JM (2004a) Identification of differentially expressed transcripts from maturing stem of sugarcane by in silico analysis of stem expressed sequence tags and gene expression profiling. Plant Molecular Biology in press

Casu RE, Manners JM, Bonnett GD, Jackson PA, McIntyre CL, Dunne R, Chapman SC, Rae AL and Grof CPL (2004b) Genomics approaches for the identification of genes determining important traits in sugarcane. Field Crops Research in press.

Grivet L and Arruda P (2002) Sugarcane genomics: depicting the complex genome of an important tropical crop. Current Opinion in Plant Biology 5: 122-127.

Ronaghi M, Uhlen M and Nyren P (1998) A sequencing method based on real-time pyrophosphate. Science 281: 363-365.

Previous PageTop Of Page