Skip to main content

SNP-based analysis of genetic diversity in anther-derived rice by whole genome sequencing

Abstract

Background

Anther culture has advantage to obtain a homozygous progeny by induced doubling of haploid chromosomes and to improve selection efficiency for invaluable agronomical traits. Therefore, anther culturing is widely utilized to breed new varieties and to induce genetic variations in several crops including rice. Genome sequencing technologies allow the detection of a massive number of DNA polymorphism such as SNPs and Indels between closely related cultivars. These DNA polymorphisms permit the rapid identification of genetic diversity among cultivars and genomic locations of heritable traits. To estimate sequence diversity derived from anther culturing, we performed whole-genome resequencing of five Korean rice accessions, including three anther culture lines (BLB, HY-04 and HY-08), their progenitor cultivar (Hwayeong), and an additional japonica cultivar (Dongjin).

Results

A total of 1,165 × 106 raw reads were generated with over 58× coverage that detected 1,154,063 DNA polymorphisms between the Korean rice accessions and Nipponbare. We observed that in Hwayeong and its progenies, 0.64 SNP was found per one kb of Nipponbare genome, while Dongjin, bred by a conventional breeding method, had a lower number of SNPs (0.45 SNP/kb). Among 1,154,063 DNA polymorphisms, 29,269 non-synonymous SNPs located on 30,013 genes and these genes were functionally classified based on gene ontology (GO). We also analyzed line-specific SNPs which were estimated 1 ~ 3% of the total SNPs. The frequency of non-synonymous SNPs in each accession ranged from 26 SNPs in Hwayeong to 214 SNPs in HY-04.

Conclusions

The genetic difference we detected between the progenies derived from anther culture and their mother cultivar is due to somaclonal variation during tissue culture process, such as karyotype change, chromosome rearrangement, gene amplification and deletion, transposable element, and DNA methylation. Detection of genome-wide DNA polymorphisms by high-throughput sequencer enabled to identify sequence diversity derived from anther culturing and genomic locations of heritable traits. Furthermore, it will provide an invaluable resource to identify molecular markers and genes associated with diverse traits of agronomical importance.

Background

Advances in genome sequencing technologies have aided in the discovery of millions of genome-wide DNA polymorphisms, single nucleotide polymorphisms (SNPs) and insertion-deletions (InDels). These are invaluable resources in analyzing genetic diversity in a population and in establishing the linkage relationship between genomes and heritable traits (Chen et al. 2011; Osman et al. 2003). Reference genome sequences for several crop species are now available, which permits both rapid identification of candidate genes through bioinformatic analysis and SNP discovery through comparison of the reference sequence with ones of various cultivars (Edwards and Batley 2010; Kim et al. 2010).

SNPs are the most common polymorphisms in the genomes of most organisms and are important molecular markers in genetic research for marker-assisted breeding (Ganal et al. 2009; Jena and Mackill 2008; McCouch et al. 2010; Silva et al. 2012. Since the rice genome was recently sequenced with high accuracy using a japonica rice cultivar, Nipponbare (IRGSP 2005), discovering massive numbers of SNPs by comparison with the Nipponbare reference sequence has become an effective tool. Recently, whole genome resequencing of rice cultivars using Nipponbare, as a reference have been performed using high-throughput sequencers. The whole genome resequencing of the japonica rice cultivar Koshihikari, which is closely related to Nipponbare has been completed (Yamamoto et al. 2010). In total 67,051 SNPs have been identified by a comparison between these two genomes. Historical representative rice cultivars were also analyzed to understand the dynamics of genome compositions using typing arrays based on SNPs. In a landrace cultivar of japonica rice 168,228 DNA polymorphisms were discovered by whole genome resequencing, and InDels were also validated by actual use as DNA markers (Arai-Kichise et al. 2011). For identifying agronomically importance genes, the resequencing 50 accessions of cultivated and wild rice revealed 6.5 million high-quality SNPs and identified thousands of genes with significantly lower diversity based on obtained SNPs. These candidate genes were considered to be selected during domestication (Xu et al. 2011).

Anther culturing has the advantages of producing homozygous progeny by induced doubling of haploid chromosomes and the improved selection efficiency for important agronomical plant traits (Janhe et al. 1991). Anther culturing, therefore, has been used as an efficient method to improve agronomically important crops such as rice and barely by producing useful cultivars (Barchi et al. 2010; Kasha and Kao 1970; Kozik et al. 2002; Zagorska et al. 2004). It has been reported that a number of variants have been detected in anther culture lines in several crops including rice (Bairu et al. 2011; Doğramaci-Altuntepe et al. 2001; Evans 1989; Reed and Wernsman 1988; Roy and Mandal 2005; Yan et al. 1996). However, the origins and extents of mutations are not well understood.

In this study, we performed whole genome sequencing to understand the extent of the sequence variation between an anther culture progenitor, Hwayeong, and its progeny lines (BLB, HY-04, and HY-08), which exhibited new agronomically important traits. Also, Dongjin, which is an elite cultivar in Korea, was resequenced to estimate the difference in genomic sequences between a cultivar developed from anther culturing and a cultivar developed by a conventional breeding method. Further genetic research will link sequence diversity with genic factors involved in anther culturing techniques. Also, this study confirms the idea that anther cultures provide valuable resources for developing genetic diversity and for breeding in rice.

Results

Sequencing and mapping of the reads to the Nipponbare genome

We performed whole genome resequencing of five Korean rice accessions including three anther culture lines (BLB, HY-04 and HY-08), their progenitor cultivar (Hwayeong), and an additional Korean japonica rice cultivar (Dongjin). The sequencing results yielded 118,243 × 106 bps (corresponding to 1,165 × 106 reads) and, on average, 61× coverage of the Nipponbare reference genome. The raw reads, which were high quality with Phred Quality Value +33 (> Q20), were used to analyze genetic variations in these five accessions (average 89.9% of total reads).

We mapped a large number of short reads from each of the five Korean rice accessions on to genomic sequences of japonica rice cultivar, Nipponbare. The mapping ratio which is a portion of reads that uniquely mapped onto Nipponbare genome in different accessions varied from 87% (207 × 106 out of 237 × 106 reads) in HY-04 to 89% (197 × 106 out of 220 × 106 reads) in Dongjin (Table 1). The final effective mapping depth averaged > 54× across the whole genome, with a sequencing depth ranging from 53× in Dongjin to 55× in HY-08. The uniquely mapped reads covered approximately 94% of the Nipponbare genome in all five accessions (Table 1). Among chromosomes, chromosome 11 had the lowest ratios, > 12% and > 10% lower, respectively, than the average ratio, in both the genome coverage and mapping depth. All three lines (HY-04, HY-08 and BLB) that were regenerated from anther cultures had the highest ratios of coverage, > 99%, on chromosome 5 and depths from 62× to 68×, which was approximately 10% higher than average, on chromosome 10. In Dongjin and Hwayeong, the highest ratio of coverage was similar to the three anther culture lines on chromosome 5, but chromosome 9 had the highest ratio of depth. However, there was little difference among the five accessions.

Table 1 Reference assembly of each accession onto Nipponbare genome

Detection of DNA polymorphisms

The total number of DNA polymorphisms was 1,154,063 including 1,024,202 SNPs, 53,180 insertions and 76,681 deletions between the five accessions and the Nipponbare genome (Figure 1b). On average, 230,813 SNPs per accession were detected, which means that 0.6 SNP was found per one kb of Nipponbare genome (382 Mb). We observed that all accessions had similar results among the DNA polymorphisms with 88.7% being substitutions, 4.5% being insertions, and 6.6% being deletions (Figure 1b).

Figure 1
figure 1

Distribution of SNP types. (a) Homozygous and heterozygous SNPs from each accession. Homozygous SNPs accounted for approximately 87% of the total potential SNPs. (b) The ratio of SNP types. Substitutions, insertions and deletions were 88.7%, 4.5%, and 6.6%, respectively, among DNA polymorphisms.

Averages of 245,776 DNA polymorphisms were detected within Hwayeong, BLB, HY-04 and HY-08. All these lines, including Hwayeong, were developed via anther cultures. There were larger DNA polymorphisms in Dongjin, which was bred by a conventional breeding method. HY-04 and HY-08, which have a high yielding ability trait, had slightly higher ratios (> 1.5%) of SNPs than BLB. They showed higher frequencies of substitutions but lower frequencies of InDels than Hwayeong and BLB. The total number of SNPs varied across on each chromosome. Over 50% of Dongjin’s SNPs were located on chromosome 11 and 12 while over 50% of Hwayeong and its anther culture derived lines SNPS were located on chromosome 8 and 11 (Table 2). There were indications of a sequence difference between Hwayeong and its anther culture derived progenies. Hwayeong had its lowest ratio of SNPs (2% of the total) on chromosome 6, but for the three progeny lines the lowest ratio of SNPs was on chromosome 5 (1 to 2%). Potential SNPs were classified into two types, homozygous and heterozygous SNPs, based on the mismatch frequency with Nipponbare when there were more than two bases in the identity position. Approximately 87% of the SNPs from all five of the accessions were homozygous and 13% were heterozygous (Figure 1a).

Table 2 The number of SNPs on individual chromosomes detected between each accession and Nipponbare

Annotation of SNPs and InDels

The Rice Annotation Project Database (RAP-DB) was used to locate the 1,154,063 DNA polymorphisms detected between all five accessions and the Nipponbare genome. Accordingly, the total 214,799 SNPs (including InDels, 18.6% of the total) out of 1,154,063 SNPs were found in a gene region, but only 57,146 SNPs (4.95% of the total) occurred in a coding region (Figure 2). Altogether, 29,269 non-synonymous SNPs (2.54% of the total) detected in all five accessions were located in 30,013 genes (Table 3). Among the 42,088 genes annotated with RAP-DB, HY-04 contained the highest number of SNP containing genes. HY-04 carried SNPs in 7,507 genes (17.8% of the total genes) and HY-08 had SNPs in 6,558 genes (15.6% of the total genes) (Table 3). The annotation of SNPs in each of the five accessions revealed that the number of SNPs per gene ranged from 6.61 in Dongjin to 7.42 in HY-04, with a mean of 7.16. Similarly, the number of non-synonymous SNPs per gene ranged from 0.92 in Dongjin to 1.02 in HY-04 (Table 3). On average, the ratio of non-synonymous to synonymous SNPs was 1.16 in the five accessions (Table 3), which is similar to that found in a previous study (McNally et al. 2009). The ratio is higher than that of Arabidopsis (0.83) (Clark et al. 2007) but lower than that of soybean (1.61) (Lam et al. 2010).

Figure 2
figure 2

Classification of SNPs based on locations. Based on the annotation of IRGSP, SNPs and InDels, SNPs were classified as genic or intergenic. Depending on intra-genic locations, ‘genic’ was further separated into CDS, intron, and UTRs. The number and ratio of SNPs in each class is shown. Unknown SNPs were located in the coding region but were not annotated in IRGSP. (a) Dongjin is an elite cultivar in Korea. (b) Hwayeong is a mother cultivar of HY-04, HY-08, and BLB. (c), (d), and (e) were derived from Hwayeong via anther culturing.

Table 3 Distribution of SNPs within genic regions

Comparison analysis between detected SNPs and dbSNP

We also analyzed whether the detected SNPs were novel SNPs or SNPs reported on the NCBI’s dbSNP. The highest percentage of novel SNPs was shown in Dongjin with only 29.48% common SNPs and 70.52% novel SNPs. The ratio of novel SNPs in HY-04 and HY-08 were nearly 4% lower than Hwayeong and BLB. In HY-04 and HY-08, chromosome 9 had the least difference between common SNPs and novel SNPs at 3.83% and 2.72%, respectively. In Hwayeong and BLB, chromosome 8 showed the least difference between common SNPs and novel SNPs (Figure 3). In contrast, the largest differences between the two SNP types were found on chromosome 5 of HY-04 and HY-08, which were 68.96% and 68.64%, respectively, and chromosome 3 of Hwayeong and BLB, which were 61.2% and 71.2%, respectively (Figure 3).

Figure 3
figure 3

Comparisons between novel SNPs and dbSNP. By comparing with NCBI’s dbSNP, SNPs were classified into novel and common. Novel SNPs were more abundant than common ones. In the graph, the x-coordinate and y-coordinate represent each chromosome and the number of SNPs, respectively. (a) – (e) are the same as described in Figure 2.

Line-specific SNP analysis

lsSNPs unique to Hwayeong and each of its progeny lines (BLB, HY-04 and HY-08) were identified. These candidate SNPs have the possibility of being associated with a unique phenotype or agronomical trait in each cultivar or line. The lsSNPs were classified as those not previously reported in the dbSNP. Unique SNPs were detected in each of these lines. It was estimated that the portion of lsSNPs is 1 to 3% of the total SNPs (Table 4, Figure 4). The distribution of non-synonymous SNPs out of lsSNPs varied from each line. In Hwayeong, SNPs were distributed only on chromosomes 5, 7, 8, and 11 and similarly, in BLB they were detected on chromosomes 2, 5, 8, and 12. In both lines, the majority of non-synonymous SNPs were distributed on chromosome 8. In HY-04 and HY-08, however, there was a more even distribution among the chromosomes. They also had larger numbers of lsSNPs on chromosome 1 than Hwayeong and BLB. The HY-04 line had the highest number of the lsSNPs with 9,602 (3.4% of the total SNPs). The BLB, on the other hand, contained only 2,160 lsSNPs (1.0% of the total SNPs), which was the lowest among the four lines. Most SNPs were located in the intergenic regions, 2,300 SNPs (88.5% of lsSNPs) in Hwayeong to 7,972 SNPs (83.0% of lsSNPs) in HY-04 (Table 4). The number of lsSNPs detected in the coding region varied from 48 SNPs (1.9%) in Hwayeong to 346 SNPs (3.6%) in HY-04. The frequency of non-synonymous SNPs in the coding regions also was different among the accessions. Hwayeong contained 26 SNPs (1.0% of lsSNPs) while HY-04 had 214 SNPs (2.2% of lsSNPs) (Table 4). Also SNPs common to all four accessions were identified. A total of 34,710 SNPs were common to all four accessions. Of those, 8,099 SNPs were unique to only these four lines and were classified as not reported in the dbSNP (Figure 4).

Table 4 Line-specific SNPs not reported in dbSNP
Figure 4
figure 4

Line-specific SNPs and common SNPs. SNPs were classified as specific to Hwayeong and each of its progeny lines and SNPs common to them all. Line-specific SNPs and common SNPs were further classified into two groups. One group consisted of novel SNPs, which were not reported in the dbSNP (no dbSNP), and the others were listed in the dbSNP. In the graph, the x-coordinate and y-coordinate represent each chromosome and the number of SNPs, respectively.

Functional study

We analyzed the five genes that had the highest number of SNPs within a genic region in each of the five accessions. The genes Os08g0205150 and Os08g023640 0 were included in the upper five SNP containing genes of Hwayeong and its progeny lines. Both genes have functions related to and including ATP binding, protein serine/threonine kinase activity, and protein amino acid phosphorylation. We also analyzed the SNP frequency in the top five genes in the three accessions developed from anther cultures. HY-04 harbored a total of 122 SNPs in one gene, Os02g016400, in which no SNP was found in Hwayeong. Only 50 coding SNPs (cSNPs) were located in the coding regions. Among these 50 cSNPs, six SNPs were detected as non-synonymous SNP (nsSNPs) and 44 SNPs were synonymous SNP. The lines BLB and HY-08 carried fewer SNPs, 13 and 14, respectively, in this gene. The difference in the number of SNPs between HY-04 and BLB or HY-08 is correlated with the difference in the number of SNPs in the coding region. Only four and five cSNPs are present in BLB and in HY-08, respectively. For the Os07g0645700 gene, 54 cSNPs were detected on the CDS in HY-08 and there were 22 nsSNPs. BLB contained 40 cSNPs in the coding region but the number of nsSNPs was only three. Also HY-04 had five nsSNPs in this gene.

To estimate the functional relationship of SNPs with genes in which SNPs reside, these genes were functionally classified based on GO. When we examined gene groups that carried one or more nsSNP, we discovered that all the accessions had plenty of SNPs in genes closely related to nucleotide binding (GO:0000166) and ATP binding (GO:0005524) (Figure 5). HY-04 and HY-08 especially showed that genes associated with purine nucleotide binding (GO:0017076) harbored many SNPs but this was not seen in the other accessions (Figure 5). In Hwayeong, 11 genes associated with the function of O-methyltransferase activity (GO: 0008171) had one or more SNP in the coding region, but the other four accessions did not have a SNP. HY-04 and HY-08 especially showed that genes associated with purine nucleotide binding (GO:0017076) and cellular protein metabolism (GO:0044267) possessed many SNPs in the coding region but Hwayeong and BLB did not appear to have these SNPs (Figure 5).

Figure 5
figure 5

Functional analysis of genes carrying non-synonymous SNPs. Genes that contained one or more non-synonymous SNPs were separated into functional categories to obtain relationships between the gene’s function and potential SNPs by Gene Ontology.

Discussion

Anther culture systems have made a significant impact on plant breeding and genetics (Evans 1989; Sugimoto et al. 2000). Anther culture-derived plants are believed to undergo a spontaneous doubling of the haploid chromosomes of microsporocytes or callus cells. Therefore, anther culturing has been utilized to achieve rapid homozygosity and to enhance selection efficiency for important agronomical traits in plants. Also, like other tissue culture systems, it has been reported that a number of variants were generated among anther culture-derived plants, including rice (Bairu et al. 2011; Evans 1989; Roy and Mandal 2005; Yan et al. 1996). The progenies that were developed from anther culturing showed different types of variations from their mother plant, such as culm length, panicle length, and grain weight (Sohn et al. 1995; Yi et al. 1999). Therefore, genetic and breeding research with anther culture derived lines has been performed to obtain variation in important agronomic traits, and these lines are valuable genetic resources (Evans 1989; Schaeffer and Sharpe 1981). Even though the significance of anther culturing has been emphasized in terms of genetic variation, there is little information on the origin and extent of mutations derived during anther culturing. Most information has been obtained from the study of epigenetic and genetic activities of endogenous transposable elements (Barret et al. 2006; Kikuchi et al. 2003). In vitro, Kikuchi et al. (2003) showed that miniature Ping (mPing) elements, which is a new class of miniature inverted-repeat transposable elements, are activated in cells derived from anther cultures where mPing elements are deleted from original sites and reinserted into new loci. Barret et al. (2006) demonstrated that ZmTPAPong-like in maize displayed homology with the transposase of Pong, and it could form part of a Zea mays element related to the rice Pong element. They also revealed somaclonal variations among plants regenerated from a doubled haploid line. Recently, it has been demonstrated that somaclonal variations result from newly induced mutations during the tissue culture process and not pre-exist in the plants before being cultured (Sato et al. 2011).

To estimate DNA polymorphisms between a mother plant and its descendants developed from anther culturing, we selected Hwayeong and three lines (BLB, HY-04, and HY-08) derived from Hwayeong via anther culturing. HY-08 and HY-04 have a high yielding ability and BLB has resistance to bacterial blight. These lines were subjected to whole-genome sequencing using a high-throughput sequencer. We also performed sequencing on Dongjin, which is an elite cultivar in Korea. All of the lines are in a japonica genetic background.

In the present study, the whole genome of five accessions was mapped to Nipponbare as a reference genome to discover genome-wide DNA polymorphisms. The uniquely mapped reads from these accessions covered > 95% of the reference genome, providing an average coverage of 54.6× across the genome (Table 1). Among the chromosomes, chromosome 5 had a high mapping ratio, > 99%, while chromosome 11 had the lowest ratio. A notable enrichment of significant structural variation which includes copy number variation (CNV) caused by large insertion, deletion or duplication have been identified within known R gene clusters in several crop species, such as soybean and rice (McHale et al. 2012; Yu et al. 2011). Therefore, it may be inferred that diverse structural variation was occurred on chromosome 11 of which dense genes or gene family were associated with disease resistance and immunity (The Rice Chromosomes 11 and 12 Sequencing Consortia 2005). The relationship between sequencing depth and efficacy in the comprehensive detection of SNPs is a key concern from the perspective of cost-effectiveness. Smith et al. (2008) reported that redundancy resulting from increasing the sequencing depth from 10× to 15× permits accurate and cost-effective detection of DNA polymorphisms using a Solexa analyzer. As mentioned above, we achieved the final effective mapping depth of > 54.6× coverage. Based on result of mapping reads, we detected the total of 1,154,063 DNA polymorphisms including 1,024,202 SNPs, 53,180 insertions and 76,681 deletions between the five accessions and the reference genome, with an average density of a SNP per 1.6 kb on Nipponbare. Dongjin bred by a conventional breeding method had a lower number of SNPs (0.64 SNP/kb) than Hwayeong and its progenies obtained from an anther cultivar (average density of 0.45 SNP/kb). SNPs were concentrated (> 50%) on chromosome 11 and 12 of Dongjin and on chromosome 8 and 11 of the anther culture lines and Hwayeong. Among Hwayeong and its progenies, HY-04 and HY-08 had more detected SNPs than Hwayeong and BLB. Particularly the ratio of SNPs on chromosome 1 was 5 times higher in HY-04 and HY-08 than Hwayeong. HY-04 and HY-08 exhibit high yield among them, which is a distinguishable agricultural trait from Hwayeong. Based on the study of Miura et al. (2011) and Vikram et al. (2011), QTLs for grain yield was identified on rice chromosome 1. We believe that this information is useful to find genes associated with important trait of both of them in further study. We classified the detected SNPs into two types, homozygous SNPs and heterozygous SNPs. Since Hwayeong and its progenies were lines driven by anther culturing and Dongjin was bred by selfing over a several generations, the detected SNPs were expected to be predominantly homozygous SNPs; however, 13% of the SNPs were heterozygous (Figure 1a). Accoding to the study of Pinson and Rutge (1993), they stated, it could be found in some mechanisms such as somatic tissues, mutation occurring during or after a spontaneous doubling event, fusion of genotypically different cells in chimeric callus, and abnormal meioses resulting in heterozygous diploid microspores. Although heterozygosity of SNPs in these accessions is difficult to explain, further studies will solve the cause of the heterozygosity in the near future.

If SNPs exert functional effects on phenotypic traits, they are most likely located in intra-genic regions. We therefore classified SNPs based on their genomic locations. Of the potential SNPs, 80% were located in intergenic regions and approximately 5% in coding regions. Hwayeong carried 10,244 SNPs (4.59% of the total) in coding regions. Of these, the number of nsSNP was 5,222 (2.34%). HY-04 contained 14,829 cSNPs (5.23%) of which 7,638 were nsSNPs (2.69%) (Figure 2).

HY-04 and HY-08 carried similar numbers of whole genome but the smallest number of cSNPs among the accessions. Among the 42,088 genes annotated with RAP-DB, 5,471 genes (13%) in Hwayeong contained one or more SNPs and the total number of SNPs was 39,029, which corresponds to 7.13 SNPs per gene (Table 3). Anther culture progenies of Hwayeong revealed slightly higher frequencies of SNPs than Hwayeong in genic regions.

The five genes that had the highest number of cSNPs in a genic region were investigated. As a result, variations were detected in genes related to immunity, such as apoptosis and signal transduction. All accessions included the genes Os08g0205150 and Os08g0236400, which perform the functions of ATP binding, protein serine/threonine kinase activity and protein amino acid phosphorylation. However, the SNP frequency in each of the top five genes varied among accessions. HY-04 revealed 122 cSNPs in the Os02g0164000 gene, while HY-08 and BLB had four cSNPs and three cSNPs, respectively. However, Hwayeong carried no SNPs in the same gene. The detection of DNA polymorphisms in this gene not only verified that HY-04 and HY-08 are anther culture-derivatives of Hwayeong but also revealed genetic differences between the progenies. The frequencies of cSNPs and nsSNPs, 54 and 21 on average, respectively, in the Os07g06457001 gene were similar in HY-08 and Hwayeong. BLB, however, contained 40 cSNPs in the coding region of the Os07g0645700 gene but the number of nsSNP in BLB was seven times lower than the other two accessions (3 nsSNPs).

Yamamoto et al. (2010) clarified the definition of the pedigree haplotypes of closely related rice cultivars to analyze conserved SNP regions between cultivars by means of genome-wide SNPs. In contrast to Yamamoto et al. (2010), we used lsSNPs to select candidate SNPs that could be associated with the phenotype of each cultivar. Based on the distribution of lsSNPs, we found certain regions and genes that were different between the mother line and its descendants and subsequently may influence the phenotype. HY-04 carried 9,602 lsSNPs, which was 3.4% of the total SNPs (Table 4). The distribution pattern of lsSNPs in the genome also was similar to that of SNPs in the whole genome. Greater than 83% of the lsSNPs were located in intergenic regions. HY-04 had the largest number of nsSNPs, 214, and Hwayeong had the smallest number of nsSNPs (Table 4, Figure 4). The lsSNPs identified in this study will provide valuable information used to isolate genes responsible for unique agronomical traits, which arise from almost identical lines generated by anther cultures. These lsSNPs will serve as molecular markers to map and clone genes that will distinguish its progenitor (mother line) and its anther culture siblings.

Conclusions

The genetic diversity between the mother cultivar and its descendants obtained from anther cultivars was analyzed by revealing DNA polymorphisms, including single nucleotide polymorphisms, insertions and deletions among the five Korean rice accessions. The analysis estimated differences in genomic sequences between accessions using the frequency and distribution of SNPs in the genome, the five genes that had the largest number of SNPs in the coding regions and lsSNPs. The lsSNPs will be useful to select candidate SNPs that could have been associated with unique phenotypes or agronomical traits in each accessions. Furthermore, DNA polymorphisms will provide an invaluable resource to identify molecular markers and genes associated with diverse traits of agronomical importance.

Methods

Sample preparation and sequencing

Genomic DNA was extracted from five Korean rice accessions, including three anther culture lines (BLB, HY-04, and HY-08), their progenitor cultivar (Hwayeong), and a Korean elite japonica cultivar (Dongjin), and prepared following the manufacturer’s protocols (Illumina). Fragments of the library were paired-end sequenced using Illumina’s Hiseq 2000. The length of all sequences generated was 101 nucleotides. In Dongjin, we performed whole-genome resequencing by two massive parallel sequencing including Illumina Hiseq 2000 and 454 GS FLX. The raw reads that were high quality with Phred Quality Values > Q20 (ASCII Character Code +33) on basis of Sanger Quality were used to analyze genetic variations in five accessions. The “Q20” value indicates an accuracy of 99% for the base called.

Reference database

Genomic data

The five Korean rice accessions belong to the japonica rice variety. Therefore, Oryza sativa L. cv. Nipponbare was used as the reference sequence (Pseudomolecules Build 5.0, http://rgp.dna.affrc.go.jp/E/IRGSP/Build5/build5.html, International Rice Genome Sequencing Project 2005). Information from RAP-DB (http://rapdb.dna.affrc.go.jp/) was constructed and annotated to analyze structure and gene function.

dbSNP The NCBI’s SNP database (dbSNP) provides valuable information from whole-genome sequencing and Next Generation Sequencing (http://www.ncbi.nlm.nih.gov/projects/SNP/).

Mapping of reads and SNP detection

A large number of paired-end reads were assembled on to genomic sequences of the japonica rice cultivar Nipponbare using CLC Assembly Cell (ver. 3.2.2, http://www.clcbio.com) with the following parameters: alignment mode, local; similarity, 95%; HSP coverage 100%; gap cost, 3; deletion cost, 3; and mismatch cost, 2. SNPs were detected by comparison alignment with the Nipponbare sequence as a reference. To classify whether mismatches were sequencing errors or genomic variations, parameters were set as follows: minimum depth, 30; minimum variant frequency, 35%; least mismatch count, 20; and homo/heterozygote fold change, 2. RAP-DB was utilized to locate the discovered SNPs. SNPs were annotated as genic and intergenic based on positional information from the genome. DNA polymorphisms in genic regions were classified as coding sequence (CDS), untranslated regions (UTRs), and introns. DNA polymorphisms in the coding region were separated into synonymous SNPs and non-synonymous SNPs by amino acid substitutions. Also, SNPs were classified into two types, homozygous and heterozygous SNPs, based on the mismatch frequency if more than two bases shared the identity position.

Comparison between SNPs and dbSNP

To get the specific variation information, we compared the potential SNPs in four accessions with the dbSNP. As the reference SNP (refSNP) position information of O. sativa provided on dbSNP is based on genome build 3. We redefined the SNP position information based on build 5. To update the refSNP to genome build 5, we reconstructed the refSNP position information based on 4,521,605 refSNPs reported in dbSNP (Table 5). Our results show that 3,985,423 refSNPs (88%) were updated with unique positions in the genome sequence, while about 12% of the refSNPs positions could not be confirmed because they mapped to multiple locations or were not mappable (Table 6). We were able to successfully update to genome build 5 when considering approximately 12% of undefined rsSNPs had no information of unique genome positions in genome build 3. Using the redefined dbSNP, we analyzed whether the detected SNPs were novel SNPs or common SNPs already reported in the dbSNP.

Table 5 Statistics of dbSNP
Table 6 The number of rsSNP according to genome version

Functional study

To estimate the functional relationship of the SNPs with genes, we performed the three analyses. First, the five genes that had the highest number of SNPs within a genic region were selected and the functions of genes were compared between each accession. Second, genes were functionally classified based on Gene Ontology (GO; http://www.geneontology.org/). Finally, the lsSNPs were classified as those not previously reported in the dbSNP. Unique SNPs were detected in each accession.

Accession codes

The resequencing data from the five Korean rice accessions have been submitted to EMBL-EBI (http://www.ebi.ac.uk) under the accession numbers; Dongjin [ERP001605, ERP001678], Hwayeong [ERP001620], BLB [ERP001655], HY-04 [ERP001653], HY-08 [ERP001654].

Authors’ information

IS, UH, GS, HS, HJ, JH, TH: Genomics Division, National Academy of Agricultural Science, Rural Development Administration, Suwon 441–707, Republic of Korea. CD: Department of Biochemistry, Gyeongsang National University, Jinju 660–701, Republic of Korea. GA: Department of plant molecular systems biotechnology and Crop biotech institute, Kyung Hee university, Yongin 446–701, Republic of Korea.

Abbreviations

refSNP:

Reference SNP

lsSNP:

Line-specific SNP

cSNP:

SNP within coding region

nsSNP:

Non-synonymous SNP

References

  • Arai-Kichise Y, Shiwa Y, Nagasaki H, Ebana K, Yoshikawa H, Yano M, Wakasa K: Discovery of genome-wide DNA polymorphisms in a landrace cultivar of japonica rice by whole-genome sequencing. Plant Cell Physiol 2011, 52: 274–282. 10.1093/pcp/pcr003

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Bairu MW, Aremu AO, Staden JV: Somaclonal variation in plants: causes and detection methods plant growth regulation. Plant Growth Regul 2011, 63: 147–173. 10.1007/s10725-010-9554-x

    Article  CAS  Google Scholar 

  • Barchi L, Lanteri S, Portis E, Stagel A, Vale G, Toppino L, Rotino GL: Segregation distortion and linkage analysis in eggplant (Solanum melongena L.). Genome 2010, 53: 805–815. 10.1139/G10-073

    Article  CAS  PubMed  Google Scholar 

  • Barret P, Brinkman M, Beckert M: A sequence related to rice Pong transposable element displays transcriptional activation by in vitro culture and reveals somaclonal variations in maize. Genome 2006, 49: 1399–407. 10.1139/g06-109

    Article  CAS  PubMed  Google Scholar 

  • Chen H, He H, Zou Y, Chen W, Yu R, Liu X, Yang Y, Gao YM, Xu JL, Fan LM, Li Y, Li ZK, Deng XW: Development and application of a set of breeder-friendly SNP markers for genetic analyses and molecular breeding of rice (Oryza sativa L.). Theor Appl Genet 2011, 123: 869–79. 10.1007/s00122-011-1633-5

    Article  PubMed  Google Scholar 

  • Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P, Warthmann N, Hu TT: Common sequence polymorphisms shaping genetic diversity in Arabidopsis thaliana. Science 2007, 317: 338–342. 10.1126/science.1138632

    Article  CAS  PubMed  Google Scholar 

  • Doğramaci-Altuntepe M, Peterson TS, Jauhar PP: Anther culture-derived regenerants of durum wheat and their cytological characterization. J Hered 2001, 92: 56–64. 10.1093/jhered/92.1.56

    Article  PubMed  Google Scholar 

  • Edwards D, Batley J: Plant genome sequencing: applications for crop improvement. Plant Biotechnol J 2010, 8: 2–9. 10.1111/j.1467-7652.2009.00459.x

    Article  CAS  PubMed  Google Scholar 

  • Evans DA: Somaclonal variation – genetic basis and breeding applications. Trends Genet 1989,5(2):46–50.

    Article  CAS  PubMed  Google Scholar 

  • Ganal MW, Altmann T, Röder MS: SNP identification in crop plants. Curr Opin Plant Biol 2009, 12: 211–217. 10.1016/j.pbi.2008.12.009

    Article  CAS  PubMed  Google Scholar 

  • International Rice Genome Sequencing Project: The map-based sequence of the rice genome. Nature 2005, 436: 793–800. 10.1038/nature03895

    Article  Google Scholar 

  • Janhe A, Hazze PA, Lorz H: Regeneration of fertile plants from protoplast derived from embryogenic suspension of barley (Hordeum vulgare L.). Plant Cells Rep 1991, 10: 1–6.

    Google Scholar 

  • Jena KK, Mackill DJ: Molecular markers and their use in marker-assisted selection in rice. Crop Sci 2008, 48: 1266–1276. 10.2135/cropsci2008.02.0082

    Article  Google Scholar 

  • Kasha KJ, Kao KN: High frequency haploid production in barley (Hordeum vulgare L.). Nature 1970, 225: 874–876. 10.1038/225874a0

    Article  CAS  PubMed  Google Scholar 

  • Kim MY, Lee SH, Van KJ, Kim TH, Jeong SC, Choi IY, Kim DS, Lee YS, Park D, Ma J: Whole-genome sequencing and intensive analysis of the undomesticated soybean (Glycine soja Sieb. and Zucc.) genome. Proc Natl Acad Sci USA 2010, 107: 22032–22037. 10.1073/pnas.1009526107

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Kikuchi K, Terauchi K, Wada M, Hirano HY: The plant MITE mPing is mobilized in anther culture. Nature 2003,9(421):167–70.

    Article  Google Scholar 

  • Kozik EU, Nowak R, Kłosińska U, Górecka K, Krzyzanowska D, Gorecki R: Morphological diversity of androgenic carrot plants. J Appl Genet 2002, 43: 49–53.

    PubMed  Google Scholar 

  • Lam HM, Xu X, Liu X, Chen W, Yang G, Wong FL, Li MW, He W, Qin N: Resequencing of 31 wild and cultivated soybean genomes identifies patterns of genetic diversity and selection. Nat Genet 2010, 42: 1053–1059. 10.1038/ng.715

    Article  CAS  PubMed  Google Scholar 

  • McCouch SR, Zhao K, Wright M, Tung C, Ebana K, Thomson M: Development of genome-wide SNP assays for rice. Breed Sci 2010, 60: 524–535. 10.1270/jsbbs.60.524

    Article  Google Scholar 

  • McHale LK, Haun WJ, Xu WW, Bhaskar PB, Anderson JE, Hyten DL, Gerhardt DJ, Jeddeloh JA, Stupar RM: Structural variants in the soybean genome localize to clusters of biotic stress-response genes. Plant Physiol 2012, 159: 1295–1308. 10.1104/pp.112.194605

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • McNally KL, Childs KL, Bohnert R, Davidson RM, Zhao K, Ulat VJ, Zeller G, Clark RM: Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. Proc Natl Acad Sci USA 2009, 106: 12273–12278. 10.1073/pnas.0900992106

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Miura K, Ashikari M, Matsuoka M: The role of QTLs in the breeding of high-yielding rice. Trends Plant Sci 2011, 16: 319–326. 10.1016/j.tplants.2011.02.009

    Article  CAS  PubMed  Google Scholar 

  • Osman A, Jordan B, Lessard PA, Muhammad N, Haron MR, Riffin NM, Sinskey AJ, Rha C, Housman DE: Genetic diversity of eurycoma longifolia inferred from single nucleotide polymorphisms. Plant Physiol 2003, 131: 1294–1301. 10.1104/pp.012492

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Pinson SRM, Rutge JN: Heterozygous diploid plants regenerated from anther culture of F1 rice plants. In Vitro Cell Dev Biol 1993, 29: 174–179.

    Article  Google Scholar 

  • Reed SM, Wernsman EA: DNA amplification among anther-derived doubled haploid lines of tobacco and its relationship to agronomic performance. Crop Sci 1988, 29: 1072–1076.

    Article  Google Scholar 

  • Roy B, Mandal AB: Anther culture response in indica rice and variations in major agronomic characters among the androclones of a scented cultivar, Karnal local. Afr J Biotechnol 2005, 4: 235–240.

    Google Scholar 

  • Sato M, Hosokawa M, Motoaki Doi M: Somaclonal variation is induced De novo via the tissue culture process: a study quantifying mutated cells in saintpaulia. PLoS One 2011, 6: e23541. 10.1371/journal.pone.0023541

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Schaeffer GW, Sharpe FT: Lysine in seed protein from S-aminoethyl-l-cysteine resistant anther-derived tissue cultures of rice. In Vitro Cell Develop Biol 1981, 17: 345–352.

    Article  CAS  Google Scholar 

  • Silva J, Scheffler B, Sanabria Y, De Guzman C, Galam D, Farmer A, Woodward J, May G, Oard J: Identification of candidate genes in rice for resistance to sheath blight disease by whole genome sequencing. Theor Appl Genet 2012, 124: 63–74. 10.1007/s00122-011-1687-4

    Article  CAS  PubMed  Google Scholar 

  • Smith DR, Quinlan AR, Peckham HE, Makowsky K, Tao W, Woolf B, Shen L, Donahue WF, Tusneem N, Stromberg M: Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome Res 2008, 18: 1638–1642. 10.1101/gr.077776.108

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Sohn JK, Yi GH, Oh BG, Lim SJ: Variation of some agronomic traits in anther-derived rice plants. Korean J Breed 1995, 27: 404–408.

    Google Scholar 

  • Sugimoto K, Miyake H, Takeoka Y: Genetic diversity of regeneration ability in anther culture of rice (Oryza sativa L.). Plant Prod Sci 2000, 3: 387–391. 10.1626/pps.3.387

    Article  Google Scholar 

  • The Rice Chromosomes 11 and 12 Sequencing Consortia: The sequence of rice chromosomes 11 and 12, rich in disease resistance genes and recent gene duplications. BMC Biol 2005, 3: 20.

    Article  PubMed Central  Google Scholar 

  • Xu X, Liu X, Ge S, Jensen JD, Hu F, Li X, Dong Y, Gutenkunst RN, Fang L, Huang L, Li J: Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes. Nat Biotechnol 2011, 30: 105–11. 10.1038/nbt.2050

    Article  CAS  PubMed  Google Scholar 

  • Yamamoto T, Nagasaki H, Yonemaru J, Ebana K, Nakajima M, Shibaya T, Yano M: Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms. BMC Genomics 2010, 11: 267. 10.1186/1471-2164-11-267

    Article  PubMed Central  PubMed  Google Scholar 

  • Yan J, Xue Q, Zhu J: Genetic studies of anther culture ability in rice (Oryza sativa). Plant Cell Tiss Organ Cult 1996, 45: 253–258. 10.1007/BF00043638

    Article  CAS  Google Scholar 

  • Yi GH, Nam MH, Oh BG, Choi HC, Kim SC, Sohn JK: Genetic behaviors of variants derived from rice cell culture. Korean J Breed 1999, 31: 280–285.

    Google Scholar 

  • Yu P, Wang C, Xu Q, Feng Y, Yuan X, Yu H, Wang Y, Tang S, Wei X: Detection of copy number variations in rice using array-based comparative genomic hybridization. BMC Genomics 2011, 12: 372. 10.1186/1471-2164-12-372

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Vikram P, Swamy BP, Dixit S, Ahmed HU, Teresa Sta Cruz M, Singh AK, Kumar A: qDTY1.1, a major QTL for rice grain yield under reproductive-stage drought stress with a consistent effect in multiple elite genetic backgrounds. BMC Genet 2011, 12: 89.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Zagorska NA, Shtereva LA, Kruleva MM, Sotirova VG, Baralieva DL, Dimitrov BD: Induced androgenesis in tomato (Lycopersicon esculentum Mill.). III. Characterization of the regenerants. Plant Cell Rep 2004, 22: 449–456. 10.1007/s00299-003-0720-8

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

This work was supported by grants from Next Generation BG21 program (PJ008215) and the National Academy of Agricultural Science (PJ006817) of the Rural Development Administration, Republic of Korea. We thank an Insilicogen, Inc. for helping with informatics analyses.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tae-Ho Kim.

Additional information

Competing interests

The authors declare no potential competing interests.

Authors’ contributions

TH and UH conceived of the study, participated in its design. IS, GS and HS performed the experiments and its analysis. GS and HS prepared samples and involved in the phenotyping. JH, IS and HJ performed bioinformatical analysis tools. GA and CD analyzed the data and helped to draft the manuscript. IS wrote the paper. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Jeong, IS., Yoon, UH., Lee, GS. et al. SNP-based analysis of genetic diversity in anther-derived rice by whole genome sequencing. Rice 6, 6 (2013). https://doi.org/10.1186/1939-8433-6-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1939-8433-6-6

Keywords