De novo Genome Assembly and Single Nucleotide Variations for Soybean Mosaic Virus Using Soybean Seed Transcriptome Data

  • Jo, Yeonhwa ;
  • Choi, Hoseong ;
  • Bae, Miah ;
  • Kim, Sang-Min ;
  • Kim, Sun-Lim ;
  • Lee, Bong Choon ;
  • Cho, Won Kyong ;
  • Kim, Kook-Hyung
  • Received : 2017.03.20
  • Accepted : 2017.06.27
  • Published : 2017.10.01


Soybean is the most important legume crop in the world. Several diseases in soybean lead to serious yield losses in major soybean-producing countries. Moreover, soybean can be infected by diverse viruses. Recently, we carried out a large-scale screening to identify viruses infecting soybean using available soybean transcriptome data. Of the screened transcriptomes, a soybean transcriptome for soybean seed development analysis contains several virus-associated sequences. In this study, we identified five viruses, including soybean mosaic virus (SMV), infecting soybean by de novo transcriptome assembly followed by blast search. We assembled a nearly complete consensus genome sequence of SMV China using transcriptome data. Based on phylogenetic analysis, the consensus genome sequence of SMV China was closely related to SMV isolates from South Korea. We examined single nucleotide variations (SNVs) for SMVs in the soybean seed transcriptome revealing 780 SNVs, which were evenly distributed on the SMV genome. Four SNVs, C-U, U-C, A-G, and G-A, were frequently identified. This result demonstrated the quasispecies variation of the SMV genome. Taken together, this study carried out bioinformatics analyses to identify viruses using soybean transcriptome data. In addition, we demonstrated the application of soybean transcriptome data for virus genome assembly and SNV analysis.


de novo genome assembly;single nucleotide variation;soybean mosaic virus


  1. Ali, A. and Kobayashi, M. 2010. Seed transmission of Cucumber mosaic virus in pepper. J. Virol. Methods 163:234-237.
  2. Barba, M., Czosnek, H. and Hadidi, A. 2014. Historical perspective, development and applications of next-generation sequencing in plant virology. Viruses 6:106-136.
  3. Burger, J. T. and Maree, H. J. 2015. Metagenomic next-generation sequencing of viruses infecting grapevines. Methods Mol. Biol. 1302:315-330.
  4. Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., Handsaker, R. E., Lunter, G., Marth, G. T., Sherry, S. T., McVean, G. and Durbin, R. 2011. The variant call format and vcftools. Bioinformatics 27:2156-2158.
  5. Domier, L. L., Hobbs, H. A., McCoppin, N. K., Bowen, C. R., Steinlage, T. A., Chang, S., Wang, Y. and Hartman, G. L. 2011. Multiple loci condition seed transmission of Soybean mosaic virus (SMV) and smv-induced seed coat mottling in soybean. Phytopathology 101:750-756.
  6. Edwards, R. A. and Rohwer, F. 2005. Viral metagenomics. Nat. Rev. Microbiol. 3:504-510.
  7. Eggenberger, A. L., Stark, D. M. and Beachy, R. N. 1989. The nucleotide sequence of a soybean mosaic virus coat protein-coding region and its expression in Escherichia coli, Agrobacterium tumefaciens and tobacco callus. J. Gen. Virol. 70:1853-1860.
  8. Haas, B. J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P. D., Bowden, J., Couger, M. B., Eccles, D., Li, B., Lieber, M., MacManes, M. D., Ott, M., Orvis, J., Pochet, N., Strozzi, F., Weeks, N., Westerman, R., William, T., Dewey, C. N., Henschel, R., LeDuc, R. D., Friedman, N. and Regev, A. 2013. De novo transcript sequence reconstruction from RNA-seq using the trinity platform for reference generation and analysis. Nat. Protoc. 8:1494-1512.
  9. Herridge, D. F., Peoples, M. B. and Boddey, R. M. 2008. Global inputs of biological nitrogen fixation in agricultural systems. Plant Soil 311:1-18.
  10. Hill, J. H. and Whitham, S. A. 2014. Control of virus diseases in soybeans. Adv. Virus Res. 90:355-390.
  11. Jafarpour, B., Shepherd, R. and Grogan, R. 1979. Serologic detection of bean common mosaic and lettuce mosaic viruses in seed. Phytopathology 69:1125-1129.
  12. Jo, Y., Choi, H., Yoon, J.-Y., Choi, S.-K. and Cho, W. K. 2016. In silico identification of Bell pepper endornavirus from pepper transcriptomes and their phylogenetic and recombination analyses. Gene 575:712-717.
  13. Leinonen, R., Sugawara, H. and Shumway, M. 2011. The sequence read archive. Nucleic Acids Res. 39:D19-D21.
  14. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R. 2009. The sequence alignment/map format and samtools. Bioinformatics 25:2078-2079.
  15. Madden, T. 2013. The BLAST sequence analysis tool. In: The NCBI handbook (2nd ed.), ed. by National Center for Biotechnology Information. Bethesda, MD, USA.
  16. Massart, S., Olmos, A., Jijakli, H. and Candresse, T. 2014. Current impact and future directions of high throughput sequencing in plant virus diagnostics. Virus Res. 188:90-96.
  17. Messina, M. J. 1999. Legumes and soybeans: overview of their nutritional profiles and health effects. Am. J. Clin. Nutr. 70:439S-450S.
  18. Metzker, M. L. 2010. Sequencing technologies - the next generation. Nat. Rev. Genet. 11:31-46.
  19. Milne, I., Bayer, M., Cardle, L., Shaw, P., Stephen, G., Wright, F. and Marshall, D. 2010. Tablet-next generation sequence assembly visualization. Bioinformatics 26:401-402.
  20. Morales, F. J. and Castano, M. 1987. Seed transmission characteristics of selected bean common mosaic virus strains in differential bean cultivars. Plant Dis. 71:51-53.
  21. Morozova, O. and Marra, M. A. 2008. Applications of next-generation sequencing technologies in functional genomics. Genomics 92:255-264.
  22. Pimentel, D. and Patzek, T. W. 2005. Ethanol production using corn, switchgrass, and wood; biodiesel production using soybean and sunflower. Nat. Resour. Res. 14:65-76.
  23. Ruiz, M., Simon, A., García, M. and Janssen, D. 2014. First report of Lettuce chlorosis virus infecting bean in spain. Plant Dis. 98:857.1.
  24. Song, Q.-X., Li, Q.-T., Liu, Y.-F., Zhang, F.-X., Ma, B., Zhang, W.-K., Man, W.-Q., Du, W.-G., Wang, G.-D., Chen, S.-Y. and Zhang, J. S. 2013. Soybean GmbZIP123 gene enhances lipid content in the seeds of transgenic Arabidopsis plants. J. Exp. Bot. 64:4329-4341.
  25. Tamura, K., Stecher, G., Peterson, D., Filipski, A. and Kumar, S. 2013. Mega6: Molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30:2725-2729.
  26. Vodovar, N., Goic, B., Blanc, H. and Saleh, M.-C. 2011. In silico reconstruction of viral genomes from small rnas improves virus-derived small interfering rna profiling. J. Virol. 85:11016-11021.
  27. Wang, Y. and Qian, P.-Y. 2009. Conservative fragments in bacterial 16s rRNA genes and primer design for 16s ribosomal DNA amplicons in metagenomic studies. PLoS One 4:e7401.
  28. Wrather, J., Anderson, T., Arsyad, D., Tan, Y., Ploper, L., Porta-Puglia, A., Ram, H. and Yorinori, J. 2001. Soybean disease loss estimates for the top ten soybean-producing counries in 1998. Can. J. Plant Pathol. 23:115-121.
  29. Wylie, S., Wilson, C., Jones, R. and Jones, M. 1993. A polymerase chain reaction assay for cucumber mosaic virus in lupin seeds. Aust. J. Agr. Res. 44:41-51.
  30. Yanagisawa, H., Tomita, R., Katsu, K., Uehara, T., Atsumi, G., Tateda, C., Kobayashi, K. and Sekine, K.-T. 2016. Combined DECS analysis and next-generation sequencing enable efficient detection of novel plant RNA viruses. Viruses 8:70.
  31. Yang, Y., Kim, K. S. and Anderson, E. J. 1997. Seed transmission of cucumber mosaic virus in spinach. Phytopathology 87:924-931.
  32. Li, H. and Durbin, R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754-1760.


Supported by : National Research Foundation of Korea (NRF), Rural Development Administration (RDA), Vegetable Breeding Research Center