DOI QR코드

DOI QR Code

Could Decimal-binary Vector be a Representative of DNA Sequence for Classification?

  • Sanjaya, Prima (Department of Ubiquitous IT, Dongseo University) ;
  • Kang, Dae-Ki (Department of Computer & Information Engineering, Dongseo University)
  • Received : 2016.07.04
  • Accepted : 2016.07.30
  • Published : 2016.09.30

Abstract

In recent years, one of deep learning models called Deep Belief Network (DBN) which formed by stacking restricted Boltzman machine in a greedy fashion has beed widely used for classification and recognition. With an ability to extracting features of high-level abstraction and deal with higher dimensional data structure, this model has ouperformed outstanding result on image and speech recognition. In this research, we assess the applicability of deep learning in dna classification level. Since the training phase of DBN is costly expensive, specially if deals with DNA sequence with thousand of variables, we introduce a new encoding method, using decimal-binary vector to represent the sequence as input to the model, thereafter compare with one-hot-vector encoding in two datasets. We evaluated our proposed model with different contrastive algorithms which achieved significant improvement for the training speed with comparable classification result. This result has shown a potential of using decimal-binary vector on DBN for DNA sequence to solve other sequence problem in bioinformatics.

Keywords

Bioinformatics;DNA Sequence Classification;Deep Learning;Deep Belief Network;Restricted Boltzmann Machine

Acknowledgement

Supported by : National Research Foundation of Korea (NRF)

References

  1. G. E. Hinton, "Learning multiple layers of representation," TRENDS Cognitive Sci., vol. 11, no. 10, pp. 428-434, 2007. https://doi.org/10.1016/j.tics.2007.09.004
  2. G. E. Dahl, M. Ranzato, A. Momamed, and G. E. Hinton, "Phone Recognition with the Mean-Covariance Restricted Boltzmann Machines," in Advances in Neural Information Processing Systems NIPS. Cambridge, MA, USA: MIT Press, 2010.
  3. G. Attardi, "a Deep Learning NLP pipeline" in Proceedings of NAACL-HLT 2015, pp. 109-115, May 31-June 5, 2015.
  4. A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," in CVPR, 2014.
  5. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
  6. L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus, "Regularization of neural networks using dropconnect," Icml, no. 1, pp. 109-111, 2013.
  7. Al-Absi, A. A., and Kang, D.-K., "Long-read Alignment with Parallel MapReduce Cloud Platform," BioMed Research International, Vol. 2015, 2015.
  8. Lockhart, David J and Winzeler, Elizabeth A. Genomics, gene expression and DNA arrays. Nature, 405(6788): 827-836, 2000. https://doi.org/10.1038/35015701
  9. Clancy, S, "DNA transcription". Nature Education 1(1):41, 2008.
  10. G. E. Hinton, S. Osindero, and Y. W. Teh, "A fast learning algorithm for deep belief nets." Neural computation, vol. 18, no. 7, pp. 1527-54, 2006 https://doi.org/10.1162/neco.2006.18.7.1527
  11. A. Fischer, C. Igel, An Introduction to Restricted Boltzmann Machines," Progress in Pattern Recognition, Image Analysiss, Computer Vision, and Applications, vol. 7441, pp. 14-36, 2012.
  12. T. Tieleman, "Training Restricted Boltzmann Machines using Approximations to the Likelihood Gradient," Proceedings of the 25th International Conference on Machine Learning, vol. 307, p. 7, 2008.
  13. Mikolov, T., Chen, K., Greg, C. and Dean, J. (2013) Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781.
  14. Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, "Greedy layer-wise training of deep networks," Advances in neural information processing systems, 19, p.153., 2007
  15. Splice Dataset, Machine Learning Repository, University of California, https://archive.ics.uci.edu/ml/machine-learning-databases/molecular-biology/splice-junction-gene-sequences/
  16. Promoter dataset, Machine Learning Repository, University of California, https://archive.ics.uci.edu/ml/machine-learning-databases/molecular-biology/promoter-gene-sequences/