DOI QR코드

DOI QR Code

Creating Songs Using Note Embedding and Bar Embedding and Quantitatively Evaluating Methods

음표 임베딩과 마디 임베딩을 이용한 곡의 생성 및 정량적 평가 방법

  • 이영배 (한성대학교 지식서비스&컨설팅대학원 미래융합컨설팅학과) ;
  • 정성훈 (한성대학교 기계전자공학부)
  • Received : 2021.09.30
  • Accepted : 2021.10.18
  • Published : 2021.11.30

Abstract

In order to learn an existing song and create a new song using an artificial neural network, it is necessary to convert the song into numerical data that the neural network can recognize as a preprocessing process, and one-hot encoding has been used until now. In this paper, we proposed a note embedding method using notes as a basic unit and a bar embedding method that uses the bar as the basic unit, and compared the performance with the existing one-hot encoding. The performance comparison was conducted based on quantitative evaluation to determine which method produced a song more similar to the song composed by the composer, and quantitative evaluation methods used in the field of natural language processing were used as the evaluation method. As a result of the evaluation, the song created with bar embedding was the best, followed by note embedding. This is significant in that the note embedding and bar embedding proposed in this paper create a song that is more similar to the song composed by the composer than the existing one-hot encoding.

인공신경망을 이용해서 기존 곡을 학습시키고 새로운 곡을 생성하기 위해서는 전처리 과정으로 곡을 신경망이 인식할 수 있는 숫자로 변환해야 하며, 지금까지는 원-핫 인코딩이 사용되어 왔다. 본 논문에서는 음표 임베딩과 마디 임베딩을 제안하고 기존의 원-핫 인코딩과 성능을 비교하였다. 성능비교는 어떤 방식이 작곡가가 작곡한 곡과 유사한 곡을 생성하는지를 정량적 평가에 근거해서 수행하였으며, 평가방법으로는 자연어 처리 분야에서 사용되는 정량적 평가 방법들을 이용하였다. 평가결과 마디 임베딩으로 생성한 곡이 가장 좋았으며 그 다음으로 음표 임베딩이 좋았다. 이는 본 논문에서 제안한 음표 임베딩과 마디 임베딩이 원-핫 인코딩보다 작곡가가 작곡한 곡과 유사한 곡을 생성한 것으로서 의의가 있다.

Keywords

Acknowledgement

본 연구는 한성대학교 교내학술연구비 지원과제임.

References

  1. H. Chu, R. Urtasun, and S. Fidler, "Song from PI: A musically plausible network for pop music generation" arXiv preprint arXiv:1611.03477, 2016.
  2. F. Shah, T. Naik and N. Vyas, "LSTM Based Music Generation," 2019 International Conference on Machine Learning and Data Engineering, 2019.
  3. P. Chen and E. Xu , "CS 224 N project report: From Note 2 Vec to Chord 2 Vec," 2019.
  4. S. R. Hwang and Y. C. Park, "Chord-based stepwise Korean Trot music generation technique using RNN-GAN," The Journal of the Acoustical Society of Korea, Vol.39, No.6, pp. 622-628. https://doi.org/10.7776/ASK.2020.39.6.622
  5. KakaoBrain. Similarity Method Between Words [Internet], https://www.kakaobrain.com/blog/6.
  6. L. C. Yang and A. Lerch, "On the evaluation of generative models in music," Neural Computing and Applications, Vol.32, No.9, pp.4773-4784, 2020. https://doi.org/10.1007/s00521-018-3849-7
  7. B. Logan, D. P. Ellis, and A. Berenzweig, "Toward evaluation techniques for music similarity," The MIR/MDL Evaluation Project White Paper Collection, Vol.3, pp.81-85, 2003.
  8. K. Papineni, S. Roukos, T. Ward and W. J. Zhu, "BLEU: a Method for Automatic Evaluation of Machine Translation," in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp.311-318, 2002.
  9. C. Y. Lin, "ROUGE: A package for automatic evaluation of summaries," in Proceedings of the Workshop on Text Summarization Branches Out, pp.74-81, 2004.
  10. S. Banerjee and A. Lavie "METEOR: An automatic metric for MT evaluation with improved correlation with human judgments," in Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pp.65-72, 2005.
  11. R. Vedantam, C. L. Zitnick, and D. Parikh, "CIDEr: Consensus-based image description evaluation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.4566-4575, 2015.
  12. S. Sharma, L. E. Asri, H. Schulz, and J. Zumer, "Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation," arXiv preprint arXiv: 1706.09799, 2017.
  13. T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," in Proceedings of Workshop at ICLR, 2013.
  14. J. Pennington, R. Socher, and C. D. Manning, "GloVe: Global vectors for word representation," in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp.1532-1543, 2014.
  15. A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, "Bag of tricks for efficient text classification," in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp.427-431, 2017.
  16. S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, Vol.9, No.8, pp.1735-1780, 1997. https://doi.org/10.1162/neco.1997.9.8.1735
  17. H. Sak, A. Senior, and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," in Proceedings of the Annual Conference of the International Speech Communication Association, pp.338-342, 2014.
  18. T. Mikolov, M. Karafiat, L. Burget, J. Cernock, and S. Khudanpur, "Recurrent neural network based language model," in Proceedings of the 11th Annual Conference of the International Speech Communication Association, pp.1045-1048, 2010.
  19. I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," arXiv preprint arXiv: 1409.3215, 2014.
  20. R. Jozefowicz, W. Zaremba, and B. Sutskever, "An empirical exploration of recurrent network architectures," in Proceedings of the 32nd International Conference on Machine Learning, pp.2342-2350, 2015.
  21. D. Bahdanau, K. H. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2016.
  22. K. Xu, J. Ba, R. Kiros, K. H. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio, "How, attend and tell: Neural image caption generation with visual attention," in Proceedings of the 32nd International Conference on Machine Learning, Lille, France, pp.2048-2057, 2015.
  23. Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, "Hierarchical attention networks for document classification," in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, pp.1480-1489, 2016.
  24. Z. Lin, M. Feng, S. N. Santos, M. Yu, B. Xiangl, B. Zhou, and Y. Bengio, "A structured self-attentive sentence embedding," arXiv preprint arXiv:1703.03130, 2017.