Acknowledgement
This study was supported by a grant (NRF KSN1824130) from the Korea Institute of Oriental Medicine.
References
- E. B. Lacerda and C. A. B. Mello, "Automatic classification of laryngeal mechanisms in singing based on the audio signal," Procedia Computer Science, vol. 112, pp. 2204-2212, Feb. 2017. DOI: 10.1016/j.procs.2017.08.115.
- A. Zysk and P. Badura, "An Approach for Vocal Register Recognition Based on Spectral Analysis of Singing," International Journal of Cognitive and Language Sciences, vol. 11, no. 2, pp. 207-212, Jan. 2017. DOI: 10.5281/zenodo.1128825.
- R. K. Shosted, "Vocalic context as a condition for nasal coda emergence: aerodynamic evidence," Journal of the International Phonetic Association, vol. 36, no. 1, pp. 39-58, May 2006. DOI: 10.1017/S0025100306002350.
- P. Fabre, "Percutaneous electric process registering glottic union during phonation: glottography at high frequency; first results," Bulletin de L'academie Nationale de Medecine, vol. 141, no. 3-4, pp. 66-69, Jan. 1957. DOI: 10.1007/BF02991550.
- V. Hampala, M. Garcia, J. G. Svec, R. C. Scherer, and C. T. Herbst, "Relationship between the electroglottographic signal and vocal fold contact area," Journal of Voice, vol. 30, no. 2, pp. 161-171, Mar. 2016. DOI: 10.1016/j.jvoice.2015.03.018.
- F. M. B. La and J. Sundberg, "Contact quotient versus closed quotient: a comparative study on professional male singers," Journal of Voice, vol. 29, no. 2, pp. 148-154, Mar. 2015. DOI: 10.1016/j.jvoice.2014.07.005.
- K. Verdolini, R. Chan, I. R. Titze, M. Hess, and W. Bierhals, "Correspondence of electroglottographic closed quotient to vocal fold impact stress in excised canine larynges," Journal of Voice, vol. 12, no. 4, pp. 415-423, Apr. 1997. DOI: 10.1016/S0892-1997(98)80050-7.
- J. Y. Lim, S. E. Lim, S. H. Choi, J. H. Kim, K. M. Kim, and H. S. Choi, "Clinical characteristics and voice analysis of patients with mutational dysphonia: clinical significance of diplophonia and closed quotients," Journal of voice, vol. 21, no. 1, pp. 12-19, Jan. 2007. DOI: 10.1016/j.jvoice.2005.10.002.
- K. He, X. Zhang, S. Ren, and, J. Sun, "Deep residual learning for image recognition," in Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas: US, pp. 770-778, 2016. DOI: 10.1109/CVPR.2016.90.
- J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," in Proceeding of the 33th Advances in Neural Information Processing Systems, Vancouver: CA, pp. 6840-6851, 2020. DOI: 10.48550/arXiv.2006.11239.
- D. Silver, A. Huang, CJ. Maddison, and A. Guez, "Mastering the game of Go with deep neural networks and tree search". Nature, vol. 529, pp. 484-489, Jan. 2016. DOI: 10.1038/nature16961.
- T. Haarnoja, A, Zhou, P. Abbeel, and S. Levine, "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," in Proceeding of International Conference on Machine Learning, Stockholm: SE, pp. 1861-1870, 2018. DOI: 10.48550/arXiv.1801.01290.
- T. Haarnoja, A, Zhou, P. Abbeel, and S. Levine, "Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor," in Proceeding of International Conference on Machine Learning, Stockholm: SE, pp. 1861-1870, 2018. DOI: 10.48550/arXiv.1801.01290.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, AN. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," in Proceeding of the 30th Advances in Neural Information Processing Systems, Long beach: US, pp. 6000-6010, 2017. DOI: 10.48550/arXiv.1706.03762.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, "Language models are few-shot learners," in Proceeding of the 33 th Advances in Neural Information Processing Systems, Vancouver: CA, pp. 1877-1901, 2020. DOI: 10.48550/arXiv.2005.14165.
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735-1780, Nov. 1997. DOI: 10.1162/neco.1997.9.8.1735.
- J. Chen X. Xue. "A transfer learning-based long short-term memory model for the prediction of river water temperature," Engineering Applications of Artificial Intelligence, vol. 133, pp. 108605, 2024. DOI: 10.1016/j.engappai.2024.108605.
- C. Qin, D. Qin, Q. Jiang, and B. Zhu, "Forecasting carbon price with attention mechanism and bidirectional long short-term memory network," Energy, vol. 299, pp. 131410, 2024. DOI: 10.1016/j.energy.2024.131410.
- R. Dey and F. M. Salem, "Gate-variants of gated recurrent unit (GRU) neural networks," in Proceeding of 2017 IEEE 60th International Midwest Symposium on Circuits and Systems, Boston: US, pp. 1597-1600, Aug. 2017. DOI: 10.1109/MWSCAS.2017.8053243.
- Y. Yevnin, S. Chorev, I. Dukan, and Y. Toledo, "Short-term wave forecasts using gated recurrent unit," Ocean Engineering, vol. 268, no. 15, pp. 113389, 2023. DOI: 10.1016/j.oceaneng.2022.113389.
- L. Zhang, J. Zhang, W. Gao, F. Bai, N. Li, and N. Ghadimi, "A deep learning outline aimed at prompt skin cancer detection utilizing gated recurrent unit networks and improved orca predation algorithm," Biomedical Signal Processing and Control, vol. 90, pp. 105858, 2024. DOI: 10.1016/j.bspc.2023.105858.
- E. Terhardt, G. Stoll, and M. Seewann, "Algorithm for extraction of pitch and pitch salience from complex tonal signals," The Journal of the Acoustical Society of America, vol. 71, no. 3, pp. 679-688, Mar. 1982. DOI: 10.1121/1.387544.
- D. Talkin and W. B. Klejin, "A robust algorithm for pitch tracking (RAPT)," Speech Coding and Synthesis, pp. 497-518, 1995.
- D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, Dec. 2014. DOI: 10.48550/arXiv.1412.6980.
- C. Ittichaichareon, S. Suksri, and T. Yingthawornsuk, "Speech recognition using MFCC," in Proceeding of International Conference on Computer Graphics, Simulation and Modeling, Pattaya: TH, vol. 9. pp. 135-138, Jul. 2012.
- M. Muller and S. Ewert, "Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features," International Society for Music Information Retrieval, 2011.
- B. P. Das and R. Parekh, "Recognition of isolated words using features based on LPC, MFCC, ZCR and STE, with neural network classifiers," International Journal of Modern Engineering Research, vol. 2. no. 3. pp. 854-858, May-Jun. 2012.
- H. Panti, A. Jagtap, V. Bhoyar, and A. Gupta, "Speech emotion recognition using MFCC, GFCC, chromagram and RMSE features," in Proceeding of 2021 8th International Conference on Signal Processing and Integrated Networks (SPIN), Nodia: IN, Aug. 2021. DOI: 10.1109/SPIN52536.2021.9566046.
- L. Breiman, "Random forests," Machine Learning, vol. 45, pp. 5-32, 2001. DOI: 10.1023/A:1010933404324.
- J. Hu and S. Szymczak, "A review on longitudinal data analysis with random forests," Briefings in Bioinformatics, vol. 24, no. 2, pp. 1-11, 2023. DOI: 10.1093/bib/bbad002.
- T. Chen, C. Guestrin, and Carlos Guestrin. "Xgboost: A scalable tree boosting system." in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 785-794. DOI: 10.1145/2939672.2939785.