Acknowledgement
This research was supported by the National Research Foundation of Korea under grant no. NRF-2021S1A5A2A03064795.
References
- Baevski, A., Zhou, Y., Mohamed, A., & Auli, M. (2020, December). Wav2vec 2.0: A framework for self-supervised learning of speech representations. In: H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems (NeurIPS 2020) (Vol. 33, pp. 12449-12460). Online Conference.
- Boersma, P., & Weenink, D. (2001). Praat, a system for doing phonetics by computer. Glot International, 5(9), 341-345.
- Geng, M., Xie, X., Liu, S., Yu, J., Hu, S., Liu, X., & Meng, H. (2020, October). Investigation of data augmentation techniques for disordered speech recognition. Proceedings of Interspeech 2020 (pp. 696-700). Shanghai, China.
- Getman, Y., Al-Ghezi, R., Voskoboinik, K., Grosz, T., Kurimo, M., Salvi, G., Svendsen, T., & Strombergsson, S. (2022, September). Wav2vec2-based speech rating system for children with speech sound disorder. Proceedings of Interspeech (pp. 3618-3622). Incheon, Korea.
- Han, M. J., & Kim, S. J. (2021). Characteristics of functional speech sound disorders in Korean children. Annals of Child Neurology, 30(1), 8-16.
- Hitchcock, E. R., Harel, D., & Byun, T. M. (2015). Social, emotional, and academic impact of residual speech errors in school-aged children: A survey study. Seminars in Speech and Language, 36(4), 283-294.
- Javanmardi, F., Tirronen, S., Kodali, M., Kadiri, S. R., & Alku, P. (2023, June). Wav2vec-based detection and severity level classification of dysarthria from speech. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Rhodes Island, Greece.
- Jiao, Y., Tu, M., Berisha, V., & Liss, J. (2018, April). Simulating dysarthric speech for training data augmentation in clinical speech applications. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6009-6013). Calgary, AB.
- Kothalkar, P., Rudolph, J., Dollaghan, C., McGlothlin, J., Campbell, T., & Hansen, J. H. L. (2018, September). Fusing text-dependent word-level i-vector models to screen 'at risk' child speech. Proceedings of Interspeech (pp. 1681-1685). Hyderabad, India.
- Laaridh, I., Kheder, W. B., Fredouille, C., & Meunier, C. (2017, August). Automatic prediction of speech evaluation metrics for dysarthric speech. Proceedings of Interspeech 2017 (pp. 1834-1838). Stockholm, Sweden.
- McLeod, S., & Baker, E. (2017). Children's speech: An evidence-based approach to assessment and intervention. Boston, MA: Pearson.
- Ng, S. I., Ng, C. W. Y., & Lee, T. (2023, August). A study on using duration and formant features in automatic detection of speech sound disorder in children. Proceedings of Interspeech 2023 (pp. 4643-4647). Dublin, Ireland.
- Park, D. S., Chan, W., Zhang, Y., Chiu, C. C., Zoph, B., Cubuk, E. D., & Le, Q. V. (2019, September). SpecAugment: A simple data augmentation method for automatic speech recognition. Proceedings of Interspeech 2019 (pp. 2613-2617). Graz, Austria.
- Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2022). Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356. https://arxiv.org/abs/2212.04356
- Sices, L., Taylor, H. G., Freebairn, L., Hansen, A., & Lewis, B. (2007). Relationship between speech-sound disorders and early literacy skills in preschool-age children: Impact of comorbid language impairment. Journal of Developmental and Behavioral Pediatrics, 28(6), 438-447.
- Shahin, M., Zafar, U., & Ahmed, B. (2020). The automatic detection of speech disorders in children: Challenges, opportunities, and preliminary results. IEEE Journal of Selected Topics in Signal Processing, 14(2), 400-412.
- Sudro, P. N., Das, R. K., Sinha, R., & Mahadeva Prasanna, S. R. (2021, December). Significance of data augmentation for improving cleft lip and palate speech recognition. 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Tokyo, Japan.
- Wang, J., Qin, Y., Peng, Z., & Lee, T. (2019, September). Child speech disorder detection with Siamese recurrent network using speech attribute features. Proceedings of Interspeech 2019 (pp. 3885-3889). Graz, Austria.