DOI QR코드

DOI QR Code

A Study on DNN-based STT Error Correction

  • Received : 2023.10.18
  • Accepted : 2023.10.29
  • Published : 2023.12.31

Abstract

This study is about a speech recognition error correction system designed to detect and correct speech recognition errors before natural language processing to increase the success rate of intent analysis in natural language processing with optimal efficiency in various service domains. An encoder is constructed to embedded the correct speech token and one or more error speech tokens corresponding to the correct speech token so that they are all located in a dense vector space for each correct token with similar vector values. One or more utterance tokens within a preset Manhattan distance based on the correct utterance token in the dense vector space for each embedded correct utterance token are detected through an error detector, and the correct answer closest to the detected error utterance token is based on the Manhattan distance. Errors are corrected by extracting the utterance token as the correct answer.

Keywords

References

  1. Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov, "Siamese Neural Networks for One-shot Image Recognition," Proceedings of the 32 nd International Conference on Machine Learning, Lille, France, 2015. JMLR: W&CP volume 37. https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf 
  2. H Sak, AW Senior, and F Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," INTERSPEECH (2014), Feb 2014. https://static.googleusercontent.com/media/research.google.com/ko//pubs/archive/43905.pdf 
  3. Chin-Hong Shih,Bi-Cheng Yan, Shih-Hung Liu, and Berlin Chen, "Investigating Siamese LSTM Networks for Text Categorization," Proceedings of APSIPA Annual Summit and Conference 2017. https://ieeexplore.ieee.org/document/8282104
  4. Kanishka Rao, Fuchun Peng, Hasim Sak, and Francoise Beaufays, "Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks," 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),19-24 April 2015. https://ieeexplore.ieee.org/document/7178767 
  5. Yoon Kim, "Convolutional Neural Networks for Sentence Classification," In Conference on Empirical Methods in Natural Language Processing, 2014. https://arxiv.org/abs/1408.5882 
  6. Ilya Sutskever, Oriol Vinyals, and Quoc V. Le, "Sequence to Sequence Learning with Neural Networks," NIPS, September 2014. https://arxiv.org/abs/1409.3215
  7. Heiga Zen and Hasim Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis," ICASSP, April 2015. https://ieeexplore.ieee.org/abstract/document/7178816 
  8. Jong-Eon Lee and Boram Lee, Device and Method for Speech recognition error correction. KR Patent 102324829, 2019.