• Title/Summary/Keyword: 콘포머

Search Result 3, Processing Time 0.018 seconds

A Korean speech recognition based on conformer (콘포머 기반 한국어 음성인식)

  • Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.488-495
    • /
    • 2021
  • We propose a speech recognition system based on conformer. Conformer is known to be convolution-augmented transformer, which combines transfer model for capturing global information with Convolution Neural Network (CNN) for exploiting local feature effectively. The baseline system is developed to be a transfer-based speech recognition using Long Short-Term Memory (LSTM)-based language model. The proposed system is a system which uses conformer instead of transformer with transformer-based language model. When Electronics and Telecommunications Research Institute (ETRI) speech corpus in AI-Hub is used for our evaluation, the proposed system yields 5.7 % of Character Error Rate (CER) while the baseline system results in 11.8 % of CER. Even though speech corpus is extended into other domain of AI-hub such as NHNdiguest speech corpus, the proposed system makes a robust performance for two domains. Throughout those experiments, we can prove a validation of the proposed system.

Conformer-based Elderly Speech Recognition using Feature Fusion Module (피쳐 퓨전 모듈을 이용한 콘포머 기반의 노인 음성 인식)

  • Minsik Lee;Jihie Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.39-43
    • /
    • 2023
  • 자동 음성 인식(Automatic Speech Recognition, ASR)은 컴퓨터가 인간의 음성을 텍스트로 변환하는 기술이다. 자동 음성 인식 시스템은 다양한 응용 분야에서 사용되며, 음성 명령 및 제어, 음성 검색, 텍스트 트랜스크립션, 자동 음성 번역 등 다양한 작업을 목적으로 한다. 자동 음성 인식의 노력에도 불구하고 노인 음성 인식(Elderly Speech Recognition, ESR)에 대한 어려움은 줄어들지 않고 있다. 본 연구는 노인 음성 인식에 콘포머(Conformer)와 피쳐 퓨전 모듈(Features Fusion Module, FFM)기반 노인 음성 인식 모델을 제안한다. 학습, 평가는 VOTE400(Voide Of The Elderly 400 Hours) 데이터셋으로 한다. 본 연구는 그동안 잘 이뤄지지 않았던 콘포머와 퓨전피쳐를 사용해 노인 음성 인식을 위한 딥러닝 모델을 제시하였다는데 큰 의미가 있다. 또한 콘포머 모델보다 높은 수준의 정확도를 보임으로써 노인 음성 인식을 위한 딥러닝 모델 연구에 기여했다.

  • PDF

A Korean menu-ordering sentence text-to-speech system using conformer-based FastSpeech2 (콘포머 기반 FastSpeech2를 이용한 한국어 음식 주문 문장 음성합성기)

  • Choi, Yerin;Jang, JaeHoo;Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.359-366
    • /
    • 2022
  • In this paper, we present the Korean menu-ordering Sentence Text-to-Speech (TTS) system using conformer-based FastSpeech2. Conformer is the convolution-augmented transformer, which was originally proposed in Speech Recognition. Combining two different structures, the Conformer extracts better local and global features. It comprises two half Feed Forward module at the front and the end, sandwiching the Multi-Head Self-Attention module and Convolution module. We introduce the Conformer in Korean TTS, as we know it works well in Korean Speech Recognition. For comparison between transformer-based TTS model and Conformer-based one, we train FastSpeech2 and Conformer-based FastSpeech2. We collected a phoneme-balanced data set and used this for training our models. This corpus comprises not only general conversation, but also menu-ordering conversation consisting mainly of loanwords. This data set is the solution to the current Korean TTS model's degradation in loanwords. As a result of generating a synthesized sound using ParallelWave Gan, the Conformer-based FastSpeech2 achieved superior performance of MOS 4.04. We confirm that the model performance improved when the same structure was changed from transformer to Conformer in the Korean TTS.