• Title/Summary/Keyword: 음성분석 및 변환

Search Result 65, Processing Time 0.025 seconds

Real-time implementation of the 2.4kbps EHSX Speech Coder Using a $TMS320C6701^TM$ DSPCore ($TMS320C6701^TM$을 이용한 2.4kbps EHSX 음성 부호화기의 실시간 구현)

  • 양용호;이인성;권오주
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.7C
    • /
    • pp.962-970
    • /
    • 2004
  • This paper presents an efficient implementation of the 2.4 kbps EHSX(Enhanced Harmonic Stochastic Excitation) speech coder on a TMS320C6701$^{TM}$ floating-point digital signal processor. The EHSX speech codec is based on a harmonic and CELP(Code Excited Linear Prediction) modeling of the excitation signal respectively according to the frame characteristic such as a voiced speech and an unvoiced speech. In this paper, we represent the optimization methods to reduce the complexity for real-time implementation. The complexity in the filtering of a CELP algorithm that is the main part for the EHSX algorithm complexity can be reduced by converting program using floating-point variable to program using fixed-point variable. We also present the efficient optimization methods including the code allocation considering a DSP architecture and the low complexity algorithm of harmonic/pitch search in encoder part. Finally, we obtained the subjective quality of MOS 3.28 from speech quality test using the PESQ(perceptual evaluation of speech quality), ITU-T Recommendation P.862 and could get a goal of realtime operation of the EHSX codec.c.

Pitch Period Detection Algorithm Using Rotation Transform of AMDF (AMDF의 회전변환을 이용한 피치 주기 검출 알고리즘)

  • Seo, Hyun-Soo;Bae, Sang-Bum;Kim, Nam-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.1019-1022
    • /
    • 2005
  • As recent information communication technology is rapidly developed, a lot of researches related to speech signal processing have been processed. So pitch period is applied as important factor to many application fields such as speech recognition, speaker identification, speech analysis and synthesis. Therefore, many algorithms related to pitch detection have been proposed in time domain and frequency domain and AMDF(average magnitude difference function) which is one of pitch detection algorithms in time domain chooses time interval from valley to valley as pitch period. But, in selection of valley point to detect pitch period, complexity of the algorithm is increased. So in this paper we proposed pitch detection algorithm using rotation transform of AMDF, that taking the global minimum valley point as pitch period and established a threshold about the phoneme in beginning portion, to exclude pitch period selection. and compared existing methods with proposed method through simulation.

  • PDF

Template-based Auto Social Magazine and Video Creation Service (템플릿 기반의 자동 소셜 매거진 및 영상 합성 서비스)

  • Lee, Jae-Won;Jang, Dal-Won;Kim, Mi-Ji;Kim, Ji-Su;Kim, Seo-Yul;Lee, Jong-Seol
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.06a
    • /
    • pp.129-132
    • /
    • 2019
  • 최근 자연어 처리 기술에 대한 중요도가 높아지고, 발전 속도가 빨라지면서, 산업 전반에 걸쳐 챗봇에 대한 수요가 증가하고 있다. 본 논문은 챗봇을 이용한 소셜 매거진 생성 및 배포, 그리고 이를 활용하여 사용자에게 텍스트를 음성으로 변환하여 동영상의 형태로 전달해 주는 시스템을 다루고 있다. 챗봇이 사용자 대화를 수집, 분석하여 상황에 맞는 키워드를 추출하고, 중복 콘텐츠 제거, 텍스트 요약 등 일련의 과정을 거쳐 소셜 매거진을 생성 및 배포하는 서비스와, 매거진의 각 콘텐츠를 구성하는 이미지, 텍스트 정보를 가지고 음성 합성, 자막 생성, 영상 효과 등을 이용하여 영상을 합성하는 서비스에 관한 것이다. 본 논문에서 제안한 시스템에 대한 성능은 실험을 통하여 검증하였다.

  • PDF

Speech Feature Extraction based on Spikegram for Phoneme Recognition (음소 인식을 위한 스파이크그램 기반의 음성 특성 추출 기술)

  • Han, Seokhyeon;Kim, Jaewon;An, Soonho;Shin, Seonghyeon;Park, Hochong
    • Journal of Broadcast Engineering
    • /
    • v.24 no.5
    • /
    • pp.735-742
    • /
    • 2019
  • In this paper, we propose a method of extracting speech features for phoneme recognition based on spikegram. The Fourier-transform-based features are widely used in phoneme recognition, but they are not extracted in a biologically plausible way and cannot have high temporal resolution due to the frame-based operation. For better phoneme recognition, therefore, it is desirable to have a new method of extracting speech features, which analyzes speech signal in high temporal resolution following the model of human auditory system. In this paper, we analyze speech signal based on a spikegram that models feature extraction and transmission in auditory system, and then propose a method of feature extraction from the spikegram for phoneme recognition. We evaluate the performance of proposed features by using a DNN-based phoneme recognizer and confirm that the proposed features provide better performance than the Fourier-transform-based features for short-length phonemes. From this result, we can verify the feasibility of new speech features extracted based on auditory model for phoneme recognition.

Pitch Period Detection Algorithm Using Modified AMDF (변형된 AMDF를 이용한 피치 주기 검출 알고리즘)

  • Seo Hyun-Soo;Bae Sang-Bum;Kim Nam-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.1
    • /
    • pp.23-28
    • /
    • 2006
  • Pitch period that is a important factor in speech signal processing is used in various applications such as speech recognition, speaker identification, speech analysis and synthesis. So many pitch detection algorithms have been studied until now. AMDF which is one of pitch period detection algorithms chooses the time interval from valley point to valley point as pitch period. In selection of valley point to detect pitch period, complexity of the algorithm is increased. So in this paper we proposed the simple algorithm using rotation transform of AMDF that detects global minimum valley point as pitch period of speech signal and compared it with existing methods through simulation.

Moderating Effects of User Gender and AI Voice on the Emotional Satisfaction of Users When Interacting with a Voice User Interface (음성 인터페이스와의 상호작용에서 AI 음성이 성별에 따른 사용자의 감성 만족도에 미치는 영향)

  • Shin, Jong-Gyu;Kang, Jun-Mo;Park, Yeong-Jin;Kim, Sang-Ho
    • Science of Emotion and Sensibility
    • /
    • v.25 no.3
    • /
    • pp.127-134
    • /
    • 2022
  • This study sought to identify the voice user interface (VUI) design parameters that evoked positive user emotions. Six VUI design parameters that could affect emotional user satisfaction were considered. The moderating effects of user gender and the design parameters were analyzed to determine the appropriate conditions for user satisfaction when interacting with the VUI. An interactive VUI system that could modify the six parameters was implemented using the Wizard of OZ experimental method. User emotions were assessed from the users' facial expression data, which was then converted into a valence score. The frequency analysis and chi-square test found that there were statistically significant moderating gender and AI effects. These results implied that it is beneficial to consider the users' gender when designing voice-based interactions. Adult/male/high-tone voices for males and adult/female/mid-tone voices for females are recommended as general guidelines for future VUI designs. Future analyses that consider various human factors will be able to more delicately assess human-AI interactions from a UX perspective.

Implementation & Usability Evaluation of Math Expression Reader for Domestic Reading Disables (국내 독서장애인을 위한 Math Expression Reader의 구현 및 사용성 평가)

  • Lee, Jae-Hwa;Lee, Jong-Woo;Lim, Soon-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.15 no.7
    • /
    • pp.951-961
    • /
    • 2012
  • E-books produced in the country provide limited audio service for reading disables. The reason is that those books cannot translate the mathematical expressions and symbols in the context. In this paper, the 'Math Expression Reader' was implemented that can translate the expressions and symbols in the document into Korean speech for those who have reading disabilities. The math to speech generated by this program has been tested to both the public and reading disables and the results of this test has been compared whether they can exactly understand the speech and evaluated the reading rules.

Tree-based Modeling of Prosodic Phrasing and Segmental Duration (운율구 추출 및 음소 지속 시간의 트리 기반 모델링)

  • 이상호;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.6
    • /
    • pp.43-53
    • /
    • 1998
  • 본 논문에서는 한국어 TTS시스템을 위한 운율구 추출, 운율구 사이의 휴지 기간, 음소의 지속 시간 모델링 방법을 설명한다. 실험을 위해 여러 장르로 구성된 400문장을 선 정하고, 이를 전문 여성 아나운서가 발성하였다. 녹음된 음성 신호에 대해 음소 및 운율구 경계를 결정하고, 문장에 대해서는 형태소 분석, 발음표기 변환, 구문 분석을 수행하였다. 400문장(약33분) 중 240문장(약20분)을 이용하여 결정 트리 및 회귀 트리를 학습시킨 후, 160분장(약13분)에 대해 실험하였다. 운율 모델링을 위한 특징들이 제안되었고, 학습된 트리 들을 해석함으로써 특징들의 유효성이 평가되었다. 실험 문장에 대해 운율구 경계의 유무를 결정하는 결정 트리의 오류율은 14.46%이었고, 운율구 사이의 휴지 기간과 음소 지속 시간 을 예측하기 위한 회귀 트리들의 평균 제곱 오류근(RMSE)이 각각 132msec, 22msec이었다. 수집된 모든 자료(400문장)로 학습한 결과, 운율구 경계 결정 오류율, 휴지 기간 및 지속시 간 RMSE의 10-fold cross-validation 추정치가 각각 13.77%, 127.91msec, 21.54msec이었다.

  • PDF

Restoration for Speech Records Managed by the National Archives of Korea (국가기록원 음성 기록물의 복원과 분석)

  • Oh, Sejin;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.3
    • /
    • pp.269-278
    • /
    • 2013
  • The speech recording of the National Archives of Korea contains very important traces which represent modern times of Korea. But the way to be recorded by analogue is easily contaminated as time goes by. So it has to be digitalized for management and services. Consequently, restoration method of distorted speech is needed. We propose the four classes for each distortion kind and apply restoration algorithms for the cases of speech level, stationary noise and abrupt noise. As a result, speech volume adjusts to -26 dBov for only on the speech region and SNR improves above 10dB. Especially, conventional way to remove the noise is almost impossible because we need to listen to all of them but it can be more effective by adaptation of auto restoration algorithm.

Automatic Dubbing System for Remote Personalized Basketball Feedback Video (원격 개인 농구 기술 피드백 영상 자동 더빙 시스템)

  • Jong-Uk Lim;Ray Kim;Young Yoon
    • Annual Conference of KIPS
    • /
    • 2024.05a
    • /
    • pp.466-467
    • /
    • 2024
  • 본 논문은 전문 스킬 트레이너들의 개인 농구 기술 분석 및 피드백 영상에 더빙을 자동으로 적용하는 시스템을 제안한다. 이 시스템은 농구 용어집 기반 번역, 음성-텍스트 변환 모델 간의 비교 분석, 영상과 더빙 트랙 동기화 알고리즘을 통해 다양한 언어로의 신속한 자동 번역과 더빙을 가능하게 함으로써 선수와 코치 간의 언어 장벽 없는 소통을 지원한다. 본 연구는 자동 더빙 기술에 힘입어 원격 농구 교육 효율성과 질의 재고 및 저변 확산에 기여하고자 한다.