• Title/Summary/Keyword: 음향음성학

Search Result 749, Processing Time 0.019 seconds

A New Morphological Analysis for the Spoken Language Translation System (음성언어 번역 시스템을 위한 새로운 형태소 분석)

  • 양승원;김재훈
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.4
    • /
    • pp.17-22
    • /
    • 1999
  • It is difficult to integrate the speech processing systems and machine translation system in the spoken language translation system by reason that each system uses its own data and basic processing unit. So, we need a common I/O unit which is used in the whole system. In this paper, we propose a Pscudo-Morpheme as the interface between speech processing systems and language translation system. We implement a morphological analysis system for Pseudo-morpheme. The speech processing system using this pseudo-morpheme can get better result than other systems using the phrase or the general morpheme. So, the quality of the whole spoken language translation system can be improved. The analysis-ratio of our implemented system is 98.9%. This is similar to the common morphological analysis systems.

  • PDF

Design of Wideband Speech Coder Compatible with CS-ACELP (CS-ACELP와 호환성을 갖는 광대역 음성 부호화기 설계)

  • 김동주;이인성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.4
    • /
    • pp.52-57
    • /
    • 2000
  • In this paper, we designed the 16 Kbps speech coder that has compatibility with CS-ACELP algorithm(G.729). The speech signal is sampled at rate of 16 KHz, divided into two narrowband signal by QMF filterbank, and decimated to rate of 8 KHz. The lower-band signal is encoded by CS-ACELP and the upper-band signal is encoded by Adaptive Transform Coding(ATC) algorithm. At the receiver, two band signals are synthesized by decoder of CS-ACELP and ATC, respectively. The reconstructed output is obtained by passing the QMF synthesis bank. The proposed wideband coder is evaluated with ITU-T G.722 coder through the Mean Opinion Score(MOS) test.

  • PDF

Audio Mixer Algorithm for Enhancing Speech Quality of Multi-party Audio Telephony (다자간 음성통화 품질 향상을 위한 오디오 믹서 알고리즘)

  • Ryu, Sang-Hyeon;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.6
    • /
    • pp.541-547
    • /
    • 2013
  • The speech quality of multi-party audio telephony between two, three or more participants is decreased by audio volume imbalance, audio volume saturation and noise level increase. To solve this issue, this paper proposes an advanced audio mixing algorithm for software-based multi-point control unit. Our approach is based on the combined voice activity detection and gain control technique that consists of a set of algorithms that classify audio signals, estimate audio volumes, adjust gain factors and mix audio signals of all channels. The proposed audio mixing algorithm is computationally efficient, delivers high-quality speech, and is suitable for use in any practical multi-party audio telephony.

Transmission of Channel Error Information over Voice Packet (음성 패킷을 이용한 채널의 에러 정보 전달)

  • 박호종;차성호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.394-400
    • /
    • 2002
  • In digital speech communications, the quality of service can be increased by speech coding scheme that is adaptive to the error rate of voice packet transmission. However, current communication protocol in cellular and internet communications does not provide the function that transmits the channel error information. To solute this problem, in this paper, new method for real-time transmission of channel error information is proposed, where channel error information is embedded in voice packet. The proposed method utilizes the pulse positions of codevector in ACELP speech codec, which results in little degradation in speech quality and low false alarm rate. The simulations with various speech data show that the proposed method meets the requirement in speech quality, detection rate, and false alarm rate.

Transformer-based transfer learning and multi-task learning for improving the performance of speech emotion recognition (음성감정인식 성능 향상을 위한 트랜스포머 기반 전이학습 및 다중작업학습)

  • Park, Sunchan;Kim, Hyung Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.515-522
    • /
    • 2021
  • It is hard to prepare sufficient training data for speech emotion recognition due to the difficulty of emotion labeling. In this paper, we apply transfer learning with large-scale training data for speech recognition on a transformer-based model to improve the performance of speech emotion recognition. In addition, we propose a method to utilize context information without decoding by multi-task learning with speech recognition. According to the speech emotion recognition experiments using the IEMOCAP dataset, our model achieves a weighted accuracy of 70.6 % and an unweighted accuracy of 71.6 %, which shows that the proposed method is effective in improving the performance of speech emotion recognition.

Voice Activity Detection Based on Discriminative Weight Training with Feedback (궤환구조를 가지는 변별적 가중치 학습에 기반한 음성검출기)

  • Kang, Sang-Ick;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.8
    • /
    • pp.443-449
    • /
    • 2008
  • One of the key issues in practical speech processing is to achieve robust Voice Activity Deteciton (VAD) against the background noise. Most of the statistical model-based approaches have tried to employ equally weighted likelihood ratios (LRs), which, however, deviates from the real observation. Furthermore voice activities in the adjacent frames have strong correlation. In other words, the current frame is highly correlated with previous frame. In this paper, we propose the effective VAD approach based on a minimum classification error (MCE) method which is different from the previous works in that different weights are assigned to both the likelihood ratio on the current frame and the decision statistics of the previous frame.

Prosodic Characteristics of Korean Distant Speech (한국어 원거리 음성의 운율적 특성)

  • Kim Sun-Hee;Kim Jong-Jin;Lee Sook-Hyang
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.3
    • /
    • pp.137-143
    • /
    • 2006
  • The aim of this paper is to investigate the prosodic characteristics of Korean distant speech. Four speakers (2 males and 2 females) produced 36 2-syllable words in both distant-talking and normal environments. totaling 288 spoken 2-syllable words. The results showed that ratios of second syllable to first syllable in vowel duration and vowel energy were significantly larger in the distant-talking environment compared to the normal environment and f0 range also bigger in the distant-talking environment. In addition, 'HL%' contour boundary tone in the second syllable and/or 'L+H' contour tone in the first syllable were used in the distant-talking environment.

Speech Intelligibility Analysis on the Laser Detected Sound of the Glass Windows (유리창의 레이저 탐지음에 대한 음성명료도 분석)

  • Kim, Seock-Hyun;Lee, Hyun-Woo;Kim, Hee-Dong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.127-134
    • /
    • 2009
  • In this study, possibility of the laser eavesdropping is investigated on the window glasses with various thicknesses, Glass windows are excited by maximum length sequency (MLS) signal and the vibration sound is detected by a laser doppler vibrometer. From the detected sound, speech intelligibility is objectively estimated. Speech transmission index (STI), which is based on the modulation transfer function (MTF). is calculated for the estimation. Finally, disturbing wave effect on the speech intelligibility is analysed by using an outside speaker and a window shaker attached on the glass window. The purpose of the study is to estimate the possibility of remote eavesdropping by the laser sensor and to evaluate the performance of the homemade window shaker to protect from the remote eavesdropping.

A Probabilistic Combination Method of Minimum Statistics and Soft Decision for Robust Noise Power Estimation in Speech Enhancement (강인한 음성향상을 위한 Minimum Statistics와 Soft Decision의 확률적 결합의 새로운 잡음전력 추정기법)

  • Park, Yun-Sik;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.4
    • /
    • pp.153-158
    • /
    • 2007
  • This paper presents a new approach to noise estimation to improve speech enhancement in non-stationary noisy environments. The proposed method combines the two separate noise power estimates provided by the minimum statistics (MS) for speech presence and soft decision (SD) for speech absence in accordance with SAP (Speech Absence Probability) on a separate frequency bin. The performance of the proposed algorithm is evaluated by the subjective test under various noise environments and yields better results compared with the conventional MS or SD-based schemes.

User-Identification on WINDOWS Environment by Using the Speech (윈도우 환경에서 음성을 이용한 사용자 확인에 관한 연구)

  • 정종순;배재옥;배명진
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.5
    • /
    • pp.3-11
    • /
    • 1998
  • 본 논문은 윈도우즈 95와 같은 멀티미디어 환경 하에서 개인신분 확인 기능을 DTW 이용하여 수행하였다. 즉, 개인신분 확인을 위한 기존 방법으로는 비밀번호를 키보드 로 입력받는 것이었으나, 본 논문에서는 음성을 이용하였다. 본 논문의 중요한 특징은 다음 과 같다. (1) 최근의 음성패턴으로 갱신하기 위해서 F1/F0율을 구하여 사용하였다. 이 방법 은 시간 흐름에 따른 인식율이 저하되는 것을 최소화 하기 위한 것이다. (2) 화자간의 변별 력을 극대화하기 위하여 가중 켑스트럼을 사용하였다. 즉, 가중 켑스트럼은 화자별로 유용한 컵스트럼 차수를 구하여, 그 차수에 가중치를 두는 것으로 F-ratio 값을 사용하였다. 제안된 방법으로 실험한 결과, 기존의 DTW 방법을 이용한 것보다 인식율이 5%이상 개선 되었다. 따라서, 윈도우즈 환경에서 비밀번호 사용 대신 음성 사용에 대한 가능성을 보여 주었다.

  • PDF