• Title/Summary/Keyword: 화자 종속 방식

Search Result 18, Processing Time 0.064 seconds

Recognizing Five Emotional States Using Speech Signals (음성 신호를 이용한 화자의 5가지 감성 인식)

  • Kang Bong-Seok;Han Chul-Hee;Woo Kyoung-Ho;Yang Tae-Young;Lee Chungyong;Youn Dae-Hee
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.101-104
    • /
    • 1999
  • 본 논문에서는 음성 신호를 이용해서 화자의 감정을 인식하기 위해 3가지 시스템을 구축하고 이들의 성능을 비교해 보았다. 인식 대상으로 하는 감정은 기쁨, 슬픔, 화남, 두려움, 지루함, 평상시의 감정이고, 각 감정에 대한 감정 음성 데이터베이스를 직접 구축하였다. 피치와 에너지 정보를 감성 인식의 특징으로 이용하였고, 인식 알고리듬은 MLB(Maximum-Likelihood Bayes)분류기, NN(Nearest Neighbor)분류기 및 HMM(Hidden Markov Model)분류기를 이용하였다. 이 중 MLB 분류기와 NN 분류기에서는 특징벡터로 피치와 에너지의 평균과 표준편차, 최대값 등 통계적인 정보를 이용하였고, TMM 분류기에서는 각 프레임에서의 델타 피치와 델타델타 피치, 델타 에너지와 델타델타 에너지 등 시간적 정보를 이용하였다. 실험은 화자종속, 문장독립형 방식으로 하였고, 인식 실험 결과는 MLB를 이용해서 $68.9\%, NN을 이용해서 $66.7\%를 얻었고, HMM 분류기를 이용해서 $89.30\%를 얻었다.

  • PDF

Fast Algorithm for Recognition of Korean Isolated Words (한국어 고립단어인식을 위한 고속 알고리즘)

  • 남명우;박규홍;정상국;노승용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.50-55
    • /
    • 2001
  • This paper presents a korean isolated words recognition algorithm which used new endpoint detection method, auditory model, 2D-DCT and new distance measure. Advantages of the proposed algorithm are simple hardware construction and fast recognition time than conventional algorithms. For comparison with conventional algorithm, we used DTW method. At result, we got similar recognition rate for speaker dependent korean isolated words and better it for speaker independent korean isolated words. And recognition time of proposed algorithm was 200 times faster than DTW algorithm. Proposed algorithm had a good result in noise environments too.

  • PDF

A Study on Korean Phoneme Classification using Recursive Least-Square Algorithm (Recursive Least-Square 알고리즘을 이용한 한국어 음소분류에 관한 연구)

  • Kim, Hoe-Rin;Lee, Hwang-Su;Un, Jong-Gwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.6 no.3
    • /
    • pp.60-67
    • /
    • 1987
  • In this paper, a phoneme classification method for Korean speech recognition has been proposed and its performance has been studied. The phoneme classification has been done based on the phonemic features extracted by the prewindowed recursive least-square (PRLS) algorithm that is a kind of adaptive filter algorithms. Applying the PRLS algorithm to input speech signal, precise detection of phoneme boundaries has been made, Reference patterns of Korean phonemes have been generated by the ordinery vector quantization (VQ) of feature vectors obtained manualy from prototype regions of each phoneme. In order to obtain the performance of the proposed phoneme classification method, the method has been tested using spoken names of seven Korean cities which have eleven different consonants and eight different vowels. In the speaker-dependent phoneme classification, the accuracy is about $85\%$ considering simple phonemic rules of Korean language, while the accuracy of the speaker-independent case is far less than that of the speaker-dependent case.

  • PDF

Efficient TTS Database Compression Based on AMR-WB Speech Coder (AMR-WB 음성 부호화기를 이용한 TTS 데이터베이스의 효율적인 압축 기법)

  • Lim, jong-Wook;Kim, Ki-Chul;Kim, Kyeong-Sun;Lee, Hang-Seop;Park, Hae-Young;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.290-297
    • /
    • 2009
  • This paper presents an improved adaptive multi-rate wideband (AMR-WB) algorithm for the efficient Text-To-Speech (TTS) database compression. The proposed algorithm includes unnecessary common bit-stream (CBS) removal and parameter delta coding combined with speaker-dependent huffman coding to reduce the required bit-rate without any quality degradation. We also propose lossy coding schemes to produce the maximum bit-rate reduction with negligible quality degradation. The proposed lossless algorithm including CBS removal can reduce bit-rate by 12.40% without quality degradation compared with the 12.65 kbps AMR-WB mode. The proposed lossy algorithm can reduce bit-rate by 20.00% with 0.12 PESQ degradation.

A embodiment of mouse pointing system using 3-axis accelerometer and sound-recognition module (3축 가속도센서 및 음성인식 모듈을 이용한 마우스 포인팅 시스템의 구현)

  • Lee, Seung-Joon;Shin, Dong-Hwan;Kasno, Mohamad Afif B.;Kim, Joo-Woong;Park, Jin-Woo;Eom, Ki-Hwan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.934-937
    • /
    • 2010
  • In this paper, we did pursue the embodiment of a mouse pointing system which help the handicapped and people of not familiar with using electronics use electronic devices easily. Speech Recognition and 3-axis acceleration sensors in conjunction with a headset, a new mouse pointing system is constructed. We used speaker dependent system module which are generating the BCD code by recognizing human voices because it has high recognition rate rather than speaker independent system. Head-set mouse system is organized by 3-axis accelerometer, sound recognition module and TMS320F2812 processor. The main controller, TMS320F2812 DSP-processor is communicated with main computer by using SCI communications. The system is operated by Visual Basic in PC.

  • PDF

Signal Processing for Speech Recognition in Noisy Environment (잡음 환경에서 음성 인식을 위한 신호처리)

  • Kim, Weon-Goo;Lim, Yong-Hoon;Cha, Il-Whan;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.2
    • /
    • pp.73-84
    • /
    • 1992
  • This paper studies noise subtraction methods and distance measures for speech recognition in a noisy environment, and investigates noise robustness of the distance measures applied to the problem of isolated word recognition in white Gaussian and colored noise (vehicle noise) environments. Noise subtraction methods which can be used as a pre-processor for the speech recognition system, such as the spectral subtraction method, autocorrelation subtraction method, adaptive noise cancellation and acoustic beamforming are studied, and distance measures such and Log Likelihood Ratio ($d_{LLR}$), cepstral distance measure ($d_{CEP}$), weighted cepstral distance measure ($d_{WCEP}$), spectral slope distance measure ($d_{RPS}$) and cepstral projection distance measure ($d_{CP},\;d_{BCP},\;d_{WCP},\;d_{BWCP}$) are also investigated. Testing of the distance measures for speaker-dependent isolated word recognition in a noisy environment indicate that $d_{RPS}\;and\;d_{WCEP}$ which weigh higher order cepstral coefficients more heavily give considerable performance improvement over $d_{CEP}and\;d_{LLR}$. In addition, when no pre-emphasis is performed, the recognizer can maintain higher performance under high noise conditions.

  • PDF

The suppression of noise-induced speech distortions for speech recognition (음성인식을 위한 잡음하의 음성왜곡제거)

  • Chi, Sang-Mun;Oh, Yung-Hwan
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.12
    • /
    • pp.93-102
    • /
    • 1998
  • In noisy environments, human speech productions are influenced by noises(Lombard effect), and speech signals are contaminated. These distortions dramatically reduce the performance of speech recognition systems. This paper proposes a method of the Lombard effect compensation and noise suppression in order to improve speech recognition performance in noise environments. To estimate the intensity of the Lombard effect which is a nonlinear distortion depending on the ambient noise levels, speakers, and phonetic units, we formulate the measure of the Lombard effect level based on the acoustic speech signal, and the measure is used to compensate the Lombard effect. The distortions of speech under noisy environments are cancelled out as follows. First, spectral subtraction and band-pass filtering are used to cancel out noise. Second, energy nomalization is proposed to cancel out the variation of vocal intensity by the Lombard effect. Finally, the Lombard effect level controls the transform which converts Lombard speech cepstrum to clean speech cepstrum. The proposed method was validated on 50 korean word recognition. Average recognition rates were 82.6%, 95.7%, 97.6% with the proposed method, while 46.3%, 75.5%, 87.4% without any compensation at SNR 0, 10, 20 dB, respectively.

  • PDF

Types and Functions of English Hedges at a syntax-pragmatics Interface (통사화용의 접합면에서 본 영어 헤지표현의 유형과 기능)

  • Hong, Sungshim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.1
    • /
    • pp.381-388
    • /
    • 2020
  • This paper discusses English Hedges or Hedging Expressions on the basis of their morphosyntactic-pragramatic properties within the perspective of sociolinguistics. The term, 'Hedges' for the past decades since Lakoff(1973), has received little attention from the English grammar circles such as morphosyntax and the generative grammar theories. This paper presents a more comprehensive approach to the identification, distributions, functions, and the morphosyntactic properties of English Hedges. The earlier research on English Hedges in the 70's show that hedges are metalinguistic or mitadiscourse expressions which constitute a means for executing Politeness strategy in pragmatics. Nonetheless, research from the interface of syntactic-pragmatics has been scarce. This article suggests a more complex body of English hedges that have not been extensively discussed in the literature. Additionally, their configurational domain is to be proposed as part of the PolP with [±hedged] above CP+ (or CP beyond). The ramifications of the current study are suggested in terms of comparative linguistics, EFL/ESL studies of English for global communication, and pragmatics-sensitive machine translation studies in the forseeable future.