• Title/Summary/Keyword: parts-of-speech

Search Result 136, Processing Time 0.024 seconds

Robust Speech Recognition in the Car Interior Environment having Car Noise and Audio Output (자동차 잡음 및 오디오 출력신호가 존재하는 자동차 실내 환경에서의 강인한 음성인식)

  • Park, Chul-Ho;Bae, Jae-Chul;Bae, Keun-Sung
    • MALSORI
    • /
    • no.62
    • /
    • pp.85-96
    • /
    • 2007
  • In this paper, we carried out recognition experiments for noisy speech having various levels of car noise and output of an audio system using the speech interface. The speech interface consists of three parts: pre-processing, acoustic echo canceller, post-processing. First, a high pass filter is employed as a pre-processing part to remove some engine noises. Then, an echo canceller implemented by using an FIR-type filter with an NLMS adaptive algorithm is used to remove the music or speech coming from the audio system in a car. As a last part, the MMSE-STSA based speech enhancement method is applied to the out of the echo canceller to remove the residual noise further. For recognition experiments, we generated test signals by adding music to the car noisy speech from Aurora 2 database. The HTK-based continuous HMM system is constructed for a recognition system. Experimental results show that the proposed speech interface is very promising for robust speech recognition in a noisy car environment.

  • PDF

On the Control of Energy Flow between the Connection Parts of Syllables for the Korean Multi-Syllabic Speech Synthesis in the Time Domain Using Mono-syllables as a Synthesis Unit (단음절 합성단위음을 사용한 시간영역에서의 한국어 다음절어 규칙합성을 위한 음절간 접속구간에서의 에너지 흐름 제어에 관한 연구)

  • 강찬희;김윤석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.24 no.9B
    • /
    • pp.1767-1774
    • /
    • 1999
  • This paper is to synthesize Korean multi-syllabic speeches in the time domain using mono-syllables as a synthesis unit. Specially it is to control the shape forms of speech energy flows between the connection parts of syllables in the case of concatenation mono-syllables. For this it is controlled with the prosody parameters1) extracted from speech waveforms in the time domains and presented the experimental results controlled the energy flows by using the induced concatenation rules from the korean syllable shapeforms in connetion parts of syllables. In the results of experiments, it is removed the incontinuities of energy follows in the connection parts produced by concatenating the mono-syllables in the time domain and also improved the qualities and naturalites of synthesized speeches.

  • PDF

Detection and Synthesis of Transition Parts of The Speech Signal

  • Kim, Moo-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.3C
    • /
    • pp.234-239
    • /
    • 2008
  • For the efficient coding and transmission, the speech signal can be classified into three distinctive classes: voiced, unvoiced, and transition classes. At low bit rate coding below 4 kbit/s, conventional sinusoidal transform coders synthesize speech of high quality for the purely voiced and unvoiced classes, whereas not for the transition class. The transition class including plosive sound and abrupt voiced-onset has the lack of periodicity, thus it is often classified and synthesized as the unvoiced class. In this paper, the efficient algorithm for the transition class detection is proposed, which demonstrates superior detection performance not only for clean speech but for noisy speech. For the detected transition frame, phase information is transmitted instead of magnitude information for speech synthesis. From the listening test, it was shown that the proposed algorithm produces better speech quality than the conventional one.

Noise Cancellation Algorithm of Bone Conduction Speech Signal using Feature of Noise in Separated Band (밴드 별 잡음 특징을 이용한 골전도 음성신호의 잡음 제거 알고리즘)

  • Lee, Jina;Lee, Gihyoun;Na, Sung Dae;Seong, Ki Woong;Cho, Jin Ho;Kim, Myoung Nam
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.2
    • /
    • pp.128-137
    • /
    • 2016
  • In mobile communication, air conduction(AC) speech signal had been commonly used, but it was easily affected by ambient noise environment such as emergency, military action and rescue. To overcome the weakness of the AC speech signal, bone conduction(BC) speech signal have been used. The BC speech signal is transmitted through bone vibration, so it is affected less by the background noise. In this paper, we proposed noise cancellation algorithm of the BC speech signal using noise feature of decomposed bands. The proposed algorithm consist of three steps. First, the BC speech signal is divided into 17 bands using perceptual wavelet packet decomposition. Second, threshold is calculated by noise feature during short time of separated-band and compared to absolute average of the signal frame. Therefore, the speech and noise parts are detected. Last, the detected noise parts are removed and then, noise eliminated bands are re-synthesised. In order to confirm the efficiency of the proposed algorithm, we compared the proposed algorithm with conventional algorithm. And the proposed algorithm has better performance than the conventional algorithm.

Quality Improvement of Bandwidth Extended Speech Using Mixed Excitation Model (혼합여기모델을 이용한 대역 확장된 음성신호의 음질 개선)

  • Choi Mu Yeol;Kim Hyung Soon
    • MALSORI
    • /
    • no.52
    • /
    • pp.133-144
    • /
    • 2004
  • The quality of narrowband speech can be enhanced by the bandwidth extension technology. This paper proposes a mixed excitation and an energy compensation method based on Gaussian Mixture Model (GMM). First, we employ the mixed excitation model having both periodic and aperiodic characteristics in frequency domain. We use a filter bank to extract the periodicity features from the filtered signals and model them based on GMM to estimate the mixed excitation. Second, we separate the acoustic space into the voiced and unvoiced parts of speech to compensate for the energy difference between narrowband speech and reconstructed highband, or lowband speech, more accurately. Objective and subjective evaluations show that the quality of wideband speech reconstructed by the proposed method is superior to that by the conventional bandwidth extension method.

  • PDF

Implementation of a Single-chip Speech Recognizer Using the TMS320C2000 DSPs (TMS320C2000계열 DSP를 이용한 단일칩 음성인식기 구현)

  • Chung, Ik-Joo
    • Speech Sciences
    • /
    • v.14 no.4
    • /
    • pp.157-167
    • /
    • 2007
  • In this paper, we implemented a single-chip speech recognizer using the TMS320C2000 DSPs. For this implementation, we had developed very small-sized speaker-dependent recognition engine based on dynamic time warping, which is especially suited for embedded systems where the system resources are severely limited. We carried out some optimizations including speed optimization by programming time-critical functions in assembly language, and code size optimization and effective memory allocation. For the TMS320F2801 DSP which has 12Kbyte SRAM and 32Kbyte flash ROM, the recognizer developed can recognize 10 commands. For the TMS320F2808 DSP which has 36Kbyte SRAM and 128Kbyte flash ROM, it has additional capability of outputting the speech sound corresponding to the recognition result. The speech sounds for response, which are captured when the user trains commands, are encoded using ADPCM and saved on flash ROM. The single-chip recognizer needs few parts except for a DSP itself and an OP amp for amplifying microphone output and anti-aliasing. Therefore, this recognizer may play a similar role to dedicated speech recognition chips.

  • PDF

Focal Parts of Utterance in Busan Korean

  • Cho, Yong-Hyung
    • Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.149-163
    • /
    • 2002
  • Focal parts of utterance can be determined by new/contrastive information, a focus particle, a contrastive topic marker, or a nominative case marker in Busan Korean. Among these factors, new or contrastive information is the most important element in determining the intonational nucleus of an utterance. However, unlike Seoul Korean, when a focus particle, a topic marker, or a case marker contributes to the placement of the most prominent peak of an utterance, the peak is on the noun to which they are attached. Moreover, the case marker-ga shows more prominent pitch on the preceding noun than the noun followed by the topic marker-nun when-ga is used as emphatic or contrastive. This is one of the major problems for Busan Korean users in commanding natural and fluent Seoul Korean intonation even if they use standard written form of Seoul Korean in their speech.

  • PDF

Study about Windows System Control Using Gesture and Speech Recognition (제스처 및 음성 인식을 이용한 윈도우 시스템 제어에 관한 연구)

  • 김주홍;진성일이남호이용범
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1289-1292
    • /
    • 1998
  • HCI(human computer interface) technologies have been often implemented using mouse, keyboard and joystick. Because mouse and keyboard are used only in limited situation, More natural HCI methods such as speech based method and gesture based method recently attract wide attention. In this paper, we present multi-modal input system to control Windows system for practical use of multi-media computer. Our multi-modal input system consists of three parts. First one is virtual-hand mouse part. This part is to replace mouse control with a set of gestures. Second one is Windows control system using speech recognition. Third one is Windows control system using gesture recognition. We introduce neural network and HMM methods to recognize speeches and gestures. The results of three parts interface directly to CPU and through Windows.

  • PDF

Parts-based Feature Extraction of Speech Spectrum Using Non-Negative Matrix Factorization (Non-Negative Matrix Factorization을 이용한 음성 스펙트럼의 부분 특징 추출)

  • 박정원;김창근;허강인
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.49-52
    • /
    • 2003
  • In this paper, we propose new speech feature parameter using NMf(Non-Negative Matrix Factorization). NMF can represent multi-dimensional data based on effective dimensional reduction through matrix factorization under the non-negativity constraint, and reduced data present parts-based features of input data. In this paper, we verify about usefulness of NMF algorithm for speech feature extraction applying feature parameter that is got using NMF in Mel-scaled filter bank output. According to recognition experiment result, we could confirm that proposal feature parameter is superior in recognition performance than MFCC(mel frequency cepstral coefficient) that is used generally.

  • PDF

Vocabulary Analyzer Based on CEFR-J Wordlist for Self-Reflection (VACSR) Version 2

  • Yukiko Ohashi;Noriaki Katagiri;Takao Oshikiri
    • Asia Pacific Journal of Corpus Research
    • /
    • v.4 no.2
    • /
    • pp.75-87
    • /
    • 2023
  • This paper presents a revised version of the vocabulary analyzer for self-reflection (VACSR), called VACSR v.2.0. The initial version of the VACSR automatically analyzes the occurrences and the level of vocabulary items in the transcribed texts, indicating the frequency, the unused vocabulary items, and those not belonging to either scale. However, it overlooked words with multiple parts of speech due to their identical headword representations. It also needed to provide more explanatory result tables from different corpora. VACSR v.2.0 overcomes the limitations of its predecessor. First, unlike VACSR v.1, VACSR v.2.0 distinguishes words that are different parts of speech by syntactic parsing using Stanza, an open-source Python library. It enables the categorization of the same lexical items with multiple parts of speech. Second, VACSR v.2.0 overcomes the limited clarity of VACSR v.1 by providing precise result output tables. The updated software compares the occurrence of vocabulary items included in classroom corpora for each level of the Common European Framework of Reference-Japan (CEFR-J) wordlist. A pilot study utilizing VACSR v.2.0 showed that, after converting two English classes taught by a preservice English teacher into corpora, the headwords used mostly corresponded to CEFR-J level A1. In practice, VACSR v.2.0 will promote users' reflection on their vocabulary usage and can be applied to teacher training.