• Title/Summary/Keyword: Non-speech

Search Result 468, Processing Time 0.029 seconds

Performance analysis of speaker verification system adopting the ACHARF ANC (ACHARF ANC를 채용한 화자인증시스템의 성능분석)

  • Lee Hyun Seung;Choi Hong Sub;Shin Yoon Ki
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.179-182
    • /
    • 2002
  • The development of noise robust speech processing systems is becoming increasingly important as speech technology is currently widely applied in real world applications. Recently, to resolve such a noise problem, adaptive noise canceller(ANC) is frequently used, which is based upon adaptive filters. The adaptive recursive filters perform better than adaptive non-recursive filters due to the added poles, but the stability may be severely threatened. But these problems of adaptive recursive filters was solved by ACHARF algorithm. This paper presents a method which combines speaker verification system with ANC(Adaptive Noise Canceller) using the ACHARF algorithm. In the front-end stage, ANC is adopted to suppress the additive noise imposed on the speech signal. The results show that the performance of speaker verification system becomes better than before.

  • PDF

Acoustic Evidence for the Development of Aspiration Feature in Putonghua Stops

  • Han, Ji-Yeon
    • Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.201-209
    • /
    • 2005
  • This study was investigated developmental temporal features in Putonghua-speaking children. The total of 212 children between the ages 2;6 and 6;5 participated in Shanghai. Speech materials were constructed according to aspiration feature in stop sounds of Putonghua. Six words were selected in this study. A voice onset time was measured. Non-parametric procedures were employed for all the analyses. The VOT value across bilabial, alveolar, and velar stops was significantly differed between aspirated and unaspirated stops for each age group. Effect of age is. significant for unaspirated stops. It is clear that each of Putonghua stops showed decreasing mean and standard deviation. The overshoot phenomenon of VOT was apparent from the age of 2;6-2;11 to 4;6-4;11. There was high variability in the production of lag time for aspirated stops.

  • PDF

Implementation of Korean TTS System based on Natural Language Processing (자연어 처리 기반 한국어 TTS 시스템 구현)

  • Kim Byeongchang;Lee Gary Geunbae
    • MALSORI
    • /
    • no.46
    • /
    • pp.51-64
    • /
    • 2003
  • In order to produce high quality synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model from texts using natural language processing. Robust preprocessing for non-Korean characters should also be required. In this paper, we analyzed Korean texts using a morphological analyzer, part-of-speech tagger and syntactic chunker. We present a new grapheme-to-phoneme conversion method for Korean using a hybrid method with a phonetic pattern dictionary and CCV (consonant vowel) LTS (letter to sound) rules, for unlimited vocabulary Korean TTS. We constructed a prosody model using a probabilistic method and decision tree-based method. The probabilistic method atone usually suffers from performance degradation due to inherent data sparseness problems. So we adopted tree-based error correction to overcome these training data limitations.

  • PDF

Rao-Blackwellized Particle Filtering for Sequential Speech Enhancement (Rao-Blackwellized particle filter를 이용한 순차적 음성 강조)

  • Park Sun-Ho;Choi Seun-Jin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06b
    • /
    • pp.151-153
    • /
    • 2006
  • we present a method of sequential speech enhancement, where we infer clean speech signal using a Rao-Blackwellized particle filter (RBPF), given a noise-contaminated observed signal. In contrast to Kalman filtering-based methods, we consider a non-Gaussian speech generative model that is based on the generalized auto-regressive (GAR) model. Model parameters are learned by a sequential Newton-Raphson expectation maximization (SNEM), incorporating the RBPF. Empirical comparison to Kalman filter, confirms the high performance of the proposed method.

  • PDF

Effects of base token for stimuli manipulation on the perception of Korean stops among native and non-native listeners

  • Oh, Eunjin
    • Phonetics and Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.43-50
    • /
    • 2020
  • This study investigated whether listeners' perceptual patterns varied according to base token selected for stimuli manipulation. Voice onset time (VOT) and fundamental frequency (F0) values were orthogonally manipulated, each in seven steps, using naturally produced words that contained a lenis (/kan/) and an aspirated (/khan/) stop in Seoul Korean. Both native and non-native groups showed significantly higher numbers of aspirated responses for the stimuli constructed with /khan/, evidencing the use of minor cues left in the stimuli after manipulation. For the native group the use of the VOT and F0 cues in the stop categorization did not differ depending on whether the base token included the lenis or aspirated stop, indicating that the results of previous studies remain tenable that investigated the relative importance of the acoustic cues in the native listener perception of the Korean stop contrasts by using one base token for manipulating perceptual stimuli. For the non-native group, the use patterns of the F0 cue differed as a function of base token selected. Some findings indicated that listeners used alternative cues to identify the stop contrast when major cues sound ambiguous. The use of the manipulated VOT and F0 cues by the non-native group was not native-like, suggesting that non-native listeners may have perceived the minor cues as stable in the context of the manipulated cue combinations.

Voice Activity Detection in Noisy Environment using Speech Energy Maximization and Silence Feature Normalization (음성 에너지 최대화와 묵음 특징 정규화를 이용한 잡음 환경에 강인한 음성 검출)

  • Ahn, Chan-Shik;Choi, Ki-Ho
    • Journal of Digital Convergence
    • /
    • v.11 no.6
    • /
    • pp.169-174
    • /
    • 2013
  • Speech recognition, the problem of performance degradation is the difference between the model training and recognition environments. Silence features normalized using the method as a way to reduce the inconsistency of such an environment. Silence features normalized way of existing in the low signal-to-noise ratio. Increase the energy level of the silence interval for voice and non-voice classification accuracy due to the falling. There is a problem in the recognition performance is degraded. This paper proposed a robust speech detection method in noisy environments using a silence feature normalization and voice energy maximize. In the high signal-to-noise ratio for the proposed method was used to maximize the characteristics receive less characterized the effects of noise by the voice energy. Cepstral feature distribution of voice / non-voice characteristics in the low signal-to-noise ratio and improves the recognition performance. Result of the recognition experiment, recognition performance improved compared to the conventional method.

Development of Autonomous Mobile Robot with Speech Teaching Command Recognition System Based on Hidden Markov Model (HMM을 기반으로 한 자율이동로봇의 음성명령 인식시스템의 개발)

  • Cho, Hyeon-Soo;Park, Min-Gyu;Lee, Hyun-Jeong;Lee, Min-Cheol
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.13 no.8
    • /
    • pp.726-734
    • /
    • 2007
  • Generally, a mobile robot is moved by original input programs. However, it is very hard for a non-expert to change the program generating the moving path of a mobile robot, because he doesn't know almost the teaching command and operating method for driving the robot. Therefore, the teaching method with speech command for a handicapped person without hands or a non-expert without an expert knowledge to generate the path is required gradually. In this study, for easily teaching the moving path of the autonomous mobile robot, the autonomous mobile robot with the function of speech recognition is developed. The use of human voice as the teaching method provides more convenient user-interface for mobile robot. To implement the teaching function, the designed robot system is composed of three separated control modules, which are speech preprocessing module, DC servo motor control module, and main control module. In this study, we design and implement a speaker dependent isolated word recognition system for creating moving path of an autonomous mobile robot in the unknown environment. The system uses word-level Hidden Markov Models(HMM) for designated command vocabularies to control a mobile robot, and it has postprocessing by neural network according to the condition based on confidence score. As the spectral analysis method, we use a filter-bank analysis model to extract of features of the voice. The proposed word recognition system is tested using 33 Korean words for control of the mobile robot navigation, and we also evaluate the performance of navigation of a mobile robot using only voice command.

DNN based Speech Detection for the Media Audio (미디어 오디오에서의 DNN 기반 음성 검출)

  • Jang, Inseon;Ahn, ChungHyun;Seo, Jeongil;Jang, Younseon
    • Journal of Broadcast Engineering
    • /
    • v.22 no.5
    • /
    • pp.632-642
    • /
    • 2017
  • In this paper, we propose a DNN based speech detection system using acoustic characteristics and context information of media audio. The speech detection for discriminating between speech and non-speech included in the media audio is a necessary preprocessing technique for effective speech processing. However, since the media audio signal includes various types of sound sources, it has been difficult to achieve high performance with the conventional signal processing techniques. The proposed method improves the speech detection performance by separating the harmonic and percussive components of the media audio and constructing the DNN input vector reflecting the acoustic characteristics and context information of the media audio. In order to verify the performance of the proposed system, a data set for speech detection was made using more than 20 hours of drama, and an 8-hour Hollywood movie data set, which was publicly available, was further acquired and used for experiments. In the experiment, it is shown that the proposed system provides better performance than the conventional method through the cross validation for two data sets.

Effect of Listening to Music on Speech Anxiety among Middle-school Female Students (음악청취가 중학생의 발표불안에 미치는 영향)

  • Oh Yoo-Suk;Sohn Jin-Hun;Jang Eun-Hye;Suk Ji-A;Lee Ok-Hyun
    • Science of Emotion and Sensibility
    • /
    • v.7 no.4
    • /
    • pp.43-49
    • /
    • 2004
  • This study investigated the effect of listening to music on the reduction of public speech anxiety among middle-school female students. The subjects were 66 female first graders, 33 of the experimental and another 33 of the control group selected from two classes of C middle school in Chonan city. The level of anxiety (SAS) was rated by the subjects through self-report and the speech behavior (SBES) of the subjects were evaluated by a teacher before, a week after and two weeks after the speech trials. 64 pieces of music were selected based on the music therapy-related references. After 23 out of 64 pieces were selected for the preliminary experiment and 7 pieces as having positive effects through evaluating those 23 pieces in the other class in the same gaders were finally selected. Subjects listened to music for 40 min two times a day for two weeks with a cassette player in the classroom. The result yielded the followings: 1) Self-reported public speech anxiety decreased both in the experimental and control groups. However, there found more decrease in the music-listening group than in the non-listening music group. 2) The public speech behavior improved both in the experimental and control groups with on difference between the two groups. This suggests a possibility that SBES may not be a accurate measure to evaluate the speech anxiety. We conclude that two-week listening to music has the effect reduction on speech anxiety.

  • PDF

Multi-channel input-based non-stationary noise cenceller for mobile devices (이동형 단말기를 위한 다채널 입력 기반 비정상성 잡음 제거기)

  • Jeong, Sang-Bae;Lee, Sung-Doke
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.7
    • /
    • pp.945-951
    • /
    • 2007
  • Noise cancellation is essential for the devices which use speech as an interface. In real environments, speech quality and recognition rates are degraded by the auditive noises coming near the microphone. In this paper, we propose a noise cancellation algorithm using stereo microphones basically. The advantage of the use of multiple microphones is that the direction information of the target source could be applied. The proposed noise canceller is based on the Wiener filter. To estimate the filter, noise and target speech frequency responses should be known and they are estimated by the spectral classification in the frequency domain. The performance of the proposed algorithm is compared with that of the well-known Frost algorithm and the generalized sidelobe canceller (GSC) with an adaptation mode controller (AMC). As performance measures, the perceptual evaluation of speech quality (PESQ), which is the most widely used among various objective speech quality methods, and speech recognition rates are adopted.