• Title/Summary/Keyword: Non-speech

Search Result 470, Processing Time 0.031 seconds

A New Variable Bit Rate Scheme for Waveform Interpolative Coders (파형보간 코더에서 파라미터간 거리차를 이용한 가변비트율 기법)

  • Yang, Hee-Sik;Jeong, Sang-Bae;Hahn, Min-Soo
    • MALSORI
    • /
    • no.65
    • /
    • pp.81-91
    • /
    • 2008
  • In this paper, we propose a new variable bit-rate speech coder based on the waveform interpolation concept. After the coder extracted all parameters, the amounts of the distortions between the current and the predicted parameters which are estimated by extrapolation using past two parameters are measured for all parameters. A parameter would not be transmitted unless the distortion exceeds the preset threshold. At the decoder side, the non-transmitted parameter is reconstructed by extrapolation with past two parameters used to synthesize signals. In this way, we can reduce 26% of the total bit rate while retaining the speech quality degradation below 0.1 PESQ score.

  • PDF

Trade-off between Model Complexity and Performance in Intra-frame Predictive Vector Quantization of Wideband Speech (광대역 음성에 대한 프레임내 잔차 벡터 양자화에 있어서 모델 복잡도와 성능 사이의 교환관계)

  • Song, Geun-Bae;Hahn, Hern-Soo
    • The Journal of Korea Robotics Society
    • /
    • v.5 no.1
    • /
    • pp.70-76
    • /
    • 2010
  • This paper addresses a design issue of "model complexity and performance trade-off" in the application of bandwidth extension (BWE) methods to the intra-frame predictivevector quantization problem of wideband speech. It discusses model-based linear and non-linear prediction methods and presents a comparative study of them in terms of prediction gain. Through experimentation, the general trend of saturation in performance (with the increase in model complexity) is observed. However, specifically, it is also observed that there is no significant difference between HMM and GMM-based BWE functions.

Effects of attention on the perception of L2 phonetic contrast

  • Lee, Hyunjung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.4
    • /
    • pp.47-52
    • /
    • 2014
  • This study investigated how the degree of attention modulates English learners' perception of Korean stop contrasts. The contributions of VOT and F0 in perceiving Korean stops were examined while availability of attentional resources was manipulated using a dual-task paradigm. Results demonstrated the attentional modulation in the use of VOT, but not in F0: under less attention, the contribution of VOT to the perception of aspirated stops decreased, whereas that of lenis stops increased, which suggests more native-like performance. This implies that the role of attention in perceiving non-native contrasts might differ depending on how equivalent the acoustic and perceptual cues are between L1 and target L2 contrasts.

The identification of Korean vowels /o/ and /u/ by native English speakers

  • Oh, Eunhae
    • Phonetics and Speech Sciences
    • /
    • v.8 no.1
    • /
    • pp.19-24
    • /
    • 2016
  • The Korean high back vowels /o/ and /u/ have been reported to be in a state of near-merger especially among young female speakers. Along with cross-generational changes, the vowel position within a word has been reported to render different phonetic realization. The current study examines native English speakers' ability to attend to the phonetic cues that distinguish the two merging vowels and the positional effects (word-initial vs. word-final) on the identification accuracy. 28 two-syllable words containing /o/ or /u/ in either initial or final position were produced by native female Korean speakers. The CV part of each target word were excised and presented to six native English speakers. The results showed that although the identification accuracy was the lowest for /o/ in word- final position (41%), it increased up to 80% in word-initial position. The acoustic analyses of the target vowels showed that /o/ and /u/ were differentiated on the height dimension only in word-initial position, suggesting that English speakers may have perceived the distinctive F1 difference retained in the prominent position.

The Evaluation of the Fuzzy-Chaos Dimension and the Fuzzy-Lyapunov Ddimension (화자인식을 위한 퍼지-상관차원과 퍼지-리아프노프차원의 평가)

  • Yoo, Byong-Wook;Park, Hyun-Sook;Kim, Chang-Seok
    • Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.167-183
    • /
    • 2000
  • In this paper, we propose two kinds of chaos dimensions, the fuzzy correlation and fuzzy Lyapunov dimensions, for speaker recognition. The proposal is based on the point that chaos enables us to analyze the non-linear information contained in individual's speech signal and to obtain superior discrimination capability. We confirm that the proposed fuzzy chaos dimensions play an important role in enhancing speaker recognition ratio, by absorbing the variations of the reference and test pattern attractors. In order to evaluate the proposed fuzzy chaos dimensions, we suggest speaker recognition using the proposed dimensions. In other words, we investigate the validity of the speaker recognition parameters, by estimating the recognition error according to the discrimination error of an individual speaker from the reference pattern.

  • PDF

Noise suppressor Using Psychoacoustic Model and Wavelet Packet Transform (심리음향 모델과 웨이블릿 패킷 변환을 이용한 잡음제거기)

  • Kim, Mi-Seon;Kim, Young-Ju;Lee, In-Sung
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.345-346
    • /
    • 2006
  • In this paper, we propose the noise suppressor with the psychoacoustic model and wavelet packet transform. The objective of the scheme is to enhance speech corrupted by colored or non-stationary noise. If corrupted noise is colored, subband approach would be more efficient than whole band one. To avoid serious residual noise and speech distortion, we must adjust the Wavelet Coefficient threshold. In this paper, the subband is designed matching with the critical band. And WCT is adapted by noise masking threshold(NMT) and segmental signal to noise ratio(seg_SNR). Consequently this work improve the PESQ-MOS about 0.23 in the case of coded speech.

  • PDF

POSTTS : Corpus Based Korean TTS based on Natural Language Analysis (POSTTS : 자연어 분석을 통한 코퍼스 기반 한국어 TTS)

  • Ha Ju-Hong;Zheng Yu;Kim Byeongchang;Lee Geunbae Lee
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.87-90
    • /
    • 2003
  • In order to produce high quality synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model from texts using natural language processing. Robust preprocessing for non-Korean characters should also be required. In this paper, we analyzed Korean texts using a morphological analyzer, part-of-speech tagger and syntactic chunker. We present a new grapheme-to-phoneme conversion method, i.e. a dictionary-based and rule-based hybrid method, for unlimited vocabulary Korean TTS. We constructed a prosody model using a probabilistic method and decision tree-based method.

  • PDF

A Study on the Rhythm of Korean English Learners' Interlanguage Talk (타언어 화자와의 담화 상에 나타난 한국인 영어 학습자의 리듬)

  • Chung, Hyunsong
    • Phonetics and Speech Sciences
    • /
    • v.5 no.3
    • /
    • pp.3-10
    • /
    • 2013
  • This study investigated the rhythmic accommodation of Korean English learners' interlanguage talk. Twelve Korean speakers, 6 native English speakers and 6 non-native English speakers in London participated in multiple conversations on different topics which produced 36 conversational data in interlanguage talk (ILT) settings. 190 utterances from the 36 conversational data were analyzed to investigate the rhythmic patterns of Korean English learners when they communicated with English speakers with different language backgrounds. Save for the final-syllable, the normalized duration of consecutive syllables was compared in order to derive a variability index (VI). It was found that there was no significant variability in the measurement of the syllable-to-syllable duration for the utterances of Korean English learners, regardless of their interlocutor's language background. Conversely, it was found that there was evidence that Korean English learners showed rhythmic accommodation in ILT when they conversed with non-native English speakers. The speaking rate became significantly slower when Korean English learners talked to non-native English speakers, than when they talked to other Korean English learners. Furthermore, there was a negative correlation between speaking rate and the VI in the utterances of Korean English learners in ILT.

Non-word repetition may reveal different errors in naive listeners and second language learners

  • Holliday, Jeffrey J.;Hong, Minkyoung
    • Phonetics and Speech Sciences
    • /
    • v.12 no.1
    • /
    • pp.1-9
    • /
    • 2020
  • The perceptual assimilation of a nonnative phonological contrast can change with linguistic experience, resulting in naïve listeners and novice second language (L2) learners potentially assimilating the members of a nonnative contrast to different native (L1) categories. While it has been shown that this sort of change can affect the discrimination of the nonnative contrast, it has not been tested whether such a change could have consequences for the production of the contrast. In this study, L1 speakers of Mandarin Chinese who were (1) naïve to Korean, (2) novice L2 learners, or (3) advanced L2 learners participated in a Korean non-word repetition task using word-initial sibilants. The initial CVs of their repetitions were then played to L1 Korean listeners who categorized the initial consonant. The naïve talkers were more likely to repeat an initial /sha/ as an affricate, whereas the L2 learners repeated it as a fricative, in line with how these listeners have been shown to assimilate Korean sibilants to Mandarin categories. This result suggests that errors in the production of new words presented auditorily to nonnative listeners may be driven by how they perceptually assimilate the nonnative sounds, emphasizing the need to better understand what drives changes in perceptual assimilation that accompany increased linguistic experience.

Semantic Ontology Speech Information Extraction using Non-parametric Correlation Coefficient (비모수적 상관계수를 이용한 시맨틱 온톨로지 음성 정보 추출)

  • Lee, Byungwook
    • Journal of Digital Convergence
    • /
    • v.11 no.9
    • /
    • pp.147-151
    • /
    • 2013
  • On retrieving high frequency keywords in information retrieval system, mismatchings to user's request are problems because of the various meanings of keywords in the existing ontology configuration. In this paper, it is to construct personnel selection ontology and rules in personnel management which are composed of various concepts and knowledges based on semantic web technology and suggest selection procedures to support these rules and knowledge retrieval system to verify suitability of selection results. This system utilizes a method of extraction of speech features by using non-parametric correlation coefficient. This proposed method has been validated by showing that the result average SNR of the experiment evaluation of the proposed techniques was shown to be decreased by .752dB.