• Title/Summary/Keyword: phonetic level

Search Result 113, Processing Time 0.02 seconds

Noise Effects on Foreign Language Learning (소음이 외국어 학습에 미치는 영향)

  • Lim, Eun-Su;Kim, Hyun-Gi;Kim, Byung-Sam;Kim, Jong-Kyo
    • Speech Sciences
    • /
    • v.6
    • /
    • pp.197-217
    • /
    • 1999
  • In a noisy class, the acoustic-phonetic features of the teacher and the perceptual features of learners are changed comparison with a quiet environment. Acoustical analyses were carried out on a set of French monosyllables consisting of 17 consonants and three vowel /a, e, i/, produced by 1 male speaker talking in quiet and in 50, 60 and 70 dB SPL of masking noise on headphone. The results of the acoustic analyses showed consistent differences in energy and formant center frequency amplitude of consonants and vowels, $F_1$ frequency of vowel and duration of voiceless stops suggesting the increase of vocal effort. The perceptual experiments in which 18 undergraduate female students learning French served as the subjects, were conducted in quiet and in 50, 60 dB of masking noise. The identification scores on consonants were higher in Lombard speech than in normal speech, suggesting that the speaker's vocal effort is useful to overcome the masking effect of noise. And, with increased noise level, the perceptual response to the French consonants given had a tendency to be complex and the subjective reaction score on the noise using the vocabulary representative of 'unpleasant' sensation to be higher. And, in the point of view on the L2(second language) acquisition, the influence of L1 (first language) on L2 examined in the perceptual result supports the interference theory.

  • PDF

A Study on Performance Evaluation of HM-Net Adaptation System Using the State Level Sharing (상태레벨 공유를 이용한 HM-Net 적응화 시스템의 성능평가에 관한 연구)

  • 오세진;김광동;노덕규;황철준;김범국;김광수;성우창;정현열
    • Proceedings of the IEEK Conference
    • /
    • 2003.11a
    • /
    • pp.397-400
    • /
    • 2003
  • 본 연구에서는 KM-Net(Hidden Markov Network)을 다양한 태스크에의 적용과 화자의 특성을 효과적으로 나타내기 위해 HM-Net 음성인식 시스템에 MLLR(Maximum Likelihood Linear Regression) 적응방법을 도입하였으며, HM-Net 학습 알고리즘을 개량하여 회귀클래스 생성방법을 제안한다. 제안방법은 PDT-SSS(Phonetic Decision Tree-based Successive State Splitting) 알고리즘의 문맥방향 상태분할에 의한 상태레벨 공유를 이용한 방법으로 새로운 화자로부터 문맥정보와 적응화 데이터의 발성 양에 의존하여 결정된 많은 적응 파라미터들을(평균, 분산) 자유롭게 제어할 수 있게 된다. 제안방법의 유효성을 확인하기 위해 국어공학센터(KLE) 452 음성 데이터와 항공편 예약관련 연속음성을 대상으로 인식실험을 수행한 결과, 전체적으로 음소인식의 경우 평균 34-37%, 단어인식의 경우 평균 9%, 연속음성인식의 경우 평균 7-8%의 인식성능 향상을 각각 보였다. 또한 적응화 데이터의 양에 따른 인식성능 비교에서, 제안방법을 적용한 인식 시스템이 적응 데이터의 양이 적은 경우에도 향상된 인식률을 보였으며. 잡음을 부가한 음성에 대한 적응화 실험에서도 향상된 인식성능을 보여 MLLR 적응방법의 특성을 만족하였다. 따라서 MLLR 적응방법을 도입한 HM-Net 음성인식 시스템에 제안한 회귀클래스 생성방법이 유효함을 확인한 수 있었다.

  • PDF

Chinese Prosody Generation Based on C-ToBI Representation for Text-to-Speech (음성합성을 위한 C-ToBI기반의 중국어 운율 경계와 F0 contour 생성)

  • Kim, Seung-Won;Zheng, Yu;Lee, Gary-Geunbae;Kim, Byeong-Chang
    • MALSORI
    • /
    • no.53
    • /
    • pp.75-92
    • /
    • 2005
  • Prosody Generation Based on C-ToBI Representation for Text-to-SpeechSeungwon Kim, Yu Zheng, Gary Geunbae Lee, Byeongchang KimProsody modeling is critical in developing text-to-speech (TTS) systems where speech synthesis is used to automatically generate natural speech. In this paper, we present a prosody generation architecture based on Chinese Tone and Break Index (C-ToBI) representation. ToBI is a multi-tier representation system based on linguistic knowledge to transcribe events in an utterance. The TTS system which adopts ToBI as an intermediate representation is known to exhibit higher flexibility, modularity and domain/task portability compared with the direct prosody generation TTS systems. However, the cost of corpus preparation is very expensive for practical-level performance because the ToBI labeled corpus has been manually constructed by many prosody experts and normally requires a large amount of data for accurate statistical prosody modeling. This paper proposes a new method which transcribes the C-ToBI labels automatically in Chinese speech. We model Chinese prosody generation as a classification problem and apply conditional Maximum Entropy (ME) classification to this problem. We empirically verify the usefulness of various natural language and phonology features to make well-integrated features for ME framework.

  • PDF

A Range Dependent Structural HRTF Model for 3-D Sound Generation in Virtual Environments (가상현실 환경에서의 3차원 사운드 생성을 위한 거리 변화에 따른 구조적 머리전달함수 모델)

  • Lee, Young-Han;Kim, Hong-Kook
    • MALSORI
    • /
    • no.59
    • /
    • pp.89-99
    • /
    • 2006
  • This paper proposes a new structural head-related transfer function(HRTF) model to produce sounds in a virtual environment. The proposed HRTF model generates 3-D sounds by using a head model, a pinna model and the proposed distance model for azimuth, elevation, and distance that are three aspects for 3-D sounds, respectively. In particular, the proposed distance model consists of level normalization block distal region model, and proximal region model. To evaluate the performance of the proposed model, we setup an experimental procedure that each listener identifies a distance of 3-D sound sources that are generated by the proposed method with a predefined distance. It is shown from the tests that the proposed model provides an average distance error of $0.13{\sim}0.31$ meter when the sound source is generated as if it is 0.5 meter $\sim$ 2 meters apart from the listeners. This result is comparable to the average distance error of the human listening for the actual sound source.

  • PDF

Improving LD-CELP using frame classification and modified synthesis filter (프레임 분류와 합성필터의 변형을 이용한 적은 지연을 갖는 음성 부호화기의 성능)

  • 임은희;이주호;김형명
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.6
    • /
    • pp.1430-1437
    • /
    • 1996
  • A low delay code excited linear predictive speech coder(LD-CELP) at bit rates under 8kbps is considered. We try to improve the perfomance of speech coder with frame type dependent modification of synthesis filter. We first classify frames into 3 groups: voiced, unvoiced and onset. For voicedand unvoiced frame, the spectral envelope of the synthesis filter is adapted to the phonetic characteristics. For transition frame from unvoiced to voiced, the synthesis filter which has been interpolated with the bias filter is used. The proposed vocoder produced more clear sound with similar delay level than other pre-existing LD-CELP vocoders.

  • PDF

Behavioral Function of the Anomalous Song in the Bush Warbler, Cettia diphone

  • Park, Shi-Ryong;Cheong, Seok-Wan;Chung, Hoon
    • Animal cells and systems
    • /
    • v.8 no.2
    • /
    • pp.89-95
    • /
    • 2004
  • The bush warblers (Cettia diphone) have been recognized to possess two types of songs: a normal song that plays roles in attracting mate and territorial defense, and an anomalous song. The present study suggests that the anomalous song functions as an alarm signal as well as other unknown signals. Field observations and playback experiments on the anomalous song of bush warbler were conducted in order to investigate the contextual information that occurred between sender and receiver. In the field observation, the males frequently emitted anomalous songs to potential predators. The males responded with an anomalous song to stuffed potential predators. The distance from where the anomalous song occurs to the stimulating source varied depending upon the kinds of stimulus. The males of bush warbler possibly show different responses to the anomalous song depending on the level of danger. When the anomalous song was played back to terrestrial males and females, no distinctive behavior was observed. The anomalous song may be sung to defend the territory against predators or to distract invaders from the nest and female because the male and female behaviors were related with the anomalous song and its phonetic characteristics.

Long Term Average Spectral Analysis for Acoustical Discrimination of Korean Nasal Consonants (한국어 비음의 음향학적 구분을 위한 장구간 스펙트럼(LTAS) 분석)

  • Choi, Soon-Ai;Seong, Cheol-Jae
    • MALSORI
    • /
    • no.60
    • /
    • pp.67-84
    • /
    • 2006
  • The purpose of this study is to find some acoustic parameters on frequency domain to distinguish the Korean nasals, $/m,\;n,\;{\eta}/$ from each other. The new parameters are devised on the basis of LTAS (Long Term Average Spectrum). The maximum peak amplitude and the relevant formant frequency are measured in low and high frequency range, respectively. The frequency of spectral valley and its energy level are also obtained in the specific frequency range of the spectrum. Spectral slope, total energy value in specific frequency range, statistical distribution of spectral energy like centroid, skewness, and kurtosis are suggested as new parameters as well. The parameters that show statistically significant differences across nasals are summerized as follows. 1) in syllable initial positions: the total energy value from 1,500 to 2,200 Hz(zeroENG); 2) in syllable final positions: the peak amplitude of the first formant(peak1_a), the formant frequency with maximum peak amplitude from 4,000 to 8,000 Hz(peak2_f), the maximum peak amplitude of the formant frequency from 4,000 to 8,000 Hz(peak2_a), and the total energy value from 1,500 to 2,200 Hz(zeroENG).

  • PDF

The Neighborhood Effect in Korean Visual Word Recognition (한국어 시각단어재인에서 나타나는 이웃효과)

  • Kwon, You-An;Cho, Hyae-Suk;Kim, Choong-Myung;Nam, Ki-Chun
    • MALSORI
    • /
    • no.60
    • /
    • pp.29-45
    • /
    • 2006
  • We investigated whether the first syllable plays an important role in lexical access in Korean visual word recognition. To do so, one lexical decision task (LDT) and two form primed LDT experiments examined the nature of the syllabic neighborhood effect. In Experiment 1, the syllabic neighborhood density and the syllabic neighborhood frequency was manipulated. The results showed that lexical decision latencies were only influenced by the syllabic neighborhood frequency. The purpose of experiment 2 was to confirm the results of experiment 1 with form-primed LDT task. The lexical decision latency was slower in form-related condition compared to form-unrelated condition. The effect of syllabic neighborhood density was significant only in form-related condition. This means that the first syllable plays an important role in the sub-lexical process. In Experiment 3, we conducted another form-primed LDT task manipulating the number of syllabic neighbors in words with higher frequency neighborhood. The interaction of syllabic neighborhood density and form relation was significant. This result confirmed that the words with higher frequency neighborhood are more inhibited by neighbors sharing the first syllable than words with no higher frequency neighborhood in the lexical level. These findings suggest that the first syllable is the unit of neighborhood and the unit of representation in sub-lexical representation is syllable in Korea.

  • PDF

Interaction of native language interference and universal language interference on L2 intonation acquisition: Focusing on the pitch range variation (L2 억양에서 나타나는 모국어 간섭과 언어 보편적 간섭현상의 상호작용: 피치대역을 중심으로)

  • Yune, Youngsook
    • Phonetics and Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.35-46
    • /
    • 2021
  • In this study, we examined the interactive aspects between pitch reduction phenomena considered a universal language phenomenon and native language interference in the production of L2 intonation performed by Chinese learners of Korean. To investigate their interaction, we conducted an acoustic analysis using acoustic measures such as pitch span, pitch level, pitch dynamic quotient, skewness, and kurtosis. In addition, the correlation between text comprehension and pitch was examined. The analyzed material consisted of four Korean discourses containing five and seven sentences of varying difficulty. Seven Korean native speakers and thirty Chinese learners who differed in their Korean proficiency participated in the production test. The results, for differences by language, showed that Chinese had a more expanded pitch span, and a higher pitch level than Korean. The analysis between groups showed that at the beginner and intermediate levels, pitch reduction was prominent, i.e., their Korean was characterized by a compressed pitch span, low pitch level, and less sentence internal pitch variation. Contrariwise, the pitch use of advanced speakers was most similar to Korean native speakers. There was no significant correlation between text difficulty and pitch use. Through this study, we observed that pitch reduction was more pronounced than native language interference in the phonetic layer.

Prosodic Boundary Effects on the V-to-V Lingual Movement in Korean

  • Cho, Tae-Hong;Yoon, Yeo-Min;Kim, Sa-Hyang
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.101-113
    • /
    • 2010
  • The present study investigated how the kinematics of the /a/-to-/i/ tongue movement in Korean would be influenced by prosodic boundary. The /a/-to-/i/ sequence was used as 'transboundary' test materials which occurred across a prosodic boundary as in /ilnjəʃ$^h$a/ # / minsakwae/ ('일년차#민사과에' 'the first year worker' # 'dept. of civil affairs'). It also tested whether the V-to-V tongue movement would be further influenced by its syllable structure with /m/ which was placed either in the coda condition (/am#i/) or in the onset condition (/a#mi). Results of an EMA (Electromagnetic Articulagraphy) study showed that kinematical parameters such as the movement distance (displacement), the movement duration, and the movement velocity (speed) all varied as a function of the boundary strength, showing an articulatory strengthening pattern of a "larger, longer and faster" movement. Interestingly, however, the larger, longer and faster pattern associated with boundary marking in Korean has often been observed with stress (prominence) marking in English. It was proposed that language-specific prosodic systems induce different ways in which phonetics and prosody interact: Korean, as a language without lexical stress and pitch accent, has more degree of freedom to express prosodic strengthening, while languages such as English have constraints, so that some strengthening patterns are reserved for lexical stress. The V-to-V tongue movement was also found to be influenced by the intervening consonant /m/'s syllable affiliation, showing a more preboundary lengthening of the tongue movement when /m/ was part of the preboundary syllable (/am#i/). The results, together, show that the fine-grained phonetic details do not simply arise as low-level physical phenomena, but reflect higher-level linguistic structures, such as syllable and prosodic structures. It was also discussed how the boundary-induced kinematic patterns could be accounted for in terms of the task dynamic model and the theory of the prosodic gesture ($\pi$-gesture).

  • PDF