• Title/Summary/Keyword: Non-speech

Search Result 470, Processing Time 0.057 seconds

Noise Robust Speaker Verification Using Subband-Based Reliable Feature Selection (신뢰성 높은 서브밴드 특징벡터 선택을 이용한 잡음에 강인한 화자검증)

  • Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
    • MALSORI
    • /
    • no.63
    • /
    • pp.125-137
    • /
    • 2007
  • Recently, many techniques have been proposed to improve the noise robustness for speaker verification. In this paper, we consider the feature recombination technique in multi-band approach. In the conventional feature recombination for speaker verification, to compute the likelihoods of speaker models or universal background model, whole feature components are used. This computation method is not effective in a view point of multi-band approach. To deal with non-effectiveness of the conventional feature recombination technique, we introduce a subband likelihood computation, and propose a modified feature recombination using subband likelihoods. In decision step of speaker verification system in noise environments, a few very low likelihood scores of a speaker model or universal background model cause speaker verification system to make wrong decision. To overcome this problem, a reliable feature selection method is proposed. The low likelihood scores of unreliable feature are substituted by likelihood scores of the adaptive noise model. In here, this adaptive noise model is estimated by maximum a posteriori adaptation technique using noise features directly obtained from noisy test speech. The proposed method using subband-based reliable feature selection obtains better performance than conventional feature recombination system. The error reduction rate is more than 31 % compared with the feature recombination-based speaker verification system.

  • PDF

A Neural Network Based Korean Segmental Duration Modeling Using Tonal Information of Phonemes (음소별 성조 정보를 이용한 신경망 기반의 한국어 음소 지속시간 모델링)

  • 김은경;이상호;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.6
    • /
    • pp.84-88
    • /
    • 1999
  • The accurate estimation of segmental duration is crucial for natural-sounding text-to-speech synthesis. For predicting Korean segmental durations, conventional methods utilized phonemic context, part-of-speech context and locational information in prosodic phrase. In this paper, the tonal information of phonemes is employed for more accurate prediction. After defining two non-boundary tones and six boundary tones, we annotated the tonal label on each syllable of 400 sentences. To predict segmental duration using tonal information, we constructed neural networks with a real-valued output node predicting phonemic duration and trained them by backpropagation algorithm. Experimental results showed that the proposed features are effective for predicting Korean segmental durations, and we got 0.863 correlation coefficient of the observed durations and predicted ones.

  • PDF

Mieko Han and her Works on Korean Phonetics (Mieko Han의 한국어 음성학 연구)

  • Ko, Do-Heung
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.213-223
    • /
    • 1997
  • This paper deals with a general review of Mieko S. Han, who made a significant contribution to the studies of Korean phonetics during the 1960' s and early 1970' s. As both a single and joint author, Dr. Han published important papers in both quantity and quality, which have been cited among Korean phoneticians until today. Before Dr. M. Han' s work, professor of USC in the department of East Asian Languages & Cultures, there were only a few phonetics-related publications in Korea, most of which are papers or books based on non-experimental traditional approach. It is known that there was coexistence between traditionalism and structuralism in the field of Korean linguistics. It was, however, fortunate that we had two important phoneticians (M. Han and Chin-W Kim) abroad at that time. Mieko Han' s concern was to investigate experimental characteristics of the system of Korean vowels and consonants using a Spectrograph, which was the single most important tool for analysing phonetic data at that time. Dr. Han conducted her experimental studies on Korean phonetics, mostly funded by the Office of Naval Research, in terms of duration, fundamental frequency, Voice Onset Time (VOT), intensity, and so on. This paper aims to re-appreciate Dr. Han's specific contribution to the study of Korean phonetics since she played an important role as a pioneer of early Korean phonetics. Further, it is highly recommended that Dr. Han's works can be extremely useful for a graduate student, who seriously would like to specialize in Korean phonetics in the first step.

  • PDF

Differential Effect for Neural Activation Processes according to the Proficiency Level of Code Switching: An ERP Study (이중언어환경에서의 언어간 부호전환 수준에 따른 차별적 신경활성화 과정: ERP연구)

  • Kim, Choong-Myung
    • Phonetics and Speech Sciences
    • /
    • v.2 no.4
    • /
    • pp.3-10
    • /
    • 2010
  • The present study aims to investigate neural activations according to the level of code switching in English proficient bilinguals and to find the relationship between the performance of language switching and proficiency level using ERPs (event-related potentials). First, when comparing high-proficient (HP) with low-proficient (LP) bilingual performance in a native language environment, the activation level of N2 was observed to be higher in the HP group than in the LP group, but only under two conditions: 1) the language switching (between-language) condition known as indexing attention of code switching and 2) the inhibition of current language for L1. Another effect of N400 can be shown in both groups only in the language non-switching (within-language) condition. This effect suggests that both groups completed the semantic acceptability task well in their native language environment without the burden of language switching, irrespective of high or low performance. The latencies of N400 are only about 100ms earlier in the HP group than in the LP group. This difference can be interpreted as facilitation of the given task. These results suggest that HP showed the differential activation in inhibitory system for L1 in switching condition of L1-to-L2 to be contrary to inactivation of inhibitory system for the LP group. Despite the absence of an N400 effect at the given task in both groups, differential latencies between the peaks were attributed to the differences of efficiency in semantic processing.

  • PDF

Pitch trajectories of English vowels produced by American men, women, and children

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.31-37
    • /
    • 2018
  • Pitch trajectories reflect a continuous variation of vocal fold movements over time. This study examined the pitch trajectories of English vowels produced by 139 American English speakers, statistically analyzing their trajectories using the Generalized Additive Mixed Models (GAMMs). First, Praat was used to read the sound data of Hillenbrand et al. (1995). A pitch analysis script was then prepared, and six pitch values at the corresponding time points within each vowel segment were collected and checked. The results showed that the group of men produced the lowest pitch trajectories, followed by the groups of women, boys, then girls. The density line showed a bimodal distribution. The pitch values at the six corresponding time points formed a single dip, which changed gradually across the vowel segment from 204 to 193 to 196 Hz. The normality tests performed on the pitch data rejected the null hypothesis. Nonparametric tests were therefore conducted to discover the significant differences in the values among the four groups. The GAMMs, which analyzed all the pitch data, produced significant results among the pitch values at the six corresponding time points but not between the two groups of boys and girls. The GAMMs also revealed that the two groups were significantly different only at the first and second time points. Accordingly, the methodology of this study and its findings may be applicable to future studies comparing curvilinear data sets elicited by experimental conditions.

Perception of Spanish $/{\setminus}/$ - /r/ distinction by native Japanese

  • Mignelina Guirao Jorge A. Gurlekian;Maria A. Garcia Jurado
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.337-342
    • /
    • 1996
  • In prevoius works we have repored phonetic similarities between Japanese and Spanish voweis and syiiabic sounds. (1) (2) (3) (4). In the present communication we explore the relative importance of duration of the consonantal segment to elicit Spanish /l/ - /r/ distinction by native j Japanese talkers. Three Argentine and three trained native Japanese talkers recorded /l-r/ combined with /a/ in VCV sequences. Modifications of consonant duration and vowel context with transitions were m made by editing natural /ala/ sounds. Mixed VCV were produced by combining sounds of both languages. Perceptual tests were produced by combining sounds of both languages perceptual performed presenting the speech material, to native t trained and non trained Japanese listeners. In a tirst sessIOn a d discrimination procedure was applied. The items were arranged in pairs a and listeners Nere told to indicate the pair that sounded different. In the f following session they were asked to identify and type the letter corresponding to each one of the items. Responses arc examined in tenns of critical duration of the interval between vowels. Preliminary results indicate that the duration of intervocalic intervais was a relevant cue for the identification of /l/ and /r/. It seems that to differentiate the two sounds, Japanese listeners required relatively longer interval steps than the argentine suhjects. There was a tendency to conhlse more frequently /l/ for /r/ than viceversa.

  • PDF

Statistical Model-Based Voice Activity Detection Using the Second-Order Conditional Maximum a Posteriori Criterion with Adapted Threshold (적응형 문턱값을 가지는 2차 조건 사후 최대 확률을 이용한 통계적 모델 기반의 음성 검출기)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.76-81
    • /
    • 2010
  • In this paper, we propose a novel approach to improve the performance of a statistical model-based voice activity detection (VAD) which is based on the second-order conditional maximum a posteriori (CMAP). In our approach, the VAD decision rule is expressed as the geometric mean of likelihood ratios (LRs) based on adapted threshold according to the speech presence probability conditioned on both the current observation and the speech activity decisions in the pervious two frames. Experimental results show that the proposed approach yields better results compared to the statistical model-based and the CMAP-based VAD using the LR test.

A perception-based analysis of voice onset time (VOT) dissimilation in Korean

  • Hijo Kang;Mira Oh
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.25-31
    • /
    • 2024
  • This study examines the perceptual motivation behind dissimilation. Consistent with previous arguments suggesting that dissimilation originates from perception rather than production (Coetzee, 2005; Kiparsky, 2003; Scheer, 2013), we hypothesized that an oral stop with short of voice onset time (VOT) would be recognized as non-aspirated more often when it is followed by an aspirated stop with a long VOT. This hypothesis was tested through a perception experiment in which 32 Korean listeners made judgments on the first consonant of C1VC2V words manipulated with C1 VOT and C2 types. The results revealed that aspirated-based C1 was recognized as aspirated or tense depending on the duration of VOT, while lenis-based C1 was consistently recognized as lenis. The dissimilatory effect of aspirated C2 was confirmed as anticipated, and furthermore, tense C2 increased the ratio of tense responses more than aspirated C2. These results provide evidence of a perceptual bias against recurrent aspirated stops, which may play a role in activating a dissimilatory rule or constraint in a language. The assimilatory effect of tense C2 is in consistent with findings indicating that word-initial tensification is facilitated by the following tense stop in Korean (Kang & Oh, 2016; H. Kim, 2016).

A Study On Generation and Reduction of the Notation Candidate for the Notation Restoration of Korean Phonetic Value (한국어 음가의 표기 복원을 위한 표기 후보 생성 및 감소에 관한 연구)

  • Rhee, Sang-Burm;Park, Sung-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.11B no.1
    • /
    • pp.99-106
    • /
    • 2004
  • The syllable restoration is a process restoring a phonetic value recognized in a speech recognition device with the notation form that a vocalization is former. In this paper a syllable restoration rule was composed of a based on standard pronunciation for a syllable restoration process. A syllable restoring regulation was used, and a generation method of a notation candidate set was researched. Also, A study is held to reduce the number of created notation candidate. Three phases of reduction processes were suggested. Reduction of a notation candidate has the non-notation syllable, non-vocabulary syllable and non-stem syllable. As a result of experiment, an average of 74% notation candidate decrease rates were shown.

Experiments on Extraction of Non-Parametric Warping Functions for Speaker Normalization (화자 정규화를 위한 비정형 워핑함수 도출에 관한 실험)

  • Shin, Ok-Keun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.5
    • /
    • pp.255-261
    • /
    • 2005
  • In this paper. experiments are conducted to extract a set of non-Parametric warping functions to examine the characteristics of the warping among speakers' utterances. For this Purpose. we made use of MFCC and LP spectra of vowels in choosing reference spectrum of each vowel as well as representative spectra of each speaker. These spectra are compared by DTW to give the warping functions of each speaker. The set of warping functions are then defined by clustering the warping functions of all the speakers. Noting that male and female warping functions have shapes similar to Piecewise linear function and Power function respectively, a new hybrid set of warping functions is defined. The effectiveness of the extracted warping functions are evaluated by conducting phone level recognition experiments, and improvements in accuracy rate are observed in both warping functions.