• Title/Summary/Keyword: speech database

Search Result 332, Processing Time 0.027 seconds

A Study on the Channel Normalized Pitch Synchronous Cepstrum for Speaker Recognition (채널에 강인한 화자 인식을 위한 채널 정규화 피치 동기 켑스트럼에 관한 연구)

  • 김유진;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.61-74
    • /
    • 2004
  • In this paper, a contort- and speaker-dependent cepstrum extraction method and a channel normalization method for minimizing the loss of speaker characteristics in the cepstrum were proposed for a robust speaker recognition system over the channel. The proposed extraction method creates a cepstrum based on the pitch synchronous analysis using the inherent pitch of the speaker. Therefore, the cepstrum called the 〃pitch synchronous cepstrum〃 (PSC) represents the impulse response of the vocal tract more accurately in voiced speech. And the PSC can compensate for channel distortion because the pitch is more robust in a channel environment than the spectrum of speech. And the proposed channel normalization method, the 〃formant-broadened pitch synchronous CMS〃 (FBPSCMS), applies the Formant-Broadened CMS to the PSC and improves the accuracy of the intraframe processing. We compared the text-independent closed-set speaker identification on 56 females and 112 males using TIMIT and NTIMIT database, respectively. The results show that pitch synchronous km improves the error reduction rate by up to 7.7% in comparison with conventional short-time cepstrum and the error rates of the FBPSCMS are more stable and lower than those of pole-filtered CMS.

Bridging the Gap between Research in Linguistics and English Teaching Pedagogy: Focusing on English Pronunciation Education

  • Kwon, Bo-Young
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.73-84
    • /
    • 2009
  • Despite the growing interest among researchers in the field of second language (L2) phonological acquisition and its apparent contribution to linguistic and acquisition theories, there have been concerns about the lack of pedagogical application of the research findings in L2 classrooms (Levis, 1999, Derwing & Munro, 2005). Based on the belief that meeting an existing pedagogic need is something that should receive primary attention in SLA, this study attempts to bridge the gap between L2 pronunciation research and pronunciation pedagogy. In so doing, this study provides a narrative literature review of papers on L2 pronunciation published from 1994 to 2008 in Korea. The articles for review were retrieved from five database search engines. In addition, six journals where relevant articles most frequently appeared were selected and electronic searches of these six journals were conducted. A total of 117 articles which met the selection criteria were collected, and were reviewed to answer the following three research questions: a) What are the current research trends in L2 pronunciation in Korea? b) Do the research trends reflect a shift of focus on L2 pronunciation teaching? and c) What is the range of research practices in L2 pronunciation? The review of the papers indicates that the number of studies on L2 pronunciation increased sharply from 1999 to 2003. Some changes in research topics were also noticed. Research on segmental features of English was dominant from 1994 to 1998, but became more balanced with research on suprasegmentals from 2004 to 2008. This review also discusses the range of research practices in L2 pronunciation and makes suggestions for future directions in L2 pronunciation research.

  • PDF

A Study on the Korean Syllable As Recognition Unit (인식 단위로서의 한국어 음절에 대한 연구)

  • Kim, Yu-Jin;Kim, Hoi-Rin;Chung, Jae-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.3
    • /
    • pp.64-72
    • /
    • 1997
  • In this paper, study and experiments are performed for finding recognition unit fit which can be used in large vocabulary recognition system. Specifically, a phoneme that is currently used as recognition unit and a syllable in which Korean is well characterized are selected. From comparisons of recognition experiments, the study is performed whether a syllable can be considered as recognition unit of Korean recognition system. For report of an objective result of the comparison experiment, we collected speech data of a male speaker and processed them by hand-segmentation for phoneme boundary and labeling to construct speech database. And for training and recognition based on HMM, we used HTK (HMM Tool Kit) 2.0 of commercial tool from Entropic Co. to experiment in same condition. We applied two HMM model topologies, 3 emitting state of 5 state and 6 emitting state of 8 state, in Continuous HMM on training of each recognition unit. We also used 3 sets of PBW (Phonetically Balanced Words) and 1 set of POW(Phonetically Optimized Words) for training and another 1 set of PBW for recognition, that is "Speaker Dependent Medium Vocabulary Size Recognition." Experiments result reports that recognition rate is 95.65% in phoneme unit, 94.41% in syllable unit and decoding time of recognition in syllable unit is faster by 25% than in phoneme.

  • PDF

News Data Analysis Using Acoustic Model Output of Continuous Speech Recognition (연속음성인식의 음향모델 출력을 이용한 뉴스 데이터 분석)

  • Lee, Kyong-Rok
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.10
    • /
    • pp.9-16
    • /
    • 2006
  • In this paper, the acoustic model output of CSR(Continuous Speech Recognition) was used to analyze news data News database used in this experiment was consisted of 2,093 articles. Due to the low efficiency of language model, conventional Korean CSR is not appropriate to the analysis of news data. This problem could be handled successfully by introducing post-processing work of recognition result of acoustic model. The acoustic model more robust than language model in Korean environment. The result of post-processing work was made into KIF(Keyword information file). When threshold of acoustic model's output level was 100, 86.9% of whole target morpheme was included in post-processing result. At the same condition, applying length information based normalization, 81.25% of whole target morpheme was recognized. The purpose of normalization was to compensate long-length morpheme. According to experiment result, 75.13% of whole target morpheme was recognized KIF(314MB) had been produced from original news data(5,040MB). The decrease rate of absolute information met was approximately 93.8%.

  • PDF

Isolated Digit and Command Recognition in Car Environment (자동차 환경에서의 단독 숫자음 및 명령어 인식)

  • 양태영;신원호;김지성;안동순;이충용;윤대희;차일환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.2
    • /
    • pp.11-17
    • /
    • 1999
  • This paper proposes an observation probability smoothing technique for the robustness of a discrete hidden Markov(DHMM) model based speech recognizer. Also, an appropriate noise robust processing in car environment is suggested from experimental results. The noisy speech is often mislabeled during the vector quantization process. To reduce the effects of such mislabelings, the proposed technique increases the observation probability of similar codewords. For the noise robust processing in car environment, the liftering on the distance measure of feature vectors, the high pass filtering, and the spectral subtraction methods are examined. Recognition experiments on the 14-isolated words consists of the Korean digits and command words were performed. The database was recorded in a stopping car and a running car environments. The recognition rates of the baseline recognizer were 97.4% in a stopping situation and 59.1% in a running situation. Using the proposed observation probability smoothing technique, the liftering, the high pass filtering, and the spectral subtraction the recognition rates were enhanced to 98.3% in a stopping situation and to 88.6% in a running situation.

  • PDF

An Implementation of Rejection Capabilities in the Isolated Word Recognition System (고립단어 인식 시스템에서의 거절기능 구현)

  • Kim, Dong-Hwa;Kim, Hyung-Soon;Kim, Young-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.6
    • /
    • pp.106-109
    • /
    • 1997
  • For the practical isolated word recognition system, the ability to reject the out-of -vocabulary(OOV) is required. In this paper, we present a rejection method which uses the clustered phoneme modeling combined with postprocessing by likelihood ratio scoring. Our baseline speech recognition system was based on the whole-word continuous HMM. And 6 clustered phoneme models were generated using statistical method from the 45 context independent phoneme models, which were trained using the phonetically balanced speech database. The test of the rejection performance for speaker independent isolated words recogntion task on the 22 section names shows that our method is superior to the conventional postprocessing method, performing the rejection according to the likelihood difference between the first and second candidates. Furthermore, this clustered phoneme models do not require retraining for the other isolated word recognition system with different vocabulary sets.

  • PDF

Speaker Identification Using Higher-Order Statistics In Noisy Environment (고차 통계를 이용한 잡음 환경에서의 화자식별)

  • Shin, Tae-Young;Kim, Gi-Sung;Kwon, Young-Uk;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.6
    • /
    • pp.25-35
    • /
    • 1997
  • Most of speech analysis methods developed up to date are based on second order statistics, and one of the biggest drawback of these methods is that they show dramatical performance degradation in noisy environments. On the contrary, the methods using higher order statistics(HOS), which has the property of suppressing Gaussian noise, enable robust feature extraction in noisy environments. In this paper we propose a text-independent speaker identification system using higher order statistics and compare its performance with that using the conventional second-order-statistics-based method in both white and colored noise environments. The proposed speaker identification system is based on the vector quantization approach, and employs HOS-based voiced/unvoiced detector in order to extract feature parameters for voiced speech only, which has non-Gaussian distribution and is known to contain most of speaker-specific characteristics. Experimental results using 50 speaker's database show that higher-order-statistics-based method gives a better identificaiton performance than the conventional second-order-statistics-based method in noisy environments.

  • PDF

Speaker Verification System Using Continuants and Multilayer Perceptrons (지속음 및 다층신경망을 이용한 화자증명 시스템)

  • Lee, Tae-Seung;Park, Sung-Won;Hwang, Byong-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.10a
    • /
    • pp.1015-1020
    • /
    • 2003
  • Among the techniques to protect private information by adopting biometrics, speaker verification is expected to be widely used due to advantages in convenient usage and implementation cost. Speaker verification should achieve a high degree of the reliability in the verification score, the flexibility in speech text usage, and the efficiency in verification system complexity. Continuants have excellent speaker-discriminant power and the modest number of phonemes in the category, and multilayer perceptrons (MLPs) have superior recognition ability and fast operation speed. In consequence, the two provide viable ways for speaker verification system to obtain the above properties. This paper implements a system to which continuants and MLPs are applied, and evaluates the system using a Korean speech database. The results of the experiment prove that continuants and MLPs enable the system to acquire the three properties.

  • PDF

Improving Speaker Enrolling Speed for Speaker Verification Systems Based on Multilayer Perceptrons by Using a Qualitative Background Speaker Selection (정질적 기준을 이용한 다층신경망 기반 화자증명 시스템의 등록속도 단축방법)

  • 이태승;황병원
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.360-366
    • /
    • 2003
  • Although multilayer perceptrons (MLPs) present several advantages against other pattern recognition methods, MLP-based speaker verification systems suffer from slow enrollment speed caused by many background speakers to achieve a low verification error. To solve this problem, the quantitative discriminative cohort speakers (QnDCS) method, by introducing the cohort speakers method into the systems, reduced the number of background speakers required to enroll speakers. Although the QnDCS achieved the goal to some extent, the improvement rate for the enrolling speed was still unsatisfactory. To improve the enrolling speed, this paper proposes the qualitative DCS (QlDCS) by introducing a qualitative criterion to select less background speakers. An experiment for both methods is conducted to use the speaker verification system based on MLPs and continuants, and speech database. The results of the experiment show that the proposed QlDCS method enrolls speakers in two times shorter time than the QnDCS does over the online error backpropagation(EBP) method.

A Design and Implementation of Online Exhibition Application for Disabled Artists

  • Seung Gyeom Kim;Ha Ram Kang;Tae Hun Kim;Jun Hyeok Lee;Won Joo Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.8
    • /
    • pp.77-84
    • /
    • 2024
  • In this paper, we design and implement an online exhibition application based on an Android platform that can showcase the artistic works of disabled artists. This application considers user convenience for disabled artists, particularly providing STT(Speech-to-Text) and TTS(Text-to-Speech) features for visually and hearing impaired individuals. Additionally, for the exhibition of works by disabled artists, the application implements disability certification during registration using disability certificates and registration numbers, ensuring that only authenticated disabled artists can exhibit their works. The database storing personal information of disabled artists and information about art pieces is implemented using MySQL. The server module utilizes RestAPI to transmit data in Json format. To address the large data size of art piece information, it is stored using Firebase Storage, eliminating data size limitations on the server. This application can alleviate issues such as a lack of exhibition space for disabled artists and a lack of communication with the general public.