• Title/Summary/Keyword: vowel recognition

Search Result 138, Processing Time 0.03 seconds

Sentence design for speech recognition database

  • Zu Yiqing
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.472-472
    • /
    • 1996
  • The material of database for speech recognition should include phonetic phenomena as much as possible. At the same time, such material should be phonetically compact with low redundancy[1, 2]. The phonetic phenomena in continuous speech is the key problem in speech recognition. This paper describes the processing of a set of sentences collected from the database of 1993 and 1994 "People's Daily"(Chinese newspaper) which consist of news, politics, economics, arts, sports etc.. In those sentences, both phonetic phenometla and sentence patterns are included. In continuous speech, phonemes always appear in the form of allophones which result in the co-articulary effects. The task of designing a speech database should be concerned with both intra-syllabic and inter-syllabic allophone structures. In our experiments, there are 404 syllables, 415 inter-syllabic diphones, 3050 merged inter-syllabic triphones and 2161 merged final-initial structures in read speech. Statistics on the database from "People's Daily" gives and evaluation to all of the possible phonetic structures. In this sentence set, we first consider the phonetic balances among syllables, inter-syllabic diphones, inter-syllabic triphones and semi-syllables with their junctures. The syllabic balances ensure the intra-syllabic phenomena such as phonemes, initial/final and consonant/vowel. the rest describes the inter-syllabic jucture. The 1560 sentences consist of 96% syllables without tones(the absent syllables are only used in spoken language), 100% inter-syllabic diphones, 67% inter-syllabic triphones(87% of which appears in Peoples' Daily). There are rougWy 17 kinds of sentence patterns which appear in our sentence set. By taking the transitions between syllables into account, the Chinese speech recognition systems have gotten significantly high recognition rates[3, 4]. The following figure shows the process of collecting sentences. [people's Daily Database] -> [segmentation of sentences] -> [segmentation of word group] -> [translate the text in to Pin Yin] -> [statistic phonetic phenomena & select useful paragraph] -> [modify the selected sentences by hand] -> [phonetic compact sentence set]

  • PDF

A Review on the Models of Letter Transposition Effect and Exploration of Hangul Model (단어재인에 있어서 글자교환 효과와 한글 처리 모형 탐색)

  • Lee, Chang H.;Lee, Yoonhyoung
    • Korean Journal of Cognitive Science
    • /
    • v.25 no.1
    • /
    • pp.1-24
    • /
    • 2014
  • Growing boy of studies focus on the letter transposition effect since it gives the information on how letters are coded and what variables are involved in the processes of word recognition. This review investigated various models on letter transposition effect. While most proposed models rely mainly on the bottom-up processes, evidences from various studies suggested the necessity of the top-down variables based on the cognitive processing mechanism. Especially, empirical evidences suggested that Hangul model should include a position specific processing mechanism based on onset, vowel, and coda of the Korean character.

Classification of Diphthongs using Acoustic Phonetic Parameters (음향음성학 파라메터를 이용한 이중모음의 분류)

  • Lee, Suk-Myung;Choi, Jeung-Yoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.167-173
    • /
    • 2013
  • This work examines classification of diphthongs, as part of a distinctive feature-based speech recognition system. Acoustic measurements related to the vocal tract and the voice source are examined, and analysis of variance (ANOVA) results show that vowel duration, energy trajectory, and formant variation are significant. A balanced error rate of 17.8% is obtained for 2-way diphthong classification on the TIMIT database, and error rates of 32.9%, 29.9%, and 20.2% are obtained for /aw/, /ay/, and /oy/, for 4-way classification, respectively. Adding the acoustic features to widely used Mel-frequency cepstral coefficients also improves classification.

Experiments on Extraction of Non-Parametric Warping Functions for Speaker Normalization (화자 정규화를 위한 비정형 워핑함수 도출에 관한 실험)

  • Shin, Ok-Keun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.5
    • /
    • pp.255-261
    • /
    • 2005
  • In this paper. experiments are conducted to extract a set of non-Parametric warping functions to examine the characteristics of the warping among speakers' utterances. For this Purpose. we made use of MFCC and LP spectra of vowels in choosing reference spectrum of each vowel as well as representative spectra of each speaker. These spectra are compared by DTW to give the warping functions of each speaker. The set of warping functions are then defined by clustering the warping functions of all the speakers. Noting that male and female warping functions have shapes similar to Piecewise linear function and Power function respectively, a new hybrid set of warping functions is defined. The effectiveness of the extracted warping functions are evaluated by conducting phone level recognition experiments, and improvements in accuracy rate are observed in both warping functions.

A Study on Printed Hangeul Recognition with Dynamic Jaso Segmentation and Neural Network (동적자소분할과 신경망을 이용한 인쇄체 한글 문자인식기에 관한 연구)

  • 이판호;장희돈;남궁재찬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.11
    • /
    • pp.2133-2146
    • /
    • 1994
  • In this paper, we present a method for dynamic Jaso segmentation and Hangeul recognition using neural network. It uses the feature vector which is extracted from the mesh depending on the segmentation result. At first, each character is converted to 256 dimension feature vector by four direction contributivity and $8\times8$ mesh. And then, the character is classified into 6 class by neural network and is segmented into Jaso using the classification result the statistic vowel location information and the structural information. After Jaso segmentation, Hanguel recognition using neural network is performed. We experiment on four font of which three fonts are used for training the neural net and the rest is used of testing. Each font has the 2350 characters which are comprised in KS C 5601. The overall recognition rates for the training data and the testing data are 97,4% and 94&% respectively. This result shows the effectivness of proposed method.

  • PDF

A Study on the Preprocessing Method Using Construction of Watershed for Character Image segmentation

  • Nam Sang Yep;Choi Young Kyoo;Kwon Yun Jung;Lee Sung Chang
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.814-818
    • /
    • 2004
  • Off-line handwritten character recognition is in difficulty of incomplete preprocessing because it has not dynamic and timing information besides has various handwriting, extreme overlap of the consonant and vowel and many error image of stroke. Consequently off-line handwritten character recognition needs to study about preprocessing of various methods such as binarization and thinning. This paper considers running time of watershed algorithm and the quality of resulting image as preprocessing For off-line handwritten Korean character recognition. So it proposes application of effective watershed algorithm for segmentation of character region and background region in gray level character image and segmentation function for binarization image and segmentation function for binarization by extracted watershed image. Besides it proposes thinning methods which effectively extracts skeleton through conditional test mask considering running time and quality. of skeleton, estimates efficiency of existing methods and this paper's methods as running time and quality. Watershed image conversion uses prewitt operator for gradient image conversion, extracts local minima considering 8-neighborhood pixel. And methods by using difference of mean value is used in region merging step, Converted watershed image by means of this methods separates effectively character region and background region applying to segmentation function. Average execution time on the previous method was 2.16 second and on this paper method was 1.72 second. We prove that this paper's method removed noise effectively with overlap stroke as compared with the previous method.

  • PDF

Lip-reading System based on Bayesian Classifier (베이지안 분류를 이용한 립 리딩 시스템)

  • Kim, Seong-Woo;Cha, Kyung-Ae;Park, Se-Hyun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.4
    • /
    • pp.9-16
    • /
    • 2020
  • Pronunciation recognition systems that use only video information and ignore voice information can be applied to various customized services. In this paper, we develop a system that applies a Bayesian classifier to distinguish Korean vowels via lip shapes in images. We extract feature vectors from the lip shapes of facial images and apply them to the designed machine learning model. Our experiments show that the system's recognition rate is 94% for the pronunciation of 'A', and the system's average recognition rate is approximately 84%, which is higher than that of the CNN tested for comparison. Our results show that our Bayesian classification method with feature values from lip region landmarks is efficient on a small training set. Therefore, it can be used for application development on limited hardware such as mobile devices.

Improvements on Speech Recognition for Fast Speech (고속 발화음에 대한 음성 인식 향상)

  • Lee Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.2
    • /
    • pp.88-95
    • /
    • 2006
  • In this Paper. a method for improving the performance of automatic speech recognition (ASR) system for conversational speech is proposed. which mainly focuses on increasing the robustness against the rapidly speaking utterances. The proposed method doesn't require an additional speech recognition task to represent speaking rate quantitatively. Energy distribution for special bands is employed to detect the vowel regions, the number of vowels Per unit second is then computed as speaking rate. To improve the Performance for fast speech. in the pervious methods. a sequence of the feature vectors is expanded by a given scaling factor, which is computed by a ratio between the standard phoneme duration and the measured one. However, in the method proposed herein. utterances are classified by their speaking rates. and the scaling factor is determined individually for each class. In this procedure, a maximum likelihood criterion is employed. By the results from the ASR experiments devised for the 10-digits mobile phone number. it is confirmed that the overall error rate was reduced by $17.8\%$ when the proposed method is employed

A Study on the Formant Comparison of Korean Monophthongs according to Age and Gender -A Survey on Patients in Oriental Hospitals- (연령 및 성별에 따른 한국인 단모음 포먼트 비교에 관한 연구 -한방병원 내원환자를 중심으로-)

  • Kim, Young-Su;Kim, Keun Ho;Kim, Jong Yeol;Jang, Jun-Su
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.73-80
    • /
    • 2013
  • Formant is one of the essential vocal features for research of voice production, recognition and synthesis. Numerous studies were established on foreign languages including English vowels. However, studies related to Korean were done with a limited number of voice data. In this study, we compare four formants according to age and gender using a large number of Korean monophthongs. A total of 2614 Korean speakers participated in our experiments. We summarize statistical results by mean and standard deviation for each formant of five monophthongs. The results show a notable difference in each age and gender group. A quantitative study based on a large dataset is suggested for future studies on Korean speech sounds.

The Development of New Hangul Code "Truecode" and Its Applications (새로운 한글코드 “Truecode”의 개발과 응용)

  • 이문형;김기두
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.30B no.5
    • /
    • pp.43-51
    • /
    • 1993
  • A new Hangul code called Truecode is developed for accomodating to the future computing environments of graphical user interface and multimedia as well as for corresponding with the invention principle of Hangul. Truecode is not a forced two-byte code of syllable unit, as completion-type of combination-type, currently used, but a one byte code of phoneme unit, which can represent initial consonant, vowel, and final consonant each. It is quite different from three-byte code of syllable unit and also does not require the fill code used for three-byte code. We expect great contribution to the Hangul culture from Truecode's some important following features. It can express all the Korean characters we may imagine and does not cause any problem in communication. As well as we may use direct connection font, we can assign ont-to-one correspondence between Truecode and a keyboard with three sets. Truecode has a good advantage in developing application softwares of Hangul and it can nicely be applied to the fields of speech recognition and artificial intelligence using natural language.

  • PDF