Search | Korea Science

Variation Analysis of Feature Parameters According to the Channel Distortion of Korean Telephone Digit Speech (한국어 숫자음 전화음성의 채널왜곡에 따른 특징파라미터의 변이 분석)

정성윤;손종목;김민성;배건성
- Proceedings of the IEEK Conference
- /
- 2002.06d
- /
- pp.191-194
- /
- 2002
The final purpose of this paper is the enhancement of speech recognition rate under the matched telephone environment between training data and test data. To analyze the effect by the distortion of the changing telephone channel on every call, MFCC is used as the feature parameter and CMN, RTCN, and RASTA are used as channel compensation techniques. For each case, the variation of feature parameters of all phones is analyzed. And, we find recognition rates according to each compensation method using the continuous HMM recognizer, and examine the relationship between variation and recognition rate.
PDF

A Study on the On-Line Handwritten Hangeul Pattern Recognition Using WLD with Parallelish (병렬성을 갖는 WLD 알고리즘을 이용한 온라인 필기체 한글, 영문자 및 숫자 패턴인식)

김은원;조원경
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.28B no.10
- /
- pp.747-754
- /
- 1991
In this paper, we studies the on-line recognition of handwritten character using WLD(weighted levenshtein distance) algorithm with parallelism. The Hangeul can be separated for unit of phonemes and the alphanumeric can be separated for unit of characters. And, we studies the parallelism and the concurrency of the WLD algorithm for realization of special-purpose processor. By the simulation result for 10, 000 characters in practical sentences, the recognition rate of strokes in obtained 96.57$\%$ and the separation rate for phonemes and characteristics is obtained 95.4$\%$.
PDF

Improved speech emotion recognition using histogram equalization and data augmentation techniques (히스토그램 등화와 데이터 증강 기법을 이용한 개선된 음성 감정 인식)

Heo, Woon-Haeng;Kwon, Oh-Wook
- Phonetics and Speech Sciences
- /
- v.9 no.2
- /
- pp.77-83
- /
- 2017
We propose a new method to reduce emotion recognition errors caused by variation in speaker characteristics and speech rate. Firstly, for reducing variation in speaker characteristics, we adjust features from a test speaker to fit the distribution of all training data by using the histogram equalization (HE) algorithm. Secondly, for dealing with variation in speech rate, we augment the training data with speech generated in various speech rates. In computer experiments using EMO-DB, KRN-DB and eNTERFACE-DB, the proposed method is shown to improve weighted accuracy relatively by 34.7%, 23.7% and 28.1%, respectively.
https://doi.org/10.13064/KSSS.2017.9.2.077 인용 PDF KSCI

Proposal of Image Detection Algorithm to Implement Hand Gestures

Woo, Eun-Ju;Moon, Yu-Sung;Choi, Ung-Se;Kim, Jung-Won
- Journal of IKEEE
- /
- v.22 no.4
- /
- pp.1222-1225
- /
- 2018
This paper proposes an image detection algorithm to implement gesture. By using a camera sensor, the performance of the extracted image algorithm based on the gesture pattern was verified through experiments. In addition, through the experiments, we confirmed the proposed method's possibility of the implementation. For efficient image detection, we applied a segmentation technique based on image transition which divides into small units. To improve gesture recognition, the proposed method not only has high recognition rate and low false acceptance rate in real gesture environment, but also designed an algorithm that efficiently finds optimal thresholds that can be applied.
https://doi.org/10.7471/ikeee.2018.22.4.1222 인용 PDF KSCI HTML

Speech Emotion Recognition with SVM, KNN and DSVM

Hadhami Aouani ;Yassine Ben Ayed
- International Journal of Computer Science & Network Security
- /
- v.23 no.8
- /
- pp.40-48
- /
- 2023
Speech Emotions recognition has become the active research theme in speech processing and in applications based on human-machine interaction. In this work, our system is a two-stage approach, namely feature extraction and classification engine. Firstly, two sets of feature are investigated which are: the first one is extracting only 13 Mel-frequency Cepstral Coefficient (MFCC) from emotional speech samples and the second one is applying features fusions between the three features: Zero Crossing Rate (ZCR), Teager Energy Operator (TEO), and Harmonic to Noise Rate (HNR) and MFCC features. Secondly, we use two types of classification techniques which are: the Support Vector Machines (SVM) and the k-Nearest Neighbor (k-NN) to show the performance between them. Besides that, we investigate the importance of the recent advances in machine learning including the deep kernel learning. A large set of experiments are conducted on Surrey Audio-Visual Expressed Emotion (SAVEE) dataset for seven emotions. The results of our experiments showed given good accuracy compared with the previous studies.
https://doi.org/10.22937/IJCSNS.2023.23.8.6 인용 PDF

Digital enhancement of pronunciation assessment: Automated speech recognition and human raters

Miran Kim
- Phonetics and Speech Sciences
- /
- v.15 no.2
- /
- pp.13-20
- /
- 2023
This study explores the potential of automated speech recognition (ASR) in assessing English learners' pronunciation. We employed ASR technology, acknowledged for its impartiality and consistent results, to analyze speech audio files, including synthesized speech, both native-like English and Korean-accented English, and speech recordings from a native English speaker. Through this analysis, we establish baseline values for the word error rate (WER). These were then compared with those obtained for human raters in perception experiments that assessed the speech productions of 30 first-year college students before and after taking a pronunciation course. Our sub-group analyses revealed positive training effects for Whisper, an ASR tool, and human raters, and identified distinct human rater strategies in different assessment aspects, such as proficiency, intelligibility, accuracy, and comprehensibility, that were not observed in ASR. Despite such challenges as recognizing accented speech traits, our findings suggest that digital tools such as ASR can streamline the pronunciation assessment process. With ongoing advancements in ASR technology, its potential as not only an assessment aid but also a self-directed learning tool for pronunciation feedback merits further exploration.
https://doi.org/10.13064/KSSS.2023.15.2.013 인용 PDF

A Spoken Korean-Digits Recognition System Based on Linear Prdiction Spectra (선형예측에 의한 숫자음성 자동인식)

;安居院猛
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.17 no.3
- /
- pp.12-19
- /
- 1980
A speech recognition system for separately pronounced Korean digits is described. The system is composed of four stages ; parameter extraction, segmentation by voiced-unovied analysis, formant tracking and pattern matching. Digit speech is segmented into an unvoiced segment and/or a voiced one using ZCR and energy measurements, then to estimate the first three formant frequencies a relatively simple formant tracking scheme is applied to the raw formant data extracted from linear prediction spectra. Finally, pattern matching is made using dynamic programmig method. Recognition experiment is carried out for 150 digit utterences spoken by three male speakers, and recgnition rate 94 % is obtained.
PDF

Studies on image recognition of human sperms using a neural network

Kitamura, S.;Tanaka, K.;Kurematsu, Y.;Takeshima, M.;Iwahara, H.;Teraguchi, T.
- 제어로봇시스템학회:학술대회논문집
- /
- 1989.10a
- /
- pp.1135-1139
- /
- 1989
Three layered neural network was applied for the pattern recognition problem of human spermatozoa in clinical test. The goodness of recognition rate was studied in relation to the number of hidden layer cells and of output layer cells. The proposed method provided better results than conventional template matching technique. Parallel processing of the back propagation learning algorithm was also studied using transputers and its performance was evaluated.
PDF

Speaker Recognition using PCA in Driving Car Environments (PCA를 이용한 자동차 주행 환경에서의 화자인식)

Yu, Ha-Jin
- Proceedings of the KSPS conference
- /
- 2005.04a
- /
- pp.103-106
- /
- 2005
The goal of our research is to build a text independent speaker recognition system that can be used in any condition without any additional adaptation process. The performance of speaker recognition systems can be severally degraded in some unknown mismatched microphone and noise conditions. In this paper, we show that PCA(Principal component analysis) without dimension reduction can greatly increase the performance to a level close to matched condition. The error rate is reduced more by the proposed augmented PCA, which augment an axis to the feature vectors of the most confusable pairs of speakers before PCA
PDF

Implementation and Enhancement of GMM Face Recognition System using Flatness Measure (평탄도 측정을 이용한 GMM 얼굴인식기 구현 및 성능향상)

천영하;고대영;김진영;백성준
- Proceedings of the IEEK Conference
- /
- 2003.07e
- /
- pp.2004-2007
- /
- 2003
This paper describes a method of performance enhancement using Flatness Mesure(FM) for the Gaussian Mixture Model(GMM) face recognition systems. Using this measure we discard the frames having low information before training and test. As the result, the performance increases about 9％ in the lower mixtures and calculation burden is decreased. As well, the recognition error rate is decreased under the illumination change surroundings. We use the 2D DCT coefficients lot face feature vectors and experiments are carried out on the Olivetti Research Laboratory (ORL) face database.
PDF

Search Result 2,809, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)