Search | Korea Science

A Study on Recognition Units and Methods to Align Training Data for Korean Speech Recognition) (한국어 인식을 위한 인식 단위와 학습 데이터 분류 방법에 대한 연구)

황영수
- Journal of the Institute of Convergence Signal Processing
- /
- v.4 no.2
- /
- pp.40-45
- /
- 2003
This is the study on recognition units and segmentation of phonemes. In the case of making large vocabulary speech recognition system, it is better to use the segment than the syllable or the word as the recognition unit. In this paper, we study on the proper recognition units and segmentation of phonemes for Korean speech recognition. For experiments, we use the speech toolkit of OGI in U.S.A. The result shows that the recognition rate of the case in which the diphthong is established as a single unit is superior to that of the case in which the diphthong is established as two units, i.e. a glide plus a vowel. And recognizer using manually-aligned training data is a little superior to that using automatically-aligned training data. Also, the recognition rate of the case in which the bipbone is used as the recognition unit is better than that of the case in which the mono-Phoneme is used.
PDF

A Real-Time Embedded Speech Recognition System

Nam, Sang-Yep;Lee, Chun-Woo;Lee, Sang-Won;Park, In-Jung
- Proceedings of the IEEK Conference
- /
- 2002.07a
- /
- pp.690-693
- /
- 2002
According to the growth of communication biz, embedded market rapidly developing in domestic and overseas. Embedded system can be used in various way such as wire and wireless communication equipment or information products. There are lots of developing performance applying speech recognition to embedded system, for instance, PDA, PCS, CDMA-2000 or IMT-2000. This study implement minimum memory of speech recognition engine and DB for apply real time embedded system. The implement measure of speech recognition equipment to fit on embedded system is like following. At first, DC element is removed from Input voice and then a compensation of high frequency was achieved by pre-emphasis with coefficients value, 0.97 and constitute division data as same size as 256 sample by lapped shift method. Through by Levinson - Durbin Algorithm, these data can get linear predictive coefficient and again, using Cepstrum - Transformer attain feature vectors. During HMM training, We used Baum-Welch reestimation Algorithm for each words training and can get the recognition result from executed likelihood method on each words. The used speech data is using 40 speech command data and 10 digits extracted form each 15 of male and female speaker spoken menu control command of Embedded system. Since, in many times, ARM CPU is adopted in embedded system, it's peformed porting the speech recognition engine on ARM core evaluation board. And do the recognition test with select set 1 and set 3 parameter that has good recognition rate on commander and no digit after the several tests using by 5 proposal recognition parameter sets. The recognition engine of recognition rate shows 95%, speech commander recognizer shows 96% and digits recognizer shows 94%.
PDF

A Study On Three-dimensional Optimized Face Recognition Model : Comparative Studies and Analysis of Model Architectures (3차원 얼굴인식 모델에 관한 연구: 모델 구조 비교연구 및 해석)

Park, Chan-Jun;Oh, Sung-Kwun;Kim, Jin-Yul
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.64 no.6
- /
- pp.900-911
- /
- 2015
In this paper, 3D face recognition model is designed by using Polynomial based RBFNN(Radial Basis Function Neural Network) and PNN(Polynomial Neural Network). Also recognition rate is performed by this model. In existing 2D face recognition model, the degradation of recognition rate may occur in external environments such as face features using a brightness of the video. So 3D face recognition is performed by using 3D scanner for improving disadvantage of 2D face recognition. In the preprocessing part, obtained 3D face images for the variation of each pose are changed as front image by using pose compensation. The depth data of face image shape is extracted by using Multiple point signature. And whole area of face depth information is obtained by using the tip of a nose as a reference point. Parameter optimization is carried out with the aid of both ABC(Artificial Bee Colony) and PSO(Particle Swarm Optimization) for effective training and recognition. Experimental data for face recognition is built up by the face images of students and researchers in IC&CI Lab of Suwon University. By using the images of 3D face extracted in IC&CI Lab. the performance of 3D face recognition is evaluated and compared according to two types of models as well as point signature method based on two kinds of depth data information.
https://doi.org/10.5370/KIEE.2015.64.6.900 인용 PDF KSCI KPUBS HTML

Research on Robust Face Recognition against Lighting Variation using CNN (CNN을 적용한 조명변화에 강인한 얼굴인식 연구)

Kim, Yeon-Ho;Park, Sung-Wook;Kim, Do-Yeon
- The Journal of the Korea institute of electronic communication sciences
- /
- v.12 no.2
- /
- pp.325-330
- /
- 2017
Face recognition technology has been studied for decades and is being used in various areas such as security, entertainment, and mobile services. The main problem with face recognition technology is that the recognition rate is significantly reduced depending on the environmental factors such as brightness, illumination angle, and image rotation. Therefore, in this paper, we propose a robust face recognition against lighting variation using CNN which has been recently re-evaluated with the development of computer hardware and algorithms capable of processing a large amount of computation. For performance verification, PCA, LBP, and DCT algorithms were compared with the conventional face recognition algorithms. The recognition was improved by 9.82%, 11.6%, and 4.54%, respectively. Also, the recognition improvement of 5.24% was recorded in the comparison of the face recognition research result using the existing neural network, and the final recognition rate was 99.25%.
https://doi.org/10.13067/JKIECS.2017.12.2.325 인용 PDF KSCI

Vocabulary Recognition Post-Processing System using Phoneme Similarity Error Correction (음소 유사율 오류 보정을 이용한 어휘 인식 후처리 시스템)

Ahn, Chan-Shik;Oh, Sang-Yeob
- Journal of the Korea Society of Computer and Information
- /
- v.15 no.7
- /
- pp.83-90
- /
- 2010
In vocabulary recognition system has reduce recognition rate unrecognized error cause of similar phoneme recognition and due to provided inaccurate vocabulary. Input of inaccurate vocabulary by feature extraction case of recognition by appear result of unrecognized or similar phoneme recognized. Also can't feature extraction properly when phoneme recognition is similar phoneme recognition. In this paper propose vocabulary recognition post-process error correction system using phoneme likelihood based on phoneme feature. Phoneme likelihood is monophone training phoneme data by find out using MFCC and LPC feature extraction method. Similar phoneme is induced able to recognition of accurate phoneme due to inaccurate vocabulary provided unrecognized reduced error rate. Find out error correction using phoneme likelihood and confidence when vocabulary recognition perform error correction for error proved vocabulary. System performance comparison as a result of recognition improve represent MFCC 7.5%, LPC 5.3% by system using error pattern and system using semantic.
https://doi.org/10.9708/jksci.2010.15.7.083 인용 PDF KSCI

Speech Intelligibility of Alaryngeal Voices and Pre/Post Operative Evaluation of Voice Quality using the Speech Recognition Program(HUVOIS) (음성인식프로그램을 이용한 무후두 음성의 말 명료도와 병적 음성의 수술 전후 개선도 측정)

Kim, Han-Su;Choi, Seong-Hee;Kim, Jae-In;Lee, Jae-Yol;Choi, Hong-Shik
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.15 no.2
- /
- pp.92-97
- /
- 2004
Background and Objectives : The purpose of this study was to examine objectively pre and post operative voice quality evaluation and intelligibility of alaryngeal voice using speech recognition program, HUVOIS. Materials and Methods : 2 laryngologists and 1 speech pathologist were evaluated 'G', 'R', 'B' in the GRBAS sclae and speech intelligibility using NTID rating scale from standard paragraph. And also acoustic estimates such as jitter, shimmer, HNR were obtained from Lx Speech Studio. Results : Speech recognition rate was not significantly different between pre and post operation for pathological vocie samples though voice quality(G, B) and acoustic values(Jitter, HNR) were significantly improved after post operation. In Alaryngeal voices, reed type electrolarynx 'Moksori' was the highest both speech intelligibility and speech recognition rate, whereas esophageal speech was the lowest. Coefficient correlation of speech intelligibility and speech recognition rate was found in alaryngeal voices, but not in pathological voices. Conclusion : Current study was not proved speech recognition program, HUVOIS during telephone program was not objective and efficient method for assisting subjective GRBAS scale.
PDF

Pose-invariant Face Recognition using a Cylindrical Model and Stereo Camera (원통 모델과 스테레오 카메라를 이용한 포즈 변화에 강인한 얼굴인식)

노진우;홍정화;고한석
- Journal of KIISE:Software and Applications
- /
- v.31 no.7
- /
- pp.929-938
- /
- 2004
This paper proposes a pose-invariant face recognition method using cylindrical model and stereo camera. We divided this paper into two parts. One is single input image case, the other is stereo input image case. In single input image case, we normalized a face's yaw pose using cylindrical model, and in stereo input image case, we normalized a face's pitch pose using cylindrical model with previously estimated pitch pose angle by the stereo geometry. Also, since we have an advantage that we can utilize two images acquired at the same time, we can increase overall recognition performance by decision-level fusion. Through representative experiments, we achieved an increased recognition rate from 61.43% to 94.76% by the yaw pose transform, and the recognition rate with the proposed method achieves as good as that of the more complicated 3D face model. Also, by using stereo camera system we achieved an increased recognition rate 5.24% more for the case of upper face pose, and 3.34% more by decision-level fusion.
PDF KSCI

Learning-based approach for License Plate Recognition System (학습 기반의 자동차 번호판 인식 시스템)

김종배;김갑기;김광인;박민호;김항준
- Journal of the Institute of Convergence Signal Processing
- /
- v.2 no.1
- /
- pp.1-11
- /
- 2001
This paper presents a learning-based approach for the construction of license Plate recognition system. The system consist of three modules. They are respectively, car detection module, license plate recognition module and recognition module. Car detection module detects a car in the given image sequence obtained from the camera with simple color-based approach. Segmentation module extracts the license plate in detect car image using neural network as filters for analyzing the color and texture properties of license plate. Recognition module then reads characters in detected license plate with support vector machine (SVM)-based characters recognizer. The system has been tested from parking lot and tollgate, etc. and have show the following performances on average: Car detect rate 100%, segmentation rate 97.5%, and character recognition rate about 97.2%. Overall system performances is 94.7% and processing time is one sec. Then our propose system does well using real world.
PDF

Improvement of User Recognition Rate using Multi-modal Biometrics (다중생체인식 기법을 이용한사용자 인식률 향상)

Geum, Myung-Hwan;Lee, Kyu-Won;Lee, Bong-Hwan
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.12 no.8
- /
- pp.1456-1462
- /
- 2008
In general, it is known a single biometric-based personal authentication has limitation to improve recognition rate due to weakness of individual recognition scheme. The recognition rate of face recognition system can be reduced by environmental factor such as illumination, while speaker verification system does not perform well with added surrounding noise. In this paper, a multi-modal biometric system composed of face and voice recognition system is proposed in order to improve the performance of the individual authentication system. The proposed empirical weight sum rule based on the reliability of the individual authentication system is applied to improve the performance of multi-modal biometrics. Since the proposed system is implemented using JAVA applet with security function, it can be utilized in the field of user authentication on the generic Web.
https://doi.org/10.6109/jkiice.2008.12.8.1456 인용 PDF KSCI

Efficient Continuous Vocabulary Clustering Modeling for Tying Model Recognition Performance Improvement (공유모델 인식 성능 향상을 위한 효율적인 연속 어휘 군집화 모델링)

Ahn, Chan-Shik;Oh, Sang-Yeob
- Journal of the Korea Society of Computer and Information
- /
- v.15 no.1
- /
- pp.177-183
- /
- 2010
In continuous vocabulary recognition system by statistical method vocabulary recognition to be performed using probability distribution it also modeling using phoneme clustering for based sample probability parameter presume. When vocabulary search that low recognition rate problem happened in express vocabulary result from presumed probability parameter by not defined phoneme and insert phoneme and it has it's bad points of gaussian model the accuracy unsecure for one clustering modeling. To improve suggested probability distribution mixed gaussian model to optimized for based resemble Euclidean and Bhattacharyya distance measurement method mixed clustering modeling that system modeling for be searching phoneme probability model in clustered model. System performance as a result of represent vocabulary dependence recognition rate of 98.63%, vocabulary independence recognition rate of 97.91%.
https://doi.org/10.9708/jksci.2010.15.1.177 인용 PDF KSCI

Search Result 2,809, Processing Time 0.034 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)