• Title/Summary/Keyword: recognition-rate

Search Result 2,809, Processing Time 0.026 seconds

A study on the voice command recognition at the motion control in the industrial robot (산업용 로보트의 동작제어 명령어의 인식에 관한 연구)

  • 이순요;권규식;김홍태
    • Journal of the Ergonomics Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.3-10
    • /
    • 1991
  • The teach pendant and keyboard have been used as an input device of control command in human-robot sustem. But, many problems occur in case that the usef is a novice. So, speech recognition system is required to communicate between a human and the robot. In this study, Korean voice commands, eitht robot commands, and ten digits based on the broad phonetic analysis are described. Applying broad phonetic analysis, phonemes of voice commands are divided into phoneme groups, such as plosive, fricative, affricative, nasal, and glide sound, having similar features. And then, the feature parameters and their ranges to detect phoneme groups are found by minimax method. Classification rules are consisted of combination of the feature parameters, such as zero corssing rate(ZCR), log engery(LE), up and down(UD), formant frequency, and their ranges. Voice commands were recognized by the classification rules. The recognition rate was over 90 percent in this experiment. Also, this experiment showed that the recognition rate about digits was better than that about robot commands.

  • PDF

A Study on Korean Digit Recognition by Using Phoneme Boundary Information (음소경계 정보를 이용한 한국어 숫자음 인식에 관한 연구)

  • Choi Goan Mook;Lim Dong Chul;Lee Haing Sei
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.117-120
    • /
    • 2001
  • Recognition rate of Korean digit is lower than that of other words because it is composed of similar phonemes. In this paper, a new method is proposed for the improvement of recognition rate by using the phoneme boundary information. In addition, the proposed method rarely increase cost because phoneme boundary is found by using simple method. We experimented with speech data of one man and then obtained results of enhanced speech recognition rate.

  • PDF

The small scale Voice Dialing System using TMS320C30 (TMS320C30을 이용한 소규모 Voice Dialing 시스템)

  • 이항섭
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1991.06a
    • /
    • pp.58-63
    • /
    • 1991
  • This paper describes development of small scale voice dialing system using TMS320C30. Recognition vocabuliary is used 50 department name within university. In vocabulary below the middle scale, word unit recognition is more practice than phoneme unit or syllable unit recognition. In this paper, we performend recognition and model generation using DMS(Dynamic Multi-Section) and implemeted voice dialing system using TMS320C30. As a result of recognition, we achieved a 98% recognition rate in condition of section 22 and weight 0.6 and recognition time took 4 seconds.

  • PDF

Optical Character Recognition for Hindi Language Using a Neural-network Approach

  • Yadav, Divakar;Sanchez-Cuadrado, Sonia;Morato, Jorge
    • Journal of Information Processing Systems
    • /
    • v.9 no.1
    • /
    • pp.117-140
    • /
    • 2013
  • Hindi is the most widely spoken language in India, with more than 300 million speakers. As there is no separation between the characters of texts written in Hindi as there is in English, the Optical Character Recognition (OCR) systems developed for the Hindi language carry a very poor recognition rate. In this paper we propose an OCR for printed Hindi text in Devanagari script, using Artificial Neural Network (ANN), which improves its efficiency. One of the major reasons for the poor recognition rate is error in character segmentation. The presence of touching characters in the scanned documents further complicates the segmentation process, creating a major problem when designing an effective character segmentation technique. Preprocessing, character segmentation, feature extraction, and finally, classification and recognition are the major steps which are followed by a general OCR. The preprocessing tasks considered in the paper are conversion of gray scaled images to binary images, image rectification, and segmentation of the document's textual contents into paragraphs, lines, words, and then at the level of basic symbols. The basic symbols, obtained as the fundamental unit from the segmentation process, are recognized by the neural classifier. In this work, three feature extraction techniques-: histogram of projection based on mean distance, histogram of projection based on pixel value, and vertical zero crossing, have been used to improve the rate of recognition. These feature extraction techniques are powerful enough to extract features of even distorted characters/symbols. For development of the neural classifier, a back-propagation neural network with two hidden layers is used. The classifier is trained and tested for printed Hindi texts. A performance of approximately 90% correct recognition rate is achieved.

A Study on the Improvement of Isolated Word Recognition for Telephone Speech (전화음성의 격리단어인식 개선에 관한 연구)

  • Do, Sam-Joo;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.9 no.4
    • /
    • pp.66-76
    • /
    • 1990
  • In this work, the effect of noise and distortion of a telephone channel on the speech recognition is studied, and methods to improve the recognition rate are proposed. Computer simulation is done using the 100-word test data whichwere made by pronouncing ten times 100-phonetically balanced Korean isolated words in a speaker dependent mode. First, a spectral subtraction method is suggested to improve the noisy speech recognition. Then, the effect of bandwidth limiting and channel distortion is studied. It has been found that bandwidth limiting and amplitude distortion lower the recognition rate significantly, but phase distortion affects little. To reduce the channel effect, we modify the reference pattern according to some training data. When both channel noise and distortion exist, the recognition rate without the proposed method is merely 7.7~26.4%, but the recognition rate with the proposed method is drastically increased to 76.2~92.3%.

  • PDF

Comparison of Male/Female Speech Features and Improvement of Recognition Performance by Gender-Specific Speech Recognition (남성과 여성의 음성 특징 비교 및 성별 음성인식에 의한 인식 성능의 향상)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.5 no.6
    • /
    • pp.568-574
    • /
    • 2010
  • In an effort to improve the speech recognition rate, we investigated performance comparison between speaker-independent and gender-specific speech recognitions. For this purpose, 20 male and 20 female speakers each pronounced 300 isolated Korean words and the speeches were divided into 4 groups: female, male, and two mixed genders. To examine the validity for the gender-specific speech recognition, Fourier spectrum and MFCC feature vectors averaged over male and female speakers separately were examined. The result showed distinction between the two genders, which supports the motivation for the gender-specific speech recognition. In experiments of speech recognition rate, the error rate for the gender-specific case was shown to be less than50% compared to that of the speaker-independent case. From the obtained results, it might be suggested that hierarchical recognition of gender and speech recognition might yield better performance over the current method of speech recognition.

New Template Based Face Recognition Using Log-polar Mapping and Affine Transformation (로그폴라 사상과 어파인 변환을 이용한 새로운 템플릿 기반 얼굴 인식)

  • Kim, Mun-Gab;Choi, Il;Chien, Sung-Il
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.39 no.2
    • /
    • pp.1-10
    • /
    • 2002
  • This paper presents the new template based human face recognition methods to improve the recognition performance against scale and in-plane rotation variations of face images. To enhance the recognition performance, the templates are generated by linear or nonlinear operation on multiple images including different scales and rotations of faces. As the invariant features to allow for scale and rotation variations of face images, we adopt the affine transformation, the log-polar mapping, and the log-polar image based FFT. The proposed recognition methods are evaluated in terms of the recognition rate and the processing time. Experimental results show that the proposed template based methods lead to higher recognition rate than the single image based one. The affine transformation based face recognition method shows marginally higher recognition rate than those of the log-polar mapping based method and the log-polar image based FFT, while, in the aspect of processing time, the log-polar mapping based method is the fastest one.

A Study on the Multilingual Speech Recognition using International Phonetic Language (IPA를 활용한 다국어 음성 인식에 관한 연구)

  • Kim, Suk-Dong;Kim, Woo-Sung;Woo, In-Sung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.7
    • /
    • pp.3267-3274
    • /
    • 2011
  • Recently, speech recognition technology has dramatically developed, with the increase in the user environment of various mobile devices and influence of a variety of speech recognition software. However, for speech recognition for multi-language, lack of understanding of multi-language lexical model and limited capacity of systems interfere with the improvement of the recognition rate. It is not easy to embody speech expressed with multi-language into a single acoustic model and systems using several acoustic models lower speech recognition rate. In this regard, it is necessary to research and develop a multi-language speech recognition system in order to embody speech comprised of various languages into a single acoustic model. This paper studied a system that can recognize Korean and English as International Phonetic Language (IPA), based on the research for using a multi-language acoustic model in mobile devices. Focusing on finding an IPA model which satisfies both Korean and English phonemes, we get 94.8% of the voice recognition rate in Korean and 95.36% in English.

Development of a Visitor Recognition System Using Open APIs for Face Recognition (얼굴 인식 Open API를 활용한 출입자 인식 시스템 개발)

  • Ok, Kisu;Kwon, Dongwoo;Kim, Hyeonwoo;An, Donghyeok;Ju, Hongtaek
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.4
    • /
    • pp.169-178
    • /
    • 2017
  • Recently, as the interest rate and necessity for security is growing, the demands for a visitor recognition system are being increased. In order to recognize a visitor in visitor recognition systems, the various biometric methods are used. In this paper, we propose a visitor recognition system based on face recognition. The visitor recognition system improves the face recognition performance by integrating several open APIs as a single algorithm and by performing the ensemble of the recognition results. For the performance evaluation, we collected the face data for about five months and measured the performance of the visitor recognition system. As the results of the performance measurement, the visitor recognition system shows a higher face recognition rate than using a single face recognition API, meeting the requirements on performance.

Variation of the Verification Error Rate of Automatic Speaker Recognition System With Voice Conditions (다양한 음성을 이용한 자동화자식별 시스템 성능 확인에 관한 연구)

  • Hong Soo Ki
    • MALSORI
    • /
    • no.43
    • /
    • pp.45-55
    • /
    • 2002
  • High reliability of automatic speaker recognition regardless of voice conditions is necessary for forensic application. Audio recordings in real cases are not consistent in voice conditions, such as duration, time interval of recording, given text or conversational speech, transmission channel, etc. In this study the variation of verification error rate of ASR system with the voice conditions was investigated. As a result in order to decrease both false rejection rate and false acception rate, the various voices should be used for training and the duration of train voices should be longer than the test voices.

  • PDF