• Title/Summary/Keyword: Recognition Improvement

Search Result 1,513, Processing Time 0.026 seconds

The FE-SM/SONN for Recognition of the Car Skid Mark (자동차 스키드마크 인식을 위한 FE-SM/SONN)

  • Koo, Gun-Seo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.1
    • /
    • pp.125-132
    • /
    • 2012
  • In this paper, We proposes FE-SM/SONN for recognizing blurred and smeared skid mark image caused by sudden braking of a vehicle. In a blurred and smeared skid marks, tread pattern image is ambiguous. To improve recognition of such image, FE-SM/SONN reads skid marks utilizing Fuzzy Logic and distinguishing tread pattern SONN(Self Organization Neural Networks) recognizer. In order to substantiate this finding, 48 tire models and 144 skid marks were compared and overall recognition ratio was 89%. This study showed 13.51% improved recognition compared to existing back propagation recognizer, and 8.78% improvement than FE-MCBP. The expected effect of this research is achieving recognition of ambiguous images by extracting distinguishing features, and the finding concludes that even when tread pattern image is in grey scale, Fuzzy Logic enables the tread pattern recognizable.

A Study on Error Correction Using Phoneme Similarity in Post-Processing of Speech Recognition (음성인식 후처리에서 음소 유사율을 이용한 오류보정에 관한 연구)

  • Han, Dong-Jo;Choi, Ki-Ho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.6 no.3
    • /
    • pp.77-86
    • /
    • 2007
  • Recently, systems based on speech recognition interface such as telematics terminals are being developed. However, many errors still exist in speech recognition and then studies about error correction are actively conducting. This paper proposes an error correction in post-processing of the speech recognition based on features of Korean phoneme. To support this algorithm, we used the phoneme similarity considering features of Korean phoneme. The phoneme similarity, which is utilized in this paper, rams data by mono-phoneme, and uses MFCC and LPC to extract feature in each Korean phoneme. In addition, the phoneme similarity uses a Bhattacharrya distance measure to get the similarity between one phoneme and the other. By using the phoneme similarity, the error of eo-jeol that may not be morphologically analyzed could be corrected. Also, the syllable recovery and morphological analysis are performed again. The results of the experiment show the improvement of 7.5% and 5.3% for each of MFCC and LPC.

  • PDF

LSTM RNN-based Korean Speech Recognition System Using CTC (CTC를 이용한 LSTM RNN 기반 한국어 음성인식 시스템)

  • Lee, Donghyun;Lim, Minkyu;Park, Hosung;Kim, Ji-Hwan
    • Journal of Digital Contents Society
    • /
    • v.18 no.1
    • /
    • pp.93-99
    • /
    • 2017
  • A hybrid approach using Long Short Term Memory (LSTM) Recurrent Neural Network (RNN) has showed great improvement in speech recognition accuracy. For training acoustic model based on hybrid approach, it requires forced alignment of HMM state sequence from Gaussian Mixture Model (GMM)-Hidden Markov Model (HMM). However, high computation time for training GMM-HMM is required. This paper proposes an end-to-end approach for LSTM RNN-based Korean speech recognition to improve learning speed. A Connectionist Temporal Classification (CTC) algorithm is proposed to implement this approach. The proposed method showed almost equal performance in recognition rate, while the learning speed is 1.27 times faster.

Improvement of Pattern Recognition Capacity of the Fuzzy ART with the Variable Learning (가변 학습을 적용한 퍼지 ART 신경망의 패턴 인식 능력 향상)

  • Lee, Chang Joo;Son, Byounghee;Hong, Hee Sik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.12
    • /
    • pp.954-961
    • /
    • 2013
  • In this paper, we propose a new learning method using a variable learning to improve pattern recognition in the FCSR(Fast Commit Slow Recode) learning method of the Fuzzy ART. Traditional learning methods have used a fixed learning rate in updating weight vector(representative pattern). In the traditional method, the weight vector will be updated with a fixed learning rate regardless of the degree of similarity of the input pattern and the representative pattern in the category. In this case, the updated weight vector is greatly influenced from the input pattern where it is on the boundary of the category. Thus, in noisy environments, this method has a problem in increasing unnecessary categories and reducing pattern recognition capacity. In the proposed method, the lower similarity between the representative pattern and input pattern is, the lower input pattern contributes for updating weight vector. As a result, this results in suppressing the unnecessary category proliferation and improving pattern recognition capacity of the Fuzzy ART in noisy environments.

Acoustic model training using self-attention for low-resource speech recognition (저자원 환경의 음성인식을 위한 자기 주의를 활용한 음향 모델 학습)

  • Park, Hosung;Kim, Ji-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.483-489
    • /
    • 2020
  • This paper proposes acoustic model training using self-attention for low-resource speech recognition. In low-resource speech recognition, it is difficult for acoustic model to distinguish certain phones. For example, plosive /d/ and /t/, plosive /g/ and /k/ and affricate /z/ and /ch/. In acoustic model training, the self-attention generates attention weights from the deep neural network model. In this study, these weights handle the similar pronunciation error for low-resource speech recognition. When the proposed method was applied to Time Delay Neural Network-Output gate Projected Gated Recurrent Unit (TNDD-OPGRU)-based acoustic model, the proposed model showed a 5.98 % word error rate. It shows absolute improvement of 0.74 % compared with TDNN-OPGRU model.

Implementation of a Speech Recognition System for a Car Navigation System (차량 항법용 음성인식 시스템의 구현)

  • Lee, Tae-Han;Yang, Tae-Young;Park, Sang-Taick;Lee, Chung-Yong;Youn, Dae-Hee;Cha, Il-Hwan
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.9
    • /
    • pp.103-112
    • /
    • 1999
  • In this paper, a speaker-independent isolated world recognition system for a car navigation system is implemented using a general digital signal processor. This paper presents a method combining SNR normalization with RAS as a noise processing method. The semi-continuous hidden markov model is adopted and TMS320C31 is used in implementing the real-time system. Recognition word set is composed of 69 command words for a car navigation system. Experimental results showed that the recognition performance has a maximum of 93.62% in case of a combination of SNR normalization and spectral subtraction, and the performance improvement rate of the system is 3.69%, Presented noise processing method showed good speech recognition performance in 5dB SNR in car environment.

  • PDF

Speech Recognition Performance Improvement using a convergence of GMM Phoneme Unit Parameter and Vocabulary Clustering (GMM 음소 단위 파라미터와 어휘 클러스터링을 융합한 음성 인식 성능 향상)

  • Oh, SangYeob
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.8
    • /
    • pp.35-39
    • /
    • 2020
  • DNN error is small compared to the conventional speech recognition system, DNN is difficult to parallel training, often the amount of calculations, and requires a large amount of data obtained. In this paper, we generate a phoneme unit to estimate the GMM parameters with each phoneme model parameters from the GMM to solve the problem efficiently. And it suggests ways to improve performance through clustering for a specific vocabulary to effectively apply them. To this end, using three types of word speech database was to have a DB build vocabulary model, the noise processing to extract feature with Warner filters were used in the speech recognition experiments. Results using the proposed method showed a 97.9% recognition rate in speech recognition. In this paper, additional studies are needed to improve the problems of improved over fitting.

Face Recognition under Varying Pose using Local Area obtained by Side-view Pose Normalization (측면 포즈정규화를 통한 부분 영역을 이용한 포즈 변화에 강인한 얼굴 인식)

  • Ahn, Byeong-Doo;Ko, Han-Seok
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.4 s.304
    • /
    • pp.59-68
    • /
    • 2005
  • This paper proposes a face recognition under varying poses using local area obtained by side-view pose normalization. General normalization methods for face recognition under varying pose have a problem with the information about invisible area of face. Generally this problem is solved by compensation, but there are many cases where the image is distorted or features lost due to compensation .To solve this problem we normalize the face pose in side-view to reduce distortion that happens mainly in areas that have large depth variation. We only use undistorted area, removing the area that has been distorted by normalization. We consider two cases of yaw pose variation and pitch pose variation, and by experiments, we confirm the improvement of recognition performance.

Performance Improvement of Connected Digit Recognition by Considering Phonemic Variations in Korean Digit and Speaking Styles (한국어 숫자음의 음운변화 및 화자 발성특성을 고려한 연결숫자 인식의 성능향상)

  • 송명규;김형순
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.401-406
    • /
    • 2002
  • Each Korean digit is composed of only a syllable, so recognizers as well as Korean often have difficulty in recognizing it. When digit strings are pronounced, the original pronunciation of each digit is largely changed due to the co-articulation effect. In addition to these problems, the distortion caused by various channels and noises degrades the recognition performance of Korean connected digit string. This paper dealt with some techniques to improve recognition performance of it, which include defining a set of PLUs by considering phonemic variations in Korean digit and constructing a recognizer to handle speakers various speaking styles. In the speaker-independent connected digit recognition experiments using telephone speech, the proposed techniques with 1-Gaussian/state gave string accuracy of 83.2%, i. e., 7.2% error rate reduction relative to baseline system. With 11-Gaussians/state, we achieved the highest string accuracy of 91.8%, i. e., 4.7% error rate reduction.

Performance Improvement of Traffic Signal Lights Recognition Based on Adaptive Morphological Analysis (적응적 형태학적 분석에 기초한 신호등 인식률 성능 개선)

  • Kim, Jae-Gon;Kim, Jin-soo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.9
    • /
    • pp.2129-2137
    • /
    • 2015
  • Lots of research and development works have been actively focused on the self-driving vehicles, locally and globally. In order to implement the self-driving vehicles, lots of fundamental core technologies need to be successfully developed and, specially, it is noted that traffic lights detection and recognition system is an essential part of the computer vision technologies in the self-driving vehicles. Up to nowadays, most conventional algorithm for detecting and recognizing traffic lights are mainly based on the color signal analysis, but these approaches have limits on the performance improvements that can be achieved due to the color signal noises and environmental situations. In order to overcome the performance limits, this paper introduces the morphological analysis for the traffic lights recognition. That is, by considering the color component analysis and the shape analysis such as rectangles and circles simultaneously, the efficiency of the traffic lights recognitions can be greatly increased. Through several simulations, it is shown that the proposed method can highly improve the recognition rate as well as the mis-recognition rate.