• Title/Summary/Keyword: recognition-rate

Search Result 2,809, Processing Time 0.031 seconds

Camera-based Music Score Recognition Using Inverse Filter

  • Nguyen, Tam;Kim, SooHyung;Yang, HyungJeong;Lee, GueeSang
    • International Journal of Contents
    • /
    • v.10 no.4
    • /
    • pp.11-17
    • /
    • 2014
  • The influence of acquisition environment on music score images captured by a camera has not yet been seriously examined. All existing Optical Music Recognition (OMR) systems attempt to recognize music score images captured by a scanner under ideal conditions. Therefore, when such systems process images under the influence of distortion, different viewpoints or suboptimal illumination effects, the performance, in terms of recognition accuracy and processing time, is unacceptable for deployment in practice. In this paper, a novel, lightweight but effective approach for dealing with the issues caused by camera based music scores is proposed. Based on the staff line information, musical rules, run length code, and projection, all regions of interest are determined. Templates created from inverse filter are then used to recognize the music symbols. Therefore, all fragmentation and deformation problems, as well as missed recognition, can be overcome using the developed method. The system was evaluated on a dataset consisting of real images captured by a smartphone. The achieved recognition rate and processing time were relatively competitive with state of the art works. In addition, the system was designed to be lightweight compared with the other approaches, which mostly adopted machine learning algorithms, to allow further deployment on portable devices with limited computing resources.

Recurrent Neural Network with Backpropagation Through Time Learning Algorithm for Arabic Phoneme Recognition

  • Ismail, Saliza;Ahmad, Abdul Manan
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1033-1036
    • /
    • 2004
  • The study on speech recognition and understanding has been done for many years. In this paper, we propose a new type of recurrent neural network architecture for speech recognition, in which each output unit is connected to itself and is also fully connected to other output units and all hidden units [1]. Besides that, we also proposed the new architecture and the learning algorithm of recurrent neural network such as Backpropagation Through Time (BPTT, which well-suited. The aim of the study was to observe the difference of Arabic's alphabet like "alif" until "ya". The purpose of this research is to upgrade the people's knowledge and understanding on Arabic's alphabet or word by using Recurrent Neural Network (RNN) and Backpropagation Through Time (BPTT) learning algorithm. 4 speakers (a mixture of male and female) are trained in quiet environment. Neural network is well-known as a technique that has the ability to classified nonlinear problem. Today, lots of researches have been done in applying Neural Network towards the solution of speech recognition [2] such as Arabic. The Arabic language offers a number of challenges for speech recognition [3]. Even through positive results have been obtained from the continuous study, research on minimizing the error rate is still gaining lots attention. This research utilizes Recurrent Neural Network, one of Neural Network technique to observe the difference of alphabet "alif" until "ya".

  • PDF

Pre-Processing for Performance Enhancement of Speech Recognition in Digital Communication Systems (디지털 통신 시스템에서의 음성 인식 성능 향상을 위한 전처리 기술)

  • Seo, Jin-Ho;Park, Ho-Chong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.7
    • /
    • pp.416-422
    • /
    • 2005
  • Speech recognition in digital communication systems has very low performance due to the spectral distortion caused by speech codecs. In this paper, the spectral distortion by speech codecs is analyzed and a pre-processing method which compensates for the spectral distortion is proposed for performance enhancement of speech recognition. Three standard speech codecs. IS-127 EVRC. ITU G.729 CS-ACELP and IS-96 QCELP. are considered for algorithm development and evaluation, and a single method which can be applied commonly to all codecs is developed. The performance of the proposed method is evaluated for three codecs, and by using the speech features extracted from the compensated spectrum. the recognition rate is improved by the maximum of $15.6\%$ compared with that using the degraded speech features.

Facial Shape Recognition Using Self Organized Feature Map(SOFM)

  • Kim, Seung-Jae;Lee, Jung-Jae
    • International journal of advanced smart convergence
    • /
    • v.8 no.4
    • /
    • pp.104-112
    • /
    • 2019
  • This study proposed a robust detection algorithm. It detects face more stably with respect to changes in light and rotation forthe identification of a face shape. The proposed algorithm uses face shape asinput information in a single camera environment and divides only face area through preprocessing process. However, it is not easy to accurately recognize the face area that is sensitive to lighting changes and has a large degree of freedom, and the error range is large. In this paper, we separated the background and face area using the brightness difference of the two images to increase the recognition rate. The brightness difference between the two images means the difference between the images taken under the bright light and the images taken under the dark light. After separating only the face region, the face shape is recognized by using the self-organization feature map (SOFM) algorithm. SOFM first selects the first top neuron through the learning process. Second, the highest neuron is renewed by competing again between the highest neuron and neighboring neurons through the competition process. Third, the final top neuron is selected by repeating the learning process and the competition process. In addition, the competition will go through a three-step learning process to ensure that the top neurons are updated well among neurons. By using these SOFM neural network algorithms, we intend to implement a stable and robust real-time face shape recognition system in face shape recognition.

Speed Sign Recognition by Using Hierarchical Application of Color Segmentation and Normalized Template Matching (컬러 세그멘테이션 및 정규화 템플릿 매칭의 계층적 적용에 의한 속도 표지판 인식)

  • Lee, Kang-Ho;Lee, Kyu-Won
    • The KIPS Transactions:PartB
    • /
    • v.16B no.4
    • /
    • pp.257-262
    • /
    • 2009
  • A method of the region extraction and recognition of a speed sign in the real road environment is proposed. The region of speed sign is extracted by using color information and then numbers are segmented in the region. We improve the recognition rate by performing an incline compensation of the speed sign for directions clockwise and counterclockwise. In image sequences of the real road environment, a robust recognition results are achieved with speed signs at normal condition as well as inclined.

wheelchair system design on speech recognition function (음성인식 기능을 탑재한 다기능 휠체어 시스템 설계 및 구현)

  • 김정훈;류홍석;강재명;강성인;김관형;이상배
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2002.05a
    • /
    • pp.1-5
    • /
    • 2002
  • The purpose of this paper is developing a speech recognition module in a wheelchair for the sake of convenience. of the disability. For this system, we used TMS320C32 as a main processor; eliminated noise by applying Winer filler while considering characteristics of noise environment in pre-processing stage, and; extracted 12 feature patterns per france using LPC&Cepstrum. Then, we implemented the hybrid form combining DTW (Dynamic Time Warping), which is generally used for isolated words in the conventional algorithms, in the recognition Part, and NN (Neural network) to prevent any error of recognition. In this research, we achieved a recognition rate of more than 96% on isolated words when DTW and Hybrid forms were individually experimented in noise environment

  • PDF

Robust Speech Recognition with Car Noise based on the Wavelet Filter Banks (웨이블렛 필터뱅크를 이용한 자동차 소음에 강인한 고립단어 음성인식)

  • Lee, Dae-Jong;Kwak, Keun-Chang;Ryu, Jeong-Woong;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.2
    • /
    • pp.115-122
    • /
    • 2002
  • This paper proposes a robust speech recognition algorithm based on the wavelet filter banks. Since the proposed algorithm adopts a multiple band decision-making scheme, it performs robustness for noise as the presence of noisy severely degrades the performance of speech recognition system. For evaluating the performance of the proposed scheme, we compared it with the conventional speech recognizer based on the VQ for the 10-isolated korean digits with car noise. Here, the proposed method showed more 9~27% improvement of the recognition rate than the conventional VQ algorithm for the various car noisy environments.

Acoustic and Pronunciation Model Adaptation Based on Context dependency for Korean-English Speech Recognition (한국인의 영어 인식을 위한 문맥 종속성 기반 음향모델/발음모델 적응)

  • Oh, Yoo-Rhee;Kim, Hong-Kook;Lee, Yeon-Woo;Lee, Seong-Ro
    • MALSORI
    • /
    • v.68
    • /
    • pp.33-47
    • /
    • 2008
  • In this paper, we propose a hybrid acoustic and pronunciation model adaptation method based on context dependency for Korean-English speech recognition. The proposed method is performed as follows. First, in order to derive pronunciation variant rules, an n-best phoneme sequence is obtained by phone recognition. Second, we decompose each rule into a context independent (CI) or a context dependent (CD) one. To this end, it is assumed that a different phoneme structure between Korean and English makes CI pronunciation variabilities while coarticulation effects are related to CD pronunciation variabilities. Finally, we perform an acoustic model adaptation and a pronunciation model adaptation for CI and CD pronunciation variabilities, respectively. It is shown from the Korean-English speech recognition experiments that the average word error rate (WER) is decreased by 36.0% when compared to the baseline that does not include any adaptation. In addition, the proposed method has a lower average WER than either the acoustic model adaptation or the pronunciation model adaptation.

  • PDF

A Comparison of Artificial Neural Networks and Statistical Pattern Recognition Methods for Rotation Machine Condition Classification (회전기계 고장 진단에 적용한 인공 신경회로망과 통계적 패턴 인식 기법의 비교 연구)

  • Kim, Chang-Gu;Park, Kwang-Ho;Kee, Chang-Doo
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.16 no.12
    • /
    • pp.119-125
    • /
    • 1999
  • This paper gives an overview of the various approaches to designing statistical pattern recognition scheme based on Bayes discrimination rule and the artificial neural networks for rotating machine condition classification. Concerning to Bayes discrimination rule, this paper contains the linear discrimination rule applied to classification into several multivariate normal distributions with common covariance matrices, the quadratic discrimination rule under different covariance matrices. Also we discribes k-nearest neighbor method to directly estimate a posterior probability of each class. Five features are extracted in time domain vibration signals. Employing these five features, statistical pattern classifier and neural networks have been established to detect defects on rotating machine. Four different cases of rotation machine were observed. The effects of k number and neural networks structures on monitoring performance have also been investigated. For the comparison of diagnosis performance of these two method, their recognition success rates are calculated form the test data. The result of experiment which classifies the rotating machine conditions using each method presents that the neural networks shows the highest recognition rate.

  • PDF

Appearance-based Object Recognition Using Higher Order Local Auto Correlation Feature Information (고차 국소 자동 상관 특징 정보를 이용한 외관 기반 객체 인식)

  • Kang, Myung-A
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.7
    • /
    • pp.1439-1446
    • /
    • 2011
  • This paper describes the algorithm that lowers the dimension, maintains the object recognition and significantly reduces the eigenspace configuration time by combining the higher correlation feature information and Principle Component Analysis. Since the suggested method doesn't require a lot of computation than the method using existing geometric information or stereo image, the fact that it is very suitable for building the real-time system has been proved through the experiment. In addition, since the existing point to point method which is a simple distance calculation has many errors, in this paper to improve recognition rate the recognition error could be reduced by using several successive input images as a unit of recognition with K-Nearest Neighbor which is the improved Class to Class method.