• Title/Summary/Keyword: Recognition Improve

Search Result 2,186, Processing Time 0.032 seconds

Region of Interest Extraction and Bilinear Interpolation Application for Preprocessing of Lipreading Systems (입 모양 인식 시스템 전처리를 위한 관심 영역 추출과 이중 선형 보간법 적용)

  • Jae Hyeok Han;Yong Ki Kim;Mi Hye Kim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.189-198
    • /
    • 2024
  • Lipreading is one of the important parts of speech recognition, and several studies have been conducted to improve the performance of lipreading in lipreading systems for speech recognition. Recent studies have used method to modify the model architecture of lipreading system to improve recognition performance. Unlike previous research that improve recognition performance by modifying model architecture, we aim to improve recognition performance without any change in model architecture. In order to improve the recognition performance without modifying the model architecture, we refer to the cues used in human lipreading and set other regions such as chin and cheeks as regions of interest along with the lip region, which is the existing region of interest of lipreading systems, and compare the recognition rate of each region of interest to propose the highest performing region of interest In addition, assuming that the difference in normalization results caused by the difference in interpolation method during the process of normalizing the size of the region of interest affects the recognition performance, we interpolate the same region of interest using nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation, and compare the recognition rate of each interpolation method to propose the best performing interpolation method. Each region of interest was detected by training an object detection neural network, and dynamic time warping templates were generated by normalizing each region of interest, extracting and combining features, and mapping the dimensionality reduction of the combined features into a low-dimensional space. The recognition rate was evaluated by comparing the distance between the generated dynamic time warping templates and the data mapped to the low-dimensional space. In the comparison of regions of interest, the result of the region of interest containing only the lip region showed an average recognition rate of 97.36%, which is 3.44% higher than the average recognition rate of 93.92% in the previous study, and in the comparison of interpolation methods, the bilinear interpolation method performed 97.36%, which is 14.65% higher than the nearest neighbor interpolation method and 5.55% higher than the bicubic interpolation method. The code used in this study can be found a https://github.com/haraisi2/Lipreading-Systems.

A Study on the Recognition of Face Based on CNN Algorithms (CNN 알고리즘을 기반한 얼굴인식에 관한 연구)

  • Son, Da-Yeon;Lee, Kwang-Keun
    • Korean Journal of Artificial Intelligence
    • /
    • v.5 no.2
    • /
    • pp.15-25
    • /
    • 2017
  • Recently, technologies are being developed to recognize and authenticate users using bioinformatics to solve information security issues. Biometric information includes face, fingerprint, iris, voice, and vein. Among them, face recognition technology occupies a large part. Face recognition technology is applied in various fields. For example, it can be used for identity verification, such as a personal identification card, passport, credit card, security system, and personnel data. In addition, it can be used for security, including crime suspect search, unsafe zone monitoring, vehicle tracking crime.In this thesis, we conducted a study to recognize faces by detecting the areas of the face through a computer webcam. The purpose of this study was to contribute to the improvement in the accuracy of Recognition of Face Based on CNN Algorithms. For this purpose, We used data files provided by github to build a face recognition model. We also created data using CNN algorithms, which are widely used for image recognition. Various photos were learned by CNN algorithm. The study found that the accuracy of face recognition based on CNN algorithms was 77%. Based on the results of the study, We carried out recognition of the face according to the distance. Research findings may be useful if face recognition is required in a variety of situations. Research based on this study is also expected to improve the accuracy of face recognition.

A Real-Time Implementation of Speech Recognition System Using Oak DSP core in the Car Noise Environment (자동차 환경에서 Oak DSP 코어 기반 음성 인식 시스템 실시간 구현)

  • Woo, K.H.;Yang, T.Y.;Lee, C.;Youn, D.H.;Cha, I.H.
    • Speech Sciences
    • /
    • v.6
    • /
    • pp.219-233
    • /
    • 1999
  • This paper presents a real-time implementation of a speaker independent speech recognition system based on a discrete hidden markov model(DHMM). This system is developed for a car navigation system to design on-chip VLSI system of speech recognition which is used by fixed point Oak DSP core of DSP GROUP LTD. We analyze recognition procedure with C language to implement fixed point real-time algorithms. Based on the analyses, we improve the algorithms which are possible to operate in real-time, and can verify the recognition result at the same time as speech ends, by processing all recognition routines within a frame. A car noise is the colored noise concentrated heavily on the low frequency segment under 400 Hz. For the noise robust processing, the high pass filtering and the liftering on the distance measure of feature vectors are applied to the recognition system. Recognition experiments on the twelve isolated command words were performed. The recognition rates of the baseline recognizer were 98.68% in a stopping situation and 80.7% in a running situation. Using the noise processing methods, the recognition rates were enhanced to 89.04% in a running situation.

  • PDF

Selective Adaptation of Speaker Characteristics within a Subcluster Neural Network

  • Haskey, S.J.;Datta, S.
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.464-467
    • /
    • 1996
  • This paper aims to exploit inter/intra-speaker phoneme sub-class variations as criteria for adaptation in a phoneme recognition system based on a novel neural network architecture. Using a subcluster neural network design based on the One-Class-in-One-Network (OCON) feed forward subnets, similar to those proposed by Kung (2) and Jou (1), joined by a common front-end layer. the idea is to adapt only the neurons within the common front-end layer of the network. Consequently resulting in an adaptation which can be concentrated primarily on the speakers vocal characteristics. Since the adaptation occurs in an area common to all classes, convergence on a single class will improve the recognition of the remaining classes in the network. Results show that adaptation towards a phoneme, in the vowel sub-class, for speakers MDABO and MWBTO Improve the recognition of remaining vowel sub-class phonemes from the same speaker

  • PDF

Estimation of gender and age using CNN-based face recognition algorithm

  • Lim, Sooyeon
    • International journal of advanced smart convergence
    • /
    • v.9 no.2
    • /
    • pp.203-211
    • /
    • 2020
  • This study proposes a method for estimating gender and age that is robust to various external environment changes by applying deep learning-based learning. To improve the accuracy of the proposed algorithm, an improved CNN network structure and learning method are described, and the performance of the algorithm is also evaluated. In this study, in order to improve the learning method based on CNN composed of 6 layers of hidden layers, a network using GoogLeNet's inception module was constructed. As a result of the experiment, the age estimation accuracy of 5,328 images for the performance test of the age estimation method is about 85%, and the gender estimation accuracy is about 98%. It is expected that real-time age recognition will be possible beyond feature extraction of face images if studies on the construction of a larger data set, pre-processing methods, and various network structures and activation functions have been made to classify the age classes that are further subdivided according to age.

Improved object recognition performance of UWB radar according to different window functions

  • Nguyen, Trung Kien;Hong, Ic-Pyo
    • Journal of IKEEE
    • /
    • v.23 no.2
    • /
    • pp.395-402
    • /
    • 2019
  • In this paper, we implemented an Ultra-Wideband radar system using Stripmap Synthetic Apertrure Radar algorithm to recognize objects inside a box. Different window functions such as Hanning, Hamming, Kaiser, and Taylor functions to improve image recognition performance are applied and implemented to radar system. The Ultra-Wideband radar system with 3.1~4.8 GHz broadband and UWB antenna were implemented to recognize the conductor plate located inside 1m3 box. To obtain the image, we use the propagation data in the time domain according to the 1m movement distance and use the Range Doppler algorithm. The effect of different window functions to improve the recognition performance of the image are analyzed. From the compared results, we confirmed that the Kaiser window function can obtain a relatively good image.

A Study on the Improvement of Isolated Word Recognition for Telephone Speech (전화음성의 격리단어인식 개선에 관한 연구)

  • Do, Sam-Joo;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.9 no.4
    • /
    • pp.66-76
    • /
    • 1990
  • In this work, the effect of noise and distortion of a telephone channel on the speech recognition is studied, and methods to improve the recognition rate are proposed. Computer simulation is done using the 100-word test data whichwere made by pronouncing ten times 100-phonetically balanced Korean isolated words in a speaker dependent mode. First, a spectral subtraction method is suggested to improve the noisy speech recognition. Then, the effect of bandwidth limiting and channel distortion is studied. It has been found that bandwidth limiting and amplitude distortion lower the recognition rate significantly, but phase distortion affects little. To reduce the channel effect, we modify the reference pattern according to some training data. When both channel noise and distortion exist, the recognition rate without the proposed method is merely 7.7~26.4%, but the recognition rate with the proposed method is drastically increased to 76.2~92.3%.

  • PDF

Speech Recognition by Integrating Audio, Visual and Contextual Features Based on Neural Networks (신경망 기반 음성, 영상 및 문맥 통합 음성인식)

  • 김명원;한문성;이순신;류정우
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.3
    • /
    • pp.67-77
    • /
    • 2004
  • The recent research has been focused on fusion of audio and visual features for reliable speech recognition in noisy environments. In this paper, we propose a neural network based model of robust speech recognition by integrating audio, visual, and contextual information. Bimodal Neural Network(BMNN) is a multi-layer perception of 4 layers, each of which performs a certain level of abstraction of input features. In BMNN the third layer combines audio md visual features of speech to compensate loss of audio information caused by noise. In order to improve the accuracy of speech recognition in noisy environments, we also propose a post-processing based on contextual information which are sequential patterns of words spoken by a user. Our experimental results show that our model outperforms any single mode models. Particularly, when we use the contextual information, we can obtain over 90% recognition accuracy even in noisy environments, which is a significant improvement compared with the state of art in speech recognition. Our research demonstrates that diverse sources of information need to be integrated to improve the accuracy of speech recognition particularly in noisy environments.

Speech Recognition Performance Improvement using Gamma-tone Feature Extraction Acoustic Model (감마톤 특징 추출 음향 모델을 이용한 음성 인식 성능 향상)

  • Ahn, Chan-Shik;Choi, Ki-Ho
    • Journal of Digital Convergence
    • /
    • v.11 no.7
    • /
    • pp.209-214
    • /
    • 2013
  • Improve the recognition performance of speech recognition systems as a method for recognizing human listening skills were incorporated into the system. In noisy environments by separating the speech signal and noise, select the desired speech signal. but In terms of practical performance of speech recognition systems are factors. According to recognized environmental changes due to noise speech detection is not accurate and learning model does not match. In this paper, to improve the speech recognition feature extraction using gamma tone and learning model using acoustic model was proposed. The proposed method the feature extraction using auditory scene analysis for human auditory perception was reflected In the process of learning models for recognition. For performance evaluation in noisy environments, -10dB, -5dB noise in the signal was performed to remove 3.12dB, 2.04dB SNR improvement in performance was confirmed.

A Black Ice Recognition in Infrared Road Images Using Improved Lightweight Model Based on MobileNetV2 (MobileNetV2 기반의 개선된 Lightweight 모델을 이용한 열화도로 영상에서의 블랙 아이스 인식)

  • Li, Yu-Jie;Kang, Sun-Kyoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.12
    • /
    • pp.1835-1845
    • /
    • 2021
  • To accurately identify black ice and warn the drivers of information in advance so they can control speed and take preventive measures. In this paper, we propose a lightweight black ice detection network based on infrared road images. A black ice recognition network model based on CNN transfer learning has been developed. Additionally, to further improve the accuracy of black ice recognition, an enhanced lightweight network based on MobileNetV2 has been developed. To reduce the amount of calculation, linear bottlenecks and inverse residuals was used, and four bottleneck groups were used. At the same time, to improve the recognition rate of the model, each bottleneck group was connected to a 3×3 convolutional layer to enhance regional feature extraction and increase the number of feature maps. Finally, a black ice recognition experiment was performed on the constructed infrared road black ice dataset. The network model proposed in this paper had an accurate recognition rate of 99.07% for black ice.