• Title/Summary/Keyword: recognition-rate

Search Result 2,809, Processing Time 0.029 seconds

Object-Action and Risk-Situation Recognition Using Moment Change and Object Size's Ratio (모멘트 변화와 객체 크기 비율을 이용한 객체 행동 및 위험상황 인식)

  • Kwak, Nae-Joung;Song, Teuk-Seob
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.5
    • /
    • pp.556-565
    • /
    • 2014
  • This paper proposes a method to track object of real-time video transferred through single web-camera and to recognize risk-situation and human actions. The proposed method recognizes human basic actions that human can do in daily life and finds risk-situation such as faint and falling down to classify usual action and risk-situation. The proposed method models the background, obtains the difference image between input image and the modeled background image, extracts human object from input image, tracts object's motion and recognizes human actions. Tracking object uses the moment information of extracting object and the characteristic of object's recognition is moment's change and ratio of object's size between frames. Actions classified are four actions of walking, waling diagonally, sitting down, standing up among the most actions human do in daily life and suddenly falling down is classified into risk-situation. To test the proposed method, we applied it for eight participants from a video of a web-cam, classify human action and recognize risk-situation. The test result showed more than 97 percent recognition rate for each action and 100 percent recognition rate for risk-situation by the proposed method.

Human hand gesture identification framework using SIFT and knowledge-level technique

  • Muhammad Haroon;Saud Altaf;Zia-ur- Rehman;Muhammad Waseem Soomro;Sofia Iqbal
    • ETRI Journal
    • /
    • v.45 no.6
    • /
    • pp.1022-1034
    • /
    • 2023
  • In this study, the impact of varying lighting conditions on recognition and decision-making was considered. The luminosity approach was presented to increase gesture recognition performance under varied lighting. An efficient framework was proposed for sensor-based sign language gesture identification, including picture acquisition, preparing data, obtaining features, and recognition. The depth images were collected using multiple Microsoft Kinect devices, and data were acquired by varying resolutions to demonstrate the idea. A case study was designed to attain acceptable accuracy in gesture recognition under variant lighting. Using American Sign Language (ASL), the dataset was created and analyzed under various lighting conditions. In ASL-based images, significant feature points were selected using the scale-invariant feature transformation (SIFT). Finally, an artificial neural network (ANN) classified hand gestures using specified characteristics for validation. The suggested method was successful across a variety of illumination conditions and different image sizes. The total effectiveness of NN architecture was shown by the 97.6% recognition accuracy rate of 26 alphabets dataset with just a 2.4% error rate.

A Study on the Effective Command Delivery of Commanders Using Speech Recognition Technology (국방 분야에서 전장 소음 환경 하에 음성 인식 기술 연구)

  • Yeong-hoon Kim;Hyun Kwon
    • Convergence Security Journal
    • /
    • v.24 no.2
    • /
    • pp.161-165
    • /
    • 2024
  • Recently, speech recognition models have been advancing, accompanied by the development of various speech processing technologies to obtain high-quality data. In the defense sector, efforts are being made to integrate technologies that effectively remove noise from speech data in noisy battlefield situations and enable efficient speech recognition. This paper proposes a method for effective speech recognition in the midst of diverse noise in a battlefield scenario, allowing commanders to convey orders. The proposed method involves noise removal from noisy speech followed by text conversion using OpenAI's Whisper model. Experimental results show that the proposed method reduces the Character Error Rate (CER) by 6.17% compared to the existing method that does not remove noise. Additionally, potential applications of the proposed method in the defense are discussed.

A Study on Measuring the Speaking Rate of Speaking Signal by Using Line Spectrum Pair Coefficients

  • Jang, Kyung-A;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3E
    • /
    • pp.18-24
    • /
    • 2001
  • Speaking rate represents how many phonemes in speech signal have in limited time. It is various and changeable depending on the speakers and the characters of each phoneme. The preprocessing to remove the effect of variety of speaking rate is necessary before recognizing the speech in the present speech recognition systems. So if it is possible to estimate the speaking rate in advance, the performance of speech recognition can be higher. However, the conventional speech vocoder decides the transmission rate for analyzing the fixed period no regardless of the variety rate of phoneme but if the speaking rate can be estimated in advance, it is very important information of speech to use in speech coding part as well. It increases the quality of sound in vocoder as well as applies the variable transmission rate. In this paper, we propose the method for presenting the speaking rate as parameter in speech vocoder. To estimate the speaking rate, the variety of phoneme is estimated and the Line Spectrum Pairs is used to estimate it. As a result of comparing the speaking rate performance with the proposed algorithm and passivity method worked by eye, error between two methods is 5.38% about fast utterance and 1.78% about slow utterance and the accuracy between two methods is 98% about slow utterance and 94% about fast utterances in 30 dB SNR and 10 dB SNR respectively.

  • PDF

Selecting Good Speech Features for Recognition

  • Lee, Young-Jik;Hwang, Kyu-Woong
    • ETRI Journal
    • /
    • v.18 no.1
    • /
    • pp.29-41
    • /
    • 1996
  • This paper describes a method to select a suitable feature for speech recognition using information theoretic measure. Conventional speech recognition systems heuristically choose a portion of frequency components, cepstrum, mel-cepstrum, energy, and their time differences of speech waveforms as their speech features. However, these systems never have good performance if the selected features are not suitable for speech recognition. Since the recognition rate is the only performance measure of speech recognition system, it is hard to judge how suitable the selected feature is. To solve this problem, it is essential to analyze the feature itself, and measure how good the feature itself is. Good speech features should contain all of the class-related information and as small amount of the class-irrelevant variation as possible. In this paper, we suggest a method to measure the class-related information and the amount of the class-irrelevant variation based on the Shannon's information theory. Using this method, we compare the mel-scaled FFT, cepstrum, mel-cepstrum, and wavelet features of the TIMIT speech data. The result shows that, among these features, the mel-scaled FFT is the best feature for speech recognition based on the proposed measure.

  • PDF

Human Motion Recognition Based on Spatio-temporal Convolutional Neural Network

  • Hu, Zeyuan;Park, Sange-yun;Lee, Eung-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.8
    • /
    • pp.977-985
    • /
    • 2020
  • Aiming at the problem of complex feature extraction and low accuracy in human action recognition, this paper proposed a network structure combining batch normalization algorithm with GoogLeNet network model. Applying Batch Normalization idea in the field of image classification to action recognition field, it improved the algorithm by normalizing the network input training sample by mini-batch. For convolutional network, RGB image was the spatial input, and stacked optical flows was the temporal input. Then, it fused the spatio-temporal networks to get the final action recognition result. It trained and evaluated the architecture on the standard video actions benchmarks of UCF101 and HMDB51, which achieved the accuracy of 93.42% and 67.82%. The results show that the improved convolutional neural network has a significant improvement in improving the recognition rate and has obvious advantages in action recognition.

A Study on Design and Implementation of Embedded System for speech Recognition Process

  • Kim, Jung-Hoon;Kang, Sung-In;Ryu, Hong-Suk;Lee, Sang-Bae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.2
    • /
    • pp.201-206
    • /
    • 2004
  • This study attempted to develop a speech recognition module applied to a wheelchair for the physically handicapped. In the proposed speech recognition module, TMS320C32 was used as a main processor and Mel-Cepstrum 12 Order was applied to the pro-processor step to increase the recognition rate in a noisy environment. DTW (Dynamic Time Warping) was used and proven to be excellent output for the speaker-dependent recognition part. In order to utilize this algorithm more effectively, the reference data was compressed to 1/12 using vector quantization so as to decrease memory. In this paper, the necessary diverse technology (End-point detection, DMA processing, etc.) was managed so as to utilize the speech recognition system in real time

Speech Recognition Using HMM Based on Fuzzy (피지에 기초를 둔 HMM을 이용한 음성 인식)

  • 안태옥;김순협
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.28B no.12
    • /
    • pp.68-74
    • /
    • 1991
  • This paper proposes a HMM model based on fuzzy, as a method on the speech recognition of speaker-independent. In this recognition method, multi-observation sequences which give proper probabilities by fuzzy rule according to order of short distance from VQ codebook are obtained. Thereafter, the HMM model using this multi-observation sequences is generated, and in case of recognition, a word that has the most highest probability is selected as a recognized word. The vocabularies for recognition experiment are 146 DDD are names, and the feature parameter is 10S0thT LPC cepstrum coefficients. Besides the speech recognition experiments of proposed model, for comparison with it, we perform the experiments by DP, MSVQ and general HMM under same condition and data. Through the experiment results, it is proved that HMM model using fuzzy proposed in this paper is superior to DP method, MSVQ and general HMM model in recognition rate and computational time.

  • PDF

A Recognition System for Multi-Form Korean Characters Based on Hierarchical Temporal Memory

  • Haibao, Nan;Bae, Sun-Gap;Bae, Jong-Min;Kang, Hyun-Syug
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.12
    • /
    • pp.1718-1727
    • /
    • 2009
  • Traditional character recognition systems usually aim at characters with simple variation. With the development of multimedia technology, printed characters may appear more diversely. Existing recognition technologies can't deal with Hangul recognition effectively in diverse environments. This paper presents a recognition system for multi-form Korean characters called RSMFK, which is based on the model of Hierarchical Temporal Memory (HTM). Our system can effectively recognize the printed Korean characters of different fonts, scales, rotation, noise and background. HTM is a model which simulates the neocortex of human brain to recognize and memorize intelligently. Experimental results show that RSMFK performs a good recognition rate of 97.8% on average, which is proved to be obviously improved over the conventional methods.

  • PDF

Intelligent Activity Recognition based on Improved Convolutional Neural Network

  • Park, Jin-Ho;Lee, Eung-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.6
    • /
    • pp.807-818
    • /
    • 2022
  • In order to further improve the accuracy and time efficiency of behavior recognition in intelligent monitoring scenarios, a human behavior recognition algorithm based on YOLO combined with LSTM and CNN is proposed. Using the real-time nature of YOLO target detection, firstly, the specific behavior in the surveillance video is detected in real time, and the depth feature extraction is performed after obtaining the target size, location and other information; Then, remove noise data from irrelevant areas in the image; Finally, combined with LSTM modeling and processing time series, the final behavior discrimination is made for the behavior action sequence in the surveillance video. Experiments in the MSR and KTH datasets show that the average recognition rate of each behavior reaches 98.42% and 96.6%, and the average recognition speed reaches 210ms and 220ms. The method in this paper has a good effect on the intelligence behavior recognition.