• Title/Summary/Keyword: recognition-rate

Search Result 2,809, Processing Time 0.031 seconds

Research on Methods to Increase Recognition Rate of Korean Sign Language using Deep Learning

  • So-Young Kwon;Yong-Hwan Lee
    • Journal of Platform Technology
    • /
    • v.12 no.1
    • /
    • pp.3-11
    • /
    • 2024
  • Deaf people who use sign language as their first language sometimes have difficulty communicating because they do not know spoken Korean. Deaf people are also members of society, so we must support to create a society where everyone can live together. In this paper, we present a method to increase the recognition rate of Korean sign language using a CNN model. When the original image was used as input to the CNN model, the accuracy was 0.96, and when the image corresponding to the skin area in the YCbCr color space was used as input, the accuracy was 0.72. It was confirmed that inserting the original image itself would lead to better results. In other studies, the accuracy of the combined Conv1d and LSTM model was 0.92, and the accuracy of the AlexNet model was 0.92. The CNN model proposed in this paper is 0.96 and is proven to be helpful in recognizing Korean sign language.

  • PDF

A Study on the Real Time Processing Technique of speech Signal (음성신호의 실시간 처리기법에 관한 연구)

  • Lee, Taek-Soo;Rhn, Chang;Kim, Sung-Nak;Rhee, Sang-Burm
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1094-1096
    • /
    • 1987
  • Zero-crossing analysis techniques have been applied to speech recognition. Zero-crossing rate, level-crossing rate and differentiated zero-crossing rate in time domain we used in analyzing speech signals. Speech samples could be stored in memory buffer in real time.

  • PDF

Method of the Analysis and the Visualization of Urban Landscape in Seoul : Focus on the Difference of Cognitions between Korean and Japanese (서울의 도시경관 이미지 분석 및 시각화 방법 : 한국인과 일본인의 인식차이를 중심으로)

  • Kim, Jung-Seop;Lee, Kyoo-Hwang
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.10
    • /
    • pp.148-158
    • /
    • 2012
  • In the world, the flow of globalization are causing that many cities in the world are trying to enhance city awareness and city image through the establishment of their identity. Seoul has also preceded various studies in order to establish its identity of the urban landscape in this flow. But, most of studies have only considered some of the problems of the urban landscape and its improvement. Therefore, a study to understand the current built urban landscape and its own identity has not been studied yet. In this study, we propose an analyzing and visualizing framework of the urban landscape through an internet survey using the compare contents based on the differences in Cognition and recognition. The core of the framework which analyses the differences in Cognitions and recognition about the urban landscape was performed by the correct and clear rate. In addition, it was extracted from the answers of the survey and the visualization, was performed by diagrams of accuracy rate that was derived from the correlation between the correct rate and the clear rate.

A Design on Face Recognition System Based on pRBFNNs by Obtaining Real Time Image (실시간 이미지 획득을 통한 pRBFNNs 기반 얼굴인식 시스템 설계)

  • Oh, Sung-Kwun;Seok, Jin-Wook;Kim, Ki-Sang;Kim, Hyun-Ki
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.12
    • /
    • pp.1150-1158
    • /
    • 2010
  • In this study, the Polynomial-based Radial Basis Function Neural Networks is proposed as one of the recognition part of overall face recognition system that consists of two parts such as the preprocessing part and recognition part. The design methodology and procedure of the proposed pRBFNNs are presented to obtain the solution to high-dimensional pattern recognition problem. First, in preprocessing part, we use a CCD camera to obtain a picture frame in real-time. By using histogram equalization method, we can partially enhance the distorted image influenced by natural as well as artificial illumination. We use an AdaBoost algorithm proposed by Viola and Jones, which is exploited for the detection of facial image area between face and non-facial image area. As the feature extraction algorithm, PCA method is used. In this study, the PCA method, which is a feature extraction algorithm, is used to carry out the dimension reduction of facial image area formed by high-dimensional information. Secondly, we use pRBFNNs to identify the ID by recognizing unique pattern of each person. The proposed pRBFNNs architecture consists of three functional modules such as the condition part, the conclusion part, and the inference part as fuzzy rules formed in 'If-then' format. In the condition part of fuzzy rules, input space is partitioned with Fuzzy C-Means clustering. In the conclusion part of rules, the connection weight of pRBFNNs is represented as three kinds of polynomials such as constant, linear, and quadratic. Coefficients of connection weight identified with back-propagation using gradient descent method. The output of pRBFNNs model is obtained by fuzzy inference method in the inference part of fuzzy rules. The essential design parameters (including learning rate, momentum coefficient and fuzzification coefficient) of the networks are optimized by means of the Particle Swarm Optimization. The proposed pRBFNNs are applied to real-time face recognition system and then demonstrated from the viewpoint of output performance and recognition rate.

Importance of Dynamic Cue in Silhouette-Based Gait Recognition (실루엣 기반 걸음걸이 인식 방법에서 동적 단서의 중요성)

  • Park Hanhoon;Park Jong-Il
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.3 s.303
    • /
    • pp.23-30
    • /
    • 2005
  • As a human identification technique, gait recognition has recently gained significant attention. Silhouette-based gait recognition is one of the most popular methods. This paper aims to investigate features that determine the style of walking in silhouette-based gait recognition. Gait can be represented using two cues: static(shape) cue and dynamic(motion) cue. Most recently, research results have been reported in the literature that the characteristics of gait are mainly determined by static cue but not affected by dynamic cue. Unlike this, experimental results in this paper verifies that dynamic cue is as important as and in many cases more important than static cue. For experiments, we use two well-blown gait databases: UBC DB and Southampton Small DB. The images of UBC DB correspond to the 'ordinary' style of walking. The images of Southampton Small DB correspond to the 'disguised' (not ordinary by wearing special clothes or bags) style of walking. As results of experiments, the recognition rate was 100% by static cue and $95.2\%$ by dynamic cue for the images of UBC DB. For the images of Southampton Small DB, the recognition rate was $50.0\%$ by static cue and $55.8\%$ by dynamic cue. The risk against correct recognition was 0.91 by static cue and 0.97 by dynamic cue for the images of UBC DB. For the images of Southampton Small DB, the risk was 0.98 by static cue and 0.98 by dynamic cue. Consequently, the characteristics of ordinary gait are mainly determined by static cue but that of disguised gait by dynamic cue.

An Implementation of Gaze Direction Recognition System using Difference Image Entropy (차영상 엔트로피를 이용한 시선 인식 시스템의 구현)

  • Lee, Kue-Bum;Chung, Dong-Keun;Hong, Kwang-Seok
    • The KIPS Transactions:PartB
    • /
    • v.16B no.2
    • /
    • pp.93-100
    • /
    • 2009
  • In this paper, we propose a Difference Image Entropy based gaze direction recognition system. The Difference Image Entropy is computed by histogram levels using the acquired difference image of current image and reference images or average images that have peak positions from $-255{\sim}+255$ to prevent information omission. There are two methods about the Difference Image Entropy based gaze direction. 1) The first method is to compute the Difference Image Entropy between an input image and average images of 45 images in each location of gaze, and to recognize the directions of user's gaze. 2) The second method is to compute the Difference Image Entropy between an input image and each 45 reference images, and to recognize the directions of user's gaze. The reference image is created by average image of 45 images in each location of gaze after receiving images of 4 directions. In order to evaluate the performance of the proposed system, we conduct comparison experiment with PCA based gaze direction system. The directions of recognition left-top, right-top, left-bottom, right-bottom, and we make an experiment on that, as changing the part of recognition about 45 reference images or average image. The experimental result shows that the recognition rate of Difference Image Entropy is 97.00% and PCA is 95.50%, so the recognition rate of Difference Image Entropy based system is 1.50% higher than PCA based system.

Building robust Korean speech recognition model by fine-tuning large pretrained model (대형 사전훈련 모델의 파인튜닝을 통한 강건한 한국어 음성인식 모델 구축)

  • Changhan Oh;Cheongbin Kim;Kiyoung Park
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.75-82
    • /
    • 2023
  • Automatic speech recognition (ASR) has been revolutionized with deep learning-based approaches, among which self-supervised learning methods have proven to be particularly effective. In this study, we aim to enhance the performance of OpenAI's Whisper model, a multilingual ASR system on the Korean language. Whisper was pretrained on a large corpus (around 680,000 hours) of web speech data and has demonstrated strong recognition performance for major languages. However, it faces challenges in recognizing languages such as Korean, which is not major language while training. We address this issue by fine-tuning the Whisper model with an additional dataset comprising about 1,000 hours of Korean speech. We also compare its performance against a Transformer model that was trained from scratch using the same dataset. Our results indicate that fine-tuning the Whisper model significantly improved its Korean speech recognition capabilities in terms of character error rate (CER). Specifically, the performance improved with increasing model size. However, the Whisper model's performance on English deteriorated post fine-tuning, emphasizing the need for further research to develop robust multilingual models. Our study demonstrates the potential of utilizing a fine-tuned Whisper model for Korean ASR applications. Future work will focus on multilingual recognition and optimization for real-time inference.

A Study on the Improvement of the Facial Image Recognition by Extraction of Tilted Angle (기울기 검출에 의한 얼굴영상의 인식의 개선에 관한 연구)

  • 이지범;이호준;고형화
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.7
    • /
    • pp.935-943
    • /
    • 1993
  • In this paper, robust recognition system for tilted facial image was developed. At first, standard facial image and lilted facial image are captured by CCTV camera and then transformed into binary image. The binary image is processed in order to obtain contour image by Laplacian edge operator. We trace and delete outermost edge line and use inner contour lines. We label four inner contour lines in order among the inner lines, and then we extract left and right eye with known distance relationship and with two eyes coordinates, and calculate slope information. At last, we rotate the tilted image in accordance with slope information and then calculate the ten distance features between element and element. In order to make the system invariant to image scale, we normalize these features with distance between left and righ eye. Experimental results show 88% recognition rate for twenty five face images when tilted degree is considered and 60% recognition rate when tilted degree is not considered.

  • PDF

CNN-based Image Rotation Correction Algorithm to Improve Image Recognition Rate (이미지 인식률 개선을 위한 CNN 기반 이미지 회전 보정 알고리즘)

  • Lee, Donggu;Sun, Young-Ghyu;Kim, Soo-Hyun;Sim, Issac;Lee, Kye-San;Song, Myoung-Nam;Kim, Jin-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.1
    • /
    • pp.225-229
    • /
    • 2020
  • Recently, convolutional neural network (CNN) have been showed outstanding performance in the field of image recognition, image processing and computer vision, etc. In this paper, we propose a CNN-based image rotation correction algorithm as a solution to image rotation problem, which is one of the factors that reduce the recognition rate in image recognition system using CNN. In this paper, we trained our deep learning model with Leeds Sports Pose dataset to extract the information of the rotated angle, which is randomly set in specific range. The trained model is evaluated with mean absolute error (MAE) value over 100 test data images, and it is obtained 4.5951.

Sound Model Generation using Most Frequent Model Search for Recognizing Animal Vocalization (최대 빈도모델 탐색을 이용한 동물소리 인식용 소리모델생성)

  • Ko, Youjung;Kim, Yoonjoong
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.1
    • /
    • pp.85-94
    • /
    • 2017
  • In this paper, I proposed a sound model generation and a most frequent model search algorithm for recognizing animal vocalization. The sound model generation algorithm generates a optimal set of models through repeating processes such as the training process, the Viterbi Search process, and the most frequent model search process while adjusting HMM(Hidden Markov Model) structure to improve global recognition rate. The most frequent model search algorithm searches the list of models produced by Viterbi Search Algorithm for the most frequent model and makes it be the final decision of recognition process. It is implemented using MFCC(Mel Frequency Cepstral Coefficient) for the sound feature, HMM for the model, and C# programming language. To evaluate the algorithm, a set of animal sounds for 27 species were prepared and the experiment showed that the sound model generation algorithm generates 27 HMM models with 97.29 percent of recognition rate.