• Title/Summary/Keyword: recognition-rate

Search Result 2,809, Processing Time 0.029 seconds

Building a Korean conversational speech database in the emergency medical domain (응급의료 영역 한국어 음성대화 데이터베이스 구축)

  • Kim, Sunhee;Lee, Jooyoung;Choi, Seo Gyeong;Ji, Seunghun;Kang, Jeemin;Kim, Jongin;Kim, Dohee;Kim, Boryong;Cho, Eungi;Kim, Hojeong;Jang, Jeongmin;Kim, Jun Hyung;Ku, Bon Hyeok;Park, Hyung-Min;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.81-90
    • /
    • 2020
  • This paper describes a method of building Korean conversational speech data in the emergency medical domain and proposes an annotation method for the collected data in order to improve speech recognition performance. To suggest future research directions, baseline speech recognition experiments were conducted by using partial data that were collected and annotated. All voices were recorded at 16-bit resolution at 16 kHz sampling rate. A total of 166 conversations were collected, amounting to 8 hours and 35 minutes. Various information was manually transcribed such as orthography, pronunciation, dialect, noise, and medical information using Praat. Baseline speech recognition experiments were used to depict problems related to speech recognition in the emergency medical domain. The Korean conversational speech data presented in this paper are first-stage data in the emergency medical domain and are expected to be used as training data for developing conversational systems for emergency medical applications.

Robust Speech Recognition Algorithm of Voice Activated Powered Wheelchair for Severely Disabled Person (중증 장애우용 음성구동 휠체어를 위한 강인한 음성인식 알고리즘)

  • Suk, Soo-Young;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.6
    • /
    • pp.250-258
    • /
    • 2007
  • Current speech recognition technology s achieved high performance with the development of hardware devices, however it is insufficient for some applications where high reliability is required, such as voice control of powered wheelchairs for disabled persons. For the system which aims to operate powered wheelchairs safely by voice in real environment, we need to consider that non-voice commands such as user s coughing, breathing, and spark-like mechanical noise should be rejected and the wheelchair system need to recognize the speech commands affected by disability, which contains specific pronunciation speed and frequency. In this paper, we propose non-voice rejection method to perform voice/non-voice classification using both YIN based fundamental frequency(F0) extraction and reliability in preprocessing. We adopted a multi-template dictionary and acoustic modeling based speaker adaptation to cope with the pronunciation variation of inarticulately uttered speech. From the recognition tests conducted with the data collected in real environment, proposed YIN based fundamental extraction showed recall-precision rate of 95.1% better than that of 62% by cepstrum based method. Recognition test by a new system applied with multi-template dictionary and MAP adaptation also showed much higher accuracy of 99.5% than that of 78.6% by baseline system.

Development of a Ship's Logbook Data Extraction Model Using OCR Program (OCR 프로그램을 활용한 선박 항해일지 데이터 추출 모델 개발)

  • Dain Lee;Sung-Cheol Kim;Ik-Hyun Youn
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.30 no.1
    • /
    • pp.97-107
    • /
    • 2024
  • Despite the rapid advancement in image recognition technology, achieving perfect digitization of tabular documents and handwritten documents still challenges. The purpose of this study is to improve the accuracy of digitizing the logbook by correcting errors by utilizing associated rules considered during logbook entries. Through this, it is expected to enhance the accuracy and reliability of data extracted from logbook through OCR programs. This model is to improve the accuracy of digitizing the logbook of the training ship "Saenuri" at the Mokpo Maritime University by correcting errors identified after Optical Character Recognition (OCR) program recognition. The model identified and corrected errors by utilizing associated rules considered during logbook entries. To evaluate the effect of model, the data before and after correction were divided by features, and comparisons were made between the same sailing number and the same feature. Using this model, approximately 10.6% of errors out of the total estimated error rate of about 11.8% were identified, and 56 out of 123 errors were corrected. A limitation of this study is that it only focuses on information from Dist.Run to Stand Course sections of the logbook, which contain navigational information. Future research will aim to correct more information from the logbook, including weather information, to overcome this limitation.

Novel License Plate Detection Method Based on Heuristic Energy

  • Sarker, Md.Mostafa Kamal;Yoon, Sook;Lee, Jaehwan;Park, Dong Sun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.12
    • /
    • pp.1114-1125
    • /
    • 2013
  • License Plate Detection (LPD) is a key component in automatic license plate recognition system. Despite the success of License Plate Recognition (LPR) methods in the past decades, the problem is quite a challenge due to the diversity of plate formats and multiform outdoor illumination conditions during image acquisition. This paper aims at automatical detection of car license plates via image processing techniques. In this paper, we proposed a real-time and robust method for license plate detection using Heuristic Energy Map(HEM). In the vehicle image, the region of license plate contains many components or edges. We obtain the edge energy values of an image by using the box filter and search for the license plate region with high energy values. Using this energy value information or Heuristic Energy Map(HEM), we can easily detect the license plate region from vehicle image with a very high possibilities. The proposed method consists two main steps: Region of Interest (ROI) Detection and License Plate Detection. This method has better performance in speed and accuracy than the most of existing methods used for license plate detection. The proposed method can detect a license plate within 130 milliseconds and its detection rate is 99.2% on a 3.10-GHz Intel Core i3-2100(with 4.00 GB of RAM) personal computer.

A Microphone Array Beamformer for the Performance Enhancement of Speech Recognizer in Car (차량환경에서 음성인식 성능 향상을 위한 마이크로폰 어레이 빔형성 기법)

  • Han Chul-Hee;Kang Hong-Goo;Hwang Youngsoo;Youn Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.7
    • /
    • pp.423-430
    • /
    • 2005
  • In this paper. a microphone array beamforming algorithm that reduces the signal distortion caused by reverberation and near-field effect in car environment is proposed. When reverberation or near-field effect is present, an optimum beamformer should be constructed with a steering vector consisting of transfer functions between source and microphones, but it is generally difficult to estimate transfer functions on-line without knowledge of the source signal. Instead, a sub-optimal beamforming algorithm that reduces signal distortion is proposed. It is constructed with steering vectors consisting of relative transfer functions between reference sensor and other sensors. In order to evaluate the performance of the proposed algorithm. we had recorded noisy speech database in a car, and performed speech recognition experiments with HMM Toolkit (HTK) released by Cambridge University. The recognition rate of the proposed algorithm was 15 percents higher than that of the conventional far-field beamformers in best case.

A Study on the Channel Normalized Pitch Synchronous Cepstrum for Speaker Recognition (채널에 강인한 화자 인식을 위한 채널 정규화 피치 동기 켑스트럼에 관한 연구)

  • 김유진;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.61-74
    • /
    • 2004
  • In this paper, a contort- and speaker-dependent cepstrum extraction method and a channel normalization method for minimizing the loss of speaker characteristics in the cepstrum were proposed for a robust speaker recognition system over the channel. The proposed extraction method creates a cepstrum based on the pitch synchronous analysis using the inherent pitch of the speaker. Therefore, the cepstrum called the 〃pitch synchronous cepstrum〃 (PSC) represents the impulse response of the vocal tract more accurately in voiced speech. And the PSC can compensate for channel distortion because the pitch is more robust in a channel environment than the spectrum of speech. And the proposed channel normalization method, the 〃formant-broadened pitch synchronous CMS〃 (FBPSCMS), applies the Formant-Broadened CMS to the PSC and improves the accuracy of the intraframe processing. We compared the text-independent closed-set speaker identification on 56 females and 112 males using TIMIT and NTIMIT database, respectively. The results show that pitch synchronous km improves the error reduction rate by up to 7.7% in comparison with conventional short-time cepstrum and the error rates of the FBPSCMS are more stable and lower than those of pole-filtered CMS.

Face Recognition based on Weber Symmetrical Local Graph Structure

  • Yang, Jucheng;Zhang, Lingchao;Wang, Yuan;Zhao, Tingting;Sun, Wenhui;Park, Dong Sun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.4
    • /
    • pp.1748-1759
    • /
    • 2018
  • Weber Local Descriptor (WLD) is a stable and effective feature extraction algorithm, which is based on Weber's Law. It calculates the differential excitation information and direction information, and then integrates them to get the feature information of the image. However, WLD only considers the center pixel and its contrast with its surrounding pixels when calculating the differential excitation information. As a result, the illumination variation is relatively sensitive, and the selection of the neighbor area is rather small. This may make the whole information is divided into small pieces, thus, it is difficult to be recognized. In order to overcome this problem, this paper proposes Weber Symmetrical Local Graph Structure (WSLGS), which constructs the graph structure based on the $5{\times}5$ neighborhood. Then the information obtained is regarded as the differential excitation information. Finally, we demonstrate the effectiveness of our proposed method on the database of ORL, JAFFE and our own built database, high-definition infrared faces. The experimental results show that WSLGS provides higher recognition rate and shorter image processing time compared with traditional algorithms.

A Study on Vision Based Gesture Recognition Interface Design for Digital TV (동작인식기반 Digital TV인터페이스를 위한 지시동작에 관한 연구)

  • Kim, Hyun-Suk;Hwang, Sung-Won;Moon, Hyun-Jung
    • Archives of design research
    • /
    • v.20 no.3 s.71
    • /
    • pp.257-268
    • /
    • 2007
  • The development of Human Computer Interface has been relied on the development of technology. Mice and keyboards are the most popular HCI devices for personal computing. However, device-based interfaces are quite different from human to human interaction and very artificial. To develop more intuitive interfaces which mimic human to human interface has been a major research topic among HCI researchers and engineers. Also, technology in the TV industry has rapidly developed and the market penetration rate for big size screen TVs has increased rapidly. The HDTV and digital TV broadcasting are being tested. These TV environment changes require changes of Human to TV interface. A gesture recognition-based interface with a computer vision system can replace the remote control-based interface because of its immediacy and intuitiveness. This research focuses on how people use their hands or arms for command gestures. A set of gestures are sampled to control TV set up by focus group interviews and surveys. The result of this paper can be used as a reference to design a computer vision based TV interface.

  • PDF

The Effect of Subjective Body Type Recognition on Weight Change in Women with Normal BMI (체질 량 지수가 정상인 여성의 주관적 체형 인식이 체중변화에 미치는 영향)

  • Park, Seo-Yeon
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.4
    • /
    • pp.313-320
    • /
    • 2018
  • This study was conducted to establish the effects of subjective body-forming perception on weight-controlling behavior and weight loss, and to suggest the need for proper information and education. Using the data from the National Health and Nutrition Examination Survey conducted in the 6th period (2013-2015), 6,238 women aged 19 and over who have a body mass index of $18.5-25kg/m^2$ were analyzed. As a result, the higher the level of education and income level, the higher they were perceived to be obese, the more they chose to lose exercise and diet in order to lose weight across the board. The weight loss effort rate was higher in the obese body type recognition group, but the body weight type was the highest in the one year body weight change group (p < .001). As a result, subjective perception of body shape affected not only weight control behavior but also weight change (p < .001). Accordingly, it is necessary to have a systematic education on healthy weight-controlling behaviors, proper body image, and healthy body type recognition.

Motion Recognitions Based on Local Basis Images Using Independent Component Analysis (독립성분분석을 이용한 국부기저영상 기반 동작인식)

  • Cho, Yong-Hyun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.5
    • /
    • pp.617-623
    • /
    • 2008
  • This paper presents a human motion recognition method using both centroid shift and local basis images. The centroid shift based on 1st moment balance technique is applied to get the robust motion images against position or size changes, the extraction of local basis images based on independent component analysis(ICA) is also applied to find a set of statistically independent motion features, which is included in each motions. Especially, ICA of fixed-point(FP) algorithm based on Newton method is used for being quick to extract a local basis images of motions. The proposed method has been applied to the problem for recognizing the 160(1 person * 10 animals * 16 motions) sign language motion images of 240*215 pixels. The 3 distances such as city-block, Euclidean, negative angle are used as measures when match the probe images to the nearest gallery images. The experimental results show that the proposed method has a superior recognition performances(speed, rate) than the method using local eigen images and the method using local basis images without centroid shift respectively.