Search | Korea Science

Fast offline transformer-based end-to-end automatic speech recognition for real-world applications

Oh, Yoo Rhee;Park, Kiyoung;Park, Jeon Gue
- ETRI Journal
- /
- v.44 no.3
- /
- pp.476-490
- /
- 2022
With the recent advances in technology, automatic speech recognition (ASR) has been widely used in real-world applications. The efficiency of converting large amounts of speech into text accurately with limited resources has become more vital than ever. In this study, we propose a method to rapidly recognize a large speech database via a transformer-based end-to-end model. Transformers have improved the state-of-the-art performance in many fields. However, they are not easy to use for long sequences. In this study, various techniques to accelerate the recognition of real-world speeches are proposed and tested, including decoding via multiple-utterance-batched beam search, detecting end of speech based on a connectionist temporal classification (CTC), restricting the CTC-prefix score, and splitting long speeches into short segments. Experiments are conducted with the Librispeech dataset and the real-world Korean ASR tasks to verify the proposed methods. From the experiments, the proposed system can convert 8 h of speeches spoken at real-world meetings into text in less than 3 min with a 10.73% character error rate, which is 27.1% relatively lower than that of conventional systems.
https://doi.org/10.4218/etrij.2021-0106 인용 PDF KSCI

An Improved LBP-based Facial Expression Recognition through Optimization of Block Weights (블록가중치의 최적화를 통해 개선된 LBP기반의 표정인식)

Park, Seong-Chun;Koo, Ja-Young
- Journal of the Korea Society of Computer and Information
- /
- v.14 no.11
- /
- pp.73-79
- /
- 2009
In this paper, a method is proposed that enhances the performance of the facial expression recognition using template matching of Local Binary Pattern(LBP) histogram. In this method, the face image is segmented into blocks, and the LBP histogram is constructed to be used as the feature of the block. Block dissimilarity is calculated between a block of input image and the corresponding block of the model image. Image dissimilarity is defined as the weighted sum of the block dissimilarities. In conventional methods, the block weights are assigned by intuition. In this paper a new method is proposed that optimizes the weights from training samples. An experiment shows the recognition rate is enhanced by the proposed method.
https://doi.org/10.9708/jksci.2009.14.11.073 인용 PDF

The Design of Keyword Spotting System based on Auditory Phonetical Knowledge-Based Phonetic Value Classification (청음 음성학적 지식에 기반한 음가분류에 의한 핵심어 검출 시스템 구현)

Kim, Hack-Jin;Kim, Soon-Hyub
- The KIPS Transactions:PartB
- /
- v.10B no.2
- /
- pp.169-178
- /
- 2003
This study outlines two viewpoints the classification of phone likely unit (PLU) which is the foundation of korean large vocabulary speech recognition, and the effectiveness of Chiljongseong (7 Final Consonants) and Paljogseong (8 Final Consonants) of the korean language. The phone likely classifies the phoneme phonetically according to the location of and method of articulation, and about 50 phone-likely units are utilized in korean speech recognition. In this study auditory phonetical knowledge was applied to the classification of phone likely unit to present 45 phone likely unit. The vowels 'ㅔ, ㅐ'were classified as phone-likely of (ee) ; 'ㅒ, ㅖ' as [ye] ; and 'ㅚ, ㅙ, ㅞ' as [we]. Secondly, the Chiljongseong System of the draft for unified spelling system which is currently in use and the Paljongseonggajokyong of Korean script haerye were illustrated. The question on whether the phonetic value on 'ㄷ' and 'ㅅ' among the phonemes used in the final consonant of the korean fan guage is the same has been argued in the academic world for a long time. In this study, the transition stages of Korean consonants were investigated, and Ciljonseeng and Paljongseonggajokyong were utilized in speech recognition, and its effectiveness was verified. The experiment was divided into isolated word recognition and speech recognition, and in order to conduct the experiment PBW452 was used to test the isolated word recognition. The experiment was conducted on about 50 men and women - divided into 5 groups - and they vocalized 50 words each. As for the continuous speech recognition experiment to be utilized in the materialized stock exchange system, the sentence corpus of 71 stock exchange sentences and speech corpus vocalizing the sentences were collected and used 5 men and women each vocalized a sentence twice. As the result of the experiment, when the Paljongseonggajokyong was used as the consonant, the recognition performance elevated by an average of about 1.45% : and when phone likely unit with Paljongseonggajokyong and auditory phonetic applied simultaneously, was applied, the rate of recognition increased by an average of 1.5% to 2.02%. In the continuous speech recognition experiment, the recognition performance elevated by an average of about 1% to 2% than when the existing 49 or 56 phone likely units were utilized.
https://doi.org/10.3745/KIPSTB.2003.10B.2.169 인용 PDF KSCI

Adaptable Center Detection of a Laser Line with a Normalization Approach using Hessian-matrix Eigenvalues

Xu, Guan;Sun, Lina;Li, Xiaotao;Su, Jian;Hao, Zhaobing;Lu, Xue
- Journal of the Optical Society of Korea
- /
- v.18 no.4
- /
- pp.317-329
- /
- 2014
In vision measurement systems based on structured light, the key point of detection precision is to determine accurately the central position of the projected laser line in the image. The purpose of this research is to extract laser line centers based on a decision function generated to distinguish the real centers from candidate points with a high recognition rate. First, preprocessing of an image adopting a difference image method is conducted to realize image segmentation of the laser line. Second, the feature points in an integral pixel level are selected as the initiating light line centers by the eigenvalues of the Hessian matrix. Third, according to the light intensity distribution of a laser line obeying a Gaussian distribution in transverse section and a constant distribution in longitudinal section, a normalized model of Hessian matrix eigenvalues for the candidate centers of the laser line is presented to balance reasonably the two eigenvalues that indicate the variation tendencies of the second-order partial derivatives of the Gaussian function and constant function, respectively. The proposed model integrates a Gaussian recognition function and a sinusoidal recognition function. The Gaussian recognition function estimates the characteristic that one eigenvalue approaches zero, and enhances the sensitivity of the decision function to that characteristic, which corresponds to the longitudinal direction of the laser line. The sinusoidal recognition function evaluates the feature that the other eigenvalue is negative with a large absolute value, making the decision function more sensitive to that feature, which is related to the transverse direction of the laser line. In the proposed model the decision function is weighted for higher values to the real centers synthetically, considering the properties in the longitudinal and transverse directions of the laser line. Moreover, this method provides a decision value from 0 to 1 for arbitrary candidate centers, which yields a normalized measure for different laser lines in different images. The normalized results of pixels close to 1 are determined to be the real centers by progressive scanning of the image columns. Finally, the zero point of a second-order Taylor expansion in the eigenvector's direction is employed to refine further the extraction results of the central points at the subpixel level. The experimental results show that the method based on this normalization model accurately extracts the coordinates of laser line centers and obtains a higher recognition rate in two group experiments.
https://doi.org/10.3807/JOSK.2014.18.4.317 인용 PDF KSCI

Design of Curve Road Detection System by Convergence of Sensor (센서 융합에 의한 곡선차선 검출 시스템 설계)

Kim, Gea-Hee;Jeong, Seon-Mi;Mun, Hyung-Jin;Kim, Chang-Geun
- Journal of Digital Convergence
- /
- v.14 no.8
- /
- pp.253-259
- /
- 2016
Regarding the research on lane recognition, continuous studies have been in progress for vehicles to navigate autonomously and to prevent traffic accidents, and lane recognition and detection have remarkably developed as different algorithms have appeared recently. Those studies were based on vision system and the recognition rate was improved. However, in case of driving at night or in rain, the recognition rate has not met the level at which it is satisfactory. Improving the weakness of the vision system-based lane recognition and detection, applying sensor convergence technology for the response after accident happened, among studies on lane detection, the study on the curve road detection was conducted. It proceeded to study on the curve road detection among studies on the lane recognition. In terms of the road detection, not only a straight road but also a curve road should be detected and it can be used in investigation on traffic accidents. Setting the threshold value of curvature from 0.001 to 0.06 showing the degree of the curve, it presented that it is able to compute the curve road.
https://doi.org/10.14400/JDC.2016.14.8.253 인용 PDF KSCI

An Analysis of Recognition on Personal Information Protection among Healthcare Administration Students in the Information Society (정보사회에서 보건행정 전공 대학생들의 개인정보보호에 대한 인지 분석)

Kim, Ji-On;Park, Ji-Kyeong
- Journal of Digital Convergence
- /
- v.12 no.5
- /
- pp.325-334
- /
- 2014
The purpose of this study was to examine the recognition of health administration students on personal information protection in an effort to be of use for raising awareness of personal information protection in students and for having them practice it in the right way. The subjects in this study were 687 college students who majored in health administration. A survey was conducted from December 3, 2012, to June 21, 2013. As a result, it's found that just 17.2 percent were cognizant of the personal information protection act. As for recognition and practice of personal information protection domain, the students who were aware of the personal information protection act significantly excelled the others who weren't in every area of recognition and practice, and there was a positive correlation between the level of practice, as better recognition led to better practice. The awareness rate of information for a personal identification stood at 57.0 percent, and the awareness rate of personal information to be managed stood at 53.7 percent, which were both at an intermediate level. To raise awareness of the personal information protection act in health administration students, a course that can deal with this act should separately be offered so that they could have the right understanding of personal information protection and practice it properly.
https://doi.org/10.14400/JDC.2014.12.5.325 인용 PDF KSCI

Feature-based Image Analysis for Object Recognition on Satellite Photograph (인공위성 영상의 객체인식을 위한 영상 특징 분석)

Lee, Seok-Jun;Jung, Soon-Ki
- Journal of the HCI Society of Korea
- /
- v.2 no.2
- /
- pp.35-43
- /
- 2007
This paper presents a system for image matching and recognition based on image feature detection and description techniques from artificial satellite photographs. We propose some kind of parameters from the varied environmental elements happen by image handling process. The essential point of this experiment is analyzes that affects match rate and recognition accuracy when to change of state of each parameter. The proposed system is basically inspired by Lowe's SIFT(Scale-Invariant Transform Feature) algorithm. The descriptors extracted from local affine invariant regions are saved into database, which are defined by k-means performed on the 128-dimensional descriptor vectors on an artificial satellite photographs from Google earth. And then, a label is attached to each cluster of the feature database and acts as guidance for an appeared building's information in the scene from camera. This experiment shows the various parameters and compares the affected results by changing parameters for the process of image matching and recognition. Finally, the implementation and the experimental results for several requests are shown.
PDF

Sign Language recognition Using Sequential Ram-based Cumulative Neural Networks (순차 램 기반 누적 신경망을 이용한 수화 인식)

Lee, Dong-Hyung;Kang, Man-Mo;Kim, Young-Kee;Lee, Soo-Dong
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.9 no.5
- /
- pp.205-211
- /
- 2009
The Weightless Neural Network(WNN) has the advantage of the processing speed, less computability than weighted neural network which readjusts the weight. Especially, The behavior information such as sequential gesture has many serial correlation. So, It is required the high computability and processing time to recognize. To solve these problem, Many algorithms used that added preprocessing and hardware interface device to reduce the computability and speed. In this paper, we proposed the Ram based Sequential Cumulative Neural Network(SCNN) model which is sign language recognition system without preprocessing and hardware interface. We experimented with using compound words in continuous korean sign language which was input binary image with edge detection from camera. The recognition system of sign language without preprocessing got 93% recognition rate.
PDF

sEMG Signal based Gait Phase Recognition Method for Selecting Features and Channels Adaptively (적응적으로 특징과 채널을 선택하는 sEMG 신호기반 보행단계 인식기법)

Ryu, J.H.;Kim, D.H.
- Journal of rehabilitation welfare engineering & assistive technology
- /
- v.7 no.2
- /
- pp.19-26
- /
- 2013
This paper propose a surface EMG signal based gait phase recognition method that selects features and channels adaptively. The proposed method can be used to control powered artificial prosthetic for lower limb amputees and can reduce overhead in real-time pattern recognition by selecting adaptive channels and features in an embedded device. The method can enhance the classification accuracy by adaptively selecting channels and features based on sensitivity and specificity of each subject because EMG signal patterns may vary according to subject's locomotion convention. In the experiments, we found that the muscles with highest recognition rate are different between human subjects. The results also show that the average accuracy of the proposed method is about 91% whereas those of existing methods using all channels and/or features is about 50%. Therefore we assure that sEMG signal based gait phase recognition using small number of adaptive muscles and corresponding features can be applied to control powered artificial prosthetic for lower limb amputees.
PDF

Motion-Understanding Cell Phones for Intelligent User Interaction and Entertainment (지능형 UI와 Entertainment를 위한 동작 이해 휴대기기)

Cho, Sung-Jung;Choi, Eun-Seok;Bang, Won-Chul;Yang, Jing;Cho, Joon-Kee;Ki, Eun-Kwang;Sohn, Jun-Il;Kim, Dong-Yoon;Kim, Sang-Ryong
- 한국HCI학회:학술대회논문집
- /
- 2006.02a
- /
- pp.684-691
- /
- 2006
As many functionalities such as cameras and MP3 players are converged to mobile phones, more intuitive and interesting interaction methods are essential. In this paper, we present applications and their enabling technologies for gesture interactive cell phones. They employ gesture recognition and real-time shake detection algorithm for supporting motion-based user interface and entertainment applications respectively. The gesture recognition algorithm classifies users' movement into one of predefined gestures by modeling basic components of acceleration signals and their relationships. The recognition performance is further enhanced by discriminating frequently confusing classes with support vector machines. The shake detection algorithm detects in real time the exact motion moment when the phone is shaken significantly by utilizing variance and mean of acceleration signals. The gesture interaction algorithms show reliable performance for commercialization; with 100 novice users, the average recognition rate was 96.9% on 11 gestures (digits 1-9, O, X) and users' movements were detected in real time. We have applied the motion understanding technologies to Samsung cell phones in Korean, American, Chinese and European markets since May 2005.
PDF

Search Result 2,809, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)