• Title/Summary/Keyword: Voice recognition rate

Search Result 137, Processing Time 0.026 seconds

Automatic Detection of Korean Accentual Phrase Boundaries

  • Lee, Ki-Yeong;Song, Min-Suck
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1E
    • /
    • pp.27-31
    • /
    • 1999
  • Recent linguistic researches have brought into focus the relations between prosodic structures and syntactic, semantic or phonological structures. Most of them prove that prosodic information is available for understanding syntactic, semantic and discourse structures. But this result has not been integrated yet into recent Korean speech recognition or understanding systems. This study, as a part of integrating prosodic information into the speech recognition system, proposes an automatic detection technique of Korean accentual phrase boundaries by using one-stage DP, and the normalized pitch pattern. For making the normalized pitch pattern, this study proposes a method of modified normalization for Korean spoken language. For the experiment, this study employs 192 sentential speech data of 12 men's voice spoken in standard Korean, in which 720 accentual phrases are included, and 74.4% of the accentual phrase boundaries are correctly detected while 14.7% are the false detection rate.

  • PDF

Classification of Three Different Emotion by Physiological Parameters

  • Jang, Eun-Hye;Park, Byoung-Jun;Kim, Sang-Hyeob;Sohn, Jin-Hun
    • Journal of the Ergonomics Society of Korea
    • /
    • v.31 no.2
    • /
    • pp.271-279
    • /
    • 2012
  • Objective: This study classified three different emotional states(boredom, pain, and surprise) using physiological signals. Background: Emotion recognition studies have tried to recognize human emotion by using physiological signals. It is important for emotion recognition to apply on human-computer interaction system for emotion detection. Method: 122 college students participated in this experiment. Three different emotional stimuli were presented to participants and physiological signals, i.e., EDA(Electrodermal Activity), SKT(Skin Temperature), PPG(Photoplethysmogram), and ECG (Electrocardiogram) were measured for 1 minute as baseline and for 1~1.5 minutes during emotional state. The obtained signals were analyzed for 30 seconds from the baseline and the emotional state and 27 features were extracted from these signals. Statistical analysis for emotion classification were done by DFA(discriminant function analysis) (SPSS 15.0) by using the difference values subtracting baseline values from the emotional state. Results: The result showed that physiological responses during emotional states were significantly differed as compared to during baseline. Also, an accuracy rate of emotion classification was 84.7%. Conclusion: Our study have identified that emotions were classified by various physiological signals. However, future study is needed to obtain additional signals from other modalities such as facial expression, face temperature, or voice to improve classification rate and to examine the stability and reliability of this result compare with accuracy of emotion classification using other algorithms. Application: This could help emotion recognition studies lead to better chance to recognize various human emotions by using physiological signals as well as is able to be applied on human-computer interaction system for emotion recognition. Also, it can be useful in developing an emotion theory, or profiling emotion-specific physiological responses as well as establishing the basis for emotion recognition system in human-computer interaction.

On Pattern Kernel with Multi-Resolution Architecture for a Lip Print Recognition (구순문 인식을 위한 복수 해상도 시스템의 패턴 커널에 관한 연구)

  • 김진옥;황대준;백경석;정진현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.12A
    • /
    • pp.2067-2073
    • /
    • 2001
  • Biometric systems are forms of technology that use unique human physical characteristics to automatically identify a person. They have sensors to pick up some physical characteristics, convert them into digital patterns, and compare them with patterns stored for individual identification. However, lip-print recognition has been less developed than recognition of other human physical attributes such as the fingerprint, voice patterns, retinal at blood vessel patterns, or the face. The lip print recognition by a CCD camera has the merit of being linked with other recognition systems such as the retinal/iris eye and the face. A new method using multi-resolution architecture is proposed to recognize a lip print from the pattern kernels. A set of pattern kernels is a function of some local lip print masks. This function converts the information from a lip print into digital data. Recognition in the multi-resolution system is more reliable than recognition in the single-resolution system. The multi-resolution architecture allows us to reduce the false recognition rate from 15% to 4.7%. This paper shows that a lip print is sufficiently used by the measurements of biometric systems.

  • PDF

A Study on Comparison of Pronunciation Accuracy of Soprano Singers

  • Song, Uk-Jin;Park, Hyungwoo;Bae, Myung-Jin
    • International journal of advanced smart convergence
    • /
    • v.6 no.2
    • /
    • pp.59-64
    • /
    • 2017
  • There are three sorts of voices of female vocalists: soprano, mezzo-soprano, and contralto according to the transliteration. Among them, the soprano has the highest vocal range. Since the voice is generated through the human vocal tract based on the voice generation model, it is greatly influenced by the vocal tract. The structure of vocal organs differs from person to person, and the formants characteristic of vocalization differ accordingly. The formant characteristic refers to a characteristic in which a specific frequency band appears distinctly due to resonance occurring in each vocal tract in the vocal process. Formant characteristics include personality that occurs in the throat, jaw, lips, and teeth, as well as phonological properties of phonemes. The first formant is the throat, the second formant is the jaw, the third formant and the fourth formant are caused by the resonance phenomenon in the lips and the teeth. Among them, pronunciation is influenced not only by phonological information but also by jaws, lips and teeth. When the mouth is small or the jaw is stiff when pronouncing, pronunciation becomes unclear. Therefore, the higher the accuracy of the pronunciation characteristics, the more clearly the formant characteristics appear in the grammar spectrum. However, many soprano singers can not open their mouths because their jaws, lips, teeth, and facial muscles are rigid to maintain high tones when singing, which makes the pronunciation unclear and thus the formant characteristics become unclear. In this paper, in order to confirm the accuracy of the pronunciation characteristics of soprano singers, the experimental group was selected as the soprano singers A, B, C, D, E of Korea and analyzed the grammar spectrum and conducted the MOS test for pronunciation recognition. As a result, soprano singer B showed a clear recognition from F1 to F5 and MOS test result showed the highest recognition rate with 4.6 points. Soprano singers A, C, and D appear from F1 to F3, but it was difficult to find formants above 2kHz. Finally, the soprano singer E had difficulty in finding the formant as a whole, and MOS test showed the lowest recognition rate at 2.1 points. Therefore, we confirmed that the soprano singer B, which exhibits the most distinct formant characteristics in the grammar spectrum, has the best pronunciation accuracy.

On the speaker's position estimation using TDOA algorithm in vehicle environments (자동차 환경에서 TDOA를 이용한 화자위치추정 방법)

  • Lee, Sang-Hun;Choi, Hong-Sub
    • Journal of Digital Contents Society
    • /
    • v.17 no.2
    • /
    • pp.71-79
    • /
    • 2016
  • This study is intended to compare the performances of sound source localization methods used for stable automobile control by improving voice recognition rate in automobile environment and suggest how to improve their performances. Generally, sound source location estimation methods employ the TDOA algorithm, and there are two ways for it; one is to use a cross correlation function in the time domain, and the other is GCC-PHAT calculated in the frequency domain. Among these ways, GCC-PHAT is known to have stronger characteristics against echo and noise than the cross correlation function. This study compared the performances of the two methods above in automobile environment full of echo and vibration noise and suggested the use of a median filter additionally. We found that median filter helps both estimation methods have good performances and variance values to be decreased. According to the experimental results, there is almost no difference in the two methods' performances in the experiment using voice; however, using the signal of a song, GCC-PHAT is 10% more excellent than the cross correlation function in terms of the recognition rate. Also, when the median filter was added, the cross correlation function's recognition rate could be improved up to 11%. And in regarding to variance values, both methods showed stable performances.

Efficient Iris Recognition using Deep-Learning Convolution Neural Network (딥러닝 합성곱 신경망을 이용한 효율적인 홍채인식)

  • Choi, Gwang-Mi;Jeong, Yu-Jeong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.3
    • /
    • pp.521-526
    • /
    • 2020
  • This paper presents an improved HOLP neural network that adds 25 average values to a typical HOLP neural network using 25 feature vector values as input values by applying high-order local autocorrelation function, which is excellent for extracting immutable feature values of iris images. Compared with deep learning structures with different types, we compared the recognition rate of iris recognition using Back-Propagation neural network, which shows excellent performance in voice and image field, and synthetic product neural network that integrates feature extractor and classifier.

Design of a User authentication Protocol Using Face Information (얼굴정보를 이용한 사용자 인증 프로토콜 설계)

  • 지은미
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.1
    • /
    • pp.157-166
    • /
    • 2004
  • Consequently substantial research has been done on the development of the bio-metric recognition method as well as technical research in the field of authentication. As a method of bio-metric recognition, personal and unique information such as fingerprints, voice, face, Iris, hand-geometry and vein-pattern are used. The face image system in bio-metric recognition and information authentication reduces the denial response from the users because it is a non-contact system the face image system operates through a PC camera attached to a computer base this makes the system economically viable as well as user friendly. Conversely, the face image system is very sensitive to illumination, hair style and appearance and consequently creates recognition errors easily, therefore we must build a stable authentication system which is not too sensitive to changes in appearance and light. In this study, I proposed user authentication protocol to serve a confidentiality and integrity and to obtain a least Equal Error Rate to minimize the wrong authentication rate when it authenticates the user.

  • PDF

Korean Digit Speech Recognition Dialing System using Filter Bank (필터뱅크를 이용한 한국어 숫자음 인식 다이얼링 시스템)

  • 박기영;최형기;김종교
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.37 no.5
    • /
    • pp.62-70
    • /
    • 2000
  • In this study, speech recognition for Korean digit is performed using filter bank which is programmed discrete HMM and DTW. Spectral analysis reveals speech signal features which are mainly due to the shape of the vocal tract. And spectral feature of speech are generally obtained as the exit of filter banks, which properly integrated a spectrum at defined frequency ranges. A set of 8 band pass filters is generally used since it simulates human ear processing. And defined frequency ranges are 320-330, 450-460, 640-650, 840-850, 900-1000, 1100-1200, 2000-2100, 3900-4000Hz and then sampled at 8kHz of sampling rate. Frame width is 20ms and period is 10ms. Accordingly, we found that the recognition rate of DTW is better than HMM for Korean digit speech in the experimental result. Recognition accuracy of Korean digit speech using filter bank is 93.3% for the 24th BPF, 89.1% for the 16th BPF and 88.9% for the 8th BPF of hardware realization of voice dialing system.

  • PDF

Lip-reading System based on Bayesian Classifier (베이지안 분류를 이용한 립 리딩 시스템)

  • Kim, Seong-Woo;Cha, Kyung-Ae;Park, Se-Hyun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.25 no.4
    • /
    • pp.9-16
    • /
    • 2020
  • Pronunciation recognition systems that use only video information and ignore voice information can be applied to various customized services. In this paper, we develop a system that applies a Bayesian classifier to distinguish Korean vowels via lip shapes in images. We extract feature vectors from the lip shapes of facial images and apply them to the designed machine learning model. Our experiments show that the system's recognition rate is 94% for the pronunciation of 'A', and the system's average recognition rate is approximately 84%, which is higher than that of the CNN tested for comparison. Our results show that our Bayesian classification method with feature values from lip region landmarks is efficient on a small training set. Therefore, it can be used for application development on limited hardware such as mobile devices.

Hand Biometric Information Recognition System of Mobile Phone Image for Mobile Security (모바일 보안을 위한 모바일 폰 영상의 손 생체 정보 인식 시스템)

  • Hong, Kyungho;Jung, Eunhwa
    • Journal of Digital Convergence
    • /
    • v.12 no.4
    • /
    • pp.319-326
    • /
    • 2014
  • According to the increasing mobile security users who have experienced authentication failure by forgetting passwords, user names, or a response to a knowledge-based question have preference for biological information such as hand geometry, fingerprints, voice in personal identification and authentication. Therefore biometric verification of personal identification and authentication for mobile security provides assurance to both the customer and the seller in the internet. Our study focuses on human hand biometric information recognition system for personal identification and personal Authentication, including its shape, palm features and the lengths and widths of the fingers taken from mobile phone photographs such as iPhone4 and galaxy s2. Our hand biometric information recognition system consists of six steps processing: image acquisition, preprocessing, removing noises, extracting standard hand feature extraction, individual feature pattern extraction, hand biometric information recognition for personal identification and authentication from input images. The validity of the proposed system from mobile phone image is demonstrated through 93.5% of the sucessful recognition rate for 250 experimental data of hand shape images and palm information images from 50 subjects.