Search | Korea Science

Voice Recognition Performance Improvement using a convergence of Voice Energy Distribution Process and Parameter (음성 에너지 분포 처리와 에너지 파라미터를 융합한 음성 인식 성능 향상)

Oh, Sang-Yeob
- Journal of Digital Convergence
- /
- v.13 no.10
- /
- pp.313-318
- /
- 2015
A traditional speech enhancement methods distort the sound spectrum generated according to estimation of the remaining noise, or invalid noise is a problem of lowering the speech recognition performance. In this paper, we propose a speech detection method that convergence the sound energy distribution process and sound energy parameters. The proposed method was used to receive properties reduce the influence of noise to maximize voice energy. In addition, the smaller value from the feature parameters of the speech signal The log energy features of the interval having a more of the log energy value relative to the region having a large energy similar to the log energy feature of the size of the voice signal containing the noise which reducing the mismatch of the training and the recognition environment recognition experiments Results confirmed that the improved recognition performance are checked compared to the conventional method. Car noise environment of Pause Hit Rate is in the 0dB and 5dB lower SNR region showed an accuracy of 97.1% and 97.3% in the high SNR region 10dB and 15dB 98.3%, showed an accuracy of 98.6%.
https://doi.org/10.14400/JDC.2015.13.10.313 인용 PDF KSCI

Speech Recognition using HMM over the WWW (웹상에서의 HMM을 이용한 한국에 음성인식)

Choi Kwang-Kook;Lee Jae-Wang;Kim Cheol;Choi Seung-Ho
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.77-80
- /
- 1999
본 논문에서는 웹상에서의 음성인식 시스템을 구현하기 위해 자바애플릿과 연속분포HMM을 이용하여 단어 단위 인식을 실행하였다. 이 시스템은 Browser-embedded 모델로 구성되었으며 클라이언트컴퓨터에서는 애플릿으로 음성을 처리하여 특징파라미터들을 인터넷을 통해 서버컴퓨터로 보내고, 서버의 음성인식기는 전향 알고리듬을 적용하여 인식된 결과를 클라이언트컴퓨터에게 보내어 문자로 출력하도록 설계하였다. 훈련DB는 자동차 항법시스템에서 사용되는 22개 단어로 구축되었다.
PDF

상후두운동과 경부외후두근의 근전도검사에 관한 연구

An, Chul-Min;Jang, Hoon
- Proceedings of the KSLP Conference
- /
- 1997.11a
- /
- pp.265-265
- /
- 1997
음성이란 성대내근과 성대외근의 운동에 의해 진성대에서 규칙적이고 조화로운 진동이 이루어져 나타나는 것으로, 이러한 운동 등에 이상이 생기거나 진성대에 기질적인 변화가 왔을 때 음성이 변하게 된다. 그러나 이런 환자들의 진단을 위하여 후두소견을 관찰해 보면 진성대 뿐만 아니라, 상후두에서도 여러 가지 양상의 다른 움직임을 확인할 수가 있는데, 상후두는 진성대와는 달리 특별한 자체를 움직일 만한 근육들이 발달되어 있지 않은 구조물임에도 불구하고 여러 다른 발성에 따라 다양한 움직임을 나타내게 된다. 이러한 것들은 진성대와 연결되어 있는 성대내근이나 후두의 외부에 붙어있는 성대외근의 영향에 의해 나타날 수가 있다고 생각이 되고, 이러한 것들은 일차적으로 또는 이차적으로 성대의 진동에 영향을 줄 수 있을 것으로 생각된다. 이에 저자들은 발성시 상후두의 움직임과 성대외근과의 관계를 확인하기 위하여 상후두의 움직임을 여러 가지 모양으로 만들도록 훈련한 후 스트로보스 3102;을 이용하여 상후두의 움직임을 확인하면서, 각각의 경우에 따라 근육의 수축정도를 비교할 수 있고 비침습적인 표면전극을 이용한 근전도 검사를 시행하여 이들에 관한 연구를 하였다.다.
PDF

A Survey or The Korean Learner's Problems in Mastering English Pronunciation (한국인의 영어 발음 학습상 문제점 개관)

Youe Hansa MahnGunn
- MALSORI
- /
- no.42
- /
- pp.47-56
- /
- 2001
이 글은 제2회 서울 국제 음성학 학술대회(SICOPS 2000) 기조강연 내용을 조금 손질한 것인데, 한국인 영어 학습자가 저지르기 쉬운 발음상 잘못을 모음, 자음별로 관찰하고 그 대책을 논의한다. 모음에서는 주로 i:l, u:$-\sigma$, (equation omitted) 흔동이 문제이며, 또한 90종이 넘는 여러 철자로 나타나는 쭉정모음(schwa) 식별과 정복한 발음도 큰 문제다. 자음에서는 음소 연결방식에서 생기는 자음접변 둥 한 국어 특유 현상을 영어에까지 연장하는 바람에 많은 오류가 생긴다는 것과 영어 sp-, st-, sk-에서 /p t k/는 연한소리(lenis)로 [(equation omitted)]인데, 된소리로 잘못알고 있는 수가 많다는 것도 지적된다. 무룻 영어학습자는 철자만 보고 발음을 속단하지 말고 단어마다 반드시 발음을 사전에서 확인할 것과 아울러 거기에 음성학적 훈련이 수반되어야 함을 역설하며, 정확한 발음을 아는 것은 실제 영어 청취i구사에 뿐 아니라 또한 언어연구 기초확립에 필수적이라는 말로 글을 맺는다.
PDF

Performance Improvement of Speech Recognition System Based on Speaker Normalization Through Linear Warping Function (선형워핑함수의 화자정규화에 의한 음성 인식시스템의 성능향상)

Choi, Seok-Yong;Chung, Kyoung-Yong;Lee, Jung-Hyun
- Proceedings of the Korea Information Processing Society Conference
- /
- 2000.10b
- /
- pp.879-882
- /
- 2000
화자종속 음성인식 시스템은 훈련 데이터가 화자들 사이의 음향적 변이를 충분히 모델링 할 수 있을 때, 화자독립 시스템보다 더 성능이 졸은 것으로 알려져 있다. 화자 정규화 기술은 입력음성의 스펙트럼을 수정하여 화자들 사이의 변이를 줄인다. 최근 성공적인 화자 정규화 알고리즘은 신호처리단계에 화자 특유 주파수 워핑을 통합했다. 이런 알고리즘은 입력음성에 담겨있는 음향적 특징을 다 사용하지 않는다. 본 논문에서는 화자의 음향적 특징으로 세 개의 포만트 주파수를 이용하였고, 수집된 포만트 주파수들로부터 워핑함수를 정의하는데 선형회귀를 사용한 화자 정규화 방법을 제안한다. 이 방법을 사용하여 인식 성능을 향상할 수 있었다.
PDF

A study on Auto-Segmentation Improvement for a Large Speech DB (대용량 음성 D/B 구축을 위한 AUTO-SEGMENTATION에 관한 연구)

Lee Byong-soon;Chang Sungwook;Yang Sung-il;Kwon Y.
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.209-212
- /
- 2000
본 논문은 음성인식에 필요한 대용량 음성 D/B 구축을 위한 auto-segmentation의 향상에 관한 논문이다. 50개의 우리말 음소(잡음, 묵음 포함)를 정하고 음성특징으로 MFCC(Mel Frequency Cepstral Coefficients), $\Delta$MFCC, $\Delta\Delta$MFCC, 39차를 추출한 다음 HMM 훈련과 CCS(Constrained Clustering Segmentation) 알고리즘(1)을 사용하여auto-segmentation을 수행하였다. 이 과정에서 대부분의 음소는 오류범위$(\pm25ms)$ 안에서 분절이 이루어지지만, 짧은 묵음, 모음+유성자음('ㅁ', 'ㄴ', 'ㄹ', 'o') 등에서 자주 오류범위를 넘어 분절이 발생하였다. 이러한 음운환경에 따른 경계의 오류를 구간별로 Wavelet 변환 신호의 MLR(Maximum Likelihood Ratio) 값을 이용, 기존 문제점을 보완하여 오류의 범위를 줄임으로서 auto-segmentation의 성능 향상을 얻을 수 있었다.
PDF

A study on the recognition performance of connected digit telephone speech for MFCC feature parameters obtained from the filter bank adapted to training speech database (훈련음성 데이터에 적응시킨 필터뱅크 기반의 MFCC 특징파라미터를 이용한 전화음성 연속숫자음의 인식성능 향상에 관한 연구)

Jung Sung Yun;Kim Min Sung;Son Jong Mok;Bae Keun Sung;Kang Jeom Ja
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.119-122
- /
- 2003
In general, triangular shape filters are used in the filter bank when we get the MFCCs from the spectrum of speech signal. In [1], a new feature extraction approach is proposed, which uses specific filter shapes in the filter bank that are obtained from the spectrum of training speech data. In this approach, principal component analysis technique is applied to the spectrum of the training data to get the filter coefficients. In this paper, we carry out speech recognition experiments, using the new approach given in [1], for a large amount of telephone speech data, that is, the telephone speech database of Korean connected digit released by SITEC. Experimental results are discussed with our findings.
PDF

Adaptive Speech Emotion Recognition Framework Using Prompted Labeling Technique (프롬프트 레이블링을 이용한 적응형 음성기반 감정인식 프레임워크)

Bang, Jae Hun;Lee, Sungyoung
- KIISE Transactions on Computing Practices
- /
- v.21 no.2
- /
- pp.160-165
- /
- 2015
Traditional speech emotion recognition techniques recognize emotions using a general training model based on the voices of various people. These techniques can not consider personalized speech character exactly. Therefore, the recognized results are very different to each person. This paper proposes an adaptive speech emotion recognition framework made from user's' immediate feedback data using a prompted labeling technique for building a personal adaptive recognition model and applying it to each user in a mobile device environment. The proposed framework can recognize emotions from the building of a personalized recognition model. The proposed framework was evaluated to be better than the traditional research techniques from three comparative experiment. The proposed framework can be applied to healthcare, emotion monitoring and personalized service.
https://doi.org/10.5626/KTCP.2015.21.2.160 인용 KSCI

Analysis of the Time Delayed Effect for Speech Feature (음성 특징에 대한 시간 지연 효과 분석)

Ahn, Young-Mok
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.1
- /
- pp.100-103
- /
- 1997
In this paper, we analyze the time delayed effect of speech feature. Here, the time delayed effect means that the current feature vector of speech is under the influence of the previous feature vectors. In this paper, we use a set of LPC driven cepstal coefficients and evaluate the time delayed effect of cepstrum with the performance of the speech recognition system. For the experiments, we used the speech database consisting of 22 words which uttered by 50 male speakers. The speech database uttered by 25 male speakers was used for training, and the other set was used for testing. The experimental results show that the time delayed effect is large in the lower orders of feature vector but small in the higher orders.
PDF

Compromised feature normalization method for deep neural network based speech recognition (심층신경망 기반의 음성인식을 위한 절충된 특징 정규화 방식)

Kim, Min Sik;Kim, Hyung Soon
- Phonetics and Speech Sciences
- /
- v.12 no.3
- /
- pp.65-71
- /
- 2020
Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.
https://doi.org/10.13064/KSSS.2020.12.3.065 인용 PDF KSCI

Search Result 278, Processing Time 0.12 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)