Search | Korea Science

Distance Measures Based Upon Adaptive Filtering For Robust Speech Recognition In Noise (잡음 환경하에서 음성 인식을 위한 적응필터링 거리 척도에 관한 연구)

정원국;은종관
- The Journal of the Acoustical Society of Korea
- /
- v.11 no.1E
- /
- pp.15-22
- /
- 1992
잡음이 있는 환경하에서는 음성 인식의 성능이 현저하게 떨어지게 된다. 본 논문에서는 이렇나 잡음의 영향에 강한 거리척도를 제안하고자 한다. 우리는 잡음이 더해진 음성신호의 특징벡터를 깨끗한 음성신호의 특징벡터가 FIR 시스템을 거쳐 변형된 것이라고 가정한다. 여기서 FIR 시스템은 잡음의 영 향을 모델링한 것이라고 할 수 있다. 미지의 FIR 시스템 계수잡음의 영향을 모델링한 것이라고 할 수 있다. 미지의 FIR 시스템계수들은 RLS 적응 알고리즘을 이용하여 구한다. 제안된 거리척도는 적응 여파 기의 예측 오차에 관한 식으로 표시되어진다. 여러 가지 적응 여파기의 구조중 단일 채널 일차 FIR 구 조가 가장 좋은 음성 인식 성능을 보이며, 이 경우 효과적인 거리척도 알고리즘을 구할 수 있다. 여러 가지 신호대 잡음비에 관하여 화자독립 격리단어 인식 실험을 DTW 알고리즘을 이용하여 수행하여 본 결과 제안된 거리척도가 거의 모든 신호대 잡음비에 대하여 우수한 성능을 보였다.
PDF

비폐색 부위에 따른 비강자음의 음향학적 특성과 비음비율의 변화

손영익;정유석;윤영선;이은경
- Proceedings of the KSLP Conference
- /
- 1997.11a
- /
- pp.253-253
- /
- 1997
비폐색이 있는 경우 음성이 변하는 것을 쉽게 느낄 수 있지만, 비폐색 때의 음향학적인 특성에 대하여 알려진 바를 찾기는 쉽지 않다. 저자들은 인위적으로 비폐색을 유발하여 비폐색 부위에 따른 비강자음의 음향학적 변화특성을 파악하고 비음도의 변화 정도를 비교하고자 하였다. 정상비음도를 보이는 성인남녀 각 10명을 대상으로, 2ml의 부피를 갖도록 수술용장갑에 Merocel$^{\circledR}$을 넣은 뒤, 이를 이용하여 ostiomeatal unit(OMU)을 중심으로 전후상하 4부위의 인위적인 비폐색을 유발하여, 비폐색 전과 후의 부위에 빠른 차이틀 비교하였다. /나나/의 발성을 각 조건에서 3회 실시하여, 모음중간의 /ㄴ/중 (CVCV) 안정된 spectorgram소견을 보이는 부위를 선택하여, 해당구간의 제1, 제2, 제3 음형대와 각각의 bandwidth 평균값을 남녀별로 비교하였고, 표준비음비율이 알려진 rabbit, baby, mama 문장을 이용하여 비음비율을 비교하였다. 남녀모두 비폐색전에 비하여 OMU의 앞쪽부위를 막은 경우에 제1음형대가 가장 뚜렷하게 감소되었으며, 비음비율의 유의한 감소를 보였다. 비폐색이 있는 경우, 비강자음 /ㄴ/은 제1음형대를 중심으로 주요변화가 일어남을 알 수 있었으며, 비폐색 부위에 따라 비음비율이나 제1음형대 감소의 정도가 다름을 알 수 있었다.
PDF

DNN based Speech Detection for the Media Audio (미디어 오디오에서의 DNN 기반 음성 검출)

Jang, Inseon;Ahn, ChungHyun;Seo, Jeongil;Jang, Younseon
- Journal of Broadcast Engineering
- /
- v.22 no.5
- /
- pp.632-642
- /
- 2017
In this paper, we propose a DNN based speech detection system using acoustic characteristics and context information of media audio. The speech detection for discriminating between speech and non-speech included in the media audio is a necessary preprocessing technique for effective speech processing. However, since the media audio signal includes various types of sound sources, it has been difficult to achieve high performance with the conventional signal processing techniques. The proposed method improves the speech detection performance by separating the harmonic and percussive components of the media audio and constructing the DNN input vector reflecting the acoustic characteristics and context information of the media audio. In order to verify the performance of the proposed system, a data set for speech detection was made using more than 20 hours of drama, and an 8-hour Hollywood movie data set, which was publicly available, was further acquired and used for experiments. In the experiment, it is shown that the proposed system provides better performance than the conventional method through the cross validation for two data sets.
https://doi.org/10.5909/JBE.2017.22.5.632 인용 PDF KSCI KPUBS

Speech Recognition based on Environment Adaptation using SNR Mapping (SNR 매핑을 이용한 환경적응 기반 음성인식)

Chung, Yong-Joo
- The Journal of the Korea institute of electronic communication sciences
- /
- v.9 no.5
- /
- pp.543-548
- /
- 2014
Multiple-model based speech recognition framework (MMSR) has been known to be very successful in speech recognition. Since it uses multiple hidden Markov modes (HMMs) that corresponds to various noise types and signal-to-noise ratio (SNR) values, the selected acoustic model can have a close match with the test noisy speech. However, since the number of HMM sets is limited in practical use, the acoustic mismatch still remains as a problem. In this study, we experimentally determined the optimal SNR mapping between the test noisy speech and the HMM set to mitigate the mismatch between them. Improved performance was obtained by employing the SNR mapping instead of using the estimated SNR from the test noisy speech. When we applied the proposed method to the MMSR, the experimental results on the Aurora 2 database show that the relative word error rate reduction of 6.3% and 9.4% was achieved compared to a conventional MMSR and multi-condition training (MTR), respectively.
https://doi.org/10.13067/JKIECS.201.9.5.543 인용 PDF KSCI

Multimedia Traffic Analysis using Markov Chain Model in CDMA Mobile Communication Systems (CDMA 이동통신 시스템에서 멀티미디어 트래픽에 대한 마르코프 체인 해석)

김백현;김철순;곽경섭
- Journal of Korea Multimedia Society
- /
- v.6 no.7
- /
- pp.1219-1230
- /
- 2003
We analyze an integrated voice/data CDMA system, where the whole channels are divided into voice prioritized channels and voice non-prioritized channels. For real-time voice service, a preemptivc priority is granted in the voice prioritized channels. And, for delay-tolerant data service, the employment of buffer is considered. On the other hand, the transmission permission probability in best-effort packet-data service is controlled by estimating the residual capacity available for users. We build a 2-dimensional markov chain about prioritized-voice and stream-data services and accomplish numerical analysis in combination with packet-data traffic based on residual capacity equation.
PDF

A study on the robust context-dependent acoustic models by considering the state splitting and the time variant of speech (음성의 시간변이와 상태분할을 고려한 강건한 문맥의존 음향모델에 관한 연구)

오세진;김광동;노덕규;정현열
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.04c
- /
- pp.229-231
- /
- 2003
일반적으로 음성은 시간함수로 표현되며 음성인식에서 표준모델을 모델링하는 것은 매우 중요한 문제이다. 음절 단어, 연속음성을 발성할 때 자음과 모음에 따라 발성시간에 차이가 있으며 이를 잘 모델링하는 것 또한 음성인식에서는 중요한 문제라고 할 수 있다. 따라서 본 연구에서는 강건한 음향모델을 학습하기 위해 시간의 변화와 상태분할과정에서의 모델의 변화를 고려하여 다양한 구조의 초기모델을 작성하였다. 각 초기모델에 의한 HM-Net 문맥의존 음향모델은 음소결정트리 기반 SSS 알고리즘(PDT-SSS)을 이용하였다. PDT-SSS 알고리즘은 미지의 문맥정보를 해결하기 위해 문맥방향과 시간방향으로 목표 상태수에 도달할 때까지 상태분할을 수행하여 모델을 작성하는 방법이다. 음성의 시간변이를 고려한 강건한 문맥의존 음향모델을 작성하기 위해 설정한 각 모델의 구조에 대한 유효성을 확인하기 위해 국어공학센터의 452 단어를 대상으로 음소와 단어인식 실험을 수행한 결과. 음소인식의 경우 상태수 2000개에서 2상태 구조의 모델에 비해 4상태 구조가 약 11.4% 향상된 인식성능과 39.2초의 인식시간을 단축할 수 있었다. 또한 단어인식의 경우 상태수 2000개에서 1상태 구조의 모델에 비해 4상태 구조가 약 5% 향상된 인식성능과 4상태 구조에서 한 단어를 인식하는데 평균 0.8초가 소요되었다. 따라서 강건한 문맥의존 음향모델을 작성하기 위해 수행한 초기모델의 구조에 관한 연구가 향후 음성인식 시스템을 구축하는데 유효함을 확인할 수 있었다.
PDF

Voice Activity Detection Algorithm Based on the Power Spectral Deviation of Teager Energy in Noisy Environment (잡음환경에서 Teager 에너지의 전력 스펙트럼 편차에 기반한 음성 검출 알고리즘)

Park, Yun-Sik;An, Hong-Sub;Lee, Sang-Min
- The Journal of the Acoustical Society of Korea
- /
- v.30 no.7
- /
- pp.396-401
- /
- 2011
In this paper, we propose a novel voice activity detection (VAD) algorithm to effectively distinguish speech from nonspeech in various noisy environments. The presented VAD utilizes the power spectral deviation (PSD) based on Teager energy (TE) instead of the conventional PSD scheme to improve the performance of decision for speech segments. In addition, the speech absence probability (SAP) is derived in each frequency subband to modify the PSD for further VAD. Performances of the proposed VAD algorithm are evaluated by objective test under various environments and better results compared with the conventional methods are obtained.
https://doi.org/10.7776/ASK.2011.30.7.396 인용 PDF KSCI

Multi-party video telephony of audio gain control for low computation voice classification method (다자간 영상통화의 오디오 게인콘트롤을 위한 저연산 음성분류방식)

Ryu, Sang-Hyeon;Kim, Hyoung-Gook
- Proceedings of the Korea Multimedia Society Conference
- /
- 2012.05a
- /
- pp.349-350
- /
- 2012
본 논문에서는 다자간 영상통화의 오디오 게인콘트롤을 위한 저연산 음성분류방식을 제안한다. 제안된 음성분류방식은 입력되는 음성신호를 음성신호의 특징에 따라서 묵음/무성음/유성음으로 분류한다. 입력된 음성신호의 에너지를 이용해서 음성구간과 비음성구간을 판별한다. 음성구간으로 판별된 구간에 대해서 ZCR(Zeor Crossing Rate)를 이용하여 유성음과 무성음으로 분류한다. 제안된 방식의 성능을 측정을 위해 음성분류 정확도와 연산시간을 측정하여 성능을 측정하였다.
PDF

Speech Coding Algorithms for Mobile Communication (이동통신을 위한 음성 부호화 방식)

이황수
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.08a
- /
- pp.3-11
- /
- 1998
정보통신 문화가 발달함에 따라 디지털 이동통신이나, 멀티미디어, 음성우편 시스템 등 음성을 이용한 여러 가지 새로운 산업들이 급속히 성장하고 있다. 이 중에서도 특히 디지털 이동통신분야에 대한 연구가 활발한데, 이는 디지털 시스템에서는 부호화기를 사용하여 음성신호를 압축하기 때문에 아날로그 시스템에 비해 패널 증대를 가져올 수 있기 때문이다. 이처럼 음성 부호화기는 실질적인 상업화와 매우 밀접한 관계를 갖고 있기 때문에 그에 대한 연구가 화발히 진행되고 있다. 본 논문에서는 먼저 일반적인 음성부호화 방법들에 대해 살펴본 다음에, 현재 디지털 셀룰라 시스템에서 사용하고 있는 full-rate 음성 부호화기 및 half-rate 음성 부호화기의 표준화 동향과 최근에 여러 응용분야에서 널리 사용되고 있는 음성 부호화기에 대해서 설명하기로 한다. 또한 ITU-T 의 표준화 동향 및 4kbps 이하의 전송률을 갖는 음성 부호화기의 연구추세에 대해서 살펴보기로 한다.
PDF

Speech Active Interval Detection Method in Noisy Speech (잡음음성에서의 음성 활성화 구간 검출 방법)

Lee, Kwang-Seok;Choo, Yeon-Gyu;Kim, Hyun-Deok
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2008.10a
- /
- pp.779-782
- /
- 2008
It is important to detect speech active interval from Noisy Speech in speech communication and speech recognition. In this research, we propose characteristic parameter with combining spectral Entropy for detect speech active interval in Noisy Speech, and compare performance of speech active interval based on energy. The results shows that analysis using proposed characteristic parameter is higher performance the others in noisy environment.
PDF

Search Result 1,996, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)