Search | Korea Science

A Comparison of Front-Ends for Robust Speech Recognition

Kim, Doh-Suk;Jeong, Jae-Hoon;Lee, Soo-Young;Kil, Rhee M.
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.3E
- /
- pp.3-11
- /
- 1998
Zero-crossings with Peak amplitudes (ZCPA) model motivated by human auditory periphery was proposed to extract reliable features form speech signals even in noisy environments for robust speech recognition. In this paper, the performance of the ZCPA model is further improved by incorporating conventional speech processing techniques into the model output. Spectral and cepstral representations of the ZCPA model output are compared, and the incorporation of dynamic features with several different lengths of time-derivative window are evaluated. Also, comparative evaluations with other front-ends in real-world noisy environments are performed, and result in the superiority of the ZCPA model.
PDF

The Performance Comparison of Digital Modulations for Underwater Data Communication (수중 데이터 통신을 위한 변조방식의 성능 비교)

Son Geun-young;Ro Yong-ju;Yoon Jong-rak
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.429-432
- /
- 2000
수중에서 AUV신호나 화상데이터의 정확한 고속 전송 등을 위해 수중 데이터 통신에 대한 많은 연구가 진행되어 왔다. 수중 데이터 통신에서 잔향과 배경 잡음 등의 해양 환경 특성을 극복하는 것은 신뢰성 있는 통신 환경을 이룩하는데 중요하다. 특히 해면과 해저로 이루어진 천해 환경에서 해면과 해저 반사파에 의한 영향은 수중 데이터 통신의 성능을 좌우하는 중요한 요소 중의 하나로 알려져 있다. 이러한 환경적 영향을 최소화하여 높은 성능의 통신 환경을 제공하기 위해 다중경로의 영향을 적게 받는 변조방식을 선택하는 것이다. 수중 데이터 통신에서 일반적으로 사용되는 변조방식은 FSK, PSK, DPSK 등이 있다. 본 연구에서는 해면$\cdot$해저로 이루어진 해양 통신 채널에서 세 가지 변조방식의 성능을 수치모의실험을 통하여 비교$\cdot$분석하였다. 수치모의실험에서 해면 해저로 이루어진 천해의 해양 통신 채널은 음원 영상법을 적용하여 구성하였으며 각 변조방식의 성능은 BER(Bit Error Ratio)로 나타내었다.
PDF

Comparison of Recognition Per formance of Noisy Speech Depend ing on Preprocessing Methods (전처리 기법에 따른 잡음음성의 인식성능 비교)

Son Jong Mok;Lee Yong Ju;Bae Keun Sung
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.31-34
- /
- 2000
본 연구에서는 부가잡음에 의한 음성신호의 왜곡에 대해 다양한 음성개선 기법을 전처리기로 도입하여 HMM(Hidden Markov Model)에 기반 한 음성인식 시스템의 인식성능을 평가하였다. 음성개선 기법으로는 MMSE(Minimun Mean Square Error) STSA(Short-Time Spectral Amplitude Estimator) 기법과 웨이브렛 영역에서의 UWD(Undecimated Wavelet Denoising), CWD(Conventional Wavelet Denoising) 기법을 적용하였다. 잡음이 없는 데이터로 훈련한 음성인식시스템에 잡음음성을 입력할 때 각 음성개선기법을 전처리기로 사용하여 신호대잡음비(Signal to Noise Ratio)에 따른 인식 성능을 비교하였다.
PDF

The Comparison of features for Speech/Music Discrimination (음성/음악 분류를 위한 특징 비교)

Lee Kyong Rok;Seo Bong Su;Kim Jin Young
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.157-160
- /
- 2000
본 논문에서는 멀티미디어 정보에서 원하는 정보를 추출하는 멀티미디어 인덱싱 중 오디오 인덱싱의 전처리 부격인 음성/음악 분류실험을 하였다. 오디오 인덱싱에 있어서 음성/음악 분류기는 원 오디오 신호에서 정보를 가진 음성 부분을 분리하는 역할을 한다. 실험에서는 음성/음악 분류에서 널리 쓰이는 멜캡스트럼(Mel Cepstrum), 정규화 로그 에너지(normalized log energy), 영교차(Zero-Crossings)를 특징 파라미터로 사용하였다[l, 2, 3]. 특징공간은 GMM(Gaussian Mixture Model)에 의해 모델링 되었고, 오디오 신호의 분류는 각각 3가지 분류항목(음성, 음악, 음성+음악)과 2가지 분류항목(음성, 음악)을 적용하였다. 실험결과 3가지 분류항목 적용시와 2가지 분류항목 적용시 모두 멜캡스트럼을 사용하였을 때 가장 좋은 결과를 보였다.
PDF

Study on Normal and Random incidence Absorption Coefficient (수직 및 랜덤입사 흡음률에 관한 연구)

Kang Hyun-Ju;Kim Bong-Ki;Kim Sang-Ryul
- Proceedings of the Acoustical Society of Korea Conference
- /
- spring
- /
- pp.283-286
- /
- 2000
Comparison for various empirical models of normal incident absorption was made, along with experiments. Comparative result indicates that Voronina model which is function of fiber diameter and porosity is more suitable than the other models. An investigation for correlation between normal and random incident absorption was carried out by experiment and analysis. It appears that at the low frequencies, the random incident absorption is higher than the normal one, whileas at the high frequencies, the random incident absorption is decreased due to the effect of grazing incident components.
PDF

Performance Comparison and Verification of Lip Parameter Selection Methods in the Bimodal Speech ]Recognition System (입술 파라미터 선정에 따른 바이모달 음성인식 성능 비교 및 검증)

박병구;김진영;임재열
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.3
- /
- pp.68-72
- /
- 1999
The choice of parameters from various lip information and the robustness of extracting lip parameters play important roles in the performance of bimodal speech recognition system. In this paper, lip parameters are extracted by using an automatic extraction algorithm and inner lip parameters effect on the recognition rate more than outer lip parameters. Compared with a manual extraction algorithm, the automatic extraction method is evaluated about its robustness.
PDF

A Study on Tunneling Effect in Sound Transmission Loss Measurement (차음성능 측정시 터널링 효과에 관한 연구)

Kim, Bong-Ki;Kim, Jae-Seung;Kim, Hyun-Sil;Kang, Hyun-Ju;Kim, Sang-Ryul
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.1E
- /
- pp.24-30
- /
- 2004
This study is aimed to evaluate a tunneling effect in the laboratory measurement of sound transmission loss. Based on the formulation for sound transmission loss of a finite panel in the presence of tunnel, variations of the sound transmission loss with the parameters of panel location and tunnel depth are investigated. In comparison with the transmission loss of a finite plate in an infinite rigid baffle, the maximum difference occurs in the laboratory measurement when the panel is placed at the center of the tunnel, while a better estimation of true transmission loss is obtained when the panel is located at either end.
PDF KSCI

Dialog System Using Multimedia Techniques for the Elderly with Dementia

Kim, Sung-Ill;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.4E
- /
- pp.170-177
- /
- 2002
The goal of the present research is to improve a quality of life of the elderly with a dementia. In this paper, it is realized by developing the dialog system that is controlled by three kinds of modules such as speech recognition engine, graphical agent, or database classified by a nursing schedule. The system was evaluated in an actual environment of a nursing facility by introducing it to an older male patient with dementia. The comparison study between dialog system and professional caregivers was then carried out at nursing home for 5 days in each case. The evaluation results showed that the dialog system was more responsive in catering to needs of dementia patient than professional caregivers. Moreover, the proposed system led the patient to talk more than caregivers did.
PDF KSCI

A Variable Step-Size NLMS Algorithm with Low Complexity

Chung, Ik-Joo
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.3E
- /
- pp.93-98
- /
- 2009
In this paper, we propose a new VSS-NLMS algorithm through a simple modification of the conventional NLMS algorithm, which leads to a low complexity algorithm with enhanced performance. The step size of the proposed algorithm becomes smaller as the error signal is getting orthogonal to the input vector. We also show that the proposed algorithm is an approximated normalized version of the KZ-algorithm and requires less computation than the KZ-algorithm. We carried out a performance comparison of the proposed algorithm with the conventional NLMS and other VSS algorithms using an adaptive channel equalization model. It is shown that the proposed algorithm presents good convergence characteristics under both stationary and non-stationary environments despites its low complexity.
PDF KSCI

A perceptual and acoustical study of /ㅅ/ in children's speech (아동이 산출한 치조마찰음 /ㅅ/에 대한 청지각적·음향학적 연구)

Kim, Jiyoun;Seong, Cheoljae
- Phonetics and Speech Sciences
- /
- v.10 no.3
- /
- pp.41-48
- /
- 2018
This study examined the acoustic characteristics of Korean alveolar fricatives of normal children. Developing children aged 3 and 7, typically produced 2 types of nonsense syllables containing alveolar fricative /sV/ and /VsV/ sequences where V was any one of three corner vowels (/i, a, and u/). Stimuli containing the speech materials used in a production experiment were presented randomly to 12 speech language pathologists (SLPs) for a perception test. The SLPs responded by selecting one of seven alternative sounds. Acoustic measures such as duration of frication noise, normalized intensity, skewness, and center of gravity were examined. There was significant difference in acoustic measures when comparing vowels. Comparison of syllable structures indicated statistically significant differences in duration of frication noise and normalized intensity. Acoustic parameters could account for the perceptual data. Relating the acoustic and perception data by means of logistic regression suggests that duration of frication noise and normalized intensity are the primary cues to perceiving Korean fricatives.
https://doi.org/10.13064/KSSS.2018.10.3.041 인용 PDF KSCI

Search Result 253, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)