MALSORI (대한음성학회지:말소리)

2008

Volume 68
Issue 67
Issue 66
Issue 65

2007

Issue 64
Issue 63
Issue 62
Issue 61

2006

Issue 60
Issue 59
Issue 58
Issue 57

2005

Issue 56
Volume 55
Issue 54
Issue 53

2004

Issue 52
Issue 51
Issue 50
Issue 49

2003

Issue 48
Issue 47
Issue 46
Issue 45

2002

Issue 44
Issue 43
Issue spc1

2001

Issue 42
Issue 41

2000

Issue 40
Issue 39

1999

Issue 38
Issue 37

1998

Issue 35_36

1997

Issue 33_34

1996

Issue 31_32

1995

Issue 29_30

1994

Issue 27_28

1993

Issue 25_26

1992

Issue 21_24

1990

Issue 19_20

1989

Issue 15_18

1987

Issue 11_14

1985

Issue 9_10

1984

Issue 7_8

1983

Issue 6

1982

Issue 5
Issue 4

1981

Issue 3
Issue 2

1980

Issue 1

DOI 출판

한국DOI센터에 등록된 논문의 서지정보를 불러와서 KoreaScience에 공개합니다.

신규 권호 정보 생성과 논문 PDF 연결은 담당자(society@kisti.re.kr, 042-869-1775)에게 요청하세요.

Issue 54

A Study on the Durational Characteristics of Korean Distant-Talking Speech

Kim, Sun-Hee 1

PDF

This paper presents durational characteristics of Korean distant-talking speech using speech data, which consist of 500 distant-talking utterances and 500 normal utterances of 10 speakers (5 males and 5 females). Each file was segmented and labeled manually and the duration of each segment and each word was extracted. Using a statistical method, the durational change of distant-talking speech in comparison with normal speech was analyzed. The results show that the duration of words with distant-talking speech is increased in comparison with normal style, and that the average unvoiced consonantal duration is reduced while the average vocalic duration is increased. Female speakers show a stronger tendency towards lengthening the duration in distant-talking speech. Finally, this study also shows that the speakers of distant-talking speech could be classified according to their different duration rate.
Analysis of Lexical Effect on Spoken Word Recognition Test

Yoon, Mi-Sun;Yi, Bong-Won 15

PDF

The aim of this paper was to analyze the lexical effects on spoken word recognition of Korean monosyllabic word. The lexical factors chosen in this paper was frequency, density and lexical familiarity of words. Result of the analysis was as follows; frequency was the significant factor to predict spoken word recognition score of monosyllabic word. The other factors were not significant. This result suggest that word frequency should be considered in speech perception test.
A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments

Lee, Gil-Ho;Yoon, Jae-Sam;Oh, Yoo-Rhee;Kim, Hong-Kook 27

PDF

Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.
Korean Prosody Generation Based on Stem-ML

Han, Young-Ho;Kim, Hyung-Soon 45

PDF

In this paper, we present a method of generating intonation contour for Korean text-to-speech (TTS) system and a method of synthesizing emotional speech, both based on Soft template mark-up language (Stem-ML), a novel prosody generation model combining mark-up tags and pitch generation in one. The evaluation shows that the intonation contour generated by Stem-ML is better than that by our previous work. It is also found that Stem-ML is a useful tool for generating emotional speech, by controling limited number of tags. Large-size emotional speech database is crucial for more extensive evaluation.
Robust Speech Recognition Using Real-Time Higher Order Statistics Normalization

Jeong, Ju-Hyun;Song, Hwa-Jeon;Kim, Hyung-Soon 63

PDF

The performance of speech recognition system is degraded by the mismatch between training and test environments. Many studies have been presented to compensate for noise components in the cepstral domain. Recently, higher order cepstral moment normalization method has been introduced to improve recognition accuracy. In this paper, we present real-time high order moment normalization method with post-processing smoothing filter to reduce the parameter estimation error in higher order moment computation. In experiments using Aurora2 database, we obtained error rate reduction of 44.7% with proposed algorithm in comparison with baseline system.
Speaker Identification Using Augmented PCA in Unknown Environments

Yu, Ha-Jin 73

PDF

The goal of our research is to build a text-independent speaker identification system that can be used in any condition without any additional adaptation process. The performance of speaker recognition systems can be severely degraded in some unknown mismatched microphone and noise conditions. In this paper, we show that PCA(principal component analysis) can improve the performance in the situation. We also propose an augmented PCA process, which augments class discriminative information to the original feature vectors before PCA transformation and selects the best direction for each pair of highly confusable speakers. The proposed method reduced the relative recognition error by 21%.