Search | Korea Science

Decision of the Korean Speech Act using Feature Selection Method (자질 선택 기법을 이용한 한국어 화행 결정)

김경선;서정연
- Journal of KIISE:Software and Applications
- /
- v.30 no.3_4
- /
- pp.278-284
- /
- 2003
Speech act is the speaker's intentions indicated through utterances. It is important for understanding natural language dialogues and generating responses. This paper proposes the method of two stage that increases the performance of the korean speech act decision. The first stage is to select features from the part of speech results in sentence and from the context that uses previous speech acts. We use x$^2$ statistics(CHI) for selecting features that have showed high performance in text categorization. The second stage is to determine speech act with selected features and Neural Network. The proposed method shows the possibility of automatic speech act decision using only POS results, makes good performance by using the higher informative features and speed up by decreasing the number of features. We tested the system using our proposed method in Korean dialogue corpus transcribed from recording in real fields, and this corpus consists of 10,285 utterances and 17 speech acts. We trained it with 8,349 utterances and have test it with 1,936 utterances, obtained the correct speech act for 1,709 utterances(88.3%). This result is about 8% higher accuracy than without selecting features.
PDF KSCI

A Study on the Bul-woo-heon-ga by Jeong Geuk-in (정극인의 <불우헌가>에 나타난 시조성 연구)

김성기
- Sijohaknonchong
- /
- v.19 no.1
- /
- pp.155-177
- /
- 2003
Jeong Geuk-in was a poet of the early Joseon period. He lived for 45 years before Hangeul was published and 35 years afterwards. So, he wrote poetry both in Chinese and Korean. He was a creative writer who wrote Korean poems and songs. There were only a few works written in Korean including and before him. His Korean poems are , and . He created Korean poems and songs by unifying three literary forms of Sijo, Gyeong-gi-che-ga and Gasa. This study was intended to examine written in Korean. For the study, the form of the Bul-woo-heon-ga was analyzed and it was considered as Saseolsijo (a form of sijo with no restrictions on the length of the first two verses) for genre classification. However, it is generally thought that the Saseolsijo appeared in the seventeenth century. Therefore, this study is to explain the reason why Bul-woo-heon-ga is included in Saseolsijo. Another problem is that the writer of Bul-woo-heon-ga is not Jeong Geul-in, because of the fact that the speaker who appears in Bul-woo-heon-ga admired Jeong Geuk-in. In general, people do not admire themselves. As Jeong Geuk-in is a subject to be admired in the book, it is thought that the writer of the book is considered as one of his pupils or friends.
PDF

An Efficient Lipreading Method Based on Lip's Symmetry (입술의 대칭성에 기반한 효율적인 립리딩 방법)

Kim, Jin-Bum;Kim, Jin-Young
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.37 no.5
- /
- pp.105-114
- /
- 2000
In this paper, we concentrate on an efficient method to decrease a lot of pixel data to be processed with an Image transform based automatic lipreading It is reported that the image transform based approach, which obtains a compressed representation of the speaker's mouth, results in superior lipreading performance than the lip contour based approach But this approach produces so many feature parameters of the lip that has much data and requires much computation time for recognition To reduce the data to be computed, we propose a simple method folding at the vertical center of the lip-image based on the symmetry of the lip In addition, the principal component analysis(PCA) is used for fast algorithm and HMM word recognition results are reported The proposed method reduces the number of the feature parameters at $22{\sim}47%$ and improves hidden Markov model(HMM)word recognition rates at $2{\sim}3%$, using the folded lip-image compared with the normal method using $16{\times}16$ lip-image.
PDF

Korean Speech Act Tagging using Previous Sentence Features and Following Candidate Speech Acts (이전 문장 자질과 다음 발화의 후보 화행을 이용한 한국어 화행 분석)

Kim, Se-Jong;Lee, Yong-Hun;Lee, Jong-Hyeok
- Journal of KIISE:Software and Applications
- /
- v.35 no.6
- /
- pp.374-385
- /
- 2008
Speech act tagging is an important step in various dialogue applications, which recognizes speaker's intentions expressed in natural language utterances. Previous approaches such as rule-based and statistics-based methods utilize the speech acts of previous utterances and sentence features of the current utterance. This paper proposes a method that determines speech acts of the current utterance using the speech acts of the following utterances as well as previous ones. Using the features of following utterances yields the accuracy 95.27%, improving previous methods by 3.65%. Moreover, sentence features of the previous utterances are employed to maximally utilize the information available to the current utterance. By applying the proper probability model for each speech act, final accuracy of 97.97% is achieved.
PDF KSCI

Applying an Auxiliary Filter in the Adaptive Echo Canceller for Performance Improvement of Double-Talk Detection (음향반향제거기에서 동시통화 검출 성능 개선을 위한 보조필터 적용)

Kim Siho;Bae Keunsung
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.42 no.1
- /
- pp.65-70
- /
- 2005
This paper deals with the problem of double-talk (DT) detection in anacoustic echo canceller (AEC). In the DT detection algorithm with correlation coefficient, detection errors occasionally occur because it is hard to set the threshold to distinguish DT from echo path change (EPC). Adaptive filter falls into the situation that it stops updating its filter coefficients when EPC is erroneously considered as DT at the starting-point of EPC. In addition, in case of echo path changing during the DT period, the end-point detection of DT period fails so that the AEC cannot update its filter coefficients for a while even after the DT period ends. To solve these problems, in this paper, we propose a novel AEC that employs an auxiliary filter. We use the idea that though the error signal cannot be estimated using reference signal in case or DT situation but it can be in case or EPC situation. The experimental result verifies that the proposed method could solve the problems caused by DT detection error or echo path change during the DT period.
PDF KSCI

Adaptation of Classification Model for Improving Speech Intelligibility in Noise (음성 명료도 향상을 위한 분류 모델의 잡음 환경 적응)

Jung, Junyoung;Kim, Gibak
- Journal of Broadcast Engineering
- /
- v.23 no.4
- /
- pp.511-518
- /
- 2018
This paper deals with improving speech intelligibility by applying binary mask to time-frequency units of speech in noise. The binary mask is set to "0" or "1" according to whether speech is dominant or noise is dominant by comparing signal-to-noise ratio with pre-defined threshold. Bayesian classifier trained with Gaussian mixture model is used to estimate the binary mask of each time-frequency signal. The binary mask based noise suppressor improves speech intelligibility only in noise condition which is included in the training data. In this paper, speaker adaptation techniques for speech recognition are applied to adapt the Gaussian mixture model to a new noise environment. Experiments with noise-corrupted speech are conducted to demonstrate the improvement of speech intelligibility by employing adaption techniques in a new noise environment.
https://doi.org/10.5909/JBE.2018.23.4.511 인용 PDF KSCI KPUBS

A Study on Spatio-temporal Features for Korean Vowel Lipreading (한국어 모음 입술독해를 위한 시공간적 특징에 관한 연구)

오현화;김인철;김동수;진성일
- The Journal of the Acoustical Society of Korea
- /
- v.21 no.1
- /
- pp.19-26
- /
- 2002
This paper defines the visual basic speech units, visemes and investigates various visual features of a lip for the effective Korean lipreading. First, we analyzed the visual characteristics of the Korean vowels from the database of the lip image sequences obtained from the multi-speakers, thereby giving a definition of seven Korean vowel visemes. Various spatio-temporal features of a lip are extracted from the feature points located on both inner and outer lip contours of image sequences and their classification performances are evaluated by using a hidden Markov model based classifier for effective lipreading. The experimental results for recognizing the Korean visemes have demonstrated that the feature victor containing the information of inner and outer lip contours can be effectively applied to lipreading and also the direction and magnitude of the movement of a lip feature point over time is quite useful for Korean lipreading.
PDF KSCI

Context-adaptive Phoneme Segmentation for a TTS Database (문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할)

이기승;김정수
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.2
- /
- pp.135-144
- /
- 2003
A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.
PDF KSCI

The Effect of Strong Syllables on Lexical Segmentation in English Continuous Speech by Korean Speakers (강음절이 한국어 화자의 영어 연속 음성의 어휘 분절에 미치는 영향)

Kim, Sunmi;Nam, Kichun
- Phonetics and Speech Sciences
- /
- v.5 no.2
- /
- pp.43-51
- /
- 2013
English native listeners have a tendency to treat strong syllables in a speech stream as the potential initial syllables of new words, since the majority of lexical words in English have a word-initial stress. The current study investigates whether Korean (L1) - English (L2) late bilinguals perceive strong syllables in English continuous speech as word onsets, as English native listeners do. In Experiment 1, word-spotting was slower when the word-initial syllable was strong, indicating that Korean listeners do not perceive strong syllables as word onsets. Experiment 2 was conducted in order to avoid any possibilities that the results of Experiment 1 may be due to the strong-initial targets themselves used in Experiment 1 being slower to recognize than the weak-initial targets. We employed the gating paradigm in Experiment 2, and measured the Isolation Point (IP, the point at which participants correctly identify a word without subsequently changing their minds) and the Recognition Point (RP, the point at which participants correctly identify the target with 85% or greater confidence) for the targets excised from the non-words in the two conditions of Experiment 1. Both the mean IPs and the mean RPs were significantly earlier for the strong-initial targets, which means that the results of Experiment 1 reflect the difficulty of segmentation when the initial syllable of words was strong. These results are consistent with Kim & Nam (2011), indicating that strong syllables are not perceived as word onsets for Korean listeners and interfere with lexical segmentation in English running speech.
https://doi.org/10.13064/KSSS.2013.5.2.043 인용 PDF

Development of a Lipsync Algorithm Based on Audio-visual Corpus (시청각 코퍼스 기반의 립싱크 알고리듬 개발)

김진영;하영민;이화숙
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.63-69
- /
- 2001
A corpus-based lip sync algorithm for synthesizing natural face animation is proposed in this paper. To get the lip parameters, some marks were attached some marks to the speaker's face, and the marks' positions were extracted with some Image processing methods. Also, the spoken utterances were labeled with HTK and prosodic information (duration, pitch and intensity) were analyzed. An audio-visual corpus was constructed by combining the speech and image information. The basic unit used in our approach is syllable unit. Based on this Audio-visual corpus, lip information represented by mark's positions was synthesized. That is. the best syllable units are selected from the audio-visual corpus and each visual information of selected syllable units are concatenated. There are two processes to obtain the best units. One is to select the N-best candidates for each syllable. The other is to select the best smooth unit sequences, which is done by Viterbi decoding algorithm. For these process, the two distance proposed between syllable units. They are a phonetic environment distance measure and a prosody distance measure. Computer simulation results showed that our proposed algorithm had good performances. Especially, it was shown that pitch and intensity information is also important as like duration information in lip sync.
PDF

Search Result 184, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)