통합 검색 | Korea Science

연속 음성에서의 신경회로망을 이용한 화자 적응 (Speaker Adaptation Using Neural Network in Continuous Speech Recognition)

김선일
- 한국음향학회지
- /
- 제19권1호
- /
- pp.11-15
- /
- 2000
RM 음성 Corpus를 이용한 화자 적응 연속 음성 인식을 수행하였다. RM Corpus의 훈련용 데이터를 이용해서 기준화자에 대한 HMM 학습을 실시하고 평가용 데이터를 이용하여 화자 적응 인식에 대한 평가를 실시하였다. 화자 적응을 위해서는 훈련용 데이터의 일부가 사용되었다. DTW를 이용하여 인식 대상화자의 데이터를 기준화자의 데이터와 시간적으로 일치시키고 오차 역전파 신경회로망을 사용하여 인식 대상화자의 스펙트럼이 기준화자의 스펙트럼 특성을 지니도록 변환시켰다. 최적의 화자 적응이 이루어지도록 하기 위해 신경회로망의 여러 요소들을 변화시키면서 실험을 실시하고 그 결과를 제시하였다. 학습을 거쳐 적절한 가중치를 지닌 신경회로망을 이용하여 기준화자에 적응시킨 결과 단어 인식율이 최대 2.1배, 단어 정인식율이 최대 4.7배 증가하였다.
PDF

KL 변환을 이용한 multilayer perceptron에 의한 한국어 연속 숫자음 인식 (Korean continuous digit speech recognition by multilayer perceptron using KL transformation)

박정선;권장우;권정상;이응혁;홍승홍
- 전자공학회논문지B
- /
- 제33B권8호
- /
- pp.105-113
- /
- 1996
In this paper, a new korean digita speech recognition technique was proposed using muktolayer perceptron (MLP). In spite of its weakness in dynamic signal recognition, MLP was adapted for this model, cecause korean syllable could give static features. It is so simle in its structure and fast in its computing that MLP was used to the suggested system. MLP's input vectors was transformed using karhunen-loeve transformation (KLT), which compress signal successfully without losin gits separateness, but its physical properties is changed. Because the suggested technique could extract static features while it is not affected from the changes of syllable lengths, it is effectively useful for korean numeric recognition system. Without decreasing classification rates, we can save the time and memory size for computation using KLT. The proposed feature extraction technique extracts same size of features form the tow same parts, front and end of a syllable. This technique makes frames, where features are extracted, using unique size of windows. It could be applied for continuous speech recognition that was not easy for the normal neural network recognition system.
PDF

Dyadic Wavelet Transform 방식의 Pitch 주기결정 (A Stable Pitch ]Determination via Dyadic Wavelet Transform (DyWT))

김남훈;윤기범;고한석
- 한국음향학회:학술대회논문집
- /
- 한국음향학회 2000년도 학술발표대회 논문집 제19권 2호
- /
- pp.197-200
- /
- 2000
This paper presents a time-based Pitch Determination Algorithm (PDA) for reliable estimation of pitch Period (PP) in speech signal. In proposed method, we use the Dyadic Wavelet Transform (DyWT), which detects the presence of Glottal Closure Instants (GCI) and uses the information to determine the pitch period. And, the proposed method also uses the periodicity property of DyWT to detect unsteady GCI. To evaluate the performance of the proposed methods, that of other PDAs based on DyWT are compared with what this paper proposed. The effectiveness of the proposed method is tested with real speech signals containing a transition between voiced and the unvoiced interval where the energy of voiced signal is unsteady. The result shows that the proposed method provides a good performance in estimating the both the unsteady GCI positions as well as the steady parts.
PDF

음성신호기반의 감정분석을 위한 특징벡터 선택 (Discriminative Feature Vector Selection for Emotion Classification Based on Speech)

최하나;변성우;이석필
- 전기학회논문지
- /
- 제64권9호
- /
- pp.1363-1368
- /
- 2015
Recently, computer form were smaller than before because of computing technique's development and many wearable device are formed. So, computer's cognition of human emotion has importantly considered, thus researches on analyzing the state of emotion are increasing. Human voice includes many information of human emotion. This paper proposes a discriminative feature vector selection for emotion classification based on speech. For this, we extract some feature vectors like Pitch, MFCC, LPC, LPCC from voice signals are divided into four emotion parts on happy, normal, sad, angry and compare a separability of the extracted feature vectors using Bhattacharyya distance. So more effective feature vectors are recommended for emotion classification.
https://doi.org/10.5370/KIEE.2015.64.9.1363 인용 PDF KSCI KPUBS HTML

영어 억양음운론에 의한 영어 억양 의미 분석 (The Analysis of Intonational Meaning Based on the English Intonational Phonology)

김기호
- 음성과학
- /
- 제7권3호
- /
- pp.109-125
- /
- 2000
The purpose of this paper is to analyse the intonational meaning of various sentences based on the English Intonational Phonology, and to show the superiority of Intonational Phonology in explaining the intonational meanings in comparison with the other existing intonational theories. The American structuralists and British schools which attempt to describe the intonation in terms of 'levels' and 'configurations' respectively, analyze intonational meaning from a holistic perspective in which an utterance cannot be divided into smaller parts. On the other hand, Intonational Phonology considers English intonation as composed of a series of High and Low tones, and as a result, intonational meaning is interpreted compositionally as sets of H and L. In this paper, the phonological relations between intonation and its meaning from the compositions of pitch accents, phrase accents, and boundary tones which consist of an intonational tune are discussed.
PDF

CSL를 이용한 한국인의 프랑스어 운율학습 방안 (A Learning Method of French Prosodic Rhythm for Korean Speakers using CSL)

이은영;이문규;이정현
- 음성과학
- /
- 제6권
- /
- pp.83-101
- /
- 1999
The aim of this study is to provide a learning method of prosodic rhythm for Taegu North Kyungsang Korean speakers to learn French rhythm more effectively. The rhythmic properties of spoken French and Taegu North Kyungsang Korean dialect are different from each other. Therefore, we try to provide a basic rhythmic model of the two languages by dividing into three parts: syllable, rhythmic unit and accent, and intonation. To do so, we recorded French of Taegu Kyungsang Korean speakers, and then analysed and compared the rhythmic properties of Korean and French by spectrograph. We tried to find rhythmic mistakes in their French pronunciation, and then established a learning model to modify them. After training with the CSL Macro learning model, we observed the output result. However, although learners understand the method we have proposed, an effective method which is possible by repeating practice must be arranged to be actually used in direct verbal communications in a well-developed learning programme. Hence, this study may play an important role at the level of preparation in the setting of an effective rhythmic learning programme.
PDF

한국어 음소분포에 대한 계량언어학적 연구 - "소"와 "고도를 기다리며"를 중심으로 - (A Quantitative Study for the Distribution of Korean Phonemes in the two parts: The Ox and Waiting for Godot)

배희숙;구동욱;윤영선;오영환
- 음성과학
- /
- 제7권4호
- /
- pp.27-40
- /
- 2000
The goal of quantitative linguistics is to show the quantitative behavior of linguistic units. There are several studies which examine the frequency of Korean phonemes, which are important in comprehending the internal function of the linguistic units. However, the frequency information, from the pure phonological level without any consideration of rhythmic group, cannot adequately represent linguistic phenomena. Therefore, to provide the effective information, the phonological transcription must be carried out on the level of rhythmic group. In this paper, we made the transcription to analyze Korean phonology. We were not satisfied with merely investigating the frequencies of the phonemes, but also examined whether the distribution of Korean phonemes show the binomial distribution within linguistic constraints.
PDF

묵음 구간의 평균 켑스트럼 차감법을 이용한 채널 보상 기법 (Channel Compensation technique using silence cepstral mean subtraction)

우승옥;윤영선
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2005년도 춘계 학술대회 발표논문집
- /
- pp.49-52
- /
- 2005
Cepstral Mean Subtraction (CMS) makes effectively compensation for a channel distortion, but there are some shortcomings such as distortions of feature parameters, waiting for the whole speech sentence. By assuming that the silence parts have the channel characteristics, we consider the channel normalization using subtraction of cepstral means which are only obtained in the silence areas. If the considered techniques are successfully used for the channel compensation, the proposed method can be used for real time processing environments or time important areas. In the experiment result, however, the performance of our method is not good as CMS technique. From the analysis of the results, we found potentiality of the proposed method and will try to find the technique reducing the gap between CMS and ours method.
PDF

도착시간지연 특성행렬을 이용한 휴머노이드 로봇의 공간 화자 위치측정 (Spatial Speaker Localization for a Humanoid Robot Using TDOA-based Feature Matrix)

김진성;김의현;김도익;유범재
- 로봇학회논문지
- /
- 제3권3호
- /
- pp.237-244
- /
- 2008
Nowadays, research on human-robot interaction has been getting increasing attention. In the research field of human-robot interaction, speech signal processing in particular is the source of much interest. In this paper, we report a speaker localization system with six microphones for a humanoid robot called MAHRU from KIST and propose a time delay of arrival (TDOA)-based feature matrix with its algorithm based on the minimum sum of absolute errors (MSAE) for sound source localization. The TDOA-based feature matrix is defined as a simple database matrix calculated from pairs of microphones installed on a humanoid robot. The proposed method, using the TDOA-based feature matrix and its algorithm based on MSAE, effortlessly localizes a sound source without any requirement for calculating approximate nonlinear equations. To verify the solid performance of our speaker localization system for a humanoid robot, we present various experimental results for the speech sources at all directions within 5 m distance and the height divided into three parts.
PDF

Japanese Vowel Sound Classification Using Fuzzy Inference System

Phitakwinai, Suwannee;Sawada, Hideyuki;Auephanwiriyakul, Sansanee;Theera-Umpon, Nipon
- 한국융합학회논문지
- /
- 제5권1호
- /
- pp.35-41
- /
- 2014
An automatic speech recognition system is one of the popular research problems. There are many research groups working in this field for different language including Japanese. Japanese vowel recognition is one of important parts in the Japanese speech recognition system. The vowel classification system with the Mamdani fuzzy inference system was developed in this research. We tested our system on the blind test data set collected from one male native Japanese speaker and four male non-native Japanese speakers. All subjects in the blind test data set were not the same subjects in the training data set. We found out that the classification rate from the training data set is 95.0 %. In the speaker-independent experiments, the classification rate from the native speaker is around 70.0 %, whereas that from the non-native speakers is around 80.5 %.
https://doi.org/10.15207/JKCS.2014.5.1.035 인용 PDF KSCI

검색결과 136건 처리시간 0.023초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)