Search | Korea Science

Context-adaptive Phoneme Segmentation for a TTS Database (문자-음성 합성기의 데이터 베이스를 위한 문맥 적응 음소 분할)

이기승;김정수
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.2
- /
- pp.135-144
- /
- 2003
A method for the automatic segmentation of speech signals is described. The method is dedicated to the construction of a large database for a Text-To-Speech (TTS) synthesis system. The main issue of the work involves the refinement of an initial estimation of phone boundaries which are provided by an alignment, based on a Hidden Market Model(HMM). Multi-layer perceptron (MLP) was used as a phone boundary detector. To increase the performance of segmentation, a technique which individually trains an MLP according to phonetic transition is proposed. The optimum partitioning of the entire phonetic transition space is constructed from the standpoint of minimizing the overall deviation from hand labelling positions. With single speaker stimuli, the experimental results showed that more than 95% of all phone boundaries have a boundary deviation from the reference position smaller than 20 ms, and the refinement of the boundaries reduces the root mean square error by about 25%.
PDF KSCI

Robust Speech Recognition Using Missing Data Theory (손실 데이터 이론을 이용한 강인한 음성 인식)

김락용;조훈영;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.56-62
- /
- 2001
In this paper, we adopt a missing data theory to speech recognition. It can be used in order to maintain high performance of speech recognizer when the missing data occurs. In general, hidden Markov model (HMM) is used as a stochastic classifier for speech recognition task. Acoustic events are represented by continuous probability density function in continuous density HMM(CDHMM). The missing data theory has an advantage that can be easily applicable to this CDHMM. A marginalization method is used for processing missing data because it has small complexity and is easy to apply to automatic speech recognition (ASR). Also, a spectral subtraction is used for detecting missing data. If the difference between the energy of speech and that of background noise is below given threshold value, we determine that missing has occurred. We propose a new method that examines the reliability of detected missing data using voicing probability. The voicing probability is used to find voiced frames. It is used to process the missing data in voiced region that has more redundant information than consonants. The experimental results showed that our method improves performance than baseline system that uses spectral subtraction method only. In 452 words isolated word recognition experiment, the proposed method using the voicing probability reduced the average word error rate by 12％ in a typical noise situation.
PDF

Study on the Improvement of Speech Recognizer by Using Time Scale Modification (시간축 변환을 이용한 음성 인식기의 성능 향상에 관한 연구)

이기승
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.6
- /
- pp.462-472
- /
- 2004
In this paper a method for compensating for thp performance degradation or automatic speech recognition (ASR) is proposed. which is mainly caused by speaking rate variation. Before the new method is proposed. quantitative analysis of the performance of an HMM-based ASR system according to speaking rate is first performed. From this analysis, significant performance degradation was often observed in the rapidly speaking speech signals. A quantitative measure is then introduced, which is able to represent speaking rate. Time scale modification (TSM) is employed to compensate the speaking rate difference between input speech signals and training speech signals. Finally, a method for compensating the performance degradation caused by speaking rate variation is proposed, in which TSM is selectively employed according to speaking rate. By the results from the ASR experiments devised for the 10-digits mobile phone number, it is confirmed that the error rate was reduced by 15.5% when the proposed method is applied to the high speaking rate speech signals.
PDF KSCI

A New Teat Data Generation for SPRT in Speaker Verification (화자 확인에서 SPRT를 위한 새로운 테스트 데이터 생성)

서창우;이기용
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.1
- /
- pp.42-47
- /
- 2003
This paper proposes the method to generate new test data using the sample shift of the start frame for SPRT(sequential probability ratio test) in speaker verification. The SPRT method is a effective algorithm that can reduce the test computational complexity. However, in making the decision procedure, SPRT can be executed on the assumption that the input samples are usually to be i.i.d. (Independent and Identically Distributed) samples from a probability density function (pdf), also it's not suitable method to apply for the short utterance. The proposed method can achieve SPRT regardless of the utterance length of the test data because it is method to generate the new test data through the sample shift of start frame. Also, the correlation property of data to be considered in the SPRT method can be effectively removed by employing the principal component analysis. Experimental results show that the proposed method increased the computational complexity of data for sample shift a little, but it has a good performance result more than a conventional method above the average 0.7% in EER (equal error rate).
PDF KSCI

A New Mobility Management Scheme Using Pointer Forwarding in Proxy Mobile IPv6 Networks (Proxy Mobile IPv6 네트워크에서 포인터 포워딩을 이용한 이동성 관리기법)

Yi, Myung-Kyu;Kim, Hyung-Heon;Park, Seok-Cheon;Yang, Young-Kyu
- The KIPS Transactions:PartC
- /
- v.17C no.1
- /
- pp.109-118
- /
- 2010
Proxy mobile IPv6 (PMIPv6) protocol is a network-based mobility management protocol to support mobility for IPv6 nodes without host involvement. In PMIPv6, the Mobile Access Gateway (MAG) incurs a high signaling cost to update the location of a mobile node to the remote Local Mobility Anchor (LMA) if it moves frequently. This increases network overhead on the LMA, wastes network resources, and lengthens the delay time. Therefore, we propose a new mobility management scheme for minimizing signaling cost using the pointer forwarding. Our proposal can reduce signaling costs by registration with the neighbor MAG instead of the remote LMA using the pointer forwarding. The cost analysis using imbedded Markov chain presented in this paper shows that our proposal can achieve performance superior that of PMIPv6 scheme.
https://doi.org/10.3745/KIPSTC.2010.17C.1.109 인용 PDF KSCI

Bayesian Clustering of Prostate Cancer Patients by Using a Latent Class Poisson Model (잠재그룹 포아송 모형을 이용한 전립선암 환자의 베이지안 그룹화)

Oh Man-Suk
- The Korean Journal of Applied Statistics
- /
- v.18 no.1
- /
- pp.1-13
- /
- 2005
Latent Class model has been considered recently by many researchers and practitioners as a tool for identifying heterogeneous segments or groups in a population, and grouping objects into the segments. In this paper we consider data on prostate cancer patients from Korean National Cancer Institute and propose a method for grouping prostate cancer patients by using latent class Poisson model. A Bayesian approach equipped with a Markov chain Monte Carlo method is used to overcome the limit of classical likelihood approaches. Advantages of the proposed Bayesian method are easy estimation of parameters with their standard errors, segmentation of objects into groups, and provision of uncertainty measures for the segmentation. In addition, we provide a method to determine an appropriate number of segments for the given data so that the method automatically chooses the number of segments and partitions objects into heterogeneous segments.
https://doi.org/10.5351/KJAS.2005.18.1.001 인용 PDF KSCI

A Study for Complexity Improvement of Automatic Speaker Verification in PDA Environment (PDA 환경에서 자동화자 확인의 계산량 개선을 위한 연구)

Seo, Chang-Woo;Lim, Young-Hwan;Jeon, Sung-Chae;Jang, Nam-Young
- Journal of the Institute of Convergence Signal Processing
- /
- v.10 no.3
- /
- pp.170-175
- /
- 2009
In this paper, we propose real time automatic speaker verification (ASV) system to protect personal information on personal digital assistant (PDA) device. Recently, the capacity of PDA has extended and been popular, especially for mobile environment such as mobile commerce (M-commerce). However, there still exist lots of difficulties for practical application of ASV utility to PDA device because it requires too much computational complexity. To solve this problem, we apply the method to relieve the computational burden by performing the preprocessing such as spectral subtraction and speech detection during the speech utterance. Also by applying the hidden Markov model (HMM) optimal state alignment and the sequential probability ratio test (SPRT), we can get much faster processing results. The whole system implementation is simple and compact enough to fit well with PDA device's limited memory and low CPU speed.
PDF

Analyzing Human's Motion Pattern Using Sensor Fusion in Complex Spatial Environments (복잡행동환경에서의 센서융합기반 행동패턴 분석)

Tark, Han-Ho;Jin, Taeseok
- Journal of the Korean Institute of Intelligent Systems
- /
- v.24 no.6
- /
- pp.597-602
- /
- 2014
We propose hybrid-sensing system for human tracking. This system uses laser scanners and image sensors and is applicable to wide and crowded area such as hallway of university. Concretely, human tracking using laser scanners is at base and image sensors are used for human identification when laser scanners lose persons by occlusion, entering room or going up stairs. We developed the method of human identification for this system. Our method is following: 1. Best-shot images (human images which show human feature clearly) are obtained by the help of human position and direction data obtained by laser scanners. 2. Human identification is conducted by calculating the correlation between the color histograms of best-shot images. It becomes possible to conduct human identification even in crowded scenes by estimating best-shot images. In the experiment in the station, some effectiveness of this method became clear.
https://doi.org/10.5391/JKIIS.2014.24.6.597 인용 PDF KSCI

An Extraction Method of Meaningful Hand Gesture for a Robot Control (로봇 제어를 위한 의미 있는 손동작 추출 방법)

Kim, Aram;Rhee, Sang-Yong
- Journal of the Korean Institute of Intelligent Systems
- /
- v.27 no.2
- /
- pp.126-131
- /
- 2017
In this paper, we propose a method to extract meaningful motion among various kinds of hand gestures on giving commands to robots using hand gestures. On giving a command to the robot, the hand gestures of people can be divided into a preparation one, a main one, and a finishing one. The main motion is a meaningful one for transmitting a command to the robot in this process, and the other operation is a meaningless auxiliary operation to do the main motion. Therefore, it is necessary to extract only the main motion from the continuous hand gestures. In addition, people can move their hands unconsciously. These actions must also be judged by the robot with meaningless ones. In this study, we extract human skeleton data from a depth image obtained by using a Kinect v2 sensor and extract location data of hands data from them. By using the Kalman filter, we track the location of the hand and distinguish whether hand motion is meaningful or meaningless to recognize the hand gesture by using the hidden markov model.
https://doi.org/10.5391/JKIIS.2017.27.2.126 인용 PDF KSCI

Factored MLLR Adaptation for HMM-Based Speech Synthesis in Naval-IT Fusion Technology (인자화된 최대 공산선형회귀 적응기법을 적용한 해양IT융합기술을 위한 HMM기반 음성합성 시스템)

Sung, June Sig;Hong, Doo Hwa;Jeong, Min A;Lee, Yeonwoo;Lee, Seong Ro;Kim, Nam Soo
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.38C no.2
- /
- pp.213-218
- /
- 2013
One of the most popular approaches to parameter adaptation in hidden Markov model (HMM) based systems is the maximum likelihood linear regression (MLLR) technique. In our previous study, we proposed factored MLLR (FMLLR) where each MLLR parameter is defined as a function of a control vector. We presented a method to train the FMLLR parameters based on a general framework of the expectation-maximization (EM) algorithm. Using the proposed algorithm, supplementary information which cannot be included in the models is effectively reflected in the adaptation process. In this paper, we apply the FMLLR algorithm to a pitch sequence as well as spectrum parameters. In a series of experiments on artificial generation of expressive speech, we evaluate the performance of the FMLLR technique and also compare with other approaches to parameter adaptation in HMM-based speech synthesis.
https://doi.org/10.7840/kics.2013.38C.2.213 인용 PDF KSCI

Search Result 181, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)