• Title/Summary/Keyword: 음성적 거리

Search Result 135, Processing Time 0.027 seconds

Voice Personality Transformation Using a Probabilistic Method (확률적 방법을 이용한 음성 개성 변환)

  • Lee Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.150-159
    • /
    • 2005
  • This paper addresses a voice personality transformation algorithm which makes one person's voices sound as if another person's voices. In the proposed method, one person's voices are represented by LPC cepstrum, pitch period and speaking rate, the appropriate transformation rules for each Parameter are constructed. The Gaussian Mixture Model (GMM) is used to model one speaker's LPC cepstrums and conditional probability is used to model the relationship between two speaker's LPC cepstrums. To obtain the parameters representing each probabilistic model. a Maximum Likelihood (ML) estimation method is employed. The transformed LPC cepstrums are obtained by using a Minimum Mean Square Error (MMSE) criterion. Pitch period and speaking rate are used as the parameters for prosody transformation, which is implemented by using the ratio of the average values. The proposed method reveals the superior performance to the previous VQ-based method in subjective measures including average cepstrum distance reduction ratio and likelihood increasing ratio. In subjective test. we obtained almost the same correct identification ratio as the previous method and we also confirmed that high qualify transformed speech is obtained, which is due to the smoothly evolving spectral contours over time.

Wireless Speech Recognition System using Psychoacoustic Model (심리음향 모델을 이용한 무선 음성인식 시스템)

  • Noh, Jin-Soo;Rhee, Kang-Hyeon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.6 s.312
    • /
    • pp.110-116
    • /
    • 2006
  • In this paper, we implement a speech recognition system to support ubiquitous sensor network application services such as switch control, authentication, etc. using wireless audio sensors. The proposed system is consist of the wireless audio sensor, the speech recognition algorithm using psychoacoustic model and LDPC(low density parity check) for correcting errors. The proposed speech recognition system is inserted in a HOST PC to use the sensor energy effectively mil to improve the accuracy of speech recognition, a FEC(Forward Error Correction) system is used. Also, we optimized the simulation coefficient and test environment to effectively remove the wireless channel noises and correcting wireless channel errors. As a result, when the distance between sensor and the source of voice is less then 1.0m FAR and FRR are 0.126% and 7.5% respectively.

A Study on Augmented Driving System (ADS) Technology Development for Useful Driving Information (운전자 정보 극대화를 위한 Augmented Driving System (ADS) 기술에 관한 연구)

  • Yang, Seung-Hun;Kim, Dong-Joong;Kim, Han-Ul;Lee, Su-Min;Hwang, Ji-Hwan;Kim, Byung-Gyu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.836-839
    • /
    • 2012
  • 본 논문에서는 운전자의 안전을 보장하기 위해 영상 처리 기술을 기반으로 도로 정보를 검출해 운전자에게 알려주고, 버튼을 직접 손으로 눌러야 하는 물리적 인터페이스를 대체할 차세대 인터페이스 기술을 제안한다. 제안된 기술은 카메라 한대에서 입력 받은 영상 정보를 제안된 알고리즘을 통해 앞차와의 거리, 차선, 교통 표지판을 검출하고 차량 내부를 주시하는 카메라와 운전자의 음성을 인식할 마이크를 기반으로 음성인식과 동작 인식이 결합된 인터페이스를 제공한다. 본 논문에서 개발된 기술을 통해 설제 테스트를 실시해 본 결과 표지판인식, 차선검출, 앞차와의 거리 검출 등의 인식률이 약 90% 이상이었으며, 이러한 기술적 요소들은 운전자가 인지하지 못하는 상황 등에서도 적절한 정보를 운전자에게 제공해 줌으로써 교통사고 확률을 크게 낮출 수 있을 것으로 기대된다.

Nasal Consonants Recognition Based on the Perceptual Representation (지각적 표현에 기초한 비음 인식에 관한 연구)

  • Kim, Ki-Chul;Cho, Jung-Wan
    • Annual Conference on Human and Language Technology
    • /
    • 1989.10a
    • /
    • pp.120-125
    • /
    • 1989
  • 음성 신호에는 언어정보이외에 여러 요인에 의한 정보가 포함되어 있어서, 문자와 일대일로 대응되는 분절을 정확하게 검출하기가 어렵다. 본 연구에서는 선형 예측계수 (LPC) 스펙트럼의 첨두 부분을 강조한 이진 (binary) 스펙트럼을 제안하고, 이를 바탕으로 음의 안정영역과 천이영역을 통합하여 음향특징을 추출하고자 한다. 각 영역의 특징은 이진 스펙트럼을 누적하여 구하며, 통합적인 특징은 각 영역의 특징을 결합한 관계적 특징으로 나타낸다. 제 2 차 포르만트 주파수의 궤적을 관계적 특징으로 하여, 양순 비음과 치조 비음을 구별한 결과, 모음의 문맥과 화자에 비교적 독립적인 인식결과를 얻을 수 있었다. 또한 이진 스펙트럼이 원래의 스펙트럼에 포함된 정보를 유지하는지 검토하기 위해, 같은 거리척도 (distance measure) 에 의해 인식 실험한 결과 이진 스펙트럼의 성능이 오히려 우수하게 나타났으며, 관계적 이진 스펙트럼의 경우 화자에 따른 변화가 더욱 적었다. 음성에 백색 잡음 (Gaussian white noise)을 더하여 잡음음성 (noisy speech) 을 만든 뒤, 같은 방법으로 실험한 결과도 유사한 인식결과를 얻을 수 있어 제안된 이진 스펙트럼의 유효성을 확인하였다.

  • PDF

Retrieving English Words with a Spoken Work Transliteration (입말 표기를 이용한 영어 단어 검색)

  • Kim Ji-Seoung;Kim Kwang-Hyun;Lee Joon-Ho
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.39 no.3
    • /
    • pp.93-103
    • /
    • 2005
  • Users of searching Internet English dictionary sometimes do not know the correct spelling of the word in mind, but remember only its pronunciation. In order to help these users, we propose a method to retrieve English words effectively with a spoken word transliteration that is a Korean transliteration of English word pronunciation. We develop KONIX codes and transform a spoken word transliteration and English words into them. We then calculate the phonetic similarity between KONIX codes using edit distance and 2-gram methods. Experimental results show that the proposed method is very effective for retrieving English words with a spoken word transliteration.

Environment Adaptation by Discriminative Noise Adaptive Training Methods (잡음적응 변별학습 방식을 이용한 환경적응)

  • Kang, Byung-Ok;Jung, Ho-Young;Lee, Yun-Keun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.397-398
    • /
    • 2007
  • 본 논문에서는 환경변화에 대해 강인하게 동작하는 음성인식 시스템을 위해 잡음적응 훈련과 변별학습 방식을 결합한 형태의 환경적응 방식을 제안한다. 다중환경 훈련과 잡음제거방식을 결합한 형태인 잡음적응 훈련 방식은 음성인식을 위한 MCE (Minimum Classification Error)의 목적과는 거리가 있고, 음성인식 시스템이 사용되는 모든 환경을 반영하는 것은 현실적으로 어렵다는 점에서 한계가 있다. 이에 잡음적응 훈련방식으로 훈련된 기본 음향모델을 목적환경에서 수집한 소량의 데이터를 이용한 변별학습을 통해 환경적응 모델로 변환함으로써 이러한 단점을 보완할 수 있는 잡음 적응 변별학습을 이용한 훈련방식을 제안한다.

  • PDF

Variable Time-Scale Modification of Speech Using Transient Information based on LPC Cepstral Distance (LPC 켑스트럼 거리 기반의 천이구간 정보를 이용한 음성의 가변적인 시간축 변환)

  • Lee, Sung-Joo;Kim, Hee-Dong;Kim, Hyung-Soon
    • Speech Sciences
    • /
    • v.3
    • /
    • pp.167-176
    • /
    • 1998
  • Conventional time-scale modification methods have the problem that as the modification rate gets higher the time-scale modified speech signal becomes less intelligible, because they ignore the effect of articulation rate on speech characteristics. Results of research on speech perception show that the timing information of transient portions of a speech signal plays an important role in discriminating among different speech sounds. Inspired by this fact, we propose a novel scheme for modifying the time-scale of speech. In the proposed scheme, the timing information of the transient portions of speech is preserved, while the steady portions of speech are compressed or expanded somewhat excessively for maintaining overall time-scale change. In order to identify the transient and steady portions of a speech signal, we employ a simple method using LPC cepstral distance between neighboring frames. The result of the subjective preference test indicates that the proposed method produces performance superior to that of the conventional SOLA method, especially for very fast playback case.

  • PDF

Voice Conversion using Generative Adversarial Nets conditioned by Phonetic Posterior Grams (Phonetic Posterior Grams에 의해 조건화된 적대적 생성 신경망을 사용한 음성 변환 시스템)

  • Lim, Jin-su;Kang, Cheon-seong;Kim, Dong-Ha;Kim, Kyung-sup
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.369-372
    • /
    • 2018
  • This paper suggests non-parallel-voice-conversion network conversing voice between unmapped voice pair as source voice and target voice. Conventional voice conversion researches used learning methods that minimize spectrogram's distance error. Not only these researches have some problem that is lost spectrogram resolution by methods averaging pixels. But also have used parallel data that is hard to collect. This research uses PPGs that is input voice's phonetic data and a GAN learning method to generate more clear voices. To evaluate the suggested method, we conduct MOS test with GMM based Model. We found that the performance is improved compared to the conventional methods.

  • PDF

Study on User Experience design in Gesture Interaction as a Product Trigger - Focusing on Product Design - (제품 트리거로서 행동인식의 사용자 경험 디자인 연구 - 제품디자인을 중심으로 -)

  • Min, Sae-yan;Lee, Cathy Yeonchoo
    • Journal of Digital Convergence
    • /
    • v.17 no.5
    • /
    • pp.379-384
    • /
    • 2019
  • The purpose of this study is to investigate the problems of the rapidly increasing voice interface and to find out what results will be obtained when the new gesture interaction is applied to the product, and to suggest the improvement method for a better user experience. Through the literature review, I have conducted a theoretical review on the changes in the product interface used in the product and the difference between them, and then conducted in-depth interviews on the 20-30 users who used voice recognition as a product trigger. As a result, it was concluded that the decline in the reliability of accuracy leads to a decrease in the preference of voice recognition interactions and an needs of appropriate interface for the functional aspect of non-relavancy in physical distance as a product trigger. This study is meaningful in that it has found a problem with the study of the product trigger interface and suggested improvement measures, and hope to be helpful in follow-up study.

A new Implementation of Perceptual LPC Cepstrum and its Application to Speech Recognition (인지 LPC cepstrum의 새로운 구현 및 음성인식에의 적용)

  • Kim, Jin-Young;Choi, Seong-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.5
    • /
    • pp.61-64
    • /
    • 1996
  • To improve the performance of a recognition system, namely the recognition rate, we propose a hew implementation of perceptual distance using LPC cepstrum(perceptual cepstrum, PLC). The PLC is caculated by convolution of a usual LPC cepstrum and a perceptual lifter(PL). To caculate PL, we define a new weighting function in the linear frequency domain considering the frequency scale(Bark-scale) characteristics. The PL is the inverse Fourier transform of the exponents of the weighting function. We verified our method through the speech recognition experiments. The performance of PLC was compared with that of the rasied sine liftering method.

  • PDF