• Title/Summary/Keyword: 음성적 거리

Search Result 135, Processing Time 0.029 seconds

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

A New Objective Speech Quality Measure Over Mobile Communication Using Bark Coherence Function (바크 코히어런스 함수를 이용한 이동 전화 음질 평가)

  • 박상옥;류승균;박영철;윤대희
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.4B
    • /
    • pp.437-446
    • /
    • 2001
  • 음질 평가에는 주관적 음질 평가법과 객관적 음질 평가법이 있는데, 주관적 음질 평가법은 사람이 직접 듣고 평가하므로 실제 체감 음질을 나타낸다. 그러나 많은 사람들에 의하여 직접 평가되므로 비용과 시간이 많이 소모되는 단점이 있다. 객관적 음질 평가법은 수학적인 계산에 의하여 원음과 왜곡음의 유사성을 비교하는 것으로 빠르고 비용이 적게 되나 실제 체감 음질과는 거리과 있다. 본 논문에서는 객관적 음질 평가 척도로 BCF(Bark Coherence Function)을 제안한다. BCF는 심리 음향 영역에서 코히어런스 함수를 정의한 것으로 기존의 객관적 음질 평가법에 비하여 주관적 음질과 상관관계가 높고 계산량이 적다. CDMA 이동 전화 시스템의 음성 데이터와 회기분석 결과, BCF가 ITU-T 표준안의 PSQM(Perceptual Speech Quality Measure)와 MNB(Measuring Normalizing Block)에 비하여 높은 상관관계를 갖음을 입증하였다.

  • PDF

An Autoregressive Parameter Estimation from Noisy Speech Using the Adaptive Predictor (적응예측기를 이용하여 잡음섞인 음성신호로부터 autoregressive 계수를 추산하는 방법)

  • Koo, Bon-Eung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.3
    • /
    • pp.90-96
    • /
    • 1995
  • A new method for autoregressive parameter estimation from noisy observation sequence is presented. This method, termed the AP method, is a result of an attempt to make use of the adaptive predictor which is a simple and reliable way of parameter estimation. It is shown theoretically that, for noisy input, the parameter vector computed from the prediction sequence is closer to that of the original sequence than the noisy input sequence is, under the spectral distortion criterion. Simulation results with the Kalman filter as a noise reduction filter and real speech data supported the theory. Roughly speaking, the performance of the parameter set obtained by the AP method is better than noisy one but worse than the EM iteration results. When the simplicity is considered, it could provide a useful alternative to more complicated parameter estimation methods in some applications.

  • PDF

Investigation of the listening environment for lower grade students in elementary school using subjective tests (주관적 평가법을 이용한 초등학교 저학년 교실의 청취환경 조사)

  • Park, Chan-Jae;Haan, Chan-Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.3
    • /
    • pp.201-212
    • /
    • 2021
  • The present study was conducted as a pilot investigation to suggest the standards of acoustic performance for classrooms suitable for incomplete hearing people such as children under 9 years of age. Subjective evaluations such as questionnaire and speech intelligibility test were conducted to 264 students at two elementary schools in Cheong-ju in order to analyze the characteristics of the listening environment in the classrooms of the lower grades in elementary school. The survey was undertaken with a total of 264 students at two elementary schools in Cheong-ju, and investigated their satisfaction with the classroom listening environment. As a result, students responded that the most helpful information type for understanding class content is the voice of teacher. In addition, the volume of the current teacher's voice is normal, and the level of clarity is highly satisfactory. As for the acoustic performance of the classroom, the opinion that the noise was normal and the reverberation was very short was found to be dominant in overall satisfaction with the listening environment. Meanwhile, as a result of speech intelligibility test using the word list selected for the lower grade students of elementary school, it could be inferred that the longitudinal axis distance from the sound source in the case of 8-year-olds is a factor that affects speech recognition.

Clinical Importance of the Resection Margin Distance in Gastric Cancer Patients (위암환자에서 위절제술 시 근위부 절제연거리의 임상적 중요성)

  • Ha, Tae-Kyung;Kwon, Sung-Joon
    • Journal of Gastric Cancer
    • /
    • v.6 no.4
    • /
    • pp.277-283
    • /
    • 2006
  • Purpose: The way in which the resection margin distance for gastric cancer patients who undergo a gastric resection influences the recurrence rate, aspects of recurrence, and the prognosis according to the characteristic of the tumor is not known. We aim to find a standard for tailor-made treatment after selecting patients in this point of view who need a more sufficient resection margin. Materials and Methods: A retrospective study was done on 1,472 patients who underwent a gastrectomy due to gastric cancer at our hospital from 1992 to 2005. The median follow-up period was 37 months. Results: There were no significant differences in the recurrence rate, the aspects of recurrence, and the 5-year survival rate between early gastric cancer (EGC) patients with a resection margin distance of less than 2 cm compared with EGC patients with a resection margin distance of greater than 2 cm. However, significant differences in the survival rate were found in advanced gastric cancer (AGC) patients when the patients were classified into groups with resection margin distances less than or greater than 3 cm (P=0.02). Significant differences were noted especially in cases of diffuse histologic-type tumors located in the lower third of the stomach and in cases with Borrmann type-3 and -4 tumors. Conclusion: The distance between the tumor resection margin and the proximal gastric resection margin has no significant influence on the survival rate in EGC patients if the resection margin is negative. However, to improve a patient's survival rate, it is important to guarantee a resection margin of more than 3 cm in AGC patients, especially when the tumor is a diffuse histologic type located in the lower third of the stomach or a Borrmann type 3 and 4.

  • PDF

Visualization of Korean Speech Based on the Distance of Acoustic Features (음성특징의 거리에 기반한 한국어 발음의 시각화)

  • Pok, Gou-Chol
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.3
    • /
    • pp.197-205
    • /
    • 2020
  • Korean language has the characteristics that the pronunciation of phoneme units such as vowels and consonants are fixed and the pronunciation associated with a notation does not change, so that foreign learners can approach rather easily Korean language. However, when one pronounces words, phrases, or sentences, the pronunciation changes in a manner of a wide variation and complexity at the boundaries of syllables, and the association of notation and pronunciation does not hold any more. Consequently, it is very difficult for foreign learners to study Korean standard pronunciations. Despite these difficulties, it is believed that systematic analysis of pronunciation errors for Korean words is possible according to the advantageous observations that the relationship between Korean notations and pronunciations can be described as a set of firm rules without exceptions unlike other languages including English. In this paper, we propose a visualization framework which shows the differences between standard pronunciations and erratic ones as quantitative measures on the computer screen. Previous researches only show color representation and 3D graphics of speech properties, or an animated view of changing shapes of lips and mouth cavity. Moreover, the features used in the analysis are only point data such as the average of a speech range. In this study, we propose a method which can directly use the time-series data instead of using summary or distorted data. This was realized by using the deep learning-based technique which combines Self-organizing map, variational autoencoder model, and Markov model, and we achieved a superior performance enhancement compared to the method using the point-based data.

A Range Dependent Structural HRTF Model for 3-D Sound Generation in Virtual Environments (가상현실 환경에서의 3차원 사운드 생성을 위한 거리 변화에 따른 구조적 머리전달함수 모델)

  • Lee, Young-Han;Kim, Hong-Kook
    • MALSORI
    • /
    • no.59
    • /
    • pp.89-99
    • /
    • 2006
  • This paper proposes a new structural head-related transfer function(HRTF) model to produce sounds in a virtual environment. The proposed HRTF model generates 3-D sounds by using a head model, a pinna model and the proposed distance model for azimuth, elevation, and distance that are three aspects for 3-D sounds, respectively. In particular, the proposed distance model consists of level normalization block distal region model, and proximal region model. To evaluate the performance of the proposed model, we setup an experimental procedure that each listener identifies a distance of 3-D sound sources that are generated by the proposed method with a predefined distance. It is shown from the tests that the proposed model provides an average distance error of $0.13{\sim}0.31$ meter when the sound source is generated as if it is 0.5 meter $\sim$ 2 meters apart from the listeners. This result is comparable to the average distance error of the human listening for the actual sound source.

  • PDF

Design and Implementation of PDA-Based Busan Culture and Tourism Guide System (PDA 기반의 부산경남 문화 관광 안내 시스템의 설계 및 구현)

  • Cha, Jong-Woo;Kim, Hyun-Soo;Ann, Chul-Jun;Cho, Mi-Gyung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.11b
    • /
    • pp.837-840
    • /
    • 2003
  • PDA와 같은 휴대용 컴퓨터는 장소에 구애받지 않고 어디에서든지 사용할 수 있다는 장점이 있다. 본 논문에서는 이러한 PDA의 장점을 살려 여행자들이 여행을 하는 도중 어디서든지 부산경남 지역의 문화 관광 정보 및 숙박 시설 등에 대한 정보를 안내받을 수 있는 휴대용 문화관광 안내 시스템을 개발하였다. 개발된 시스템은 ADOCE(Microsoft ActiveX Data Objects for Windows CE)를 이용 데이터베이스와의 연동으로 부산 경남의 문화 관광 정보, 테마 언행, 숙박시설, 교통, 먹거리 등에 대한 정보들을 단순한 텍스트 정보만이 아닌 동영상, 음성, 지도 정보 등으로 제공한다. 여행자들에게 현 위치에서 다른 위치로 이동하고자 한 때 필요한 대략적인 지리 정보와 이동 거리 등에 대한 정보를 제공한다. 또한 여행자들이 관광지에 대한 여행기를 펜과 음성으로 기록하여 PC로 전송할 수 있는 기능을 제공한다.

  • PDF

Effects of carbendazim on DNA, gene and chromosome (살균제 carbendazim이 DNA, 유전자 및 염색체에 미치는 영향)

  • Lee, Je-Bong;Sung, Pil-Nam;Jeong, Mi-Hye;Shin, Jin-Sup;Kang, Kyu-Young
    • The Korean Journal of Pesticide Science
    • /
    • v.8 no.4
    • /
    • pp.288-298
    • /
    • 2004
  • Benzimidazole pesticide carbendazim that is effective against a wide range of fungal plant pathogens is a protective, eradicant, and systemic fungicide. For genetic toxicity evaluation of carbendazim on DNA, genes and chromosome, were investigated with chromosome aberration, bacterial reverse mutation, micronucleus test in mouse born marrow and DNA damage assay by single cell microgel electrophoresis. Substitution and frameshift mutation were not induce at variable concentration of carbendazim on Ames test with or without rat liver microsomal activation. For the result of chromosome aberration test, numerical changes of chromosome were detected at the concentrations higher than $4.0{\mu}g/m{\ell}$, but structural aberration was not induced. Positive control, Mitomycin-C and captafol made a structural aberration, but numerical change of chromosome did not appear. In the micronucleus test for mouse born marrow, carbendazim was negative, but was weak positive in DNA damage assay by single cell microgel electrophoresis because of increased DNA moving length of 20% to control.

Formant-broadened CMS Using the Log-spectrum Transformed from the Cepstrum (켑스트럼으로부터 변환된 로그 스펙트럼을 이용한 포먼트 평활화 켑스트럴 평균 차감법)

  • 김유진;정혜경;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.4
    • /
    • pp.361-373
    • /
    • 2002
  • In this paper, we propose a channel normalization method to improve the performance of CMS (cepstral mean subtraction) which is widely adopted to normalize a channel variation for speech and speaker recognition. CMS which estimates the channel effects by averaging long-term cepstrum has a weak point that the estimated channel is biased by the formants of voiced speech which include a useful speech information. The proposed Formant-broadened Cepstral Mean Subtraction (FBCMS) is based on the facts that the formants can be found easily in log spectrum which is transformed from the cepstrum by fourier transform and the formants correspond to the dominant poles of all-pole model which is usually modeled vocal tract. The FBCMS evaluates only poles to be broadened from the log spectrum without polynomial factorization and makes a formant-broadened cepstrum by broadening the bandwidths of formant poles. We can estimate the channel cepstrum effectively by averaging formant-broadened cepstral coefficients. We performed the experiments to compare FBCMS with CMS, PFCMS using 4 simulated telephone channels. In the experiment of channel estimation, we evaluated the distance cepstrum of real channel from the cepstrum of estimated channel and found that we were able to get the mean cepstrum closer to the channel cepstrum due to an softening the bias of mean cepstrum to speech. In the experiment of text-independent speaker identification, we showed the result that the proposed method was superior than the conventional CMS and comparable to the pole-filtered CMS. Consequently, we showed the proposed method was efficiently able to normalize the channel variation based on the conventional CMS.