• 제목/요약/키워드: voice color

검색결과 60건 처리시간 0.024초

음성합성시스템을 위한 음색제어규칙 연구 (A Study on Voice Color Control Rules for Speech Synthesis System)

  • 김진영;엄기완
    • 음성과학
    • /
    • 제2권
    • /
    • pp.25-44
    • /
    • 1997
  • When listening the various speech synthesis systems developed and being used in our country, we find that though the quality of these systems has improved, they lack naturalness. Moreover, since the voice color of these systems are limited to only one recorded speech DB, it is necessary to record another speech DB to create different voice colors. 'Voice Color' is an abstract concept that characterizes voice personality. So speech synthesis systems need a voice color control function to create various voices. The aim of this study is to examine several factors of voice color control rules for the text-to-speech system which makes natural and various voice types for the sounding of synthetic speech. In order to find such rules from natural speech, glottal source parameters and frequency characteristics of the vocal tract for several voice colors have been studied. In this paper voice colors were catalogued as: deep, sonorous, thick, soft, harsh, high tone, shrill, and weak. For the voice source model, the LF-model was used and for the frequency characteristics of vocal tract, the formant frequencies, bandwidths, and amplitudes were used. These acoustic parameters were tested through multiple regression analysis to achieve the general relation between these parameters and voice colors.

  • PDF

HMM 기반 감정 음성 합성기 개발을 위한 감정 음성 데이터의 음색 유사도 분석 (Analysis of Voice Color Similarity for the development of HMM Based Emotional Text to Speech Synthesis)

  • 민소연;나덕수
    • 한국산학기술학회논문지
    • /
    • 제15권9호
    • /
    • pp.5763-5768
    • /
    • 2014
  • 하나의 합성기에서 감정이 표현되지 않는 기본 음성과 여러 감정 음성을 함께 합성하는 경우 음색을 유지하는 것이 중요해 진다. 감정이 과도하게 표현된 녹음 음성을 사용하여 합성기를 구현하는 경우 음색이 유지되지 못해 각 합성음이 서로 다른 화자의 음성처럼 들릴 수 있다. 본 논문에서는 감정 레벨을 조절하는 HMM 기반 음성 합성기를 구현하기 위해 구축한 음성데이터의 음색 변화를 분석하였다. 음성 합성기를 구현하기 위해서는 음성을 녹음하여 데이터베이스를 구축하게 되는데, 감정 음성 합성기를 구현하기 위해서는 특히 녹음 과정이 매우 중요하다. 감정을 정의하고 레벨을 유지하는 것은 매우 어렵기 때문에 모니터링이 잘 이루어져야 한다. 음성 데이터베이스는 일반 음성과 기쁨(Happiness), 슬픔(Sadness), 화남(Anger)의 감정 음성으로 구성하였고, 각 감정은 High/Low의 2가지 레벨로 구별하여 녹음하였다. 기본음성과 감정 음성의 음색 유사도 측정을 위해 대표 모음들의 각각의 스펙트럼을 누적하여 평균 스펙트럼을 구하고, 평균 스펙트럼에서 F1(제 1포만트)을 측정하였다. 감정 음성과 일반 음성의 음색 유사도는 Low-level의 감정 데이터가 High-level의 데이터 보다 우수하였고, 제안한 방법이 이러한 감정 음성의 음색 변화를 모니터링 할 수 있는 방법이 될 수 있음을 확인할 수 있었다.

포먼트 이동과 스펙트럼 기울기의 변환을 이용한 음색 변환 (Voice Color Conversion Based on the Formants and Spectrum Tilt Modification)

  • 손성용;한민수
    • 대한음성학회지:말소리
    • /
    • 제45호
    • /
    • pp.63-77
    • /
    • 2003
  • The purpose of voice color conversion is to change the speaker identity perceived from the speech signal. In this paper, we propose a new voice color conversion algorithm through the formant shifting and the spectrum-tilt modification in the frequency domain. The basic idea of this technique is to convert the positions of source formants into those of target speaker's formants through interpolation and decimation and to modify the spectrum-tilt by utilizing the information of both speakers' spectrum envelops. The LPC spectrum is adopted to evaluate the position of formant and the information of spectrum-tilt. Our algorithm enables us to convert the speaker identity rather successfully while maintaining good speech quality, since it modifies speech waveforms directly in the frequency domain.

  • PDF

음성 분석을 통한 최근 보이스피싱의 음성 특징 규명 (Identification of Voice Features for Recently Voice Fishing by Voice Analysis)

  • 이범주;조동욱;정연만
    • 한국통신학회논문지
    • /
    • 제41권10호
    • /
    • pp.1276-1283
    • /
    • 2016
  • 국가적 그리고 사회적 피해 방지 노력에도 불구하고 보이스피싱으로 인한 재산 피해의 규모가 좀처럼 줄지 않고 있는 실정이다. 이는 최근의 보이스피싱이 세련된 말투와 전문용어를 사용함으로써 범죄자를 인지하기 어렵게 만들기 때문이다. 더 나아가 개인정보 취득과 실제 관광서등에서 근무하는 종사자들이 실제 현장에서 사용하는 전문용어를 구사하여 젊은 층을 집중적으로 속이고 있는 실정이다. 이는 결과적으로 노년층보다 판단력이 있는 20~30대의 젊은 층의 피해가 급증하고 있는 상황과 직결된다. 이를 위해 본 논문에서는 실제 보이스피싱에 사용된 보이스피싱의 음원과 보이스 피싱 범죄자와 동년배의 일반 젊은 층을 대상으로 같은 문장을 읽고 이에 대한 음성을 비교. 분석을 행하였다. 실험은 2011년 이후 최근까지 보이스 피싱을 행한 범죄자 목소리의 음성적 특징이 무엇인가를 음성의 음높이와 그 대역폭, 음성에 실리는 에너지 및 발화속도, 음색 등을 기반으로 수행하였다. 실험 결과 음성에 실리는 에너지와 발화속도에 있어 의미 있는 차이가 있음을 규명해 낼 수 있었다.

전통적 벨칸토 발성훈련법에 기초한 음성전문직업인 발성훈련의 표준화 (Standardization Voice Training Method for Professional Voice User Based on Traditional )

  • 김철준
    • 대한후두음성언어의학회지
    • /
    • 제28권1호
    • /
    • pp.17-19
    • /
    • 2017
  • Opera singers train their vocal organ to have a good timbre of voice. They train and train again to have a strong resonance, large range of voice, homogenous color of voice, a voice goes far and to avoid vocal disorder, etc. This article is analyzing from scientific and medical perspective. It could approach the secret of the great art of 400 years history - . Furthermore standardizing voice training method based on will facilitate to train, therapy and care the voice professional user and voice disorders.

  • PDF

임신돈의 분만 감시 및 예측 시스템 개발 (Development of a Monitoring and Forecasting System for the Delivery of Pregnant Sow)

  • 임영일
    • 한국축산시설환경학회지
    • /
    • 제6권1호
    • /
    • pp.15-22
    • /
    • 2000
  • A monitoring and the forecasting system for the swine delivery was developed using CCD camera multi-function board microphone and data-recorder equipped on a personal computer. For the swine delivery monitoring and forecasting factors four factors were selected such as genitalia swine body shape breast color and sound. Image of physical variation of body shape, shape and color of genitalia area and color of breast of pregnant sow were grabbed using the CCD color camera and multi-function board and variation of voice of pregnant sow was acquired using microphone and data recorder. Acquired information of image and voice were analyzed using a custom developed algorithm and program. The result of the forecasting efficiency of swine delivery was 89%, 71% and 100% using the variation of genitalia are the body shape and the voice of pregnant sow. respectively. The efficiency of image processing was 100% for the delivery detection when the piglet was delivered half of its body from genitalia of pregnant sow, The monitoring and forecasting system informed the estimated time of the delivery of swine to a farm manager immediately if an estimated and established time set by the farm manager was the same and/or the estimated time ws earlier than the established time and the system detected the delivery.

  • PDF

음원 모델에 기초한 합성음의 피치 조절 (Pitch Modification based on a Voice Source Model)

  • 최용진;여수진;김진영;성굉모
    • 음성과학
    • /
    • 제3권
    • /
    • pp.132-147
    • /
    • 1998
  • Previously developed methods for pitch modification have not been based on the voice source model. Therefore, the synthesized speech often sounds unnatural although it may be highly intelligible. The purpose of this paper is to analyze the alteration of a voice source signal with pitch period and to establish the pitch-modification rule based on the result of this analysis. We examine the alteration of the interval of closing phase, closed phase and open phase using the excitation waveform as the pitch increases. In comparison to the previous methods which performed directly on the speech signal, the pitch modification method based on a voice source model shows high intelligibility and naturalness. This study might benefit the application to the speaker identification and the voice color conversion. Therefore the proposed method will provide high quality synthetic speech.

  • PDF

Traffic Signal Recognition System Based on Color and Time for Visually Impaired

  • P. Kamakshi
    • International Journal of Computer Science & Network Security
    • /
    • 제23권4호
    • /
    • pp.48-54
    • /
    • 2023
  • Nowadays, a blind man finds it very difficult to cross the roads. They should be very vigilant with every step they take. To resolve this problem, Convolutional Neural Networks(CNN) is a best method to analyse the data and automate the model without intervention of human being. In this work, a traffic signal recognition system is designed using CNN for the visually impaired. To provide a safe walking environment, a voice message is given according to light state and timer state at that instance. The developed model consists of two phases, in the first phase the CNN model is trained to classify different images captured from traffic signals. Common Objects in Context (COCO) labelled dataset is used, which includes images of different classes like traffic lights, bicycles, cars etc. The traffic light object will be detected using this labelled dataset with help of object detection model. The CNN model detects the color of the traffic light and timer displayed on the traffic image. In the second phase, from the detected color of the light and timer value a text message is generated and sent to the text-to-speech conversion model to make voice guidance for the blind person. The developed traffic light recognition model recognizes traffic light color and countdown timer displayed on the signal for safe signal crossing. The countdown timer displayed on the signal was not considered in existing models which is very useful. The proposed model has given accurate results in different scenarios when compared to other models.

남녀 음성 변환 기술연구 (A Study On Male-To-Female Voice Conversion)

  • 최정규;김재민;한민수
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2000년도 하계학술발표대회 논문집 제19권 1호
    • /
    • pp.115-118
    • /
    • 2000
  • Voice conversion technology is essential for TTS systems because the construction of speech database takes much effort. In this paper. male-to-female voice conversion technology in Korean LPC TTS system has been studied. In general. the parameters for voice color conversion are categorized into acoustic and prosodic parameters. This paper adopts LSF(Line Spectral Frequency) for acoustic parameter, pitch period and duration for prosodic parameters. In this paper. Pitch period is shortened by the half, duration is shortened by $25\%, and LSFs are shifted linearly for the voice conversion. And the synthesized speech is post-filtered by a bandpass filter. The proposed algorithm is simpler than other algorithms. for example, VQ and Neural Net based methods. And we don't even need to estimate formant information. The MOS(Mean Opinion Socre) test for naturalness shows 2.25 and for female closeness, 3.2. In conclusion, by using the proposed algorithm. male-to-female voice conversion system can be simply implemented with relatively successful results.

  • PDF

음성검사 중 공기역학적 검사에서 한국인의 정상 평균치 (Mean Value of Aerodynamic Study in Normal Korean)

  • 서장수;송시연;권오철;김준우;이희경;정옥란
    • 대한후두음성언어의학회지
    • /
    • 제8권1호
    • /
    • pp.27-32
    • /
    • 1997
  • Recently, many people suffering from voice color change visit otolaryngologist. There is no specific data which can be evaluated objectively for voice color change in korean. In aerodynamic study, maximum phonation time, mean air flow rate, phonatory flow volume and subglottal pressure were tested by using Aerophone II voice function analyzer in korean. 112 male and 122 female aged from 10 to 69 years were randomly selected. Maximum phonation time was 20.8${\pm}$6.4sec in male and 17.2${\pm}$4.1sec in female. Mean air flow rate was 167.1${\pm}$61.4ml/sec. in male and 129.6${\pm}$49.3ml/sec in female. Phonatory flow volume was 3184.5${\pm}$646.0ml in male and 2122.1${\pm}$670.5ml in female. Subglottal pressure was 4.1${\pm}$1.8 cmH2O in male and 3.5${\pm}$1.4cm $H_2O$ in female. There was no statistically significant difference among age groups in all above results.

  • PDF