• Title/Summary/Keyword: speech effort

Search Result 65, Processing Time 0.022 seconds

Packet Loss Concealment Algorithm Based on Speech Characteristics (음성신호의 특성을 고려한 패킷 손실 은닉 알고리즘)

  • Yoon Sung-Wan;Kang Hong-Goo;Youn Dae-Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.7C
    • /
    • pp.691-699
    • /
    • 2006
  • Despite of the in-depth effort to cantrol the variability in IP networks, quality of service (QoS) is still not guaranteed in the IP networks. Thus, it is necessary to deal with the audible artifacts caused by packet lasses. To overcame the packet loss problem, most speech coding standard have their own embedded packet loss concealment (PLC) algorithms which adapt extrapolation methods utilizing the dependency on adjacent frames. Since many low bit rate CELP coders use predictive schemes for increasing coding efficiency, however, error propagation occurs even if single packet is lost. In this paper, we propose an efficient PLC algorithm with consideration about the speech characteristics of lost frames. To design an efficient PLC algorithm, we perform several experiments on investigating the error propagation effect of lost frames of a predictive coder. And then, we summarize the impact of packet loss to the speech characteristics and analyze the importance of the encoded parameters depending on each speech classes. From the result of the experiments, we propose a new PLC algorithm that mainly focuses on reducing the error propagation time. Experimental results show that the performance is much higher than conventional extrapolation methods over various frame erasure rate (FER) conditions. Especially the difference is remarkable in high FER condition.

A Study On Male-To-Female Voice Conversion (남녀 음성 변환 기술연구)

  • Choi Jung-Kyu;Kim Jae-Min;Han Min-Su
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.115-118
    • /
    • 2000
  • Voice conversion technology is essential for TTS systems because the construction of speech database takes much effort. In this paper. male-to-female voice conversion technology in Korean LPC TTS system has been studied. In general. the parameters for voice color conversion are categorized into acoustic and prosodic parameters. This paper adopts LSF(Line Spectral Frequency) for acoustic parameter, pitch period and duration for prosodic parameters. In this paper. Pitch period is shortened by the half, duration is shortened by $25\%, and LSFs are shifted linearly for the voice conversion. And the synthesized speech is post-filtered by a bandpass filter. The proposed algorithm is simpler than other algorithms. for example, VQ and Neural Net based methods. And we don't even need to estimate formant information. The MOS(Mean Opinion Socre) test for naturalness shows 2.25 and for female closeness, 3.2. In conclusion, by using the proposed algorithm. male-to-female voice conversion system can be simply implemented with relatively successful results.

  • PDF

Phonetic Factors Conditioning the Release of English Sentence-Final Stops (영어 문장 말 폐쇄음의 파열 양상)

  • Kim, Da-Hee
    • MALSORI
    • /
    • no.53
    • /
    • pp.1-16
    • /
    • 2005
  • This experimental study aims to test the hypothesis that the occurrence of English sentence-final stop release is, at least, partly predictable by examining its phonetic context. 10 native(5 male and 5 female) speakers of American English recorded, in a sound-proof booth, sentences excerpted from novels and the natural documents on the World Wide Web. Based on the waveforms and spectrograms of the recorded sentences, judgements of the release of a sentence-final stop were made. If the aperiodic energy of a given final stop lasted more than .015 second, it was considered to be "released." The result reveals that English sentence-final stops tend to be released when they are 1) velar consonants, 2) preceeded by tense vowels, and 3) coda consonants of content words. The phonetic environment in which final stops are often released can be characterized by the articulatory comfortableness and the need for release burst noise, without which the final stops may not be correctly perceived. By examining the release of English final stops, it is concluded that the phonological events, which had been considered to occur rather "randomly," in fact, reflect the universal tendency of human speech: to minimize the speakers' and hearers' effort.

  • PDF

Acoustic Characteristics of Korean Compounds and Phrases (한국어 복합어와 구의 음향 음성학적 특성)

  • Yi, So-Pae
    • Phonetics and Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.49-54
    • /
    • 2012
  • Recent studies on acoustic correlates of stress in English compounds and English phrases have revealed the difference of changes in acoustic manifestation between English compounds and English phrases with different intonation patterns. However, little effort has been made to compare Korean compounds and Korean phrases in different intonational environments. Therefore, this study focuses on the analysis of acoustic characteristics of Korean compounds and Korean phrases produced in different intonational sentence patterns (Subject, Question, Clause-Final, and Statement-Final). Measurements of vowel duration, intensity (dB) and pitch (in semitones) were compared. The results of the experiment in which 30 native speakers of Korean pronounced Korean compounds and Korean phrases (obtained from $8{\times}30$ sentences) in controlled prosodic and intonational environments reveal clear patterns that distinguish Korean compounds from Korean phrases and support the evidence of acoustic salience for phrases. Duration differences turned out to be a significant cue to distinguish Korean compounds and Korean phrases in all but the Clause Final position. According to the size effect, duration ratio is the most reliable cue to distinguish Korean compounds and Korean phrases followed by the pitch differences between the first syllable and the second syllable and the intensity ratio. Implications for Korean and English intonation training were also discussed.

Implementation of interactive Stock Trading System Using VoiceXML

  • Shin Jeong-Hoon;Cho Chang-Su;Hong Kwang-Seok
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.387-390
    • /
    • 2004
  • In this paper, we design and implement practical application service using VoiceXML. And we suggest new solutions of problems can be occurred when implementing a new systems using VoiceXML, based on the fact. Up to now, speech related services were developed using API (Application Program Interface) and programming languages, which methods depend on system architectures. It thus appears that reuse of contents and resource was very difficult. To solve these problems, nowadays, companies develop their applications using VoiceXML. Advantages of using VoiceXML when developing services are as follows. First, we can use web developing technologies and technologies for transmitting web contents. And, we can save labors for low level programming like C language or Assembler language. And we can save labors for managing resources, too. As the result of these advantages, we can reduce developing hours of applications services and we can solve problem of compatibility between systems. But, there's poor grip of actual problems can be occurred when implementing their own services using VoiceXML. To overcome these problems, we implemented interactive stock trading system using VoiceXML and concentrated our effort to find out problems when using VoiceXML. And then, we proposed solutions to these problems and analyzed strong points and weak points of suggested system.

  • PDF

Learning Conversation in Conversational Agent Using Knowledge Acquisition based on Speech-act Templates and Sentence Generation with Genetic Programming (화행별 템플릿 기반의 지식획득 기법과 유전자 프로그래밍을 이용한 문장 생성 기법을 통한 대화형 에이전트의 대화 학습)

  • Lim Sungsoo;Hong Jin-Hyuk;Cho Sung-Bae
    • Korean Journal of Cognitive Science
    • /
    • v.16 no.4
    • /
    • pp.351-368
    • /
    • 2005
  • The manual construction of the knowledge-base takes much time and effort, and it is hard to adjust intelligence systems to dynamic and flexible environment. Thus mental development in those systems has been investigated in recent years. Autonomous mental development is a new paradigm for developing autonomous machines, which are adaptive and flexible to the environment. Learning conversation, a kind of mental development, is an important aspect of conversational agents. In this paper, we propose a learning conversation method for conversational agents which uses several promising techniques; speech-act templates and genetic programming. Knowledge acquisition of conversational agents is implemented by finite state machines and templates, and dynamic sentence generation is implemented by genetic programming Several illustrations and usability tests how the usefulness of the proposed method.

  • PDF

An Experimental Study of Comfortable Pitch and Loudness with Target Matching: Effects on Electroglottographic and Acoustic Measures

  • Choi, Seong Hee
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.139-146
    • /
    • 2012
  • This study was designed to examine comfort levels of pitch and loudness with target matching and their effects on electroglottographic (EGG) and acoustic measures. Twelve speakers, six males and six females, were instructed to produce /a/ sustained vowel for three seconds at a comfortable pitch and loudness level without any instruction and with a target matching procedure of either a certain f0 or SPL separately with visual and auditory feedback. The range of pitch for females and males were presented by progressing up and down randomly at intervals of 5Hz from 150 Hz to 310 Hz (total 33 frequency targets) and from 85 Hz to 190 Hz (total 22 frequency targets), respectively. The loudness levels were 65, 75, 85, 95 dB (total of four intensity targets) for both males and females. Subjective estimations of comfortable levels were obtained using a 10-point equal-appearing interval rating scale following each phonation. The results showed that males and females demonstrated similar trends in loudness levels with greatest comfort at 75 dB, whereas pitch comfort ratings showed a greater variability with females having a wider range with target matching. In the comfort levels of individuals, most male and female speakers rated higher comfort at soft, rather than loud phonations. On the other hand, most male speakers perceived highest comfort levels below the comfort pitch levels they phonated under natural conditions. Higher frequency ranges, however, were perceived to be more comfortable than those of natural condition in most female speakers, although the comfortable pitch levels in spontaneous phonations were within the comfort level ranges determined by targeted phonations. When comparing acoustic (%jitter, %shimmer, SNR) and EGG measures (CQ%) between spontaneous comfortable phonations and targeted phonations produced by the same subject at similar f0 and intensity, no significant differences were observed (p>0.05). Thus, target matching procedures may be considered a compatible and alternative method to reduce the variability of comfortable pitch and loudness levels by eliciting consistent comfortable phonations.

A Study on Cockpit Voice Command System for Fighter Aircraft (전투기용 음성명령 시스템에 대한 연구)

  • Kim, Seongwoo;Seo, Mingi;Oh, Yunghwan;Kim, Bonggyu
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.41 no.12
    • /
    • pp.1011-1017
    • /
    • 2013
  • The human voice is the most natural means of communication. The need for speech recognition technology is increasing gradually to increase the ease of human and machine interface. The function of the avionics equipment is getting various and complicated in consequence of the growth of digital technology development, so that the load of pilots in the fighter aircraft must become increased since they don't concentrate only the attack function, but also operate the complicated avionics equipments. Accordingly, if speech recognition technology is applied to the aircraft cockpit as regards the operating the avionics equipments, pilots can spend their time and effort on the mission of fighter aircraft. In this paper, the cockpit voice command system applicable to the fighter aircraft has been developed and the function and the performance of the system verified.

Implementation of TTS Engine for Natural Voice (자연음 TTS(Text-To-Speech) 엔진 구현)

  • Cho Jung-Ho;Kim Tae-Eun;Lim Jae-Hwan
    • Journal of Digital Contents Society
    • /
    • v.4 no.2
    • /
    • pp.233-242
    • /
    • 2003
  • A TTS(Text-To-Speech) System is a computer-based system that should be able to read any text aloud. To output a natural voice, we need a general knowledge of language, a lot of time, and effort. Furthermore, the sound pattern of english has a variable pattern, which consists of phonemic and morphological analysis. It is very difficult to maintain consistency of pattern. To handle these problems, we present a system based on phonemic analysis for vowel and consonant. By analyzing phonological variations frequently found in spoken english, we have derived about phonemic contexts that would trigger the multilevel application of the corresponding phonological process, which consists of phonemic and allophonic rules. In conclusion, we have a rule data which consists of phoneme, and a engine which economize in system. The proposed system can use not only communication system, but also utilize office automation and so on.

  • PDF

ACOUSTIC CHARACTERISTICS OF KOREAN TRADITIONAL SINGING VOICE: A PRELIMINARY REPORT

  • Moon, Seung-Jae
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.367-371
    • /
    • 1996
  • Most Koreans agree that Korean traditional singing voice has a very peculiar sound comparing to Western singing voice. The goal of this paper is to investigate the acoustic characteristics of Korean traditional singing voice called 'Pansori' Materials are analyzed from 3male professional singers and 4 female professional singers. Their singing was compared with their own conversation and other non-singers' conversation. Long term average spectra indicated that all the singers showed a much less spectral tilt than non-singers. The phenomenon was prevailing for professional singers not only in their singing, but also in their conversation. This suggests that it is not the result of a temporary effort but it may involve a certain permanent change in their physiological configuration. (To assess this hypothesis, voice source should be looked at directly. Therefore, in further research, using Rothenberg mask (Rothenberg, 1973) is strongly recommended.) In addition to LTA, individual vowel formants will be studied later.

  • PDF