• Title/Summary/Keyword: Speech Area

Search Result 250, Processing Time 0.028 seconds

Normalization in Collection Procedures of Emotional Speech by Scriptual Context (대본 내용에 의한 정서음성 수집과정의 정규화에 대하여)

  • Jo Cheol-Woo
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.123-125
    • /
    • 2006
  • One of the biggest problems unsolved in emotional speech acquisition is how to make or find a situation which is close to natual or desired state from humans. We proposed a method to collect emotional speech data by scriptual context. Several contexts from the scripts of drama were chosen by the experts in the area. Context were divided into 6 classes according to the contents. Two actors, one male and one female, read the text after recognizing the emotional situations in the script.

  • PDF

A Study of Speech Control Tags Based on Semantic Information of a Text (텍스트의 의미 정보에 기반을 둔 음성컨트롤 태그에 관한 연구)

  • Chang, Moon-Soo;Chung, Kyeong-Chae;Kang, Sun-Mee
    • Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.187-200
    • /
    • 2006
  • The speech synthesis technology is widely used and its application area is also being broadened to an automatic response service, a learning system for handicapped person, etc. However, the sound quality of the speech synthesizer has not yet reached to the satisfactory level of users. To make a synthesized speech, the existing synthesizer generates rhythms only by the interval information such as space and comma or by several punctuation marks such as a question mark and an exclamation mark so that it is not easy to generate natural rhythms of people even though it is based on mass speech database. To make up for the problem, there is a way to select rhythms after processing language from a higher level information. This paper proposes a method for generating tags for controling rhythms by analyzing the meaning of sentence with speech situation information. We use the Systemic Functional Grammar (SFG) [4] which analyzes the meaning of sentence with speech situation information considering the sentence prior to the given one, the situation of a conversation, the relationship among people in the conversation, etc. In this study, we generate Semantic Speech Control Tag (SSCT) by the result of SFG's meaning analysis and the voice wave analysis.

  • PDF

Review And Challenges In Speech Recognition (ICCAS 2005)

  • Ahmed, M.Masroor;Ahmed, Abdul Manan Bin
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.1705-1709
    • /
    • 2005
  • This paper covers review and challenges in the area of speech recognition by taking into account different classes of recognition mode. The recognition mode can be either speaker independent or speaker dependant. Size of the vocabulary and the input mode are two crucial factors for a speech recognizer. The input mode refers to continuous or isolated speech recognition system and the vocabulary size can be small less than hundred words or large less than few thousands words. This varies according to system design and objectives.[2]. The organization of the paper is: first it covers various fundamental methods of speech recognition, then it takes into account various deficiencies in the existing systems and finally it discloses the various probable application areas.

  • PDF

Performance Comparison between the PMC and VTS Method for the Isolated Speech Recognition in Car Noise Environments (자동차 잡음환경 고립단어 음성인식에서의 VTS와 PMC의 성능비교)

  • Chung, Yong-Joo;Lee, Seung-Wook
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.251-261
    • /
    • 2003
  • There has been many research efforts to overcome the problems of speech recognition in noisy conditions. Among the noise-robust speech recognition methods, model-based adaptation approaches have been shown quite effective. Particularly, the PMC (parallel model combination) method is very popular and has been shown to give considerably improved recognition results compared with the conventional methods. In this paper, we experimented with the VTS (vector Taylor series) algorithm which is also based on the model parameter transformation but has not attracted much interests of the researchers in this area. To verify the effectiveness of it, we employed the algorithm in the continuous density HMM (Hidden Markov Model). We compared the performance of the VTS algorithm with the PMC method and could see that the it gave better results than the PMC method.

  • PDF

A Shift in the Point of Articulation for Korean Consonants with an Increase of their Tensity (한국어 자음의 긴장도 증가에 따른 조음점 이동에 관한 연구)

  • Baik, Woon-Il
    • Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.115-120
    • /
    • 2001
  • This study aims to show why and how the shift in the point of articulation for Korean consonants occurs with an increase of their tensity. From an EPG study on the point of articulation for Korean alveolar stops, fricatives and velar stops, it was concluded that the shift in the point of articulation occurs because the increase in the tensity of Korean consonants requires more contact area with the roof of mouth by the tongue, and that the direction of the shift depends on the surface shape of tongue for the plain consonant.

  • PDF

Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features (음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류)

  • Yeo, Eun Jung;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.57-66
    • /
    • 2021
  • This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.

A study of the understanding about speech therapy and the satisfaction about counseling for mothers who have children with disability (언어치료에 대한 장애아동 어머니의 이해도와 상담 만족도)

  • Park, Jin-Won
    • Journal of Korean Clinical Health Science
    • /
    • v.9 no.1
    • /
    • pp.1469-1477
    • /
    • 2021
  • Purpose: The purpose of this study is to investigate the understanding about speech therapy and the satisfaction of counseling about speech therapy according to the characteristics of mothers who have children with disabilities, and to devise the clinical instruction methods to provide the effective speech therapy by identifying the correlation between the two variables. Methods: This study conducted a survey for 78 mothers of children with disabilities who use speech therapy labs in university. 17 questions were composed to investigate the understanding degree about speech therapy and 24 questions were composed to investigate the satisfaction degree about speech therapy counseling. Results: First, the survey showed that mothers who have the higher education level have the higher understanding degree about language(p<0.01). Second, the survey showed that mothers who have the higher education level have the lower satisfaction degree about counseling process(p<0.5). In the view of job status, mothers who have a job have the higher satisfaction degree about counseling time(p<0.5). Third, the survey showed that in the view of mothers'understanding degree about speech therapy and satisfaction degree about counseling, mothers who have the higher understanding degree about language, speech therapy tools and speech therapy area have the higher satisfaction degree about counseling. Conclusions: This study showed the necessity to understand the subjects'needs exactly and communicate with mothers actively. In addition, the concrete and various methods should be devised in order to increase the understanding degree about speech therapy and increase the satisfaction degree of counseling about the clinical practice environment and language therapy process.

The Influence of Non-Linear Frequency Compression on the Perception of Speech and Music in Patients with High Frequency Hearing Loss

  • Ahn, Jungmin;Choi, Ji Eun;Kang, Ju Yong;Choi, Ik Joon;Lee, Myung-Chul;Lee, Byeong-Cheol;Hong, Sung Hwa;Moon, Il Joon
    • Korean Journal of Audiology
    • /
    • v.25 no.2
    • /
    • pp.80-88
    • /
    • 2021
  • Background and Objectives: Non-linear frequency compression (NLFC) technology compresses and shifts higher frequencies into a lower frequency area that has better residual hearing. Because consonants are uttered in the high-frequency area, NLFC could provide better speech understanding. The aim of this study was to investigate the clinical effectiveness of NLFC technology on the perception of speech and music in patients with high-frequency hearing loss. Subjects and Methods: Twelve participants with high-frequency hearing loss were tested in a counter-balanced order, and had two weeks of daily experience with NLFC set on/off prior to testing. Performance was repeatedly evaluated with consonant tests in quiet and noise environments, speech perception in noise, music perception and acceptableness of sound quality rating tasks. Additionally, two questionnaires (the Abbreviated Profile of Hearing Aid Benefit and the Korean version of the International Outcome Inventory-Hearing Aids) were administered. Results: Consonant and speech perception improved with hearing aids (NLFC on/off conditions), but there was no significant difference between NLFC on and off states. Music perception performances revealed no notable difference among unaided and NLFC on and off states. The benefits and satisfaction ratings between NLFC on and off conditions were also not significantly different, based on questionnaires, however great individual variability preferences were noted. Conclusions: Speech perception as well as music perception both in quiet and noise environments was similar between NLFC on and off states, indicating that real world benefits from NLFC technology may be limited in Korean adult hearing aid users.

The Influence of Non-Linear Frequency Compression on the Perception of Speech and Music in Patients with High Frequency Hearing Loss

  • Ahn, Jungmin;Choi, Ji Eun;Kang, Ju Yong;Choi, Ik Joon;Lee, Myung-Chul;Lee, Byeong-Cheol;Hong, Sung Hwa;Moon, Il Joon
    • Journal of Audiology & Otology
    • /
    • v.25 no.2
    • /
    • pp.80-88
    • /
    • 2021
  • Background and Objectives: Non-linear frequency compression (NLFC) technology compresses and shifts higher frequencies into a lower frequency area that has better residual hearing. Because consonants are uttered in the high-frequency area, NLFC could provide better speech understanding. The aim of this study was to investigate the clinical effectiveness of NLFC technology on the perception of speech and music in patients with high-frequency hearing loss. Subjects and Methods: Twelve participants with high-frequency hearing loss were tested in a counter-balanced order, and had two weeks of daily experience with NLFC set on/off prior to testing. Performance was repeatedly evaluated with consonant tests in quiet and noise environments, speech perception in noise, music perception and acceptableness of sound quality rating tasks. Additionally, two questionnaires (the Abbreviated Profile of Hearing Aid Benefit and the Korean version of the International Outcome Inventory-Hearing Aids) were administered. Results: Consonant and speech perception improved with hearing aids (NLFC on/off conditions), but there was no significant difference between NLFC on and off states. Music perception performances revealed no notable difference among unaided and NLFC on and off states. The benefits and satisfaction ratings between NLFC on and off conditions were also not significantly different, based on questionnaires, however great individual variability preferences were noted. Conclusions: Speech perception as well as music perception both in quiet and noise environments was similar between NLFC on and off states, indicating that real world benefits from NLFC technology may be limited in Korean adult hearing aid users.

Possibililty of the Rough Set Approach in Phonetic Distinctions

  • Lee, Jae-Ik;Kim, Jeong-Kuk;Jo, Heung-Kuk;Suh, Kap-Sun
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1996.10a
    • /
    • pp.66-69
    • /
    • 1996
  • The goal of automatic speech recognition is to study techniques and systems that enable agents such that computers and robots to accept speech input. However, this paper does not provide a concrete technology in speech recognition but propose a possible mathematical tools to be employed in that area. We introduce rough set theory and suggest the possibility of the rough set approach in phonetic distinctions.

  • PDF