• 제목/요약/키워드: Korean speech

검색결과 5,286건 처리시간 0.027초

The Effect of the Speech Enhancement Algorithm for Sensorineural Hearing Impaired Listeners

  • Kim, Dong-Wook;Lee, Young-Woo;Lee, Jong-Shill;Chee, Young-Joon;Lee, Sang-Min;Kim, In-Young;Kim, Sun-I.
    • 대한의용생체공학회:의공학회지
    • /
    • 제28권6호
    • /
    • pp.732-743
    • /
    • 2007
  • Background noise is one of the major complaints of not only hearing impaired persons but also normal listeners. This paper describes the results of two experiments in which speech recognition performance was determined for listeners with normal hearing and sensorineural hearing loss in noise environment. First, we compared speech enhancement algorithms by evaluation speech recognition ability in various speech-to-noise ratios and types of noise. Next, speech enhancement algorithms by reducing background noise were presented and evaluated to improve speech intelligibility for sensorineural hearing impairment listeners. We tested three noise reduction methods using single-microphone, such as spectrum subtraction and companding, Wiener filter method, and maximum likelihood envelop estimation. Their responses in background noise were investigated and compared with those by the speech enhancement algorithm that presented in this paper. The methods improved speech recognition test score for the sensorineural hearing impaired listeners, but not for normal listeners. The results suggest the speech enhancement algorithm with the loudness compression can improve speech intelligibility for listeners with sensorineural hearing loss.

Spike Train Decoding에 기반한 인공와우 어음처리기의 음성시작점 정보 전달특성 평가 (Performance Evaluation of Speech Onset Representation Characteristic of Cochlear Implants Speech Processor using Spike Train Decoding)

  • 김두희;김진호;김경환
    • 대한의용생체공학회:의공학회지
    • /
    • 제28권5호
    • /
    • pp.694-702
    • /
    • 2007
  • The adaptation effect originating from the chemical synapse between auditory nerve and inner hair cell gives advantage in accurate representation of temporal cues of incoming speech such as speech onset. Thus it is expected that the modification of conventional speech processing strategies of cochlear implant(CI) by incorporating the adaptation effect will result in considerable improvement of speech perception performance such as consonant perception score. Our purpose in this paper was to evaluate our new CI speech processing strategy incorporating the adaptation effect by the observation of auditory nerve responses. By classifying the presence or absence of speech from the auditory nerve responses, i. e. spike trains, we could quantitatively compare speech onset detection performances of conventional and improved strategies. We could verify the effectiveness of the adaptation effect in improving the speech onset representation characteristics.

음성 하모닉스 스펙트럼의 피크-피팅을 이용한 피치검출에 관한 연구 (A Study on the Pitch Detection of Speech Harmonics by the Peak-Fitting)

  • 김종국;조왕래;배명진
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.85-95
    • /
    • 2003
  • In speech signal processing, it is very important to detect the pitch exactly in speech recognition, synthesis and analysis. If we exactly pitch detect in speech signal, in the analysis, we can use the pitch to obtain properly the vocal tract parameter. It can be used to easily change or to maintain the naturalness and intelligibility of quality in speech synthesis and to eliminate the personality for speaker-independence in speech recognition. In this paper, we proposed a new pitch detection algorithm. First, positive center clipping is process by using the incline of speech in order to emphasize pitch period with a glottal component of removed vocal tract characteristic in time domain. And rough formant envelope is computed through peak-fitting spectrum of original speech signal infrequence domain. Using the roughed formant envelope, obtain the smoothed formant envelope through calculate the linear interpolation. As well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. Inverse fast fourier transform (IFFT) compute this flattened harmonics. After all, we obtain Residual signal which is removed vocal tract element. The performance was compared with LPC and Cepstrum, ACF. Owing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

기본주파수와 성도길이의 상관관계를 이용한 HTS 음성합성기에서의 목소리 변환 (Voice transformation for HTS using correlation between fundamental frequency and vocal tract length)

  • 유효근;김영관;서영주;김회린
    • 말소리와 음성과학
    • /
    • 제9권1호
    • /
    • pp.41-47
    • /
    • 2017
  • The main advantage of the statistical parametric speech synthesis is its flexibility in changing voice characteristics. A personalized text-to-speech(TTS) system can be implemented by combining a speech synthesis system and a voice transformation system, and it is widely used in many application areas. It is known that the fundamental frequency and the spectral envelope of speech signal can be independently modified to convert the voice characteristics. Also it is important to maintain naturalness of the transformed speech. In this paper, a speech synthesis system based on Hidden Markov Model(HMM-based speech synthesis, HTS) using the STRAIGHT vocoder is constructed and voice transformation is conducted by modifying the fundamental frequency and spectral envelope. The fundamental frequency is transformed in a scaling method, and the spectral envelope is transformed through frequency warping method to control the speaker's vocal tract length. In particular, this study proposes a voice transformation method using the correlation between fundamental frequency and vocal tract length. Subjective evaluations were conducted to assess preference and mean opinion scores(MOS) for naturalness of synthetic speech. Experimental results showed that the proposed voice transformation method achieved higher preference than baseline systems while maintaining the naturalness of the speech quality.

영어의 억양 유형화를 이용한 발화 속도와 남녀 화자에 따른 음향 분석 (An acoustical analysis of speech of different speaking rates and genders using intonation curve stylization of English)

  • 이서배
    • 말소리와 음성과학
    • /
    • 제6권4호
    • /
    • pp.79-90
    • /
    • 2014
  • An intonation curve stylization was used for an acoustical analysis of English speech. For the analysis, acoustical feature values were extracted from 1,848 utterances produced with normal and fast speech rate by 28 (12 women and 16 men) native speakers of English. Men are found to speak faster than women at normal speech rate but no difference is found between genders at fast speech rate. Analysis of pitch point features has it that fast speech has greater Pt (pitch point movement time), Pr (pitch point pitch range), and Pd (pitch point distance) but smaller Ps (pitch point slope) than normal speech. Men show greater Pt, Pr, and Pd than women. Analysis of sentence level features reveals that fast speech has smaller Sr (sentence level pitch range), Sd (sentence duration), and Max (maximum pitch) but greater Ss (sentence slope) than normal speech. Women show greater Sr, Ss, Sp (pitch difference between the first pitch point and the last), Sd, MaxNr (normalized Max), and MinNr (normalized Min) than men. As speech rate increases, women speak with greater Ss and Sr than men.

Speech Interactive Agent on Car Navigation System Using Embedded ASR/DSR/TTS

  • Lee, Heung-Kyu;Kwon, Oh-Il;Ko, Han-Seok
    • 음성과학
    • /
    • 제11권2호
    • /
    • pp.181-192
    • /
    • 2004
  • This paper presents an efficient speech interactive agent rendering smooth car navigation and Telematics services, by employing embedded automatic speech recognition (ASR), distributed speech recognition (DSR) and text-to-speech (ITS) modules, all while enabling safe driving. A speech interactive agent is essentially a conversational tool providing command and control functions to drivers such' as enabling navigation task, audio/video manipulation, and E-commerce services through natural voice/response interactions between user and interface. While the benefits of automatic speech recognition and speech synthesizer have become well known, involved hardware resources are often limited and internal communication protocols are complex to achieve real time responses. As a result, performance degradation always exists in the embedded H/W system. To implement the speech interactive agent to accommodate the demands of user commands in real time, we propose to optimize the hardware dependent architectural codes for speed-up. In particular, we propose to provide a composite solution through memory reconfiguration and efficient arithmetic operation conversion, as well as invoking an effective out-of-vocabulary rejection algorithm, all made suitable for system operation under limited resources.

  • PDF

텍스트의 의미 정보에 기반을 둔 음성컨트롤 태그에 관한 연구 (A Study of Speech Control Tags Based on Semantic Information of a Text)

  • 장문수;정경채;강선미
    • 음성과학
    • /
    • 제13권4호
    • /
    • pp.187-200
    • /
    • 2006
  • The speech synthesis technology is widely used and its application area is also being broadened to an automatic response service, a learning system for handicapped person, etc. However, the sound quality of the speech synthesizer has not yet reached to the satisfactory level of users. To make a synthesized speech, the existing synthesizer generates rhythms only by the interval information such as space and comma or by several punctuation marks such as a question mark and an exclamation mark so that it is not easy to generate natural rhythms of people even though it is based on mass speech database. To make up for the problem, there is a way to select rhythms after processing language from a higher level information. This paper proposes a method for generating tags for controling rhythms by analyzing the meaning of sentence with speech situation information. We use the Systemic Functional Grammar (SFG) [4] which analyzes the meaning of sentence with speech situation information considering the sentence prior to the given one, the situation of a conversation, the relationship among people in the conversation, etc. In this study, we generate Semantic Speech Control Tag (SSCT) by the result of SFG's meaning analysis and the voice wave analysis.

  • PDF

파킨슨병 환자의 교대운동속도 과제에서 관찰된 '말 뭉침'의 음향학적 특성 (Acoustic Characteristics of 'Short Rushes of Speech' using Alternate Motion Rates in Patients with Parkinson's Disease)

  • 김선우;윤지혜;이승진
    • 말소리와 음성과학
    • /
    • 제7권2호
    • /
    • pp.55-62
    • /
    • 2015
  • It is widely accepted that Parkinson's disease(PD) is the most common cause of hypokinetic dysarthria, and its characteristics of 'short rushes of speech' have become more evident along with the severity of motor disorders. Speech alternate motion rates (AMRs) are particularly useful for observing not only rate abnormalities but also deviant speech. However, relatively little is known about the characteristics of 'short rushes of speech' in terms of AMRs of PD except for the perceptual characteristics. The purpose of this study was to examine which acoustic features of 'short rushes of speech' in terms of AMRs are a robust indicator of Parkinsonian speech. Numbers of syllabic repetitions (/pə/, /tə/, /kə/) in AMR tasks were analyzed through acoustic methods observing a spectrogram of the Computerized Speech Lab in 9 patients with PD. Acoustically, we found three characteristics of 'short rushes of speech': 1) Vocalized consonants without closure duration(VC) 76.3%; 2) No consonant segmentation(NC) 18.6%; 3) No vowel formant frequency(NV) 5.1%. Based on these results, 'short rushes of speech' may affect the failure to reach and maintain the phonatory targets. In order to best achieve the therapeutic goals, and to make the treatment most efficacious, it is important to incorporate training methods which are based on both phonation and articulation.

소유권적 언론자유에 대한 일고찰 : 로크의 사회계약론을 중심으로 (A Study of Locke's Concept of Freedom of Speech as Proprietorship)

  • 문종대
    • 한국언론정보학보
    • /
    • 제17권
    • /
    • pp.7-36
    • /
    • 2001
  • 로크의 자연권 이론 및 자연법 사상은 현대 언론사상에 많은 영향을 미치고 있다. 무엇보다도 자유주의적 언론입장을 이해하는 데 있어서 많은 시사점을 던져주고 있다. 본 논문은 로크의 사회계약론을 중심으로 로크가 자연권으로부터 이끌어낸 콜론자유의 본질이 무엇인지, 그리고 소유권과 언론자유간의 관계가 어떠한지, 로크의 시민사회에서 언론자유가 어떻게 실현될 수 있는지, 그리고 국가는 어느 범위까지 언론에 개입할 수 있는지를 중심으로 분석했다.

  • PDF

구개열 환자 말 평가 시 검사어에 대한 고찰 : 임상현장의 말 평가 어음자료와 문헌적 고찰을 중심으로 (Speech Stimuli on the Diagnostic Evaluation of Speech with Cleft Lip and Palate : Clinical Use and Literature Review)

  • 최성희;최재남;남도현;최홍식
    • 대한후두음성언어의학회지
    • /
    • 제16권1호
    • /
    • pp.33-48
    • /
    • 2005
  • Differential diagnosis of articulation and resonance problems in the cleft lip and palate speech is required for evaluating various factors contribute to speech problems such as VPI, dental occlusion, palatal fistulae, learning. However, validity of speech stimuli is current issue to evaluate accurately each problem in cleft speech. This study was conducted to investigate speech stimuli using in the clinical setting and review the literatures and articles published 1990 to 2005 for helping develop standardized speech samples. The results were recommendation to evaluate properly velopharyngeal function when conducting a diagnostic evaluation as follows : 1) In identification hypernasality, the speech stimuli should be included low pressure consonants to eliminate effects of nasal emission, compensatory articulation. 2) Speech stimuli should be consist of visual, front sounds to eliminate compensatory articulation and to stimulate easily. 3) Regarding early diagnosis and treatment, speech stimuli need to develop for infants and preschooler. 4) Stimulus length on nasalance scores should be at least 6 syllables. 5) In phonetic context on nasalance scores, /i/ vowel should be take into consideration excluding paragraph. 6) Connected speech stimuli should be developed for evaluating intelligibility and VP function.

  • PDF