• Title/Summary/Keyword: Speech

Search Result 7,753, Processing Time 0.025 seconds

Microphone Array Based Speech Enhancement Using Independent Vector Analysis (마이크로폰 배열에서 독립벡터분석 기법을 이용한 잡음음성의 음질 개선)

  • Wang, Xingyang;Quan, Xingri;Bae, Keunsung
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.87-92
    • /
    • 2012
  • Speech enhancement aims to improve speech quality by removing background noise from noisy speech. Independent vector analysis is a type of frequency-domain independent component analysis method that is known to be free from the frequency bin permutation problem in the process of blind source separation from multi-channel inputs. This paper proposed a new method of microphone array based speech enhancement that combines independent vector analysis and beamforming techniques. Independent vector analysis is used to separate speech and noise components from multi-channel noisy speech, and delay-sum beamforming is used to determine the enhanced speech among the separated signals. To verify the effectiveness of the proposed method, experiments for computer simulated multi-channel noisy speech with various signal-to-noise ratios were carried out, and both PESQ and output signal-to-noise ratio were obtained as objective speech quality measures. Experimental results have shown that the proposed method is superior to the conventional microphone array based noise removal approach like GSC beamforming in the speech enhancement.

Common Speech Database Collection for Telecommunications (통신망환경 한국어 공통음성 DB 구축)

  • Kim Sanghun;Park Moonwhan;Kim Hyunsuk
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.23-26
    • /
    • 2003
  • This paper presents common speech database collection for telecommunication applications. During 3 year project, we will construct very large scale speech and text databases for speech recognition, speech synthesis, and speaker identification. The common speech database has been considered various communication environments, distribution of speakers' sex, distribution of speakers' age, and distribution of speakers' region. It consists of Korean continuous digit, isolated words, and sentences which reflects Korean phonetic coverage. In addition, it consists of various pronunciation style such as read speech, dialogue speech, and semi-spontaneous speech. Thanks to the common speech databases, the duplicated resources of Korean speech industries are prohibited. It encourages domestic speech industries and activate speech technology domestic market.

  • PDF

The Comparison of Prosodic Phrasing in Spontaneous Speech and Read Speech (자유 발화와 낭독 발화의 운율 경계 형성 비교)

  • Noh, Seok-Eun
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.19-23
    • /
    • 2006
  • This paper is for the comparison of prosodic phrasing in Korean spontaneous speech and read speech. For this comparison, The subjects read the transcriptions from their own spontaneous speech. The number of IP in spontaneous speech is more than in read speech, while The number of AP has no difference between them. A accentual phrase in spontaneous speech has less syllable than read speech.

  • PDF

Differences in High Pitch Accents between News Speech and Natural Speech (영어 뉴스와 자연발화에 나타나는 고성조 피치액센트의 차이점)

  • Choi, Yun-Hui;Lee, Joo-Kyeong
    • Speech Sciences
    • /
    • v.12 no.2
    • /
    • pp.17-28
    • /
    • 2005
  • This paper argues that news speech entails a distinct intonational pattern from natural speech, effectively reflecting that it primarily focuses on providing new information. We conducted a phonetic experiment to compare the tonal contours between news speech and natural speech, examining the distributions of pitch accents and the overall pitch ranges. We utilized 70 American Press (AP) radio news utterances and 70 natural utterances extracted from TV dramas. Results show that news speech involves 3.38 H*'s (including L+H* and !H*) within an intonational phrase (IP) or intermediate phrase (ip) whereas natural speech, 1.8 in average. The number of IP/ip's per sentence is 3 in news speech, which is shown in the highest rate of 32.07% of the news speech, but it is merely 1, taking up the highest 41.42% in natural speech. Next, declination tends to be prevented in news speech, and the pitch range is much greater in news speech than in natural speech. Finally, a secondary stress syllable is comparatively frequently given a pitch accent in news speech, explicitly distinct from natural speech. These results can be interpreted as stating that news has the particular purpose of providing new information; every content word tends to be given a H* or its related pitch accent like L+H* or !H* because news speech assumes that every word conveys new information. This definitely brings about more IP/ip's per sentence due to a human physiological constraint; that is, more H*'s will cause more respiratory breaks. Also, greater pitch ranges and pitch accents imposed on secondary stress may be attributed to exaggerating new information.

  • PDF

Self-Reported Speech Problems in Adolescents and Young Adults with 22q11.2 Deletion Syndrome: A Cross-Sectional Cohort Study

  • Spruijt, Nicole E.;Vorstman, Jacob A.S.;Kon, Moshe;Molen, Aebele B. Mink Van Der
    • Archives of Plastic Surgery
    • /
    • v.41 no.5
    • /
    • pp.472-479
    • /
    • 2014
  • Background Speech problems are a common clinical feature of the 22q11.2 deletion syndrome. The objectives of this study were to inventory the speech history and current self-reported speech rating of adolescents and young adults, and examine the possible variables influencing the current speech ratings, including cleft palate, surgery, speech and language therapy, intelligence quotient, and age at assessment. Methods In this cross-sectional cohort study, 50 adolescents and young adults with the 22q11.2 deletion syndrome (ages, 12-26 years, 67% female) filled out questionnaires. A neuropsychologist administered an age-appropriate intelligence quotient test. The demographics, histories, and intelligence of patients with normal speech (speech rating=1) were compared to those of patients with different speech (speech rating>1). Results Of the 50 patients, a minority (26%) had a cleft palate, nearly half (46%) underwent a pharyngoplasty, and all (100%) had speech and language therapy. Poorer speech ratings were correlated with more years of speech and language therapy (Spearman's correlation=0.418, P=0.004; 95% confidence interval, 0.145-0.632). Only 34% had normal speech ratings. The groups with normal and different speech were not significantly different with respect to the demographic variables; a history of cleft palate, surgery, or speech and language therapy; and the intelligence quotient. Conclusions All adolescents and young adults with the 22q11.2 deletion syndrome had undergone speech and language therapy, and nearly half of them underwent pharyngoplasty. Only 34% attained normal speech ratings. Those with poorer speech ratings had speech and language therapy for more years.

A User-friendly Remote Speech Input Method in Spontaneous Speech Recognition System

  • Suh, Young-Joo;Park, Jun;Lee, Young-Jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.2E
    • /
    • pp.38-46
    • /
    • 1998
  • In this paper, we propose a remote speech input device, a new method of user-friendly speech input in spontaneous speech recognition system. We focus the user friendliness on hands-free and microphone independence in speech recognition applications. Our method adopts two algorithms, the automatic speech detection and the microphone array delay-and-sum beamforming (DSBF)-based speech enhancement. The automatic speech detection algorithm is composed of two stages; the detection of speech and nonspeech using the pitch information for the detected speech portion candidate. The DSBF algorithm adopts the time domain cross-correlation method as its time delay estimation. In the performance evaluation, the speech detection algorithm shows within-200 ms start point accuracy of 93%, 99% under 15dB, 20dB, and 25dB signal-to-noise ratio (SNR) environments, respectively and those for the end point are 72%, 89%, and 93% for the corresponding environments, respectively. The classification of speech and nonspeech for the start point detected region of input signal is performed by the pitch information-base method. The percentages of correct classification for speech and nonspeech input are 99% and 90%, respectively. The eight microphone array-based speech enhancement using the DSBF algorithm shows the maximum SNR gaing of 6dB over a single microphone and the error reductin of more than 15% in the spontaneous speech recognition domain.

  • PDF

A Study of Korean Literature Review Related to Speech Characteristics and Speech Therapy in Patients with Parkinson Disease (파킨슨병 환자의 말 특성과 언어치료 관련 국내문헌연구)

  • Kang, Ha Neul;Yoo, Jae Yeon
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.30 no.2
    • /
    • pp.87-94
    • /
    • 2019
  • The purpose of this study was to investigate the speech characteristics and speech therapy of Parkinson disease (PD). This study selected 28 papers published in Korea from 1998 to 2018 after searching the terms 'Parkinson voice' and 'Parkinson speech therapy.' Literature review had been conducted in the two aspects of speech characteristics and speech therapy. The speech characteristics were divided into respiration, phonation, articulation, prosody, vowel production, and voice questionnaire. Speech therapy was divided into Lee Sliverman voice treatment (LSVT) and other voice therapy. PD patients did not differ in respiration function compared to normal elderly people, but their speech and articulation function were poorer. There was also a difference in the speech rate, frequency of pause, and accuracy of vowel production compared with normal elderly people. PD had a lower VHI score and their voice related quality of life was a little poorer. The LSVT was typically used in speech therapy for PD. The methods of speech therapy for PD have been shown to improve respiration and phonation. It is necessary to establish voice norms in PD patients and develop effective speech therapy in the following study.

The Effects of Pitch Increasing Training (PIT) on Voice and Speech of a Patient with Parkinson's Disease: A Pilot Study

  • Lee, Ok-Bun;Jeong, Ok-Ran;Shim, Hong-Im;Jeong, Han-Jin
    • Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.95-105
    • /
    • 2006
  • The primary goal of therapeutic intervention in dysarthric speakers is to increase the speech intelligibility. Decision of critical features to increase the intelligibility is very important in speech therapy. The purpose of this study is to know the effects of pitch increasing training (PIT) on speech of a subject with Parkinson's disease (PD). The PIT program is focused on increasing pitch while a vowel is sustained with the same loudness. The loudness level is somewhat higher than that of the habitual loudness. A 67-year-old female with PD participated in the study. Speech therapy was conducted for 4 sessions (200 minutes) for one week. Before and after the treatment, acoustic, perceptual and speech naturalness evaluation was peformed for data analysis. Speech and voice satisfaction index (SVSI) was obtained after the treatment. Results showed Improvements in voice quality and speech naturalness. In addition, the patient's satisfaction ratings (SVSI) indicated a positive relationship between improved speech production and their (the patient and care-givers) satisfaction.

  • PDF

Developing a Korean Standard Speech DB (한국인 표준 음성 DB 구축)

  • Shin, Jiyoung;Jang, Hyejin;Kang, Younmin;Kim, Kyung-Wha
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.139-150
    • /
    • 2015
  • The data accumulated in this database will be used to develop a speaker identification system. This may also be applied towards, but not limited to, fields of phonetic studies, sociolinguistics, and language pathology. We plan to supplement the large-scale speech corpus next year, in terms of research methodology and content, to better answer the needs of diverse fields. The purpose of this study is to develop a speech corpus for standard Korean speech. For the samples to viably represent the state of spoken Korean, demographic factors were considered to modulate a balanced spread of age, gender, and dialects. Nine separate regional dialects were categorized, and five age groups were established from individuals in their 20s to 60s. A speech-sample collection protocol was developed for the purpose of this study where each speaker performs five tasks: two reading tasks, two semi-spontaneous speech tasks, and one spontaneous speech task. This particular configuration of sample data collection accommodates gathering of rich and well-balanced speech-samples across various speech types, and is expected to improve the utility of the speech corpus developed in this study. Samples from 639 individuals were collected using the protocol. Speech samples were collected also from other sources, for a combined total of samples from 1,012 individuals.

Speech Intelligibility and Vowel Space Characteristics of Alaryngeal Speech (무후두음성의 말 명료도와 모음 공간 특성)

  • Shim, Hee-Jeong;Jang, Hyo-Ryung;Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.5 no.4
    • /
    • pp.17-24
    • /
    • 2013
  • This study is aimed at finding out different types of speech characteristics categorized based on voice rehabilitation techniques used on twenty-six patients (all-male) with total or partial laryngectomees. The speech intelligibility of standard esophageal (SE), tracheoesophageal speech (TE), and electriclarynx (EL) was measured by using the CSL and eleven listeners were instructed to rate the speech on a 5-point scale. The vowel space parameters such as vowel space, VAI, FCR, and F2 ratio were measured by averaging 5 repeats of each vowel (/a/, /e/, /i/, /u/) and the results were put into the parameter formula. The results showed significant statistical differences in speech intelligibility and vowel space between SE and TE. The speech intelligibility and vowel space of TE were higher than those of SE or EL and there was a high correlation between speech intelligibility and some parameters (vowel space, VAI, F2 ratio). The results also showed that TE's speech characteristics were most similar to normal groups comparing with SE and EL, but still very deviant in laryngeal speech. This was due to insufficient airflow intake into the esophagus when producing sounds, and because articulation movement was carried out differently among groups. Therefore, these findings will contribute to establishing a baseline related to speech characteristics in voice rehabilitation for patients with alaryngeal speech.