• Title/Summary/Keyword: Korean speech

Search Result 5,286, Processing Time 0.026 seconds

Current Status and Perspectives of Telepractice in Voice and Speech Therapy (비대면 음성언어치료의 현황과 전망)

  • Seung Jin, Lee
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.33 no.3
    • /
    • pp.130-141
    • /
    • 2022
  • Voice and speech therapy can be performed in various ways depending on the situation, although it is generally performed in a face-to-face manner. Telepractice refers to the provision of specialized voice and speech therapy by speech-language pathologists for assessment, therapy, and counseling by applying telecommunication technology from a remote location. Recently, due to the pandemic situation and the active use of non-face-to-face platforms, interest in telepractice of voice and speech therapy has increased. Moreover, a growing body of literature has been advocating its clinical usefulness and non-inferiority to traditional face-to-face intervention. In this review, the existing discussions, guidelines, and preliminary studies on non-face-toface voice and speech therapy were summarized, and recommendations on the tools for telepractice were provided.

Automatic Detection of Intonational and Accentual Phrases in Korean Standard Continuous Speech (한국 표준어 연속음성에서의 억양구와 강세구 자동 검출)

  • Lee, Ki-Young;Song, Min-Suck
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.209-224
    • /
    • 2000
  • This paper proposes an automatic detection method of intonational and accentual phrases in Korean standard continuous speech. We use the pause over 150 msec for detecting intonational phrases, and extract accentual phrases from the intonational phrases by analyzing syllables and pitch contours. The speech data for the experiment are composed of seven male voices and two female voices which read the texts of the fable 'the ant and the grasshopper' and a newspaper article 'manmulsang' in normal speed and in Korean standard variation. The results of the experiment shows that the detection rate of intonational phrases is 95% on the average and that of accentual phrases is 73%. This detection rate implies that we can segment the continuous speech into smaller units(i.e. prosodic phrases) by using the prosodic information and so the objects of speech recognition can narrow down to words or phrases in continuous speech.

  • PDF

Correlation Analysis of PESQ and MOS Evaluation for HMM-based Synthetic Korean Speech (HMM 기반의 한국어 합성음에 대한 PESQ 및 MOS 평가의 상관도 분석)

  • Lin, Cang-Song;Bae, Keun-Sung
    • Phonetics and Speech Sciences
    • /
    • v.2 no.1
    • /
    • pp.71-75
    • /
    • 2010
  • The PESQ is an objective speech quality evaluation measure that is known to have a high correlation with a subjective speech quality measure such as MOS. To examine whether it could be useful as an objective quality measure of synthetic speech, we carried out both subjective evaluation tests with MOS and DMOS and an objective evaluation test with PESQ for HMM-based Korean synthetic speech signals and analyzed the correlation between them. Experimental results have shown that the PESQ has correlations of 0.87 with MOS and 0.92 with DMOS. It means that the PESQ holds much promise for evaluating the quality of synthetic Korean speech.

  • PDF

A Preliminary Study on Voice Symptoms and Korean Voice Handicap Index of Speech Language Pathologists (언어치료사의 음성증상 및 한국어판 음성장애지수에 대한 예비연구)

  • Song, Yun-Kyung;Pyo, Hwa-Young
    • Phonetics and Speech Sciences
    • /
    • v.2 no.2
    • /
    • pp.123-133
    • /
    • 2010
  • Speech language pathologists depend on their voice for livelihood and are high risk group of voice disorders. But there are few studies on their prevalence of voice symptoms and voice handicap index. This study aimed to evaluate prevalence of voice symptoms and Korean voice handicap index with 86 speech language pathologists and 90 individuals employed in other occupations. We analyzed self-reported voice symptoms and voice handicap index using a questionnaire for this study. The results showed that the prevalence of voice symptoms of speech language pathologists is 60.5% and voice handicap index scores of speech language pathologists group are significantly higher than those of control group in physical and total score. And we found that alcohol history was a risk factor for voice symptoms. These findings indicate that special vocal hygiene program for speech language pathologists and follow up studies for comparisons of prevalence of voice symptoms and voice handicap index with other professional voice users are necessary.

  • PDF

The Korean Text-to-speech Using Syllable Units (음절 단위를 이용한 한국어 음성 합성)

  • 김병수;윤기선;박성한
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.1
    • /
    • pp.143-150
    • /
    • 1990
  • In this paper, a rule-based method for improving the intelligibility of synthetic speech is proposed. A 12-pole linear prediction coding method is used to model syllable speech signals. A syllable concatenation rule for pause and frame rejection between syllables is developed to improve the naturalness of the synthetic speech. In addition, phonoligical structure transform rule and prosody rule are applied to the synthetic speech by LPC. The illustrative results demonstrate that the synthetic speech obtained by applying these rules has better naturalness than the synthetic speech by LPC.

  • PDF

Phonetic Evaluation in Speech Sciences and Issues in Phonetic Transcription (음성 평가의 다학문적 현황과 표기의 과제)

  • Kim, Jong-Mi
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.259-280
    • /
    • 2003
  • The paper discusses the way in which speech sounds are being evaluated and transcribed in various fields of speech sciences, and suggests ways for a more accurate transcription. The academic fields explored are of phonetics, speech processing, speech pathology, and foreign language education. The discussion centers on the International Phonetic Alphabet (IPA), most commonly used in these fields, and other less widely-accepted transcription conventions such as the TOnes and Break Indices (ToBI), the Speech Assessment Methods Phonetic Alphabet (SAMPA), an extension of the official Korean Romanization (KORBET), and the American-English transcription system in the TIMIT database (TIMITBET). These transcription conventions are dealt with Korean, English, and Korean-accented English. The paper demonstrates that each transcription can exclusively be recommended for a specific need from different academic fields. Due to its publicity, the IPA is best suited for phonetic evaluation in the fields of phonetics, speech pathology, and foreign language education. The rest of the transcriptions are useful for keyboard-inputting the phonetically evaluated data from all these fields as well as for sound transcription in speech engineering, because they use convenient letter symbols for typing, searching, and programming. Several practical suggestions are made to maintain the transcriptional efficiency and consistency to accommodate the intra-and inter-transcriber variability.

  • PDF

Comparison of HMM models and various cepstral coefficients for Korean whispered speech recognition (은닉 마코프 모델과 켑스트럴 계수들에 따른 한국어 속삭임의 인식 비교)

  • Park, Chan-Eung
    • 전자공학회논문지 IE
    • /
    • v.43 no.2
    • /
    • pp.22-29
    • /
    • 2006
  • Recently the use of whispered speech has increased due to mobile phone and the necessity of whispered speech recognition is increasing. So various feature vectors, which are mainly used for speech recognition, are applied to their HMMs, normal speech models, whispered speech models, and integrated models with normal speech and whispered speech so as to find out suitable recognition system for whispered speech. The experimental results of recognition test show that the recognition rate of whispered speech applied to normal speech models is too low to be used in practical applications, but separate whispered speech models recognize whispered speech with the highest rates at least 85%. And also integrated models with normal speech and whispered speech score acceptable recognition rate but more study is needed to increase recognition rate. MFCE and PLCC feature vectors score higher recognition rate when applied to separate whispered speech models, but PLCC is the best when a lied to integrated models with normal speech and whispered speech.

Adaptive Band Selection for Robust Speech Detection In Noisy Environments

  • Ji Mikyong;Suh Youngjoo;Kim Hoirin
    • MALSORI
    • /
    • no.50
    • /
    • pp.85-97
    • /
    • 2004
  • One of the important problems in speech recognition is to accurately detect the existence of speech in adverse environments. The speech detection problem becomes severer when recognition systems are used over the telephone network, especially in a wireless network and a noisy environment. In this paper, we propose a robust speech detection algorithm, which detects speech boundaries accurately by selecting useful bands adaptively to noisy environments. The bands where noises are mainly distributed, so called, noise-centric bands are introduced. In this paper, we compare two different speech detection algorithms with the proposed algorithm, and evaluate them on noisy environments. The experimental results show the excellence of the proposed speech detection algorithm.

  • PDF

Accurate Speech Detection based on Sub-band Selection for Robust Keyword Recognition (강인한 핵심어 인식을 위해 유용한 주파수 대역을 이용한 음성 검출기)

  • Ji Mikyong;Kim Hoirin
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.183-186
    • /
    • 2002
  • The speech detection is one of the important problems in real-time speech recognition. The accurate detection of speech boundaries is crucial to the performance of speech recognizer. In this paper, we propose a speech detector based on Mel-band selection through training. In order to show the excellence of the proposed algorithm, we compare it with a conventional one, so called, EPD-VAA (EndPoint Detector based on Voice Activity Detection). The proposed speech detector is trained in order to better extract keyword speech than other speech. EPD-VAA usually works well in high SNR but it doesn't work well any more in low SNR. But the proposed algorithm pre-selects useful bands through keyword training and decides the speech boundary according to the energy level of the sub-bands that is previously selected. The experimental result shows that the proposed algorithm outperforms the EPD-VAA.

  • PDF

The Effects of Speaking Mode on Intelligibility of Dysarthric Speech (뇌성마비 성인의 발화유형에 따른 명료도)

  • Kim, Soo-Jin;Ko, Hyun-Ju
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.171-176
    • /
    • 2009
  • Intelligibility measurement is one criterion for the assessment of the severity of speech disorders especially of dysarthric persons. Rate control, usually rate reduction, is used with many dysarthric speakers to improve their intelligibility. The purpose of this study is to compare how change intelligibility of speech produced by cerebral palsic speakers according to three speaking conditions. Speech samples were collected from 10 adults with cerebral palsy were asked to speak under three speaking conditions-(1) naturally(control), (2) more slowly(rate control), (3) louder and accurately(clear speech). In a perception test, after listening to the speech samples, a group of three judges were to write down whatever they heard. The result showed that total cerebral palsic subjects were divided into two subgroups according to their intelligibility according to three speaking conditions. Some subjects showed that speech intelligibility increased greatly if asked to speak 'louder and more accurately'. and the others showed no difference of intelligibility according to the speaking conditions. This study suggested that it would be useful clinically to find out the best instruction to improve intelligibility suitable for each speaker with cerebral palsy.

  • PDF