• Title/Summary/Keyword: Speech Recording

Search Result 97, Processing Time 0.021 seconds

The Korean Corpus of Spontaneous Speech

  • Yun, Weonhee;Yoon, Kyuchul;Park, Sunwoo;Lee, Juhee;Cho, Sungmoon;Kang, Ducksoo;Byun, Koonhyuk;Hahn, Hyeseung;Kim, Jungsun
    • Phonetics and Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.103-109
    • /
    • 2015
  • This paper describes the development of the Korean corpus of spontaneous speech, also called the Seoul corpus. The corpus contains the audio recording of the interview-style spontaneous speech from the 40 native speakers of Seoul Korean. The talkers are divided into four age groups; talkers in their teens, twenties, thirties and forties. Each age group has ten talkers, five males and five females. The method used to elicit and record the speech is described. The corpus containing around 220,000 phrasal words was phonemically labeled along with information on the boundaries for Korean phrasal words and utterances, which were additionally romanized. According to the test result of labeling consistency, the inter-labeler agreement on phoneme identification was 98.1% and the mean deviation on boundary placement was 9.04 msec. The corpus will be made available for free to the research community in March, 2015.

Noise Spectrum Estimation Using Line Spectral Frequencies for Robust Speech Recognition

  • Jang, Gil-Jin;Park, Jeong-Sik;Kim, Sang-Hun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.3
    • /
    • pp.179-187
    • /
    • 2012
  • This paper presents a novel method for estimating reliable noise spectral magnitude for acoustic background noise suppression where only a single microphone recording is available. The proposed method finds noise estimates from spectral magnitudes measured at line spectral frequencies (LSFs), under the observation that adjacent LSFs are near the peak frequencies and isolated LSFs are close to the relatively flattened valleys of LPC spectra. The parameters used in the proposed method are LPC coefficients, their corresponding LSFs, and the gain of LPC residual signals, so it suits well to LPC-based speech coders.

Speech Corpus for Korean as a Foreign Language and the Aspects of the Foreign Learners' Acquisition of the Phonetic and Phonological Systems in the Korean Language (외국어로서의 한국어 음성 코퍼스 구축과 이를 통한 외국인의 한국어 음성${\cdot}$음운체계 습득 양상 연구)

  • Rhee, Seok-Chae;Kim, Jeong-Ah;Chang, Chae-Woong
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.29-33
    • /
    • 2005
  • This study aims to establish a speech corpus for Korean as a foreign language (L2 Korean Speech Corpus, L2KSC) and to examine the aspects of the foreign learners acquisition of the phonetic and phonological systems in the Korean Language. In the first year of this project, L2KSC will be established through the process of reading list organizing, recording, and slicing, and the second year includes an in-depth study of the aspects of foreign learners Korean acquisition and a contrastive analysis of phonetic and phonological systems. The expectation is that this project will provide significant bases for a variety of fields such as Korean education, academic research, and technological development of phonetic information.

  • PDF

Unit Generation Based on Phrase Break Strength and Pruning for Corpus-Based Text-to-Speech

  • Kim, Sang-Hun;Lee, Young-Jik;Hirose, Keikichi
    • ETRI Journal
    • /
    • v.23 no.4
    • /
    • pp.168-176
    • /
    • 2001
  • This paper discusses two important issues of corpus-based synthesis: synthesis unit generation based on phrase break strength information and pruning redundant synthesis unit instances. First, the new sentence set for recording was designed to make an efficient synthesis database, reflecting the characteristics of the Korean language. To obtain prosodic context sensitive units, we graded major prosodic phrases into 5 distinctive levels according to pause length and then discriminated intra-word triphones using the levels. Using the synthesis unit with phrase break strength information, synthetic speech was generated and evaluated subjectively. Second, a new pruning method based on weighted vector quantization (WVQ) was proposed to eliminate redundant synthesis unit instances from the synthesis database. WVQ takes the relative importance of each instance into account when clustering similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through objective and subjective evaluations of synthetic speech quality: one to simply limit the maximum number of instances, and the other based on normal VQ-based clustering. For the same reduction rate of instance number, the proposed method showed the best performance. The synthetic speech with reduction rate 45% had almost no perceptible degradation as compared to the synthetic speech without instance reduction.

  • PDF

Speech recognition rates and acoustic analyses of English vowels produced by Korean students

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.11-17
    • /
    • 2022
  • English vowels play an important role in verbal communication. However, Korean students tend to experience difficulty pronouncing a certain set of vowels despite extensive education in English. The aim of this study is to apply speech recognition software to evaluate Korean students' pronunciation of English vowels in minimal pair words and then to examine acoustic characteristics of the pairs in order to check their pronunciation problems. Thirty female Korean college students participated in the recording. Speech recognition rates were obtained to examine which English vowels were correctly pronounced. To compare and verify the recognition results, such acoustic analyses as the first and second formant trajectories and durations were also collected using Praat. The results showed an overall recognition rate of 54.7%. Some students incorrectly switched the tense and lax counterparts and produced the same vowel sounds for qualitatively different English vowels. From the acoustic analyses of the vowel formant trajectories, some of these vowel pairs were almost overlapped or exhibited slight acoustic differences at the majority of the measurement points. On the other hand, statistical analyses on the first formant trajectories of the three vowel pairs revealed significant differences throughout the measurement points, a finding that requires further investigation. Durational comparisons revealed a consistent pattern among the vowel pairs. The author concludes that speech recognition and analysis software can be useful to diagnose pronunciation problems of English-language learners.

Effects of Age and Type of Stimulus on the Cortical Auditory Evoked Potential in Healthy Malaysian Children

  • Mukari, Siti Zamratol-Mai Sarah;Umat, Cila;Chan, Soon Chien;Ali, Akmaliza;Maamor, Nashrah;Zakaria, Mohd Normani
    • Korean Journal of Audiology
    • /
    • v.24 no.1
    • /
    • pp.35-39
    • /
    • 2020
  • Background and Objectives: The cortical auditory evoked potential (CAEP) is a useful objective test for diagnosing hearing loss and auditory disorders. Prior to its clinical applications in the pediatric population, the possible influences of fundamental variables on the CAEP should be studied. The aim of the present study was to determine the effects of age and type of stimulus on the CAEP waveforms. Subjects and Methods: Thirty-five healthy Malaysian children aged 4 to 12 years participated in this repeated-measures study. The CAEP waveforms were recorded from each child using a 1 kHz tone burst and the speech syllable /ba/. Latencies and amplitudes of P1, N1, and P2 peaks were analyzed accordingly. Results: Significant negative correlations were found between age and speech-evoked CAEP latency for each peak (p<0.05). However, no significant correlations were found between age and tone-evoked CAEP amplitudes and latencies (p>0.05). The speech syllable /ba/ produced a higher mean P1 amplitude than the 1 kHz tone burst (p=0.001). Conclusions: The CAEP latencies recorded with the speech syllable became shorter with age. While both tone-burst and speech stimuli were appropriate for recording the CAEP, significantly bigger amplitudes were found in speech-evoked CAEP. The preliminary normative CAEP data provided in the present study may be beneficial for clinical and research applications in Malaysian children.

Effects of Age and Type of Stimulus on the Cortical Auditory Evoked Potential in Healthy Malaysian Children

  • Mukari, Siti Zamratol-Mai Sarah;Umat, Cila;Chan, Soon Chien;Ali, Akmaliza;Maamor, Nashrah;Zakaria, Mohd Normani
    • Journal of Audiology & Otology
    • /
    • v.24 no.1
    • /
    • pp.35-39
    • /
    • 2020
  • Background and Objectives: The cortical auditory evoked potential (CAEP) is a useful objective test for diagnosing hearing loss and auditory disorders. Prior to its clinical applications in the pediatric population, the possible influences of fundamental variables on the CAEP should be studied. The aim of the present study was to determine the effects of age and type of stimulus on the CAEP waveforms. Subjects and Methods: Thirty-five healthy Malaysian children aged 4 to 12 years participated in this repeated-measures study. The CAEP waveforms were recorded from each child using a 1 kHz tone burst and the speech syllable /ba/. Latencies and amplitudes of P1, N1, and P2 peaks were analyzed accordingly. Results: Significant negative correlations were found between age and speech-evoked CAEP latency for each peak (p<0.05). However, no significant correlations were found between age and tone-evoked CAEP amplitudes and latencies (p>0.05). The speech syllable /ba/ produced a higher mean P1 amplitude than the 1 kHz tone burst (p=0.001). Conclusions: The CAEP latencies recorded with the speech syllable became shorter with age. While both tone-burst and speech stimuli were appropriate for recording the CAEP, significantly bigger amplitudes were found in speech-evoked CAEP. The preliminary normative CAEP data provided in the present study may be beneficial for clinical and research applications in Malaysian children.

Speech Coarticulation Database of Korean and English ($\cdot$ 영 동시조음 데이터베이스의 구축)

  • ;Stephen A. Dyer;Dwight D. Day
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.3
    • /
    • pp.17-26
    • /
    • 1999
  • We present the first speech coarticulation database of Korean, English and Konglish/sup 3)/ named "SORIDA"/sup 4)/, which is designed to cover the maximum number of representations of coarticulation in these languages [1]. SORIDA features a compact database which is designed to contain a maximum number of triphones in a minimum number of prompts. SORIDA contains all consonantal triphones and vowel allophones in 682 Korean prompts of word length and in 717 English prompt words, spoken five times by speakers of balanced genders, dialects and ages. Korean prompts are synthesized lexicons which maximize their coarticulation variation disregarding any stress phenomena, while English prompts are natural words that fully reflect their stress effects with respect to the coarticulation variation. The prompts are designed differently because English phonology has stress while Korean does not. An intermediate language, Konglish has also been modeled by two Korean speakers reading 717 English prompt words. Recording was done in a controlled laboratory environment with an AKG Model C-100 microphone and a Fostex D-5 digital-audio-tape (DAT) recorder. The total recording time lasted four hours. SORIDA CD-ROM is available in one disk of 22.05 kHz sampling rate with a 16 bit sample size. SORIDA digital audio-tapes are available in four 124-minute-tapes of 48 kHz sampling rate. SORIDA′s list of phonetically-rich-words is also available in English and Korean.

  • PDF

An acoustical analysis method of numeric sounds by Praat (Praat를 이용한 숫자음의 음향적 분석법)

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.7 no.2
    • /
    • pp.127-137
    • /
    • 2000
  • This paper presents a macro script to analyze numeric sounds by a speech analysis shareware, Praat, and analyzes those sounds produced by three students who were born and raised in Pusan. Recording was done in a quiet office. To make a meaningful comparison, dynamic time points in relation to the total duration of voicing segments were determined to measure acoustical values. Results showed that a strong correlation coefficient was found between the repetitive production of numeric sounds within and across the speakers. Very high coefficients among diphthongal numbers (0 and 6) which usually show wide formant variation were noticed. This supports that each speaker produced numbers quite coherently. Also, the frequency differences between the three subjects were within a perceptually similar range. To identify a speaker among others may require to find subtle individual differences within this range. Perceptual experiments by synthesized numeric sounds may lead to resolve the issue.

  • PDF

Introduction to the Spectrum and Spectrogram (스팩트럼과 스팩트로그램의 이해)

  • Jin, Sung-Min
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.19 no.2
    • /
    • pp.101-106
    • /
    • 2008
  • The speech signal has been put into a form suitable for storage and analysis by computer, several different operation can be performed. Filtering, sampling and quantization are the basic operation in digiting a speech signal. The waveform can be displayed, measured and even edited, and spectra can be computed using methods such as the Fast Fourier Transform (FFT), Linear predictive Coding (LPC), Cepstrum and filtering. The digitized signal also can be used to generate spectrograms. The spectrograph provide major advantages to the study of speech. So, author introduces the basic techniques for the acoustic recording, digital signal processing and the principles of spectrum and spectrogram.

  • PDF