• 제목/요약/키워드: Speech Corpus

검색결과 300건 처리시간 0.021초

한국인 표준 음성 DB 구축 (Developing a Korean Standard Speech DB)

  • 신지영;장혜진;강연민;김경화
    • 말소리와 음성과학
    • /
    • 제7권1호
    • /
    • pp.139-150
    • /
    • 2015
  • The data accumulated in this database will be used to develop a speaker identification system. This may also be applied towards, but not limited to, fields of phonetic studies, sociolinguistics, and language pathology. We plan to supplement the large-scale speech corpus next year, in terms of research methodology and content, to better answer the needs of diverse fields. The purpose of this study is to develop a speech corpus for standard Korean speech. For the samples to viably represent the state of spoken Korean, demographic factors were considered to modulate a balanced spread of age, gender, and dialects. Nine separate regional dialects were categorized, and five age groups were established from individuals in their 20s to 60s. A speech-sample collection protocol was developed for the purpose of this study where each speaker performs five tasks: two reading tasks, two semi-spontaneous speech tasks, and one spontaneous speech task. This particular configuration of sample data collection accommodates gathering of rich and well-balanced speech-samples across various speech types, and is expected to improve the utility of the speech corpus developed in this study. Samples from 639 individuals were collected using the protocol. Speech samples were collected also from other sources, for a combined total of samples from 1,012 individuals.

한국인의 영어 음성 코퍼스 설계 및 구축 (Design and Construction of Korean-Spoken English Corpus(K-SEC))

  • 이석재;이숙향;강석근;이용주
    • 대한음성학회지:말소리
    • /
    • 제46호
    • /
    • pp.159-174
    • /
    • 2003
  • K-SEC (Korean-Spoken English Corpus) is a kind of speech database that is being under construction by the authors of this paper This article discusses the needs of the K-SEC from various academic disciplines and industrial circles, and it introduces the characteristics of the K-SEC design, its catalogues and contents of the recorded database, exemplifying what are being considered from both Korean and English languages' phonetics and phonologies. The K-SEC can be marked as a beginning of a parallel speech corpus, and it is suggested that a similar corpus should be enlarged for the future advancements of the experimental phonetics and the speech information technology.

  • PDF

벅아이 코퍼스를 이용한 영어 무성파열음의 VOT 연구 (A Study on the Voice Onset Time of English Voiceless Stops in the Buckeye Corpus)

  • 윤규철
    • 말소리와 음성과학
    • /
    • 제4권2호
    • /
    • pp.33-40
    • /
    • 2012
  • The purpose of this paper is to investigate the voice onset time (VOT) of the English voiceless stops [p, t, k] found in the Buckeye Corpus of Conversational Speech [1]. Three young female speakers were chosen for this study and their VOT values were semi-automatically extracted along with other factors. The factors used for the analysis were place of articulation, location in word, syllabic stress, content word or not, word frequency calculated from the corpus, and the speech rate expressed in syllables per second. Results showed that, for the three places of articulation of each speaker, all the factors had a statistically significant effect on the VOT values. This paper has significance in that the materials used for the analysis were from a corpus of spontaneous natural English speech.

Prosodic Contour Generation for Korean Text-To-Speech System Using Artificial Neural Networks

  • Lim, Un-Cheon
    • The Journal of the Acoustical Society of Korea
    • /
    • 제28권2E호
    • /
    • pp.43-50
    • /
    • 2009
  • To get more natural synthetic speech generated by a Korean TTS (Text-To-Speech) system, we have to know all the possible prosodic rules in Korean spoken language. We should find out these rules from linguistic, phonetic information or from real speech. In general, all of these rules should be integrated into a prosody-generation algorithm in a TTS system. But this algorithm cannot cover up all the possible prosodic rules in a language and it is not perfect, so the naturalness of synthesized speech cannot be as good as we expect. ANNs (Artificial Neural Networks) can be trained to learn the prosodic rules in Korean spoken language. To train and test ANNs, we need to prepare the prosodic patterns of all the phonemic segments in a prosodic corpus. A prosodic corpus will include meaningful sentences to represent all the possible prosodic rules. Sentences in the corpus were made by picking up a series of words from the list of PB (phonetically Balanced) isolated words. These sentences in the corpus were read by speakers, recorded, and collected as a speech database. By analyzing recorded real speech, we can extract prosodic pattern about each phoneme, and assign them as target and test patterns for ANNs. ANNs can learn the prosody from natural speech and generate prosodic patterns of the central phonemic segment in phoneme strings as output response of ANNs when phoneme strings of a sentence are given to ANNs as input stimuli.

벅아이 코퍼스에서의 젊은 성인 남성의 모음 포먼트 분석 (An Analysis of the Vowel Formants of the Young Males in the Buckeye Corpus)

  • 윤규철;노혜욱
    • 말소리와 음성과학
    • /
    • 제4권2호
    • /
    • pp.41-49
    • /
    • 2012
  • The purpose of this paper is to extract the vowel formants of the ten young male speakers from the Buckeye Corpus of Conversational Speech [1] and to analyze them in comparison to earlier works in terms of various phonetic factors that are expected to affect the realization of the formant distribution. The first two formant frequency values were automatically extracted with a Praat script along with such factors as the place of articulation, the content versus function word information, syllabic stress information, the location in a word, location in utterance, speech rate of three consecutive words, and the word frequency in the corpus. The results indicated that the formant patterns from the corpus were very different from those of earlier works although the overall pattern was similar and that the factors were strongly responsible for the realization of the two formants. The purpose of this paper is to extract the vowel formants of the ten young male speakers from the Buckeye Corpus of Conversational Speech [1] and to analyze them in comparison to earlier works in terms of various phonetic factors that are expected to affect the realization of the formant distribution. The first two formant frequency values were automatically extracted with a Praat script along with such factors as the place of articulation, the content versus function word information, the syllabic stress information, the location in a word, the location in an utterance, the speech rate of the three consecutive words, and the word frequency in the corpus. The result indicated that the formant patterns from the corpus were very different from those of earlier works although the overall pattern was similar and that the factors were strongly responsible for the realization of the two formants.

KMSAV: Korean multi-speaker spontaneous audiovisual dataset

  • Kiyoung Park;Changhan Oh;Sunghee Dong
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.71-81
    • /
    • 2024
  • Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open-source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state-of-the-art ASR and AVSR techniques, capitalizing on both pretrained models and fine-tuning processes. After fine-tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.

A Comparison of Korean EFL Learners' Oral and Written Productions

  • Lee, Eun-Ha
    • 영어어문교육
    • /
    • 제12권2호
    • /
    • pp.61-85
    • /
    • 2006
  • The purpose of the present study is to compare Korean EFL learners' speech corpus (i.e. oral productions) with their composition corpus (i.e. written productions). Four college students participated in the study. The composition corpus was collected through a writing assignment, and the speech corpus was gathered by audio-taping their oral presentations. The results of the data analysis indicate that (i) As for error frequency, young adult low-intermediate Korean EFL learners showed high frequency in determiners (mostly, indefinite articles), vocabulary (mostly, semantic errors), and prepositions. The frequency order did not show much difference between the speech corpus and the composition corpus; and (ii) When comparing the oral productions with the written productions, there were not many differences between them in terms of the contents, a style (i.e., colloquial vs. literary), vocabulary selection, and error types and frequency. Therefore, it is assumed that the proficiency in oral presentation of EFL learners at this learning stage heavily depends on how much/how well they are able to write. In other words, EFL learners' writing and speaking skills are closely co-related. It implies that the teacher does not need to separate teaching how to speak from teaching how to write. The teacher may use the same methods or strategies to help the learners improve their English speaking and writing skills. Furthermore, it will be more effective to teach writing before speaking since they have more opportunities to write than speak in the EFL contexts.

  • PDF

서울 코퍼스 20대 남성의 성대진동 개시시간 연구 (A study on the voice onset times of the Seoul Corpus males in their twenties)

  • 이유리;윤규철
    • 말소리와 음성과학
    • /
    • 제8권4호
    • /
    • pp.1-8
    • /
    • 2016
  • The purpose of this work is to examine the voice onset times (VOTs) of the three types of plosives from the Seoul Corpus male speakers in their twenties. In addition, the factors known to affect VOTs were analyzed, including the place and manner of articulation, speakers, location in words, type of following vowels and speech rates calculated from the three consecutive words. Much of the findings agreed with those from earlier studies on Korean and other languages and new discoveries were made.

The fundamental frequency (f0) distribution of American speakers in a spontaneous speech corpus

  • Byunggon Yang
    • 말소리와 음성과학
    • /
    • 제16권1호
    • /
    • pp.11-16
    • /
    • 2024
  • The fundamental frequency (f0), representing an acoustic measure of vocal fold vibration, serves as an indicator of the speaker's emotional state and language-specific pattern in daily conversations. This study aimed to examine the f0 distribution in an English corpus of spontaneous speech, establishing normative data for American speakers. The corpus involved 40 participants engaging in free discussions on daily activities and personal viewpoints. Using Praat, f0 values were collected filtering outliers after removing nonspeech sounds and interviewer voices. Statistical analyses were performed with R. Results indicated a median f0 value of 145 Hz for all the speakers. The f0 values for all speakers exhibited a right-skewed, pointy distribution within a frequency range of 216 Hz from 75 Hz to 339 Hz. The female f0 range was wider than that of males, with a median of 113 Hz for males and 181 Hz for females. This spontaneous speech corpus provides valuable insights for linguists into f0 variation among individuals or groups in a language. Further research is encouraged to develop analytical and statistical measures for establishing reliable f0 standards for the general population.

벅아이 코퍼스의 모음 길이 연구 (A Study on the Vowel Duration of the Buckeye Corpus)

  • 정혜정;윤규철
    • 말소리와 음성과학
    • /
    • 제7권4호
    • /
    • pp.103-110
    • /
    • 2015
  • The purpose of this study is to assess the vowel property by examining the vowel duration of the American English vowles found in the Buckeye corpus[6]. The vowel durations were analyzed in terms of various linguistic factors including the number of syllables of the word containing the vowel, the location of the vowel in a word, types of stress, function versus content word, the word frequency in the corpus and the speech rate calculated from the three consecutive words. The findings from this work agreed mostly with those from earlier studies, but with some exceptions. The relationship between the speech rate and the vowel duration proved non-linear.