• Title/Summary/Keyword: Speech Annotation

Search Result 19, Processing Time 0.022 seconds

Annotation of a Non-native English Speech Database by Korean Speakers

  • Kim, Jong-Mi
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.111-135
    • /
    • 2002
  • An annotation model of a non-native speech database has been devised, wherein English is the target language and Korean is the native language. The proposed annotation model features overt transcription of predictable linguistic information in native speech by the dictionary entry and several predefined types of error specification found in native language transfer. The proposed model is, in that sense, different from other previously explored annotation models in the literature, most of which are based on native speech. The validity of the newly proposed model is revealed in its consistent annotation of 1) salient linguistic features of English, 2) contrastive linguistic features of English and Korean, 3) actual errors reported in the literature, and 4) the newly collected data in this study. The annotation method in this model adopts the widely accepted conventions, Speech Assessment Methods Phonetic Alphabet (SAMPA) and the TOnes and Break Indices (ToBI). In the proposed annotation model, SAMPA is exclusively employed for segmental transcription and ToBI for prosodic transcription. The annotation of non-native speech is used to assess speaking ability for English as Foreign Language (EFL) learners.

  • PDF

Standardization for Annotation Information Description of Speech Database (음성 DB 부가 정보 기술방안 표준화를 위한 제안)

  • Kim Sanghun;Lee Youngjik;Hahn Minsoo
    • MALSORI
    • /
    • no.47
    • /
    • pp.109-120
    • /
    • 2003
  • This paper presents about the activities of speech database standardization in ETRI. Recently, with the support of government, ETRI and SiTEC have been gathering the large speech corpus for the domestic speech related companies. First, due to the lack of sharing the knowledge of speech database specification, the distributed speech database has a different format. Hence it seems to be needed to have the same format as soon as possible. ETRI and SiTEC are trying to find the better representation format of speech database. Second, we introduce a new description method of the annotation information of speech database. As one of the structured description method, XML based description will be applied to represent the metadata of the speech database. It will be continuously revised through the speech technology standard forum during this year.

  • PDF

Meta-data Standardization of Speech Database (음성 DB의 메타데이타 표준화)

  • Kim Sanghun
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.61-64
    • /
    • 2003
  • In this paper, we introduce a new description method of annotation information of speech database. As one of structured description methods, XML based description which has been standardized by W3C will be applied to represent metadata of speech database. It will be continuously revised through the speech technology standard forum during this year

  • PDF

Segmentation and Labeling in Creation of Speech Corpus (음성 코퍼스 구축에서 분절과 레이블링의 문제)

  • Um Yongnam;Lee Yong-Ju
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.27-32
    • /
    • 2002
  • In this paper it is discussed what should be taken into consideration with respect to segmentation and labeling in creation of speech corpus. What levels of annotation and what kind of contents should be included, what kind of acoustic information is checked for in segmentation, etc are discussed.

  • PDF

XML Based Meta-data Specification for Industrial Speech Databases (산업용 음성 DB를 위한 XML 기반 메타데이터)

  • Joo Young-Hee;Hong Ki-Hyung
    • MALSORI
    • /
    • v.55
    • /
    • pp.77-91
    • /
    • 2005
  • In this paper, we propose an XML based meta-data specification for industrial speech databases. Building speech databases is very time-consuming and expensive. Recently, by the government supports, huge amount of speech corpus has been collected as speech databases. However, the formats and meta-data for speech databases are different depending on the constructing institutions. In order to advance the reusability and portability of speech databases, a standard representation scheme should be adopted by all speech database construction institutions. ETRI proposed a XML based annotation scheme [51 for speech databases, but the scheme has too simple and flat modeling structure, and may cause duplicated information. In order to overcome such disadvantages in this previous scheme, we first define the speech database more formally and then identify object appearing in speech databases. We then design the data model for speech databases in an object-oriented way. Based on the designed data model, we develop the meta-data specification for industrial speech databases.

  • PDF

Prosodic Annotation in a Thai Text-to-speech System

  • Potisuk, Siripong
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.405-414
    • /
    • 2007
  • This paper describes a preliminary work on prosody modeling aspect of a text-to-speech system for Thai. Specifically, the model is designed to predict symbolic markers from text (i.e., prosodic phrase boundaries, accent, and intonation boundaries), and then using these markers to generate pitch, intensity, and durational patterns for the synthesis module of the system. In this paper, a novel method for annotating the prosodic structure of Thai sentences based on dependency representation of syntax is presented. The goal of the annotation process is to predict from text the rhythm of the input sentence when spoken according to its intended meaning. The encoding of the prosodic structure is established by minimizing speech disrhythmy while maintaining the congruency with syntax. That is, each word in the sentence is assigned a prosodic feature called strength dynamic which is based on the dependency representation of syntax. The strength dynamics assigned are then used to obtain rhythmic groupings in terms of a phonological unit called foot. Finally, the foot structure is used to predict the durational pattern of the input sentence. The aforementioned process has been tested on a set of ambiguous sentences, which represents various structural ambiguities involving five types of compounds in Thai.

  • PDF

Detecting and correcting errors in Korean POS-tagged corpora (한국어 품사 부착 말뭉치의 오류 검출 및 수정)

  • Choi, Myung-Gil;Seo, Hyung-Won;Kwon, Hong-Seok;Kim, Jae-Hoon
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.37 no.2
    • /
    • pp.227-235
    • /
    • 2013
  • The quality of the part-of-speech (POS) annotation in a corpus plays an important role in developing POS taggers. There, however, are several kinds of errors in Korean POS-tagged corpora like Sejong Corpus. Such errors are likely to be various like annotation errors, spelling errors, insertion and/or deletion of unexpected characters. In this paper, we propose a method for detecting annotation errors using error patterns, and also develop a tool for effectively correcting them. Overall, based on the proposed method, we have hand-corrected annotation errors in Sejong POS Tagged Corpus using the developed tool. As the result, it is faster at least 9 times when compared without using any tools. Therefore we have observed that the proposed method is effective for correcting annotation errors in POS-tagged corpus.

WalkieTagging : Efficient Speech-Based Video Annotation Method for Smart Devices (워키태깅 : 스마트폰 환경에서 음성기반의 효과적인 영상 콘텐츠 어노테이션 방법에 관한 연구)

  • Park, Joon Young;Lee, Soobin;Kang, Dongyeop;Seok, YoungTae
    • Journal of Information Technology Services
    • /
    • v.12 no.1
    • /
    • pp.271-287
    • /
    • 2013
  • The rapid growth and dissemination of touch-based mobile devices such as smart phones and tablet PCs, gives numerous benefits to people using a variety of multimedia contents. Due to its portability, it enables users to watch a soccer game, search video from YouTube, and sometimes tag on contents on the road. However, the limited screen size of mobile devices and touch-based character input methods based on this, are still major problems of searching and tagging multimedia contents. In this paper, we propose WalkieTagging, which provides a much more intuitive way than that of previous one. Just like any other previous video tagging services, WalkieTagging, as a voice-based annotation service, supports inserting detailed annotation data including start time, duration, tags, with little effort of users. To evaluate our methods, we developed the Android-based WalkieTagging application and performed user study via a two-week. Through our experiments by a total of 46 people, we observed that experiment participator think our system is more convenient and useful than that of touch-based one. Consequently, we found out that voice-based annotation methods can provide users with much convenience and satisfaction than that of touch-based methods in the mobile environments.

Building a Korean conversational speech database in the emergency medical domain (응급의료 영역 한국어 음성대화 데이터베이스 구축)

  • Kim, Sunhee;Lee, Jooyoung;Choi, Seo Gyeong;Ji, Seunghun;Kang, Jeemin;Kim, Jongin;Kim, Dohee;Kim, Boryong;Cho, Eungi;Kim, Hojeong;Jang, Jeongmin;Kim, Jun Hyung;Ku, Bon Hyeok;Park, Hyung-Min;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.81-90
    • /
    • 2020
  • This paper describes a method of building Korean conversational speech data in the emergency medical domain and proposes an annotation method for the collected data in order to improve speech recognition performance. To suggest future research directions, baseline speech recognition experiments were conducted by using partial data that were collected and annotated. All voices were recorded at 16-bit resolution at 16 kHz sampling rate. A total of 166 conversations were collected, amounting to 8 hours and 35 minutes. Various information was manually transcribed such as orthography, pronunciation, dialect, noise, and medical information using Praat. Baseline speech recognition experiments were used to depict problems related to speech recognition in the emergency medical domain. The Korean conversational speech data presented in this paper are first-stage data in the emergency medical domain and are expected to be used as training data for developing conversational systems for emergency medical applications.

Acoustic correlates of prosodic prominence in conversational speech of American English, as perceived by ordinary listeners

  • Mo, Yoon-Sook
    • Phonetics and Speech Sciences
    • /
    • v.3 no.3
    • /
    • pp.19-26
    • /
    • 2011
  • Previous laboratory studies have shown that prosodic structures are encoded in the modulations of phonetic patterns of speech including suprasegmental as well as segmental features. Drawing on a prosodically annotated large-scale speech data from the Buckeye corpus of conversational speech of American English, the current study first evaluated the reliability of prosody annotation by a large number of ordinary listeners and later examined whether and how prosodic prominence influences the phonetic realization of multiple acoustic parameters in everyday conversational speech. The results showed that all the measures of acoustic parameters including pitch, loudness, duration, and spectral balance are increased when heard as prominent. These findings suggest that prosodic prominence enhances the phonetic characteristics of the acoustic parameters. The results also showed that the degree of phonetic enhancement vary depending on the types of the acoustic parameters. With respect to the formant structure, the findings from the present study more consistently support Sonority Expansion Hypothesis than Hyperarticulation Hypothesis, showing that the lexically stressed vowels are hyperarticulated only when hyperarticulation does not interfere with sonority expansion. Taken all into account, the present study showed that prosodic prominence modulates the phonetic realization of the acoustic parameters to the direction of the phonetic strengthening in everyday conversational speech and ordinary listeners are attentive to such phonetic variation associated with prosody in speech perception. However, the present study also showed that in everyday conversational speech there is no single dominant acoustic measure signaling prosodic prominence and listeners must attend to such small acoustic variation or integrate acoustic information from multiple acoustic parameters in prosody perception.

  • PDF