• 제목/요약/키워드: Speech Texts

검색결과 44건 처리시간 0.027초

자동 대소문자 식별을 이용한 영어 음성인식 결과의 가독성 향상 (Readability Enhancement of English Speech Recognition Output Using Automatic Capitalisation Classification)

  • 김지환
    • 대한음성학회지:말소리
    • /
    • 제61호
    • /
    • pp.101-111
    • /
    • 2007
  • A modified speech recogniser have been proposed for automatic capitalisation generation to improve the readability of English speech recognition output. In this modified speech recogniser, every word in its vocabulary is duplicated: once in a de-caplitalised form and again in the capitalised forms. In addition its language model is re-trained on mixed case texts. In order to evaluate the performance of the proposed system, experiments of automatic capitalisation generation were performed for 3 hours of Broadcast News(BN) test data using the modified HTK BN transcription system. The proposed system produced an F-measure of 0.7317 for automatic capitalisation generation with an SER of 48.55, a precision of 0.7736 and a recall of 0.6942.

  • PDF

A Low-Cost Speech to Sign Language Converter

  • Le, Minh;Le, Thanh Minh;Bui, Vu Duc;Truong, Son Ngoc
    • International Journal of Computer Science & Network Security
    • /
    • 제21권3호
    • /
    • pp.37-40
    • /
    • 2021
  • This paper presents a design of a speech to sign language converter for deaf and hard of hearing people. The device is low-cost, low-power consumption, and it can be able to work entirely offline. The speech recognition is implemented using an open-source API, Pocketsphinx library. In this work, we proposed a context-oriented language model, which measures the similarity between the recognized speech and the predefined speech to decide the output. The output speech is selected from the recommended speech stored in the database, which is the best match to the recognized speech. The proposed context-oriented language model can improve the speech recognition rate by 21% for working entirely offline. A decision module based on determining the similarity between the two texts using Levenshtein distance decides the output sign language. The output sign language corresponding to the recognized speech is generated as a set of sequential images. The speech to sign language converter is deployed on a Raspberry Pi Zero board for low-cost deaf assistive devices.

대용량 연속 음성 인식 시스템에서의 코퍼스 선별 방법에 의한 언어모델 설계 (A Corpus Selection Based Approach to Language Modeling for Large Vocabulary Continuous Speech Recognition)

  • 오유리;윤재삼;김홍국
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2005년도 추계 학술대회 발표논문집
    • /
    • pp.103-106
    • /
    • 2005
  • In this paper, we propose a language modeling approach to improve the performance of a large vocabulary continuous speech recognition system. The proposed approach is based on the active learning framework that helps to select a text corpus from a plenty amount of text data required for language modeling. The perplexity is used as a measure for the corpus selection in the active learning. From the recognition experiments on the task of continuous Korean speech, the speech recognition system employing the language model by the proposed language modeling approach reduces the word error rate by about 6.6 % with less computational complexity than that using a language model constructed with randomly selected texts.

  • PDF

Prosodic Features at "Sentence Boundaries" in Oral Presentations

  • Umesaki, Atsuko-Furuta
    • 대한음성학회지:말소리
    • /
    • 제41호
    • /
    • pp.83-96
    • /
    • 2001
  • It is generally said that falling intonation is used at the end of a declarative sentence. However, this is not the case with all stretches of spontaneous speech which are marked in transcription as sentences. The present paper examines intonation patterns appearing at the end of declarative sentences in oral presentations, and discusses instances where falling intonation does not appear. The texts used for analysis are eight oral presentations collected at international conferences in the field of physics. Quantitative and qualitative analyses are carried out. Three major factors related to discourse structure have been found for non-occurrence of falling intonation at sentence boundaries.

  • PDF

Prosodic Features at "Sentence Boundaries" in Oral Presentations

  • Umesaki, Atsuko-Furuta
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2000년도 7월 학술대회지
    • /
    • pp.149-164
    • /
    • 2000
  • It is generally said that falling intonation is used at the end of a declarative sentence. However, this is not the case with all stretches of spontaneous speech which are marked in transcription as sentences. The present paper examines intonation patterns appearing at the end of declarative sentences in oral presentations, and discusses instances where falling intonation does not appear. The texts used for analysis are eight oral presentations collected at international conferences in the field of physics. Quantitative and qualitative analyses are carried out. Three major factors related to discourse structure have been found for nonoccurrence of falling intonation at sentence boundaries.

  • PDF

운동실조형 마비성구음장애에 적용되는 지각적, 음향학적, 생리학적 도구에 관하여 - 환자사례를 중심으로 - (Perceptual, Acoustical, and Physiological Tools in Ataxic Dysarthria Management: A Case Report)

  • 김향희
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 1996년도 2월 학술대회지
    • /
    • pp.9-22
    • /
    • 1996
  • Among the various dysarthric subtypes, diagnosis of ataxic dysarthria is rendered when the speech characteristics include imprecise and irregular articulatory breakdowns, marked degree of speech rate impairment, overall monopitch and monoloudness, and respiratory-articulatory incoordination. Traditionally, speech pathologists have relied only upon their ‘ears’ to describe and evaluate the dysarthric speech. A statement of percentage of correct words identified by a listener do not provide so much more than an index of severity. Within the same perceptual dimension, a carefully constructed speech intelligibility test can specify patterns of errors. The patterns can contain a diagnostic value as well as Provide strategies for remediation. The phonetically transcribed texts on single words and a standard passage, 'kail' produced by an ataxic dysarthria are presented in this report, with an emphasis of the articulatory error analysis. Furthermore,, acoustic tools [e.g., spectrography to measure formant transitions, segment durations, consonant spectra, etc.] are utilized to serve as basic measures that objectively document patients' speech intelligibility, Finally, the treatment methods [e.g., spectrography as a visual feedback, gestural reorganization using pacing method, DAF (Delayed Auditory Feedback)] to modify the dysarthric behaviors are presented.

  • PDF

한국어 대등적 연결어미 '-고'의 함축 의미와 운율 (The Implicational Meaning and Prosody of Conjunctive Marker '-ko' in Korean)

  • 김미란
    • 음성과학
    • /
    • 제8권4호
    • /
    • pp.289-305
    • /
    • 2001
  • The conjunctive marker '-ko' in Korean can be interpreted as meaning either conjunctive 'and' or ordering 'and then'. The interpretation of '-ko' is ambiguous in written texts but not in spoken texts. It is because the meaning of the utterance is determined by the combination of the text with its prosody. The two meanings of ' -ko' can be explained by the theory of implicature, which was introduced by Grice (1973, 1981). This paper examines the meaning of the marker '-ko' with respect to the relation between its meaning and prosody. The results of the experiments in this paper showed that the prosodic phrasing in Korean influences the interpretation of the marker '-ko'. When two constituents combined by '-ko' are realized in the same accentual phrase, the marker can be interpreted as meaning 'exactly be orderly'. This meaning can be classified as the Particularlized Conversational Implicature (PCl) in Gricean theory. In the other cases of phrasing, the marker '-ko' can mean either 'conjunctive' or 'be orderly' by the Generalized Conversational Implicature (GCI). The fact that phrasing determines the interpretations of the marker '-ko' can be seen as supporting the view that prosody interacts with various levels of linguistic phenomena from phonology to pragmatics.

  • PDF

낭독체에 나타난 중국인 학습자들의 운율구 실현 양상 -청취실험을 바탕으로- (Aspects of Prosodic Phrases' Formation Produced by Chinese Speakers in the Reading of Korean Text)

  • 윤영숙
    • 음성과학
    • /
    • 제15권4호
    • /
    • pp.29-41
    • /
    • 2008
  • The purpose of this paper is to examine how Chinese speakers realize Korean prosodic phrases in the reading of Korean texts. Prosodic phrase, in this study, is defined as basic unit of spoken language which can be perceived as purely separate phonetic unit by both hearer and speaker, and is realized with a coherent intonational configuration. Prosodic phrase plays an important role in both speech production and perception. In the second language acquisition, prosody influences the accuracy and fluency of spoken language. The main purpose of this study is to describe the aspect of syntagmatic operation of prosody that produces prosodic phrases. We have specifically examined the relations between the prosodic phrase's boundary and its syntactic status. Furthermore, we examined internal syntactic structure of each prosodic phrase. And the results of each analysis were compared to the aspects of prosodic phrases' formation produced by native Korean speakers. The results show that Chinese speakers tend to coincide the prosodic phrases with syntactic structure more than native Korean speakers.

  • PDF

담화상에 나타나는 목적격표지 {-를}의 음향적 특성 (The Acoustic Characteristics of the Korean Accusative Marker {l${\i}$l} in Discourse)

  • 김기호;김화영;김민정
    • 음성과학
    • /
    • 제6권
    • /
    • pp.55-82
    • /
    • 1999
  • The purpose of this paper is to investigate the acoustic characteristics of the Korean accusative marker {-lil} which functions as a discourse marker in discourse. Generally, in written texts or read speeches, it is seldom omitted and it certainly seems to serve a grammatical function. But in ordinary discourse, speakers do not use it in many cases. That is, the environments speakers use {-lil} differ from those they do not. According to the semantic interpretations, {-lil} functions as a pragmatic factor and adds to the meaning of the object in an utterance. In this paper, by comparing the acoustic characteristics of the utterances that contain the marker {-lil} with those of utterances that do not, especially based on Korean Intonational Phonology, we will demonstrate that the Korean accusative marker {-lil} shows clearly the acoustic characteristics related to the pragmatic factors which reflect speakers' special intention.

  • PDF

코퍼스 기반 무제한 단어 중국어 TTS (Corpus Based Unrestricted vocabulary Mandarin TTS)

  • ;하주홍;김병창;이근배
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2003년도 10월 학술대회지
    • /
    • pp.175-179
    • /
    • 2003
  • In order to produce a high quality (intelligibility and naturalness) synthesized speech, it is very important to get an accurate grapheme-to-phoneme conversion and prosody model. In this paper, we analyzed Chinese texts using a segmentation, POS tagging and unknown word recognition. We present a grapheme-to-phoneme conversion using a dictionary-based and rule-based method. We constructed a prosody model using a probabilistic method and a decision tree-based error correction method. According to the result from the above analysis, we can successfully select and concatenate exact synthesis unit of syllables from the Chinese Synthesis DB.

  • PDF