• Title/Summary/Keyword: Korean speech

Search Result 5,307, Processing Time 0.03 seconds

A Phoneme-based Approximate String Searching System for Restricted Korean Character Input Environments (제한된 한글 입력환경을 위한 음소기반 근사 문자열 검색 시스템)

  • Yoon, Tai-Jin;Cho, Hwan-Gue;Chung, Woo-Keun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.10
    • /
    • pp.788-801
    • /
    • 2010
  • Advancing of mobile device is remarkable, so the research on mobile input device is getting more important issue. There are lots of input devices such as keypad, QWERTY keypad, touch and speech recognizer, but they are not as convenient as typical keyboard-based desktop input devices so input strings usually contain many typing errors. These input errors are not trouble with communication among person, but it has very critical problem with searching in database, such as dictionary and address book, we can not obtain correct results. Especially, Hangeul has more than 10,000 different characters because one Hangeul character is made by combination of consonants and vowels, frequency of error is higher than English. Generally, suffix tree is the most widely used data structure to deal with errors of query, but it is not enough for variety errors. In this paper, we propose fast approximate Korean word searching system, which allows variety typing errors. This system includes several algorithms for applying general approximate string searching to Hangeul. And we present profanity filters by using proposed system. This system filters over than 90% of coined profanities.

An Emotion Scanning System on Text Documents (텍스트 문서 기반의 감성 인식 시스템)

  • Kim, Myung-Kyu;Kim, Jung-Ho;Cha, Myung-Hoon;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.12 no.4
    • /
    • pp.433-442
    • /
    • 2009
  • People are tending to buy products through the Internet rather than purchasing them from the store. Some of the consumers give their feedback on line such as reviews, replies, comments, and blogs after they purchased the products. People are also likely to get some information through the Internet. Therefore, companies and public institutes have been facing this situation where they need to collect and analyze reviews or public opinions for them because many consumers are interested in other's opinions when they are about to make a purchase. However, most of the people's reviews on web site are too numerous, short and redundant. Under these circumstances, the emotion scanning system of text documents on the web is rising to the surface. Extracting writer's opinions or subjective ideas from text exists labeled words like GI(General Inquirer) and LKB(Lexical Knowledge base of near synonym difference) in English, however Korean language is not provided yet. In this paper, we labeled positive, negative, and neutral attribute at 4 POS(part of speech) which are noun, adjective, verb, and adverb in Korean dictionary. We extract construction patterns of emotional words and relationships among words in sentences from a large training set, and learned them. Based on this knowledge, comments and reviews regarding products are classified into two classes polarities with positive and negative using SO-PMI, which found the optimal condition from a combination of 4 POS. Lastly, in the design of the system, a flexible user interface is designed to add or edit the emotional words, the construction patterns related to emotions, and relationships among the words.

  • PDF

Pivot Discrimination Approach for Paraphrase Extraction from Bilingual Corpus (이중 언어 기반 패러프레이즈 추출을 위한 피봇 차별화 방법)

  • Park, Esther;Lee, Hyoung-Gyu;Kim, Min-Jeong;Rim, Hae-Chang
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.1
    • /
    • pp.57-78
    • /
    • 2011
  • Paraphrasing is the act of writing a text using other words without altering the meaning. Paraphrases can be used in many fields of natural language processing. In particular, paraphrases can be incorporated in machine translation in order to improve the coverage and the quality of translation. Recently, the approaches on paraphrase extraction utilize bilingual parallel corpora, which consist of aligned sentence pairs. In these approaches, paraphrases are identified, from the word alignment result, by pivot phrases which are the phrases in one language to which two or more phrases are connected in the other language. However, the word alignment is itself a very difficult task, so there can be many alignment errors. Moreover, the alignment errors can lead to the problem of selecting incorrect pivot phrases. In this study, we propose a method in paraphrase extraction that discriminates good pivot phrases from bad pivot phrases. Each pivot phrase is weighted according to its reliability, which is scored by considering the lexical and part-of-speech information. The experimental result shows that the proposed method achieves higher precision and recall of the paraphrase extraction than the baseline. Also, we show that the extracted paraphrases can increase the coverage of the Korean-English machine translation.

  • PDF

The Effects of Physical Function Level and Intensity of Treatment for Rehabilitation on Improvement of Physical Function in Children with Cerebral Palsy: Follow-up Study for 6 Months (뇌성마비 아동의 신체 기능수준과 재활 목적 치료 강도가 신체 기능향상에 미치는 영향: 6개월간 추적연구)

  • Kim, Bu-Young;Yun, Young-Ju;Shin, Yong-Beom;Kim, Soo-Yeon;Oh, Tae-Young
    • Journal of the Korean Society of Physical Medicine
    • /
    • v.13 no.1
    • /
    • pp.27-38
    • /
    • 2018
  • PURPOSE: The purpose of this study was to find out the treatment patterns of Children with cerebral palsy, and to analyze the effect of physical function level and treatment intensity on improvement of physical function in children with cerebral palsy for six months. METHODS: Participants were 126 children (boys 83, girls 43) diagnosed cerebral palsy that the mean age was at 33months, ranged from 8 months to 77 months. We collected data related on demography and disable characteristic, treatment pattern using by questionnaire constructed ourselves for six months on caregivers. The treatment pattern includes, type, frequency, and institute of treatment. We performed the evaluation of Gross Motor Function Measurement (GMFM) and Pediatric Evaluation of Disability Inventory (PEDI) between pre and post for six months in order to find out improvement of physical function. We analyzed the effect of physical functional level measured by Gross Motor Functional Classification system, age, treatment intensity on physical function using by repeated measures ANOVA for SPSS PC ver. 22.0. RESULTS: The average of treatment frequency for physical therapy was 5.74 times per week, occupational therapy was 3.96 times, speech therapy was 2.96 times, treatment for accompanying disability was 3.12 times. Physical function level and age was significantly factors affecting improvement of physical function, there was no significant difference according to treatment intensity. CONCLUSION: We suggest that physical function and age might be important factors on improvement of physical function and professional rehabilitation team must consider the appropriate treatment type customized to each children.

Hanja Information in the Entries of Korean Unabridged Dictionary (국어대사전의 표제어에 나타나는 한자 정보)

  • Kim, Cheol-Su
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.4
    • /
    • pp.438-446
    • /
    • 2010
  • For language information processing that includes both Hangul and Hanja, an electronic dictionary supporting Hangul and Hanja simultaneously is necessary. This paper examined statistical information on Hanja entries of Korean Unabridged Dictionary such as the number of entries that include Hanja based on the KSC-5601 character set, the frequency of the pronunciation and meaning of each character of Hanja included in the entries, the frequency per part of speech of Hanja in entries and the average number of Hanja characters per entry. At least one or more of Hanja characters appear in 303,951 entries out of 440,594, accounting for 68.99% of the total. 858,595 characters of Hanja are included in the 440,594 entries, which is 1.95 Hanja characters per entry. As the average syllable length of the entries is 3.56 and the average count of the Hanja characters per entry is 1.96, it can be said that 54.7% of all the characters of the entries are in Hanja. Among 4,888 Hanja character codes, 4,660 are used once or more, whereas 228 Hanja codes never appear in any entry. There were 5 characters which appear more than 4,000 times. A total of 858,595 Hanja characters used in all the entries correspond to 471 Hangeul codes.

Effective Feature Vector for Isolated-Word Recognizer using Vocal Cord Signal (성대신호 기반의 명령어인식기를 위한 특징벡터 연구)

  • Jung, Young-Giu;Han, Mun-Sung;Lee, Sang-Jo
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.3
    • /
    • pp.226-234
    • /
    • 2007
  • In this paper, we develop a speech recognition system using a throat microphone. The use of this kind of microphone minimizes the impact of environmental noise. However, because of the absence of high frequencies and the partially loss of formant frequencies, previous systems developed with those devices have shown a lower recognition rate than systems which use standard microphone signals. This problem has led to researchers using throat microphone signals as supplementary data sources supporting standard microphone signals. In this paper, we present a high performance ASR system which we developed using only a throat microphone by taking advantage of Korean Phonological Feature Theory and a detailed throat signal analysis. Analyzing the spectrum and the result of FFT of the throat microphone signal, we find that the conventional MFCC feature vector that uses a critical pass filter does not characterize the throat microphone signals well. We also describe the conditions of the feature extraction algorithm which make it best suited for throat microphone signal analysis. The conditions involve (1) a sensitive band-pass filter and (2) use of feature vector which is suitable for voice/non-voice classification. We experimentally show that the ZCPA algorithm designed to meet these conditions improves the recognizer's performance by approximately 16%. And we find that an additional noise-canceling algorithm such as RAST A results in 2% more performance improvement.

A study on the lip shape recognition algorithm using 3-D Model (3차원 모델을 이용한 입모양 인식 알고리즘에 관한 연구)

  • 배철수
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.3 no.1
    • /
    • pp.59-68
    • /
    • 1999
  • Recently, research and developmental direction of communication system is concurrent adopting voice data and face image in speaking to provide more higher recognition rate then in the case of only voice data. Therefore, we present a method of lipreading in speech image sequence by using the 3-D facial shape model. The method use a feature information of the face image such as the opening-level of lip, the movement of jaw, and the projection height of lip. At first, we adjust the 3-D face model to speeching face image sequence. Then, to get a feature information we compute variance quantity from adjusted 3-D shape model of image sequence and use the variance quality of the adjusted 3-D model as recognition parameters. We use the intensity inclination values which obtaining from the variance in 3-D feature points as the separation of recognition units from the sequential image. After then, we use discrete HMM algorithm at recognition process, depending on multiple observation sequence which considers the variance of 3-D feature point fully. As a result of recognition experiment with the 8 Korean vowels and 2 Korean consonants, we have about 80% of recognition rate for the plosives and vowels. We propose that usability with visual distinguishing factor that using feature vector because as a result of recognition experiment for recognition parameter with the 10 korean vowels, obtaining high recognition rate.

  • PDF

Joke-Related Aspects and their Significance in Traditional Korean Funny Performing Arts (한국 전통연희에서의 재담의 양상과 그 의의)

  • Son, Tae-do
    • Journal of Korean Classical Literature and Education
    • /
    • no.32
    • /
    • pp.29-61
    • /
    • 2016
  • A joke (才談, 재담) is "the most interesting and witty language unit" in our speech. However, the search of a joke is still starting. Although joke are related to the witty and interesting talks, stories, songs and plays, the actual object of a joke is only the witty and interesting talk. A joke is witty talk that is interesting or laughter-inducing. Many Jokes can be found in the traditional Korean funny performing arts (演戱, 연희). This is because these art forms are performed in open yards, which necessitated amusing the audience, amusement, in its turn, required jokes. Jokes in the traditional funny performing arts can generally be classified as follows: 1) Jokes related to a situation: These include right words at a given situation, exaggerating words, diminishing words, deviancy words, and cause-effect words. 2) Jokes related to discourse: These include enumerating words, amplificatory words, contrasting words, fluently lying words, undeniable words, purposely unknowing words, and deliberately incorrect words. 3) Jokes related to vocabulary: These include synonym, similar words, changed word-ordering words, and incorrect words. 4) Jokes related to pronunciation: These include homonyms, and anti-homonyms. Although there may be other jokes, those presented above are typical ones. A joke is "the result that human being can achieve when he/she has overcome natural and social difficulties and is left with only a free and creative spirit." Jokes are necessary in all ages and everywhere. Today, more varied and high-level jokes can be created by developing the diversity of jokes in traditional funny performing arts. Also, I expect new sorts of jokes, because a joke always demands a creative spirit.

An Hwak's Recognition of 'Joseon' and 'Joseon Cheolhak' (안확의 '조선' 인식과 '조선철학')

  • Lee, Haeng Hoon
    • The Journal of Korean Philosophical History
    • /
    • no.50
    • /
    • pp.171-200
    • /
    • 2016
  • The full-scaled study of Joseon conducted by Japan in the 1910s was part of its colonial policy, while the native Joseon studies against it contained political aspiration to recover the national rights and independence. Accordingly, the conceptual meaning of 'Joseon' varied according to its subject of speech. The establishment of modern nation-state failed along with the extinction of Korean Empire, but 'Joseon' was newly discovered within national ideology. It became a historical concept in which the experience of the past and the expectation toward the future could be united. The so-called 'Joseon Studies' was only limited to intellectuals in the academic circle, but 'Joseon' embraced the articulations from more various social agents. Furthermore, it is only natural that 'Joseon Studies' should be interpreted within the historical semantics of 'Joseon', considering the connection between concept and discourse. In his The History of Joseon Civilization, An Hwak encompassed the history from the times of ancient mythology to the contemporary times under the banner of 'Joseon'. Opposing Japanese distortion of history carried out in the name of historical positivism, he idealized Joseon history as comparable to that of the Western democracy. He extended the study of 'Joseon' into culture at large, foreshadowing a kind of Joseon philosophy. In his An Overview of Joseon Philosophical Ideas, the first description of 'Joseon philosophy' as an independent field, he proposed philosophy as one of three sources of pride in Joseon and asserted its uniqueness and originality compared to the West. It was an attempt to grasp the peculiarity of Joseon ideas from a perspective of the history of universal human civilization. He considered 'Jong'(倧) as an ideological foundation held from the ancient to the modern times, and the acceptance of Buddhism and Confucianism as beneficial to 'Joseon philosophy'. The birth of 'Joseon philosophy', the modern transformation of the traditional knowledge system, was an intellectual experiment to apply traditional knowledge to the modern disciplinary classification system.

Analyzing Vocabulary Characteristics of Colloquial Style Corpus and Automatic Construction of Sentiment Lexicon (구어체 말뭉치의 어휘 사용 특징 분석 및 감정 어휘 사전의 자동 구축)

  • Kang, Seung-Shik;Won, HyeJin;Lee, Minhaeng
    • Smart Media Journal
    • /
    • v.9 no.4
    • /
    • pp.144-151
    • /
    • 2020
  • In a mobile environment, communication takes place via SMS text messages. Vocabularies used in SMS texts can be expected to use vocabularies of different classes from those used in general Korean literary style sentence. For example, in the case of a typical literary style, the sentence is correctly initiated or terminated and the sentence is well constructed, while SMS text corpus often replaces the component with an omission and a brief representation. To analyze these vocabulary usage characteristics, the existing colloquial style corpus and the literary style corpus are used. The experiment compares and analyzes the vocabulary use characteristics of the colloquial corpus SMS text corpus and the Naver Sentiment Movie Corpus, and the written Korean written corpus. For the comparison and analysis of vocabulary for each corpus, the part of speech tag adjective (VA) was used as a standard, and a distinctive collexeme analysis method was used to measure collostructural strength. As a result, it was confirmed that adjectives related to emotional expression such as'good-','sorry-', and'joy-' were preferred in the SMS text corpus, while adjectives related to evaluation expressions were preferred in the Naver Sentiment Movie Corpus. The word embedding was used to automatically construct a sentiment lexicon based on the extracted adjectives with high collostructural strength, and a total of 343,603 sentiment representations were automatically built.