• Title/Summary/Keyword: National Language

Search Result 3,207, Processing Time 0.029 seconds

Building a Korean conversational speech database in the emergency medical domain (응급의료 영역 한국어 음성대화 데이터베이스 구축)

  • Kim, Sunhee;Lee, Jooyoung;Choi, Seo Gyeong;Ji, Seunghun;Kang, Jeemin;Kim, Jongin;Kim, Dohee;Kim, Boryong;Cho, Eungi;Kim, Hojeong;Jang, Jeongmin;Kim, Jun Hyung;Ku, Bon Hyeok;Park, Hyung-Min;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.81-90
    • /
    • 2020
  • This paper describes a method of building Korean conversational speech data in the emergency medical domain and proposes an annotation method for the collected data in order to improve speech recognition performance. To suggest future research directions, baseline speech recognition experiments were conducted by using partial data that were collected and annotated. All voices were recorded at 16-bit resolution at 16 kHz sampling rate. A total of 166 conversations were collected, amounting to 8 hours and 35 minutes. Various information was manually transcribed such as orthography, pronunciation, dialect, noise, and medical information using Praat. Baseline speech recognition experiments were used to depict problems related to speech recognition in the emergency medical domain. The Korean conversational speech data presented in this paper are first-stage data in the emergency medical domain and are expected to be used as training data for developing conversational systems for emergency medical applications.

Unpaired Korean Text Style Transfer with Masked Language Model (마스크 언어 모델 기반 비병렬 한국어 텍스트 스타일 변환)

  • Bae, Jangseong;Lee, Changki;Noh, Hyungjong;Hwang, Jeongin
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.391-395
    • /
    • 2021
  • 텍스트 스타일 변환은 입력 스타일(source style)로 쓰여진 텍스트의 내용(content)을 유지하며 목적 스타일(target style)의 텍스트로 변환하는 문제이다. 텍스트 스타일 변환을 시퀀스 간 변환 문제(sequence-to-sequence)로 보고 기존 기계학습 모델을 이용해 해결할 수 있지만, 모델 학습에 필요한 각 스타일에 대응되는 병렬 말뭉치를 구하기 어려운 문제점이 있다. 따라서 최근에는 비병렬 말뭉치를 이용해 텍스트 스타일 변환을 수행하는 방법들이 연구되고 있다. 이 연구들은 주로 인코더-디코더 구조의 생성 모델을 사용하기 때문에 입력 문장이 가지고 있는 내용이 누락되거나 다른 내용의 문장이 생성될 수 있는 문제점이 있다. 본 논문에서는 마스크 언어 모델(masked language model)을 이용해 입력 텍스트의 내용을 유지하면서 원하는 스타일로 변경할 수 있는 텍스트 스타일 변환 방법을 제안하고 한국어 긍정-부정, 채팅체-문어체 변환에 적용한다.

  • PDF

A Preprocessor for English-to-Korean Machine Translation of Web Pages (웹용 영한 기계번역을 위한 문서 전처리기의 설계 및 구현)

  • An, Dong-Un;Ryu, Hong-Jin;Seo, Jin-Won;Lee, Young-Woo;Jeong, Sung-Jong;Yuh, Sang-Hwa;Kim, Tae-Wan;Park, Dong-In
    • Annual Conference on Human and Language Technology
    • /
    • 1997.10a
    • /
    • pp.249-254
    • /
    • 1997
  • 영어 웹 문서를 한국어로 기계번역을 하기 위해서는 HTML 태그를 번역 대상 문장과 분리하는 처리가 필요하다. HTML 태그를 단순히 제거하는 것이 아니라 대상 문장의 기계번역이 종료된 후에 같은 형태의 한국어 웹 문서로 복원하기 위한 방안이 마련 되어야 한다. 또한 문서 전처리기에서는 영어 형태소해석기의 성능을 높이기 위하여 번역 단위가 되는 문장의 인식 및 분리, 타이틀의 처리, 나열된 단어의 처리, 하이픈 처리, 고유명사 인식, 특수 문자 처리, 대소문자 정규화, 날짜 인식 등을 처리하여 문서의 정규화를 수행한다.

  • PDF

Assessment and Treatment of the Cleft Palate Speech Disorder by Use of the Nasometer (비음측정기를 사용한 구개열 언어의 평가 및 치료)

  • Shin, Hyo-Keun;Leem, Dae-Ho;Whang, Sang-Jun;Kim, Dong-Chil;Kim, Hyun-Gi
    • Korean Journal of Cleft Lip And Palate
    • /
    • v.11 no.1
    • /
    • pp.1-12
    • /
    • 2008
  • In cleft palate patient, characteristic of speech disorder is the resonance disorder result from velopharyngeal incompetence. Clinically VPI caused by congenital factor as congenital palatal incompetence, submucosal cleft palate, and caused by acquired factor as CNS damage, tumor, palatal palsy. The clinicians more concerned about the speech disorders after cleft palate surgery rather than language pathologist. The resonance disorder devided for hypernasality, hyponasality and nasal emission, but as a rule, hypernasality is typical phenomenon of the resonance disorder. Traditionally clinicians and language pathologists evaluated four-stage or five-stage of hypernasality by subjective assessment. Although language pathologist is well-trained, results of the language level should be different. In late 1980s, Kay Elemetrics Corp. developed nasometer that objective nasalance identified with well-trained language pathologist and originate from nasometer Tonar I and II were developed by Fletcher. Therefore objective nasalance test was possible, the nasometer used in hospital, collage and speech clinic both and home and abroad. Standardization of the cleft palate speech assessment must be settled without delay because of different character result in different language and different assessment results by dialect in same language. In our study, we provide the data base for the standardization of cleft palate speech assessment which through report of objective assessment method, speech therapy effects and problems result in interdisciplinary teamwork by nasometer use in treatment of cleft palate patient.

  • PDF

Displacement of the Korean Language and the Aesthetics of the Korean Diaspora (한국어의 탈지역과 한국적 이산의 미학)

  • Yim, Jin-Hee
    • Journal of English Language & Literature
    • /
    • v.54 no.1
    • /
    • pp.149-167
    • /
    • 2008
  • Korea has persisted in the notion of "ethnic nationalism." That is "one race, one people, one language" as a homogeneous entity. This social ideal of unity prevails, even in overseas Korean communities formed by voluntary and involuntary displacement in the turmoil of modern history: communities made intermittent with the Japanese colonial occupation and with postcolonial encounters with the West. Given that the Korean people suffered from the trauma of deprivation of the language caused by the loss of the nation, nation has been equated with the language. Accordingly, "these bearers of a homeland" are also firm Korean language holders. The linguistic patriotism of unity based on the intertwining of "mother tongue" and "father country" has become prevalent in the collective memory of the people of the Korean diaspora. Korean American literature has grappled with this concept of the national history of Korea and the Korean language. The aesthetics of Korean American literature has been marked by an influx of literary resources of 'Korea' in sensibilities and structure of feelings; Korean myth, folk lore, songs, humor, traditional stories, manners, customs and historic moments. An experimental use of the Korean alphabet, Hangeul, written down as pronounced, provides an ethnic flavor in the midst of the English texts. Despite its national framework of mind, however, Korean American literature as an interstitial art reveals a keen awareness of inbetweenness, and transnational hybrid identities. By exploring the complex interrelationships of cultural and linguistic boundary-crossing practices in Korean American literature, this paper argues that the poetics of the Korean diaspora challenges the closed structure of identity formation, and offers a transnational sphere to deconstruct a rigidly demarcated national ideology of "one race, one people, one language," for the world literary history.

Preservice Teachers' Difficulties with Statistical Writing

  • Park, Min-Sun;Park, Mimi;Lee, Eun-Jung;Lee, Kyeong Hwa
    • Research in Mathematical Education
    • /
    • v.16 no.4
    • /
    • pp.265-276
    • /
    • 2012
  • These days, with the emphasis on statistical literacy, the importance of communication is the focus of attention. Communication about statistics is important since it is a way of describing the understanding of concepts and the interpretation of data. However, students usually have trouble with expressing what they understand, especially through writing. In this paper, we examined preservice teachers' difficulties when they wrote about statistical concepts. By comparing preservice teachers' written responses and interview transcripts of the variance concept task, we could find the missing information in their written language compared to their verbal language. From the results, we found that preservice teachers had difficulty in connecting terms contextually and conceptually, presenting various factors of the concepts that they considered, and presenting the problem solving strategies that they used.

Part-of-Speech Tagging System Using Rules/Statistics Extracted by Unsupervised Learning (규칙과 비감독 학습 기반 통계정보를 이용한 품사 태깅 시스템)

  • Lee Donghun;Kang Mi-young;Hwang Myeong-jin;Hwon Hyuk-chul
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.445-447
    • /
    • 2005
  • 본 논문은 규칙 기반 방법과 통계 기반 방법을 동시에 사용함으로써 두 가지 방법의 장단점을 상호 보완한다. 한 문장에 대한 최적의 품사열은 HMM을 기반으로 Viterbi Algorithm을 사용하여 선택한다. 이때 파라미터 값은 규칙에 의한 가중치 값과 통계 정보를 사용한다. 최소한의 일반규칙을 사용하여 구축한 규칙의 적용에 따라 가중치 값을 구하며 규칙을 적용받지 못하는 경우는 비감독학습으로 추출한 통계정보에 기반을 둔 가중치 값을 이용하여 파라미터 값을 구한다. 이러한 기본 모델을 여러 회 반복하여 학습함으로써 최적의 통계기반 가중치를 구한다. 규칙과 비감독 학습으로 추출한 통계정보를 이용한 본 품사 태깅 시스템의 어절 기반 정확도는 $97.78\%$이다.

  • PDF

KOREAN TOPIC MODELING USING MATRIX DECOMPOSITION

  • June-Ho Lee;Hyun-Min Kim
    • East Asian mathematical journal
    • /
    • v.40 no.3
    • /
    • pp.307-318
    • /
    • 2024
  • This paper explores the application of matrix factorization, specifically CUR decomposition, in the clustering of Korean language documents by topic. It addresses the unique challenges of Natural Language Processing (NLP) in dealing with the Korean language's distinctive features, such as agglutinative words and morphological ambiguity. The study compares the effectiveness of Latent Semantic Analysis (LSA) using CUR decomposition with the classical Singular Value Decomposition (SVD) method in the context of Korean text. Experiments are conducted using Korean Wikipedia documents and newspaper data, providing insight into the accuracy and efficiency of these techniques. The findings demonstrate the potential of CUR decomposition to improve the accuracy of document clustering in Korean, offering a valuable approach to text mining and information retrieval in agglutinative languages.

Examining Generalizability of Kang's (1999) Model of Structural Relationships between ESL Learning Strategy Use and Language Proficiency

  • Kang, Sung-Woo
    • English Language & Literature Teaching
    • /
    • v.7 no.2
    • /
    • pp.55-75
    • /
    • 2002
  • The present study examined whether Kang's (1999) model of the relationships among language learning strategy use and language proficiency for the Asian students could be applied to a more heterogeneous group. In Kang's study, he collected information of language learning strategies of 957 foreign students learning English as a second language in American colleges through a questionnaire. He also measured the subjects' language proficiency with the Institutional Testing Program TOEFL (Test of English as a Foreign Language). This study analyzed the same data without the limitation of cultural identity. Structural equation modeling was used to model the relationships among strategy use and language proficiency. Then, the model of the present study was descriptively compared with Kang's (1999) model for the Asian students. The overall flow of the relationship paths appeared to vary very little across the two models, which would have indicated that the generalizability of Kang's (1999) model could be extended more than originally examined. (156)

  • PDF