• Title/Summary/Keyword: Corpus-based

Search Result 568, Processing Time 0.025 seconds

MTReadable: Arabic Readability Corpus for Medical Tests Information

  • Alahmdi, Dimah;Alghamdi, Athir Saeed;Almuallim, Neda'a;Alarifi, Suaad
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.5
    • /
    • pp.84-89
    • /
    • 2021
  • Medical tests are very important part of the health monitoring process. It is performed for various reasons like diagnosing diseases, determining medications effectiveness, etc. Due to that, patients should be able to read and understand the available online tests and results in order to take proper decisions regarding their health condition. In fact, people are varying in their educational level and health backgrounds that make providing such information in an easily readable format by the majority of people considered as a challenge in the health domain since ever. This paper describes the MTReadable corpus which constructed for evaluating the readability of online medical tests. It covered 32 basic periodic check-up tests with over 36k words. These tests information are annotated and labelled based on three readability levels which are easy, neutral and difficult by three non-specialists native Arabic speakers. This paper contributes to enriching the Arabic health research community with an investigation of the level of readability of online medical tests and to be a baseline for further complex health online reports and information.

Translating English By-Phrase Passives into Korean: A Parallel Corpus Analysis (영한 병렬 코퍼스에 나타난 영어 수동문의 한국어 번역)

  • Lee, Seung-Ah
    • Journal of English Language & Literature
    • /
    • v.56 no.5
    • /
    • pp.871-905
    • /
    • 2010
  • This paper is motivated by Watanabe's (2001) observation that English byphrase passives are sometimes translated into Japanese object topicalization constructions. That is, the original English sentence in the passive may be translated into the active voice with the logical object topicalized. A number of scholars, including Chomsky (1981) and Baker (1992), have remarked that languages have various ways to avoid focusing on the logical subject. The aim of the present study is to examine the translation equivalents of the English by-phrase passives in an English-Korean parallel corpus compiled by the author. A small sample of articles from Newsweek magazine and its published Korean translation reveals that there are indeed many ways to translate English by-phrase passives, including object topicalization (12.5%). Among the 64 translated sentences analyzed and classified, 12 (18.8%) examples were problematic in terms of agent defocusing, which is the primary function of passives. Of these 12 instances, five cases were identified where an alternative translation would be more suitable. The results suggest that the functional characteristics of English by-phrase passives should be highlighted in translator training as well as language teaching.

ToBI and beyond: Phonetic intonation of Seoul Korean ani in Korean Intonation Corpus (KICo)

  • Ji-eun Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • This study investigated the variation in the intonation of Seoul Korean interjection ani across different meanings ("no" and "really?") and speech levels (Intimate and Polite) using data from Korean Intonation Corpus (KICo). The investigation was conducted in two stages. First, IP-final tones in the dataset were categorized according to the K-ToBI convention (Jun, 2000). While significant relationships were observed between the meaning of ani and its IP-final tones, substantial overlap between groups was notable. Second, the F0 characteristics of the final syllable of ani were analyzed to elucidate the apparent many-to-many relationships between intonation and meaning/speech level. Results indicated that these seemingly overlapping relationships could be significantly distinguished. Overall, this study advocates for a deeper analysis of phonetic intonation beyond ToBI-based categorical labels. By examining the F0 characteristics of the IP-final syllable, previously unclear connections between meaning/speech level and intonation become more comprehensible. Although ToBI remains a valuable tool and framework for studying intonation, it is imperative to explore beyond these categories to grasp the "distinctiveness" of intonation, thereby enriching our understanding of prosody.

A Situation-Based Dialogue Management with Dialogue Examples (대화 예제를 이용한 상황 기반 대화 관리 시스템)

  • Lee, Cheong-Jae;Jung, Sang-Keun;Lee, Geun-Bae
    • MALSORI
    • /
    • no.56
    • /
    • pp.185-194
    • /
    • 2005
  • In this paper, we present POSSDM (POSTECH Situation-Based Dialogue Manager) for a spoken dialogue system using a new example and situation-based dialogue management technique for effective generation of appropriate system responses. Spoken dialogue system should generate cooperative responses to smoothly control dialogue flow with the users. We introduce a new dialogue management technique incorporating dialogue examples and situation-based rules for EPG (Electronic Program Guide) domain. For the system response inference, we automatically construct and index a dialogue example database from dialogue corpus, and the best dialogue example is retrieved for a proper system response with the query from a dialogue situation including a current user utterance, dialogue act, and discourse history. When dialogue corpus is not enough to cover the domain, we also apply manually constructed situation-based rules mainly for meta-level dialogue management.

  • PDF

Compar ison of Level Set-based Active Contour Models on Subcor tical Image Segmentation

  • Vongphachanh, Bouasone;Choi, Heung-Kook
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.7
    • /
    • pp.827-833
    • /
    • 2015
  • In this paper, we have compared three level set-based active contour (LSAC) methods on inhomogeneous MR image segmentation which is known as an important role of brain diseases to diagnosis and treatment in early. MR image is often occurred a problem with similar intensities and weak boundaries which have been causing many segmentation methods. However, LSAC method could be able to segment the targets such as the level set based on the local image fitting energy, the local binary fitting energy, and local Gaussian distribution fitting energy. Our implemented and tested the subcortical image segmentations were the corpus callosum and hippocampus and finally demonstrated their effectiveness. Consequently, the level set based on local Gaussian distribution fitting energy has obtained the best model to accurate and robust for the subcortical image segmentation.

A Situation-Based Dialogue Management with Dialogue Examples (대화 예제를 이용한 상황 기반 대화 관리 시스템)

  • Lee, Cheon-Jae;Jung, Sang-Keun;Lee, Geun-Bae
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.113-115
    • /
    • 2005
  • In this paper, we present POSSDM (POSTECH Situation-Based Dialogue Manager) for a spoken dialogue system using a new example and situation-based dialogue management techniques for effective generation of appropriate system responses. Spoken dialogue system should generate cooperative responses to smoothly control dialogue flow with the users. We introduce a new dialogue management technique incorporating dialogue examples and situation-based rules for EPG (Electronic Program Guide) domain. For the system response inference, we automatically construct and index a dialogue example database from dialogue corpus, and the best dialogue example is retrieved for a proper system response with the query from a dialogue situation including a current user utterance, dialogue act, and discourse history. When dialogue corpus is not enough to cover the domain, we also apply manually constructed situation-based rules mainly for meta-level dialogue management.

  • PDF

A Reconsideration of Asymmetries of Bracketing Paradoxes in English Derivation: a Corpus-based Approach

  • Kim, Jin-hyung
    • Journal of English Language & Literature
    • /
    • v.55 no.3
    • /
    • pp.475-495
    • /
    • 2009
  • In this paper, I discuss some asymmetries of bracketing paradoxes from a corpus-based perspective. Through a critical examination of previous analyses of bracketing paradoxes, it is demonstrated that the cases of apparent asymmetries of bracketing paradoxes are consistently accounted for when combined with the frequency-based parsability in morphological processing. Based on the relative frequency, this paper argues that bracketing paradoxes are well-atttested when their immediate bases are frequent and productive enough to be accessed as a unit and stored as such in memory. This is an extension of Hay 2002 which conducted a comprehensive survey of differential frequency effects in suffix pairs. A frequency-based approach to bracketing paradoxes adopted in this paper can be a challenge to the conventional formal theory by assuming a major role of language use and have the potential to significantly advance our understanding of the asymmetries observed in the real language world.

Phospholipids from Bombycis corpus and Their Neurotrophic Effects

  • Yeon Jung;Kwon, Hak-Cheol;Cho, Se-Yeon;Cho, Ock-Ryun;Yang, Min-Cheol;Kim, Sun-Yeou;Lee, Kang-Ro
    • Proceedings of the Korean Society of Sericultural Science Conference
    • /
    • 2003.10a
    • /
    • pp.58-65
    • /
    • 2003
  • This study was carried out to investigate active constituents of Bombysis corpus on the neurite outgrowth from PCl2 cells led to isolate three phospholipids (4 6) and three aromatic amines (13) were obtained from the methanol extract of Bombycis corpus. Based on spectral data, their structures have been elucidated as nicotiamide (1), cytidine (2), adenine (3), 1-O-(9Z-octadecenoyl)-2-O-(8Z, 11Z-octadecadienoyl)-sn-glycero-3-phosphorylcholine(4), 1, 2-di-O-hexadecanoyl-sn-glycero-3-phosphorylcholine(5) and 1, 2-di-O-9Z-octadecenoyl-sn-glycero-3-phosphorylcholine(6). (omitted)

  • PDF

Generating a Category Set of Words Using a Hierarchical Part-of-speech System and Tagged Corpus

  • Kojima, Takeyuki;Kotani, Yoshiyuki
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.217-226
    • /
    • 2002
  • In this paper, we propose a method of generating a proper categorization of morphemes by giving a hierarchical part-of-speech system and a corpus tagged using this part-of-speech system. Our method use hierarchical information in the part-of-speech system and statistical information in the corpus to generate a category set. The statistical information is based on the context of occurrence of categories. First, we specify the format of given information. Then, we describe an algorithm to generate a proper categorization. Finally, we present the results of our experiments in applying this method. We obtained a moderately proper categorization and found several candidates for improvement .

  • PDF

Corpus-based analysis of the usage of Korean markers -(n)un and -i/ka in editorial texts

  • Kim, Kyoung-Young
    • Language and Information
    • /
    • v.19 no.2
    • /
    • pp.19-36
    • /
    • 2015
  • The aim of this paper is to investigate the usage of Korean markers -(n)un and -i/ka in editorial texts focusing on information structure. Noun phrases ending with the markers -(n)un and -i/ka were annotated semi-automatically using a corpus obtained from an online newspaper. Two important factors to determine the choice of markers were examined with the annotated data: referential givenness/newness and position in a sentence. Referential givenness and newness were adopted as indicators of information structure, topic and focus respectively. In addition to quantitative analysis, qualitative analysis was conducted on the selected data. The results suggest that both the marker -(n)un and -i/ka could carry a topic and a focus reading. Sentence position also played a crucial role in determining the marker, and the marker -i/ka was used more frequently in a later position of a sentence than the marker -(n)un.

  • PDF