• Title/Summary/Keyword: Lexical model

Search Result 100, Processing Time 0.025 seconds

Generate of OCL on XML Sechma Meta Model (XML 스키마 메타모델에서 OCL 생성)

  • Lee Don-Yang;Choi Han-Yong
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.6
    • /
    • pp.42-49
    • /
    • 2006
  • XML used rapid method of meta language representation in internet for information transmission. In addition to XML Schema used frequency specification to variety data type. This thesis designed to Simple Type meta model of XML schema using UML. But because structure of XML schema complicate and suppose variety data type we can recognize many difficult matter to user's apprehension and application of model properties that appeared UML. To way out of this matter this study could specified clearly to structured expression in XML schema meta model that is applied OCL specification and together, come up with method of detailed design to parse tree and token generation for lexical and symmentics analysis in compile step on this study foundation.

  • PDF

KorLexClas 1.5: A Lexical Semantic Network for Korean Numeral Classifiers (한국어 수분류사 어휘의미망 KorLexClas 1.5)

  • Hwang, Soon-Hee;Kwon, Hyuk-Chul;Yoon, Ae-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.1
    • /
    • pp.60-73
    • /
    • 2010
  • This paper aims to describe KorLexClas 1.5 which provides us with a very large list of Korean numeral classifiers, and with the co-occurring noun categories that select each numeral classifier. Differently from KorLex of other POS, of which the structure depends largely on their reference model (Princeton WordNet), KorLexClas 1.0 and its extended version 1.5 adopt a direct building method. They demand a considerable time and expert knowledge to establish the hierarchies of numeral classifiers and the relationships between lexical items. For the efficiency of construction as well as the reliability of KorLexClas 1.5, we use following processes: (1) to use various language resources while their cross-checking for the selection of classifier candidates; (2) to extend the list of numeral classifiers by using a shallow parsing techniques; (3) to set up the hierarchies of the numeral classifiers based on the previous linguistic studies; and (4) to determine LUB(Least Upper Bound) of the numeral classifiers in KorLexNoun 1.5. The last process provides the open list of the co-occurring nouns for KorLexClas 1.5 with the extensibility. KorLexClas 1.5 is expected to be used in a variety of NLP applications, including MT.

Prosodic Boundary Effects on the V-to-V Lingual Movement in Korean

  • Cho, Tae-Hong;Yoon, Yeo-Min;Kim, Sa-Hyang
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.101-113
    • /
    • 2010
  • The present study investigated how the kinematics of the /a/-to-/i/ tongue movement in Korean would be influenced by prosodic boundary. The /a/-to-/i/ sequence was used as 'transboundary' test materials which occurred across a prosodic boundary as in /ilnjəʃ$^h$a/ # / minsakwae/ ('일년차#민사과에' 'the first year worker' # 'dept. of civil affairs'). It also tested whether the V-to-V tongue movement would be further influenced by its syllable structure with /m/ which was placed either in the coda condition (/am#i/) or in the onset condition (/a#mi). Results of an EMA (Electromagnetic Articulagraphy) study showed that kinematical parameters such as the movement distance (displacement), the movement duration, and the movement velocity (speed) all varied as a function of the boundary strength, showing an articulatory strengthening pattern of a "larger, longer and faster" movement. Interestingly, however, the larger, longer and faster pattern associated with boundary marking in Korean has often been observed with stress (prominence) marking in English. It was proposed that language-specific prosodic systems induce different ways in which phonetics and prosody interact: Korean, as a language without lexical stress and pitch accent, has more degree of freedom to express prosodic strengthening, while languages such as English have constraints, so that some strengthening patterns are reserved for lexical stress. The V-to-V tongue movement was also found to be influenced by the intervening consonant /m/'s syllable affiliation, showing a more preboundary lengthening of the tongue movement when /m/ was part of the preboundary syllable (/am#i/). The results, together, show that the fine-grained phonetic details do not simply arise as low-level physical phenomena, but reflect higher-level linguistic structures, such as syllable and prosodic structures. It was also discussed how the boundary-induced kinematic patterns could be accounted for in terms of the task dynamic model and the theory of the prosodic gesture ($\pi$-gesture).

  • PDF

A Study on the Automatic Lexical Acquisition for Multi-lingustic Speech Recognition (다국어 음성 인식을 위한 자동 어휘모델의 생성에 대한 연구)

  • 지원우;윤춘덕;김우성;김석동
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.6
    • /
    • pp.434-442
    • /
    • 2003
  • Software internationalization, the process of making software easier to localize for specific languages, has deep implications when applied to speech technology, where the goal of the task lies in the very essence of the particular language. A greatdeal of work and fine-tuning has gone into language processing software based on ASCII or a single language, say English, thus making a port to different languages difficult. The inherent identity of a language manifests itself in its lexicon, where its character set, phoneme set, pronunciation rules are revealed. We propose a decomposition of the lexicon building process, into four discrete and sequential steps. For preprocessing to build a lexical model, we translate from specific language code to unicode. (step 1) Transliterating code points from Unicode. (step 2) Phonetically standardizing rules. (step 3) Implementing grapheme to phoneme rules. (step 4) Implementing phonological processes.

Empirical Study for Automatic Evaluation of Abstractive Summarization by Error-Types (오류 유형에 따른 생성요약 모델의 본문-요약문 간 요약 성능평가 비교)

  • Seungsoo Lee;Sangwoo Kang
    • Korean Journal of Cognitive Science
    • /
    • v.34 no.3
    • /
    • pp.197-226
    • /
    • 2023
  • Generative Text Summarization is one of the Natural Language Processing tasks. It generates a short abbreviated summary while preserving the content of the long text. ROUGE is a widely used lexical-overlap based metric for text summarization models in generative summarization benchmarks. Although it shows very high performance, the studies report that 30% of the generated summary and the text are still inconsistent. This paper proposes a methodology for evaluating the performance of the summary model without using the correct summary. AggreFACT is a human-annotated dataset that classifies the types of errors in neural text summarization models. Among all the test candidates, the two cases, generation summary, and when errors occurred throughout the summary showed the highest correlation results. We observed that the proposed evaluation score showed a high correlation with models finetuned with BART and PEGASUS, which is pretrained with a large-scale Transformer structure.

Integration of WFST Language Model in Pre-trained Korean E2E ASR Model

  • Junseok Oh;Eunsoo Cho;Ji-Hwan Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.6
    • /
    • pp.1692-1705
    • /
    • 2024
  • In this paper, we present a method that integrates a Grammar Transducer as an external language model to enhance the accuracy of the pre-trained Korean End-to-end (E2E) Automatic Speech Recognition (ASR) model. The E2E ASR model utilizes the Connectionist Temporal Classification (CTC) loss function to derive hypothesis sentences from input audio. However, this method reveals a limitation inherent in the CTC approach, as it fails to capture language information from transcript data directly. To overcome this limitation, we propose a fusion approach that combines a clause-level n-gram language model, transformed into a Weighted Finite-State Transducer (WFST), with the E2E ASR model. This approach enhances the model's accuracy and allows for domain adaptation using just additional text data, avoiding the need for further intensive training of the extensive pre-trained ASR model. This is particularly advantageous for Korean, characterized as a low-resource language, which confronts a significant challenge due to limited resources of speech data and available ASR models. Initially, we validate the efficacy of training the n-gram model at the clause-level by contrasting its inference accuracy with that of the E2E ASR model when merged with language models trained on smaller lexical units. We then demonstrate that our approach achieves enhanced domain adaptation accuracy compared to Shallow Fusion, a previously devised method for merging an external language model with an E2E ASR model without necessitating additional training.

PC-KIMMO-based Description of Mongolian Morphology

  • Jaimai, Purev;Zundui, Tsolmon;Chagnaa, Altangerel;Ock, Cheol-Young
    • Journal of Information Processing Systems
    • /
    • v.1 no.1 s.1
    • /
    • pp.41-48
    • /
    • 2005
  • This paper presents the development of a morphological processor for the Mongolian language, based on the two-level morphological model which was introduced by Koskenniemi. The aim of the study is to provide Mongolian syntactic parsers with more effective information on word structure of Mongolian words. First hand written rules that are the core of this model are compiled into finite-state transducers by a rule tool. Output of the compiler was edited to clarity by hand whenever necessary. The rules file and lexicon presented in the paper describe the morphology of Mongolian nouns, adjectives and verbs. Although the rules illustrated are not sufficient for accounting all the processes of Mongolian lexical phonology, other necessary rules can be easily added when new words are supplemented to the lexicon file. The theoretical consideration of the paper is concluded in representation of the morphological phenomena of Mongolian by the general, language-independent framework of the two-level morphological model.

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

  • Prakash, Amit;Singh, Niraj Kumar;Saha, Sujan Kumar
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.413-425
    • /
    • 2022
  • The study of literary texts is one of the earliest disciplines practiced around the globe. Poetry is artistic writing in which words are carefully chosen and arranged for their meaning, sound, and rhythm. Poetry usually has a broad and profound sense that makes it difficult to be interpreted even by humans. The essence of poetry is Rasa, which signifies mood or emotion. In this paper, we propose a poetry classification-based approach to automatically extract similar poems from a repository. Specifically, we perform a novel Rasa-based classification of Hindi poetry. For the task, we primarily used lexical features in a bag-of-words model trained using the support vector machine classifier. In the model, we employed Hindi WordNet, Latent Semantic Indexing, and Word2Vec-based neural word embedding. To extract the rich feature vectors, we prepared a repository containing 37 717 poems collected from various sources. We evaluated the performance of the system on a manually constructed dataset containing 945 Hindi poems. Experimental results demonstrated that the proposed model attained satisfactory performance.

An analysis on streetscape using the Model of Emotion Evaluation (가로경관에 대한 감성평가모형 적용 분석 연구)

  • Lee, Jin-Sook;Kim, Ji-Hye
    • Science of Emotion and Sensibility
    • /
    • v.16 no.2
    • /
    • pp.149-156
    • /
    • 2013
  • In this study, the Model of Emotion Evaluation, an emotional analysis actively applied in environmental assessment, was divided into two parts, the abbreviated model and the inferential model, through pilot study and experiment. In addition, an analysis was conducted through the experiment on the attributes of the evaluation vocabularies of two additional types of representative models, the EPA Model and PAD Model, and the results show a huge difference in the development approach and lexical constitution of the two models. It was also identified through factor analysis that the vocabularies were abbreviated according to the respective models. Similarity relationships were analyzed using multidimensional scaling and the results show that mutual relationship was established to some degree. Based on this, we can conclude that, rather than a biased use of the Model of Emotion Evaluation in emotion evaluation, a more objective image analysis is possible by analyzing the characteristics of the model before applying it. In this study, the evaluation target was confined only to the environmental assessment of streetscape and continuous research on the Model of Emotion Evaluation that allows for the comparison of evaluation models in various areas is needed.

  • PDF

Ontology-based models of legal knowledge

  • Sagri, Maria-Teresa;Tiscornia, Daniela
    • 한국디지털정책학회:학술대회논문집
    • /
    • 2004.11a
    • /
    • pp.111-127
    • /
    • 2004
  • In this paper we describe an application of the lexical resource JurWordNet and of the Core Legal Ontology as a descriptive vocabulary for modeling legal domains. It can be viewed as the semantic component of a global standardisation framework for digital governments. A content description model provides a repository of structured knowledge aimed at supporting the semantic interoperability between sectors of Public Administration and the communication processes towards citizen. Specific conceptual models built from this base will act as a cognitive interface able to cope with specific digital government issues and to improve the interaction between citizen and Public Bodies. As a Case study, the representation of the click-on licences for re-using Public Sector Information is presented.

  • PDF