• 제목/요약/키워드: computational linguistic research

검색결과 9건 처리시간 0.023초

An Algorithm for Predicting the Relationship between Lemmas and Corpus Size

  • Yang, Dan-Hee;Gomez, Pascual Cantos;Song, Man-Suk
    • ETRI Journal
    • /
    • 제22권2호
    • /
    • pp.20-31
    • /
    • 2000
  • Much research on natural language processing (NLP), computational linguistics and lexicography has relied and depended on linguistic corpora. In recent years, many organizations around the world have been constructing their own large corporal to achieve corpus representativeness and/or linguistic comprehensiveness. However, there is no reliable guideline as to how large machine readable corpus resources should be compiled to develop practical NLP software and/or complete dictionaries for humans and computational use. In order to shed some new light on this issue, we shall reveal the flaws of several previous researches aiming to predict corpus size, especially those using pure regression or curve-fitting methods. To overcome these flaws, we shall contrive a new mathematical tool: a piecewise curve-fitting algorithm, and next, suggest how to determine the tolerance error of the algorithm for good prediction, using a specific corpus. Finally, we shall illustrate experimentally that the algorithm presented is valid, accurate and very reliable. We are confident that this study can contribute to solving some inherent problems of corpus linguistics, such as corpus predictability, compiling methodology, corpus representativeness and linguistic comprehensiveness.

  • PDF

보조용언 '하다' 구성의 전산언어학적 연구 (Computational Linguistics Study of the Construction of the Auxiliary Verb 'hada')

  • 홍혜란
    • 언어사실과 관점
    • /
    • 제47권
    • /
    • pp.495-535
    • /
    • 2019
  • The purpose of this study is to investigate the distributional characteristics of the construction of the auxiliary verb 'hada' to morphological and semantic aspects in the corpus composed of four language register of academic prose, newspaper, fiction, and spoken language by applying computational linguistic research methodology. It is also aimed to analyze how the discourse function of the construction of the auxiliary verb 'hada' that express the meaning of 'conditions/assumptions' is performed, and from that, to investigate what the mechanism that the construction of the auxiliary verb 'hada' performs various semantic functions at the discourse level is. As a result of the study, it was shown that the construction of the auxiliary verb 'hada' performs the primary grammatical meaning by adding meaning of linguistic features such as connective ending which is combining with the preceding verb, final ending, particles, formulaic expression. And that meaning performs various discourse functions according to the contextual conditions such as the formality, the relationship between participant's of utterance, contents of utterance, speaker's attitude. From this, it can be seen that the function of discourse is not fixed, it is a new additional meaning obtain from the discourse level including various contexts, and it is characterized by contextual dependency that can change if some of these conditions are different.

Differentiation of Aphasic Patients from the Normal Control Via a Computational Analysis of Korean Utterances

  • Kim, HyangHee;Choi, Ji-Myoung;Kim, Hansaem;Baek, Ginju;Kim, Bo Seon;Seo, Sang Kyu
    • International Journal of Contents
    • /
    • 제15권1호
    • /
    • pp.39-51
    • /
    • 2019
  • Spontaneous speech provides rich information defining the linguistic characteristics of individuals. As such, computational analysis of speech would enhance the efficiency involved in evaluating patients' speech. This study aims to provide a method to differentiate the persons with and without aphasia based on language usage. Ten aphasic patients and their counterpart normal controls participated, and they were all tasked to describe a set of given words. Their utterances were linguistically processed and compared to each other. Computational analyses from PCA (Principle Component Analysis) to machine learning were conducted to select the relevant linguistic features, and consequently to classify the two groups based on the features selected. It was found that functional words, not content words, were the main differentiator of the two groups. The most viable discriminators were demonstratives, function words, sentence final endings, and postpositions. The machine learning classification model was found to be quite accurate (90%), and to impressively be stable. This study is noteworthy as it is the first attempt that uses computational analysis to characterize the word usage patterns in Korean aphasic patients, thereby discriminating from the normal group.

교량구조시스템의 유지관리를 위한 퍼지 신뢰성해석 모델 (Fuzzy Reliability Analysis Models for Maintenance of Bridge Structure Systems)

  • 김종길;손용우;이증빈;이채규;안영기
    • 한국전산구조공학회:학술대회논문집
    • /
    • 한국전산구조공학회 2003년도 가을 학술발표회 논문집
    • /
    • pp.103-114
    • /
    • 2003
  • This paper aims to propose a method that helps maintenance engineers to evaluate the damage states of bridge structure systems by using a Fuzzy Fault Tree Analysis. It may be stated that Fuzzy Fault Tree Analysis may be very useful for the systematic and rational fuzzy reliability assessment for real bridge structure systems problems because the approach is able to effectively deal with all the related bridge structural element damages in terms of the linguistic variables that incorporate systematically experts experiences and subjective judgement. This paper considers these uncertainties by providing a fuzzy reliability-based framework and shows that the identification of the optimum maintenance scenario is a straightforward process. This is achieved by using a computer program for LIFETIME. This program can consider the effects of various types of actions on the fuzzy reliability index profile of a deteriorating structures. Only the effect of maintenance interventions is considered in this study. However. any environmental or mechanical action affecting the fuzzy reliability index profile can be considered in LIFETIME. Numerical examples of deteriorating bridges are presented to illustrate the capability of the proposed approach. Further development and implementation of this approach are recommended for future research.

  • PDF

Robust Syntactic Annotation of Corpora and Memory-Based Parsing

  • Hinrichs, Erhard W.
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 2002년도 Language, Information, and Computation Proceedings of The 16th Pacific Asia Conference
    • /
    • pp.1-1
    • /
    • 2002
  • This talk provides an overview of current work in my research group on the syntactic annotation of the T bingen corpus of spoken German and of the German Reference Corpus (Deutsches Referenzkorpus: DEREKO) of written texts. Morpho-syntactic and syntactic annotation as well as annotation of function-argument structure for these corpora is performed automatically by a hybrid architecture that combines robust symbolic parsing with finite-state methods ("chunk parsing" in the sense Abney) with memory-based parsing (in the sense of Daelemans). The resulting robust annotations can be used by theoretical linguists, who lire interested in large-scale, empirical data, and by computational linguists, who are in need of training material for a wide range of language technology applications. To aid retrieval of annotated trees from the treebank, a query tool VIQTORYA with a graphical user interface and a logic-based query language has been developed. VIQTORYA allows users to query the treebanks for linguistic structures at the word level, at the level of individual phrases, and at the clausal level.

  • PDF

한국어 문장분석과 어휘정보의 연결에 관한 연구 (A Study of the Interface between Korean Sentence Parsing and Lexical Information)

  • 최병진
    • 한국언어정보학회지:언어와정보
    • /
    • 제4권2호
    • /
    • pp.55-68
    • /
    • 2000
  • The efficiency and stability of an NLP system depends crucially on how is lexicon is orga- nized . Then lexicon ought to encode linguistic generalizations and exceptions thereof. Nowadays many computational linguists tend to construct such lexical information in an inheritance hierarchy DATR is good for this purpose In this research I will construct a DATR-lexicon in order to parse sentences in Korean using QPATR is implemented on the basis of a unification based grammar developed in Dusseldorf. In this paper I want to show the interface between a syntactic parser(QPATR) and DTAR-formalism representing lexical information. The QPATR parse can extract the lexical information from the DATR lexicon which is organised hierarchically.

  • PDF

초등학교 영어교과를 적용한 프로그래밍 교육 모델 개발 (A Study on the Development of Programming Education Model Applying English Subject in Elementary School)

  • 허미연;김갑수
    • 정보교육학회논문지
    • /
    • 제21권5호
    • /
    • pp.497-507
    • /
    • 2017
  • 그동안의 소프트웨어교육과 타 교과의 연계 융합에 대한 연구는 주로 수학과 과학교과에 편중되어왔다. 이는 학생의 교과에 대한 다양한 선호와 학습 성격 유형 등을 만족시키지 못하여 학습 격차를 유발할 수 있다. 뿐만 아니라, 컴퓨팅 사고를 적용할 수 있는 다양한 융 복합적 문제의 해결과정을 다루어야 함을 감안할 때 바람직하지 않다. 그리하여 기존의 수학과학적 접근에서 벗어난 언어적인 접근인 영어교과와의 연계를 통해 학생들의 다양한 성향과 선호를 포용하고, 영어교과와 소프트웨어교육의 새로운 언어를 배우는 과정과 방법상의 유사점을 접목시켜 교육 효과의 향상을 도모하고자 하였다. 이를 위하여 초등 영어교과와 소프트웨어교육의 교수학습모델 분석을 토대로 연계에 적합하도록 기존의 영어교과와 소프트웨어 교수학습모델을 변형하여 수업모형을 개발하였다. 이후 초등학교 영어교과내용 중 소프트웨어교육에 적용 가능한 학습 요소를 추출하여 개발된 수업모형에 적용한 프로그램을 설계하여 실제적인 학습 활용 방안을 모색하였다.

토픽 모델링 기반 과학적 지식의 불확실성의 흐름에 관한 연구 (The Stream of Uncertainty in Scientific Knowledge using Topic Modeling)

  • 허고은
    • 정보관리학회지
    • /
    • 제36권1호
    • /
    • pp.191-213
    • /
    • 2019
  • 과학적 지식을 얻는 과정은 연구자의 연구를 통해 이루어진다. 연구자들은 과학의 불확실성을 다루고 과학적 지식의 확실성을 구축해나간다. 즉, 과학적 지식을 얻기 위해서 불확실성은 반드시 거쳐가야 하는 필수적인 단계로 인식되고 있다. 현존하는 불확실성의 특성을 파악하는 연구는 언어학적 접근의 hedging 연구를 통해 소개되었으며 컴퓨터 언어학에서 수작업 기반으로 불확실성 단어 코퍼스를 구축해왔다. 기존의 연구들은 불확실성 단어의 단순 출현 빈도를 기반으로 특정 학문 영역의 불확실성의 특성을 파악해오는데 그쳤다. 따라서 본 연구에서는 문장 내 생의학적 주장이 중요한 역할을 하는 생의학 문헌을 대상으로 불확실성 단어 기반 과학적 지식의 패턴을 시간의 흐름에 따라 살펴보고자 한다. 이를 위해 생의학 온톨로지인 UMLS에서 제공하는 의미적 술어를 기반으로 생의학 명제를 분석하였으며, 학문 분야의 패턴을 파악하는데 용이한 DMR 토픽 모델링을 적용하여 생의학 개체의 불확실성 기반 토픽의 동향을 종합적으로 파악하였다. 시간이 흐름에 따라 과학적 지식의 표현은 불확실성이 감소하는 패턴으로 연구의 발전이 이루어지고 있음을 확인하였다.

Memory Organization for a Fuzzy Controller.

  • Jee, K.D.S.;Poluzzi, R.;Russo, B.
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1993년도 Fifth International Fuzzy Systems Association World Congress 93
    • /
    • pp.1041-1043
    • /
    • 1993
  • Fuzzy logic based Control Theory has gained much interest in the industrial world, thanks to its ability to formalize and solve in a very natural way many problems that are very difficult to quantify at an analytical level. This paper shows a solution for treating membership function inside hardware circuits. The proposed hardware structure optimizes the memoried size by using particular form of the vectorial representation. The process of memorizing fuzzy sets, i.e. their membership function, has always been one of the more problematic issues for the hardware implementation, due to the quite large memory space that is needed. To simplify such an implementation, it is commonly [1,2,8,9,10,11] used to limit the membership functions either to those having triangular or trapezoidal shape, or pre-definite shape. These kinds of functions are able to cover a large spectrum of applications with a limited usage of memory, since they can be memorized by specifying very few parameters ( ight, base, critical points, etc.). This however results in a loss of computational power due to computation on the medium points. A solution to this problem is obtained by discretizing the universe of discourse U, i.e. by fixing a finite number of points and memorizing the value of the membership functions on such points [3,10,14,15]. Such a solution provides a satisfying computational speed, a very high precision of definitions and gives the users the opportunity to choose membership functions of any shape. However, a significant memory waste can as well be registered. It is indeed possible that for each of the given fuzzy sets many elements of the universe of discourse have a membership value equal to zero. It has also been noticed that almost in all cases common points among fuzzy sets, i.e. points with non null membership values are very few. More specifically, in many applications, for each element u of U, there exists at most three fuzzy sets for which the membership value is ot null [3,5,6,7,12,13]. Our proposal is based on such hypotheses. Moreover, we use a technique that even though it does not restrict the shapes of membership functions, it reduces strongly the computational time for the membership values and optimizes the function memorization. In figure 1 it is represented a term set whose characteristics are common for fuzzy controllers and to which we will refer in the following. The above term set has a universe of discourse with 128 elements (so to have a good resolution), 8 fuzzy sets that describe the term set, 32 levels of discretization for the membership values. Clearly, the number of bits necessary for the given specifications are 5 for 32 truth levels, 3 for 8 membership functions and 7 for 128 levels of resolution. The memory depth is given by the dimension of the universe of the discourse (128 in our case) and it will be represented by the memory rows. The length of a world of memory is defined by: Length = nem (dm(m)+dm(fm) Where: fm is the maximum number of non null values in every element of the universe of the discourse, dm(m) is the dimension of the values of the membership function m, dm(fm) is the dimension of the word to represent the index of the highest membership function. In our case then Length=24. The memory dimension is therefore 128*24 bits. If we had chosen to memorize all values of the membership functions we would have needed to memorize on each memory row the membership value of each element. Fuzzy sets word dimension is 8*5 bits. Therefore, the dimension of the memory would have been 128*40 bits. Coherently with our hypothesis, in fig. 1 each element of universe of the discourse has a non null membership value on at most three fuzzy sets. Focusing on the elements 32,64,96 of the universe of discourse, they will be memorized as follows: The computation of the rule weights is done by comparing those bits that represent the index of the membership function, with the word of the program memor . The output bus of the Program Memory (μCOD), is given as input a comparator (Combinatory Net). If the index is equal to the bus value then one of the non null weight derives from the rule and it is produced as output, otherwise the output is zero (fig. 2). It is clear, that the memory dimension of the antecedent is in this way reduced since only non null values are memorized. Moreover, the time performance of the system is equivalent to the performance of a system using vectorial memorization of all weights. The dimensioning of the word is influenced by some parameters of the input variable. The most important parameter is the maximum number membership functions (nfm) having a non null value in each element of the universe of discourse. From our study in the field of fuzzy system, we see that typically nfm 3 and there are at most 16 membership function. At any rate, such a value can be increased up to the physical dimensional limit of the antecedent memory. A less important role n the optimization process of the word dimension is played by the number of membership functions defined for each linguistic term. The table below shows the request word dimension as a function of such parameters and compares our proposed method with the method of vectorial memorization[10]. Summing up, the characteristics of our method are: Users are not restricted to membership functions with specific shapes. The number of the fuzzy sets and the resolution of the vertical axis have a very small influence in increasing memory space. Weight computations are done by combinatorial network and therefore the time performance of the system is equivalent to the one of the vectorial method. The number of non null membership values on any element of the universe of discourse is limited. Such a constraint is usually non very restrictive since many controllers obtain a good precision with only three non null weights. The method here briefly described has been adopted by our group in the design of an optimized version of the coprocessor described in [10].

  • PDF