• Title/Summary/Keyword: 명사구

Search Result 150, Processing Time 0.022 seconds

Boolean Query Formulation From Korean Natural Language Queries using Syntactic Analysis (구문분석에 기반한 한글 자연어 질의로부터의 불리언 질의 생성)

  • Park, Mi-Hwa;Won, Hyeong-Seok;Lee, Geun-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.10
    • /
    • pp.1219-1229
    • /
    • 1999
  • 일반적으로 AND, OR, NOT과 같은 연산자를 사용하는 불리언 질의는 사용자의 검색의도를 정확하게 표현할 수 있기 때문에 검색 전문가들은 불리언 질의를 사용하여 높은 검색성능을 얻는다고 알려져 있지만, 일반 사용자는 자신이 원하는 정보를 불리언 형태로 표현하는데 익숙하지 않다. 본 논문에서는 검색성능의 향상과 사용자 편의성을 동시에 만족하기 위하여 사용자의 자연어 질의를 확장 불리언 질의로 자동 변환하는 방법론을 제안한다. 먼저 자연어 질의를 범주문법에 기반한 구문분석을 수행하여 구문트리를 생성하고 연산자 및 키워드 정보를 추출하여 구문트리를 간략화한다. 다음으로 간략화된 구문트리로부터 명사구를 합성하고 키워드들에 대한 가중치를 부여한 후 불리언 질의를 생성하여 검색을 수행한다. 또한 구문분석의 오류로 인한 검색성능 저하를 최소화하기 위하여 상위 N개 구문트리에 대해 각각 불리언 질의를 생성하여 검색하는 N-BEST average 방법을 제안하였다. 정보검색 실험용 데이타 모음인 KTSET2.0으로 실험한 결과 제안된 방법은 수동으로 추출한 불리언 질의보다 8% 더 우수한 성능을 보였고, 기존의 벡터공간 모델에 기반한 자연어질의 시스템에 비해 23% 성능향상을 보였다. Abstract There have been a considerable evidence that trained users can achieve a good search effectiveness through a boolean query because a structural boolean query containing operators such as AND, OR, and NOT can make a more accurate representation of user's information need. However, it is not easy for ordinary users to construct a boolean query using appropriate boolean operators. In this paper, we propose a boolean query formulation method that automatically transforms a user's natural language query into a extended boolean query for both effectiveness and user convenience. First, a user's natural language query is syntactically analyzed using KCCG(Korean Combinatory Categorial Grammar) parser and resulting syntactic trees are structurally simplified using a tree-simplifying mechanism in order to catch the logical relationships between keywords. Next, in a simplified tree, plausible noun phrases are identified and added into the same tree as new additional keywords. Finally, a simplified syntactic tree is automatically converted into a boolean query using some mapping rules and linguistic heuristics. We also propose an N-BEST average method that uses top N syntactic trees to compensate for bad effects of single incorrect top syntactic tree. In experiments using KTSET2.0, we showed that a proposed method outperformed a traditional vector space model by 23%, and surprisingly manually constructed boolean queries by 8%.

Domain Analysis on the Field of Open Access by Co-Word Analysis: Based on Published Journals of Library and Information Science during 2013 to 2018 (동시출현단어 분석을 활용한 오픈액세스 분야의 지적구조 분석: 2013년부터 2018년까지 출판된 문헌정보학 저널을 기반으로)

  • Kim, Sun-Kyum;Kim, Wan-Jong;Seo, Tae-Sul;Choi, Hyun-Jin
    • Journal of Korean Library and Information Science Society
    • /
    • v.50 no.1
    • /
    • pp.333-356
    • /
    • 2019
  • Open access has emerged as an alternative to overcome the crisis brought by scholarly communication on commercial publishers. The purpose of this study is to suggest the intellectual structure that reflects the newest research trend in the field of open access, to identify how the subject area is structured by using co-word analysis, and compare and analyze with the existing study. In order to do this, the total number of dataset was 761 papers collected from Web of Science during the period from January 2012 to November 2018 using information science and 2,321 keywords as a noun phase are extracted from titles and abstracts. To analyze the intellectual structure of open access, 13 topic clusters are extracted by network analysis and the keywords with higher centrallity are drawn by visualizing the intellectual relationship. In addition, after clustering analysis, the relationship was analyzed by plotting the result on the multidimensional scaling map. As a result, it is expected that our research helps the research direction of open access for the future.

A perceptual study of the wh-island constraint in Seoul Korean (서울말의 wh-섬 제약 지각 연구)

  • Yun, Weonhee
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.27-35
    • /
    • 2021
  • This study investigated the status of the wh-island constraint in Seoul Korean. The syntactic movement of a wh-phrase out of an embedded sentence so as to have wide scope at LF is known to be invalid as it violates the wh-island constraint, but there have been studies in which such a movement was possible when the sentence is read with a wh-intonation. We conducted perceptual tests in which subjects were asked to select an answer after listening to each of the four types of interrogative sentences. Three of them were with 'Nugu-leul', which is an accusative form of the wh-phrase 'who' as well as an indefinite form. The fourth sentence contained the name of a person. 'Nugu-leul' and the noun were positioned in the same embedded sentence to see whether the subjects accepted a matrix scope interpretation of the wh-phrases. Response time was transformed to normalized log response time and checked to find any differences in the time taken to select the answers depending on different types of interrogative sentences. The results showed the subjects had a definite preference for the matrix scope interpretation for the sentences with a wh-intonation. The response time required to select the matrix scope interpretation was longer than for any other type of interrogative sentence. We concluded that the wh-island constraint in Seoul Korean is weak.

A study about the aspect of translation on 'Hu(怖)' in novel 『Kokoro』 - Focusing on novels translated in Korean and English - (소설 『こころ』에 나타난 감정표현 '포(怖)'에 관한 번역 양상 - 한국어 번역 작품과 영어 번역 작품을 중심으로 -)

  • Yang, Jung-soon
    • Cross-Cultural Studies
    • /
    • v.53
    • /
    • pp.131-161
    • /
    • 2018
  • Emotional expressions are expressions that show the internal condition of mind or consciousness. Types of emotional expressions include vocabulary that describes emotion, the composition of sentences that expresses emotion such as an exclamatory sentence and rhetorical question, expressions of interjection, appellation, causative, passive, adverbs of attitude for an idea, and a style of writing. This study focuses on vocabulary that describes emotion and analyzes the aspect of translation when emotional expressions of 'Hu(怖)' is shown on "Kokoro". The aspect of translation was analyzed by three categories as follows; a part of speech, handling of subjects, and classification of meanings. As a result, the aspect of translation for expressions of Hu(怖)' showed that they were translated to vocabulary as they were suggested in the dictionary in some cases. However, they were not always translated as they were suggested in the dictionary. Vocabulary that described the emotion of 'Hu(怖)' in Japanese sentences were mostly translated to their corresponding parts of speech in Korean. Some adverbs needed to add 'verbs' when they were translated. Also, different vocabulary was added or used to maximize emotion. However, the correspondence of a part of speech in English was different from Korean. Examples of Japanese sentences that expressed 'Hu(怖)' by verbs were translated to expression of participles for passive verbs such as 'fear', 'dread', 'worry', and 'terrify' in many cases. Also, idioms were translated with focus on the function of sentences rather than the form of sentences. Examples, what was expressed in adverbs did not accompany verbs of 'Hu (怖)'. Instead, it was translated to the expression of participles for passive verbs and adjectives such as 'dread', 'worry', and 'terrify' in many cases. The main agents of emotion were shown in the first person and the third person in simple sentences. The translation on emotional expressions when a main agent was the first person showed that the fundamental word order of Japanese was translated as it was in Korean. However, adverbs of time and adverbs of degree tended to be added. Also, the first person as the main agent of emotion was positioned at the place of subject when it was translated in English. However, things or the cause of events were positioned at the place of subject in some cases to show the degree of 'Hu(怖)' which the main agent experienced. The expression of conjecture and supposition or a certain visual and auditory basis was added to translate the expression of emotion when the main agent of emotion was the third person. Simple sentences without a main agent of emotion showed that their subjects could be omitted even if they were essential components because they could be known through context in Korean. These omitted subjects were found and translated in English. Those subjects were not necessarily humans who were the main agents of emotion. They could be things or causes of events that specified the expression of emotion.

A Harmful Site Judgement Technique based on Text (문자 기반 유해사이트 판별 기법)

  • Jung, Kyu-Cheol;Lee, Jin-Kwan;Lee, Taehun;Park, Kihong
    • The Journal of Korean Association of Computer Education
    • /
    • v.7 no.5
    • /
    • pp.83-91
    • /
    • 2004
  • Through this research, it was possible to set up classification system between 'Harmful information site' and 'General site' that badly effect to teenagers emotional health. To intercept those entire harmful information sites, it using contents basis isolating. Instead of using existing methods, it picks most frequent using composed key words and adds all those harmful words' harmfulness degree point by using 'ICEC(Information Communication Ethics Committee)' suggested harmful word classification. To testify harmful information blocking system, to classify the harmful information site, set standard harmfulness degree point as 3.5 by the result of a fore study, after that pick up a hundred of each 'Harmful information site' and 'General site' randomly to classify them through new classification system. By this classification could found this new classification system classified 78% of 'Harmful Site' to "Harmful information site' and 96% of 'General Site' to 'General site'. As a result, successfully confirm validity of this new classification system.

  • PDF

e-Learning Course Reviews Analysis based on Big Data Analytics (빅데이터 분석을 이용한 이러닝 수강 후기 분석)

  • Kim, Jang-Young;Park, Eun-Hye
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.2
    • /
    • pp.423-428
    • /
    • 2017
  • These days, various and tons of education information are rapidly increasing and spreading due to Internet and smart devices usage. Recently, as e-Learning usage increasing, many instructors and students (learners) need to set a goal to maximize learners' result of education and education system efficiency based on big data analytics via online recorded education historical data. In this paper, the author applied Word2Vec algorithm (neural network algorithm) to find similarity among education words and classification by clustering algorithm in order to objectively recognize and analyze online recorded education historical data. When the author applied the Word2Vec algorithm to education words, related-meaning words can be found, classified and get a similar vector values via learning repetition. In addition, through experimental results, the author proved the part of speech (noun, verb, adjective and adverb) have same shortest distance from the centroid by using clustering algorithm.

Geographical Cognition and the literary Geography Figuration of the 'Dong Hae' in Joseon Dynasty (조선시대의 '동해'에 대한 지리인식과 문학적 형상)

  • Lee, Seung-Su;Oh, Il-Whan
    • Journal of the Korean association of regional geographers
    • /
    • v.16 no.5
    • /
    • pp.441-456
    • /
    • 2010
  • While the Dong Hae was originally named for the point of the compass, it became a proper noun containing knowledge on the long history. The Dong Hae has played host to a number of historic events and cultural meanings and is a geographical space in the nature. This study examined the geographical cognition of ancestors about the Dong Hae among the historic and cultural meanings of Dong Hae and a variety of figurations described in literature. Intellectuals of the Joseon period identified the global geography on a higher level an recognized the geography and topography of the land in the structure that they identified. In this aspect, Mt. Baekdu, one of two mountains which dominated the world, ruled the geography of Liaodong field, the Korean peninsula and Japnn. Historical geographers mentioned the Dong Hac whenever discussing Mt. Baekdu and Baekdudaegan (Great Range) from long ago. Baekdudaegan and the Dong Hae are the complete symbol of national territory with great size, depth, height, width, dignity and magnanimity. The cultural figurations of the Dong Hae were classified into four categories. In Korean literature, the Dong Hae strengthened the spirits of those who lived in the Korean peninsula. Second, the Dong Hae was the basis to see through and deliberate the reasons of life and the world. Third, the view of the Dong Hae from Baekdudaegan changed the cognition about the land. Finally, the Dong Hae was where the hard life of people in the fishing villages occurred and a variety of customs and trades were dynamically deployed.

  • PDF

Sprachtypologische Fehleranalyse - Im Vergleich der deutschen und koreanischen Sprache - (독일어와 한국어를 비교한 언어 유형적 분석)

  • Park Jin-Gil
    • Koreanishche Zeitschrift fur Deutsche Sprachwissenschaft
    • /
    • v.7
    • /
    • pp.1-24
    • /
    • 2003
  • 우리는 지금까지 독일어와 한국어 두 언어간의 오류분석을 논의해 왔다. 특히 언어유형학적인 측면에서 몇 가지 오류유형과 분석을 시도했다. 그 결과는 대체로 다음과 같이 요약될 수 있다. 독일어와 한국어가 서로 근본적으로 상반되는 언어현상과 더불어 약간의 공통성을 나타내며 일정한 유형을 나타낸다. 이는 두로 인간의 언어습득장치에 기인된 언어습득의 결정주의(Determinismus)에서 비롯될 것이다. 언어특성/문제의 체계성/규칙성 또는 일관성은 이를 반영한다. 거대한 언어자료 중에 극히 미미한 일부, 즉 언어최소량를 정복함으로써 그 효용성을 극대화할 수 있는 것은 매우 중요한 의미를 지닌다. 이를 연구 이용하는 경우에는 엄청난 효과와 가능성을 기대할 수 있을 것이다. (1) 독일어와 한국어의 학습 및 오류분석에서 가장 핵심적인 것은 언어유형학적으로 드러난 언어특성, 즉 전치성(독일어/영어)과 후치성(한국어)이다. 이를 토대로 형성된 대립적인 면과 공통적인 문제를 체계화하는 것이 역시 오류분석 문제의 관건이다. 또한 독일어가 아직 후치성 언어(한국어(TXV))에서 출발해서 전치성 언어(영어(SVX))로 발전/변화해 가는 과정, 즉 중간단계인 TVX에 머물고 있다는 사실이 중요한 의미를 지닌다. 즉 그들의 대극성과 유사성을 연결하는 실마리로 볼 수 있기 때문이다. (2) 일치(Kongruenz)/상관(Korrelation) 및 반복(Wiederholung) 현상, 그리고 격변화와 인칭변화 현상은 어순문제와 더불어 형태론적 문제를 통해 문법적인 확인수단으로 작용한다. 이들은 대부분 체계적/구조적으로 나타나기 때문에 학습자는 흔히 같은 유형에서 반복적으로 오류를 범하기 마련이다. 이를 극복하기 위해서 언어 유형학적 오류분석을 이해하고 또한 이를 통해 오류를 줄이거나 예방하는 학습이 필요하다. (3) 명사가 한정사구 안에서 성/수/격에 따라 변화하는 것과 동사가 동사구에서 주어의 인칭/수에 따라 인칭변화 하는 것은 우리 한국인에게는 아주 이색적인 현상이다. 이는 양면적인 수식구조에 대한 확인수단 및 원자가에 의한 강력한 형식위주 언어인 독일어와 전위적인 단일 수식구조와 부정형 동사를 특성으로 형성된 핵/최소문 언어간의 필연적인 적응관계 및 결과라고 볼 수 있다. 이 두 가지 유형 역시 언어특성에 따라 도식화/공식화 할 수 있다. (4) 괄호현상, 즉 으뜸머리(Hauptkopf)가 버금머리(Nebenkopf)와 분리하는 것은 우리 한국인에게는 아주 이색적인 언어현상이다. 한국어에는 머리의 이동이 없기 때문이다. 긴 구문에서 버금머리를 잊어버리거나 실수하는 것은 모든 괄호구문에서 예견되는 결과이다. 그러나 이는 정치성과 후치성 언어간의 전이 과정으로 이해될 수 있다. 으뜸머리가 원래의 자리를 박차고 소속 구/문의 앞자리로 도약한 것처럼 느껴지기 때문이다. (5) 전치 및 후치 수식이 유동적으로 작용하는 독일어는 전치 수식만으로 고정된 한국어보다 복잡하지만 균형적인 언어구조이다. 이러한 수식구조에서 한국인은 흔히 형태 및 어순에서, 그리고 번역에서 오류를 범하고 만다. (6) 그러나 가장 중요한 것은 아는 것을 제대로 이용하는 문제이다. 모국어/L2를 자유로이 말하고 쓸 때까지, 즉 언어습득에는 일체이 문법이나 도표/도식을 이용할 필요가 없다는 사실이다. 이는 17세기 서구의 이성주의 철학자들의 한결같은 경고이다. 오늘날 초고속 과학문명에서 더욱 빛을 발하는 것은 당연한 결과이다. 한 언어 속에 들어있는 문법체계를 익혀 가는 것이 곧 언어습득 과정이지만, 이를 달성하는 가능성 내지 첩경은 실제적인 언어자료와 체험이지 결코 문법이나 추상적인 개념적 접근이 아님을 웅변하고 있기 때문이다. 핵심적인 문제는 모국어교육에서도 최대 장점인 대화를 통한 언어연습/대화 기회를 최대한 보장하는 데 있다. 또한 언어간섭 현상을 조장하는 분위기를 막아야 할 것이다. 이러한 의미에서 교수법 개발이 외국어/L2 성공의 관건일 것이다. (7) 언어학습에서 오류를 극복하는 데는 일차적인 실제 상황에 부합하는 대화적인 연습, 그리고 효과적인 언어자료 접촉, 즉 독서와 모방이 중요하다. 이차적이고 직접적인 것은 통사(Syntax) 및 형태론(Morphologie)를 익힐 수 있는 말/문을 끊임없이 익히는 일이다. 이것이 또한 언어최소량을 충족시켜 언어습득에 이르는 첩경이다. 자연 생태적인 모국어 학습 또는 조정 및 제도적인 언어학습에서도 실제상황에 어긋나는 문법적인 체계에 얽매이는 도식 및 도표 위주의 텟스트는 일시적인 기대일 뿐이다. 인간의 언어습득장치를 이해하지 못한 결과이기 때문이다. 문법적인 개념위주 접근은 상당한 설명이 필요해서 절박한 자료와 체험까지 앗아가기 마련이다. 더구나 이를 위해 수준을 무시하고 모국어로 일관하여 벙어리와 문맹을 자초하는 것은 참으로 어리석은 일이다. 지식 정보화 시대 및 세계화 시대에는 무엇보다도 교육 및 언어정책이 국가 발전의 원동력이다. 특히 영어를 비롯한 외국어 학습능력과 학습방법은 매우 중요하다. 학습자에게 말하고 쓰는 기본 능력을 보장하는 것이 급선무이다. 이를 위한 작업의 하나가 바로 언어간의 오류분석일 것이다. 언어의 습득과 활용이 체계적이듯이 오류분석 역시 상당히 체계적이다. 그래서 인간의 언어습득과 언어습득장치를 두고 결정론(Determinismus)이 지배적이다. 이러한 의미에서 언어습득의 3대 요소, 즉 언어습득장치를 구비한 인간으로 태어나고, 해당 언어를 통한 일관된 언어체험/학습으로 언어최소량을 충족해야 한 언어를 정복할 수 있다는 것은 결정적인 사실이다. 학생고객에게 다가서는 책임교육으로 교육개방에 대비하는 일 역시 시대적인 상황이요 또한 결정적인 단계임엔 틀림이 없을 것이다.

  • PDF

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.

A Study on Korean Language Translation of Chinese Traditional Hansi in the 1910s and 1920s (1910~20년대 시인의 전통 한시 국역 양상과 의미 연구 - 최남선, 김소월, 김억, 이광수를 중심으로 -)

  • Chung, So-yeon
    • Journal of Korean Classical Literature and Education
    • /
    • no.34
    • /
    • pp.149-191
    • /
    • 2017
  • This study examines Korean language translations of traditional Chinese hansi in the 1910s and 1920s. In the $20^{th}$ century, many poets translated Chinese and Korean traditional hansi into Korean. In the early $20^{th}$ century, Korean language began to be used as a national public language. At that time, not only hansi but also poetry from several other languages had been translated into Korean. Choi Nam-sun in the 1910s and Kim So-woel, Kim Eok, and Lee Kwang-su in the 1920s translated Chinese traditional hansi, focusing on famous Dang dynasty poetry from Tu Fu and Li Bai, etc. Choi Nam-sun's translation in the 1910s aimed to consider poetry as a written literature. On the contrary, Kim So-woel, Kim Eok, and Lee Kwang-su believed that Korean modern verse literature should be songs as well as poetry, and their translations in the 1920s aimed to create songs as spoken literature by focusing on orality and universality. Though Korean is now the language, the literary history of hansi continues in modern poetry.