• Title/Summary/Keyword: Compound Noun Analysis

Search Result 25, Processing Time 0.017 seconds

Chunking of Contiguous Nouns using Noun Semantic Classes (명사 의미 부류를 이용한 연속된 명사열의 구묶음)

  • Ahn, Kwang-Mo;Seo, Young-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.3
    • /
    • pp.10-20
    • /
    • 2010
  • This paper presents chunking strategy of a contiguous nouns sequence using semantic class. We call contiguous nouns which can be treated like a noun the compound noun phrase. We use noun pairs extracted from a syntactic tagged corpus and their semantic class pairs for chunking of the compound noun phrase. For reliability, these noun pairs and semantic classes are built from a syntactic tagged corpus and detailed dictionary in the Sejong corpus. The compound noun phrase of arbitrary length can also be chunked by these information. The 38,940 pairs of 'left noun - right noun', 65,629 pairs of 'left noun - semantic class of right noun', 46,094 pairs of 'semantic class of left noun - right noun', and 45,243 pairs of 'semantic class of left noun - semantic class of right noun' are used for compound noun phrase chunking. The test data are untrained 1,000 sentences with contiguous nouns of length more than 2randomly selected from Sejong morphological tagged corpus. Our experimental result is 86.89% precision, 80.48% recall, and 83.56% f-measure.

Integrated Indexing Method using Compound Noun Segmentation and Noun Phrase Synthesis (복합명사 분할과 명사구 합성을 이용한 통합 색인 기법)

  • Won, Hyung-Suk;Park, Mi-Hwa;Lee, Geun-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.1
    • /
    • pp.84-95
    • /
    • 2000
  • In this paper, we propose an integrated indexing method with compound noun segmentation and noun phrase synthesis. Statistical information is used in the compound noun segmentation and natural language processing techniques are carefully utilized in the noun phrase synthesis. Firstly, we choose index terms from simple words through morphological analysis and part-of-speech tagging results. Secondly, noun phrases are automatically synthesized from the syntactic analysis results. If syntactic analysis fails, only morphological analysis and tagging results are applied. Thirdly, we select compound nouns from the tagging results and then segment and re-synthesize them using statistical information. In this way, segmented and synthesized terms are used together as index terms to supplement the single terms. We demonstrate the effectiveness of the proposed integrated indexing method for Korean compound noun processing using KTSET2.0 and KRIST SET which are a standard test collection for Korean information retrieval.

  • PDF

Morphological Analysis of the Korean Language (한국어의 형태소해석)

  • Lee, Soo-Hyon;Ozawa, S.;Lee, Joo-Keun
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.4
    • /
    • pp.53-61
    • /
    • 1989
  • A morphological analysis is described to extract the informations which are required in syntactic and semantic analysis of the Korean language. The noun and particle are separated in a noun phrase, the selecting conditions are specified to analyze the compound noun and a restoring rule is represented to process the irregular compound noun. The stem and ending are separated in normal verbals and a logical representive form is proposed to the anomalously inflected word and contracted vowels. The logical representation is composed of the attribute value an analyzing rule. The redundancy of noun is reduced in the dictionary as the verb of a "Nounformed HA-" is processed by "noun" and "HA-", separately and a predicative "IDA" is analyzed by Q parameter. The processing form of negation is also derived and the morpheme and basic structure of compound predicative parts are presented.

  • PDF

Error-driven Noun-Connection Rule Extraction for Morphological Analysis (오류에 기반한 복합명사 좌우접속규칙 사전 구축)

  • Lee, Kong Joo;Lee, Songwook
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.36 no.8
    • /
    • pp.1123-1128
    • /
    • 2012
  • The goal of this research is to develop an error-driven noun-connection rules which is used for breaking complicate nouns in Korean morphology analysis module. We collected complicate nouns from Web sites, and analyzed them by CnuMa. Whenever we find errors from outputs of the analyzer, we write noun-connection rules to correct the errors. The noun-connection rules are devised by considering left/right contexts in compound nouns. The error-driven noun-connection rules are helpful in improving precision and recall of a Korean morphology analyzer, CnuMa by 2.8% and 10.8%, respectively.

A Method for Compound Noun Extraction to Improve Accuracy of Keyword Analysis of Social Big Data

  • Kim, Hyeon Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.55-63
    • /
    • 2021
  • Since social big data often includes new words or proper nouns, statistical morphological analysis methods have been widely used to process them properly which are based on the frequency of occurrence of each word. However, these methods do not properly recognize compound nouns, and thus have a problem in that the accuracy of keyword extraction is lowered. This paper presents a method to extract compound nouns in keyword analysis of social big data. The proposed method creates a candidate group of compound nouns by combining the words obtained through the morphological analysis step, and extracts compound nouns by examining their frequency of appearance in a given review. Two algorithms have been proposed according to the method of constructing the candidate group, and the performance of each algorithm is expressed and compared with formulas. The comparison result is verified through experiments on real data collected online, where the results also show that the proposed method is suitable for real-time processing.

Segmentation of Korean Compound Nouns Using Semantic Category Analysis of Unregistered Nouns (미등록어의 의미 범주 분석을 이용한 복합명사 분해)

  • Kang Yu-Hwan;Seo Young-Hoon
    • Journal of Information Technology Applications and Management
    • /
    • v.11 no.4
    • /
    • pp.95-102
    • /
    • 2004
  • This paper proposes a method of segmenting compound nouns which include unregistered nouns into a correct combination of unit nouns using characteristics of person's names, loanwords, and location names. Korean person's name is generally composed of 3 syllables, only relatively small number of syllables is used as last names, and the second and the third syllables combination is somewhat restrictive. Also many person's names appear with clue words in compound nouns. Most loanwords have one or more syllables which cannot appear in Korean words, or have sequences of syllables different from usual Korean words. Location names are generally used with clue words designating districts in compound nouns. Use of above characteristics to analyze compound nouns not only makes segmentation more accurate, helps natural language systems use semantic categories of those unregistered nouns. Experimental results show that the precision of our method is approximately 98% on average. The precision of human names and loanwords recognition is about 94% and about 92% respectively.

  • PDF

Modifiers and Compound Sentences Processing of a Korean-Japanese Machine Translation System (한국어-일본어 기계번역 시스템의 수식어 처리와 중문처리)

  • Joo, I.S.;Paik, M.H.;Jin, J.H.;Lim, S.T.;Lim, I.C.
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1046-1049
    • /
    • 1987
  • This paper proposes a Korean-Japanese Machine Translation System that processes unregistered words, modifiers and compound sentences. In mophological analysis, the unregistered words are processed by using unregistered word processing algorithm. The modifiers are processed by consulting noun-attributes and grammar rules. The compound sentence processing algorithm recognizes whether the sentence that includes commas is compound sentence or not. This system performs on IBM-PC/AT DOS using Prolog-1.

  • PDF

An Analysis of Korean Word Spacing Errors Made by Chinese Learners (중국인 한국어 학습자의 글쓰기에 나타난 띄어쓰기 오류 양상 및 지도 방향)

  • Wang, Yuan
    • Korean Educational Research Journal
    • /
    • v.40 no.1
    • /
    • pp.59-79
    • /
    • 2019
  • The purpose of this study is to analyze, through questionnaires and interviews, spacing errors in Chinese students' Korean writing and to propose changes for the teaching methods used for Chinese learners by analyzing the causes of errors. By analyzing the learners' writing samples, a total of 148 space errors were found. The rates of errors (77.6%) that were made by combining separate words is much higher than the errors (22.4%) that were made by placing a space within a compound word. Among the error types, "noun + noun," "observer (type) + dependent noun," and postpositional particle errors occur most frequently. In this paper, we propose the direction of spacing starting from the deductive side and the inductive side for nouns and investigations.

  • PDF

Compound Noun Analysis Strengthened Unknown Noun Processing (미등록어 처리가 강화된 복합명사 분해)

  • Kim, Eung-Gyun;Seo, Young-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2003.10d
    • /
    • pp.40-46
    • /
    • 2003
  • 본 논문에서는 분해 패턴을 이용한 재사용 분해 알고리즘과 외래어 인식, 이름 명사 인식, 지명 인식에 의한 미등록어 추정을 이용한 복합명사 분해 방법을 제안한다. 재사용 분해 알고리즘은 현재 분해되는 음절보다 짧은 길이의 음절에서 사용된 분해 방법을 재사용하여 분해가 이루어짐을 의미한다. 외래어 인식에서는 한국어 음절에서 비교적 사용 빈도가 낮은 음절들로 외래어가 구성이 됨을 이용한다. 이름 명사는 한국인의 이름 특성에서 한자 독음을 차용하여 작명이 이루어지기 때문에 일정한 수의 음절이 반복적으로 사용되는 점을 이용하여 인식한다. 지명 인식 방법은 지명이 출현하는 패턴을 분석하여 지명 사전의 검색으로 인식한다. 이와 같이 지명 사전에 의한 지명 인식과 알고리즘에 의한 외래어 및 이름 명사 인식 방법을 사용함으로써 미등록어 추정에 정확성을 높이고 분해 정확율 향상에 기여한다. 실험 결과 미등록어가 포함된 약 1,500어절에 대해 약 98%의 정확율이 나타났고, 미등록어가 사전에 모두 등재된 후의 실험에서는 약 99%의 정확율을 보였다.

  • PDF

Workbench for Constructing Dictionary for Semantic Analysis of Compound Noun (합성명사 의미해석용 사전 구축을 위한 워크벤치)

  • 이경순;김도완;최기선
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2000.06a
    • /
    • pp.149-155
    • /
    • 2000
  • 본 논문에서는 한국어에서 빈번하게 나타나는 합성명사의 의미해석을 하기 위한 워크벤치를 설계하고 구현하였다. 합성명사 의미해석을 위한 사전 구축 지원 워크벤치의 기능은 합성명사를 이루고 있는 명사와 명사가 어떠한 의미관계로 결합하고 있는지를 밝히기 위해서 의미관계 패턴을 정의한다. 정의된 의미관계 패턴을 이용하여 합성명사를 자동적으로 추출한다. 추출된 합성명사 사전을 이용해서 각 명사의 상위개념에 대해서도 의미관계를 반영시켜서 합성명사의 의미관계를 해석할 수 있도록 하는 환경을 제공하고 있다.

  • PDF