• Title/Summary/Keyword: Compound words

Search Result 97, Processing Time 0.019 seconds

Concept-based Compound Keyword Extraction (개념기반 복합키워드 추출방법)

  • Lee, Sangkon;Lee, Taehun
    • The Journal of Korean Association of Computer Education
    • /
    • v.6 no.2
    • /
    • pp.23-31
    • /
    • 2003
  • In general, people use a key word or a phrase as the name of field or subject word in document. This paper has focused on keyword extraction. First of all, we investigate that an author suggests keywords that are not occurred as contents words in literature, and present generation rules to combine compound keywords based on concept of lexical information. Moreover, we present a new importance measurement to avoid useless keywords that are not related to documents' contents. To verify the validity of extraction result, we collect titles and abstracts from research papers about natural language and/or voice processing studies, and obtain the 96% precision in a top rank of extraction result.

  • PDF

A Study on the Origin of the Clothing Terms and Their Interpretations -Focusing on the Misused Foreign Languages- (의류용어의 원류와 그 의미분석 -오용되는 오래어를 중심으로-)

  • 조규회
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.19 no.6
    • /
    • pp.933-945
    • /
    • 1995
  • The purpose of this study is to examine the current foreign languages of clothing terms which have been misused, clarify the meanings and suggest the unified teams. The results are as follows. First, English and Japanese are great parts of the origins of the clothing terms in foreign languges which have been misused. And next, there were French, German, Portuguese and Spanish via English and Japanese. Especially, the misused foreign languages in styles, materials of clothing are also via English and Japanese. The compound words in Japanese are many parts of them and misused Japanese, Japanese via English, French, German, Dutch, Spanish, Poltuguese, and some terms can not be found their origin. (ex: 색채, 컬러, 카라) In case of the colors of clothing, the terms have the English marking rules and the Japanese pronounciation. And some unified terms are Korean, English, and Chinese letters. (ex: 빨강, 레드, 적색) There are lots of the misused foreign lagusges in sewing terms. On each case, the corresponding words in English and Japanese were suggested to understand easily. The most of the unified words were suggested in Korean. (ex: 하찌사시 $\rightarrow$ 하자시; padding stitch, 팔자뜨기) In clothing construction, there were lots of the misused terms in Japanese and the corrupted terms of Japanese. And so the explains and the unified terms were suggested. (ex: 구세토리, 몸새맞춤, 나찌, 가위집 (내기)) Finally, the origins of terms in western history of costume were clarified and analyzed the meanings : $\circled1$robe, $\circled2$ jacket, gipon, pourpoint, doublet, justaucorps, habit, flock(coat), cutaway, swallow tail coat, 배광, lounge suit, $\circled3$ coat Robe is the gown style garment which was used by men and women from the Middle ages, the jacket is a short, coat-like garment and coat is a long outer garment. Each origin is different, however the 'jacket' and the 'coat' were used confusely in the middle of 19th century.

  • PDF

HMM-based Korean Named Entity Recognition (HMM에 기반한 한국어 개체명 인식)

  • Hwang, Yi-Gyu;Yun, Bo-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.2
    • /
    • pp.229-236
    • /
    • 2003
  • Named entity recognition is the process indispensable to question answering and information extraction systems. This paper presents an HMM based named entity (m) recognition method using the construction principles of compound words. In Korean, many named entities can be decomposed into more than one word. Moreover, there are contextual relationships among nouns in an NE, and among an NE and its surrounding words. In this paper, we classify words into a word as an NE in itself, a word in an NE, and/or a word adjacent to an n, and train an HMM based on NE-related word types and parts of speech. Proposed named entity recognition (NER) system uses trigram model of HMM for considering variable length of NEs. However, the trigram model of HMM has a serious data sparseness problem. In order to solve the problem, we use multi-level back-offs. Experimental results show that our NER system can achieve an F-measure of 87.6% in the economic articles.

The Effects of Korean Lexical Characteristics on Memory Span (한국어 어휘특성들이 기억폭에 미치는 효과)

  • Park Tae-Jin;Park Sun-Hee;Kim Tae-Ho
    • Korean Journal of Cognitive Science
    • /
    • v.17 no.1
    • /
    • pp.15-27
    • /
    • 2006
  • The effects of the number of Hangul syllable, the nunber/location of batchim in a Hangul word, and compound/noncompound Hangul word on memory span were examined. The results were that (1) the more syllables a word had, the lower us memory span was, (2) the more batchims a two-syllable word had, the lower its memory span was (Korean batchim effect on memory span), (3) noncompound word had higher memory span than compound word. The reading speed of above mentioned words was measured and the results were that (1) the more syllables a word had, the slower its reading speed was, (2) but the reading speed of a two-syllable word was forest when it had a batchim on second syllable than when it had no batchim or had a batchim on first syllable or batchims on both syllables (Korean ending batchim effect on reading speed), (3) noncompound word was read faster thu compound word. Korean ending batchim effect on reading speed was not compatible with the explanation by articulatory loop bur compatible with the explanation by visual cache where the orthographic information was represented. The results suggest that memory span was influenced nor only by phonological information but also by orthographic information.

  • PDF

Korean Base-Noun Extraction and its Application (한국어 기준명사 추출 및 그 응용)

  • Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.15B no.6
    • /
    • pp.613-620
    • /
    • 2008
  • Noun extraction plays an important part in the fields of information retrieval, text summarization, and so on. In this paper, we present a Korean base-noun extraction system and apply it to text summarization to deal with a huge amount of text effectively. The base-noun is an atomic noun but not a compound noun and we use tow techniques, filtering and segmenting. The filtering technique is used for removing non-nominal words from text before extracting base-nouns and the segmenting technique is employed for separating a particle from a nominal and for dividing a compound noun into base-nouns. We have shown that both of the recall and the precision of the proposed system are about 89% on the average under experimental conditions of ETRI corpus. The proposed system has applied to Korean text summarization system and is shown satisfactory results.

Homonym Disambiguation based on Mutual Information and Sense-Tagged Compound Noun Dictionary (상호정보량과 복합명사 의미사전에 기반한 동음이의어 중의성 해소)

  • Heo, Jeong;Seo, Hee-Cheol;Jang, Myung-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.12
    • /
    • pp.1073-1089
    • /
    • 2006
  • The goal of Natural Language Processing(NLP) is to make a computer understand a natural language and to deliver the meanings of natural language to humans. Word sense Disambiguation(WSD is a very important technology to achieve the goal of NLP. In this paper, we describe a technology for automatic homonyms disambiguation using both Mutual Information(MI) and a Sense-Tagged Compound Noun Dictionary. Previous research work using word definitions in dictionary suffered from the problem of data sparseness because of the use of exact word matching. Our work overcomes this problem by using MI which is an association measure between words. To reflect language features, the rate of word-pairs with MI values, sense frequency and site of word definitions are used as weights in our system. We constructed a Sense-Tagged Compound Noun Dictionary for high frequency compound nouns and used it to resolve homonym sense disambiguation. Experimental data for testing and evaluating our system is constructed from QA(Question Answering) test data which consisted of about 200 query sentences and answer paragraphs. We performed 4 types of experiments. In case of being used only MI, the result of experiment showed a precision of 65.06%. When we used the weighted values, we achieved a precision of 85.35% and when we used the Sense-Tagged Compound Noun Dictionary, we achieved a precision of 88.82%, respectively.

A Study on Ma Je Kai Shi(麻帝核試) (麻帝核試의 硏究)

  • 김진구
    • The Research Journal of the Costume Culture
    • /
    • v.5 no.4
    • /
    • pp.6-11
    • /
    • 1997
  • The purpose of this study was to identify and to trace the source of origins of 麻帝核試 that appears in Kei Rim Yu Sa(鷄林類事). Comparative liguistic analytical approaches ware employed for this study. Results of this study revealed that madi(마디) survives as a dialect for m ∂ri[머리(頭)] in Kyung Sang Province Thus, it si considered that the dialect madi(마디) is a survival of 마디(麻帝) of Koryo. Similar words to 核試 of Koryo were found in Hebrew and Japanese : Heb. k-u-tsi(zi) means locks of hair and Japanese ku-shi(くシ) has several meanings : comb, head, and the hair of the head. The word 麻帝核試 of Koryo is a compound ward of madi(麻帝), head and k ∂ shi(그시) 核試 locks of hair(hair of the head). 核試 of Koryo , Jao. ku shi(くシ), and Heb. k-u-tsi(zi) showed close relationships to one another. The word ku shi(si) 그시 核試 was derived from Heb. k-u-tsi(zi) and Jap. ku shi(くシ) was originated from 核試 of Koryo. Kor. ku shi(si) 그시 核試 is a transliteration of Heb. k-u-tsi(zi) and Jap. ku shi(くシ) is a trans-literation of Kor. ku shi 그시 核試.

  • PDF

Analysis of Compound Nouns Containing Korean or Foreign Unknown Words (한국어 및 외래어 미등록어를 포함한 복합명사 분석)

  • Kim, Myoung-Sun;Ra, Dong-Yul
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2006.06a
    • /
    • pp.73-79
    • /
    • 2006
  • 본 논문에서는 미등록어 처리가 강화된 복합명사 분석 기법을 제시한다. 기본적으로 모든 복합명사 내에 한국어나 외래어의 미등록어가 포함되어 있을 수 있다는 가정하에 분석을 시도한다. 따라서 등록어로 구성된 복합명사에 대해서도 미등록어가 포함된 분해 후보가 생성될 수도 있다. 이는 분해 후보의 수를 크게 증가시키는 문제를 일으킨다. 이 문제에 대처하기 위하여 미등록어의 분류에 따라 미등록어로서의 가능성 여부의 판별 및 제거, 분해 후보 상호간의 견제에 의한 제거 등을 이용하였다. 이러한 과정은 정답 후보 선택시에도 영향을 미쳐 정답이 아닌 분해 후보가 선택되는 것을 방지할 수 있으며, 처리 시간을 줄일 수 있는 이점이 있다. 실험 결과 제시된 기법들이 매우 효과적임을 확인할 수 있었다.

  • PDF

Comparison of GC-Profile on Tobacco Smoke Components (담배 연기성분의 GC-Profile 특성비교)

  • 나효환;한상빈;복진영;이운철;백순옥;장기철;양광규
    • Journal of the Korean Society of Tobacco Science
    • /
    • v.16 no.2
    • /
    • pp.152-162
    • /
    • 1994
  • This study was designed to establish an analytical method for the properties of leaf tobacco smoke. Lyophilized TPM from leaf tobacco smoke was extracted with MeOH, dried under reduced pressure, and trimethyl-silylated(TMS). Gas chromatography of the material using SPB-5 column showed 120 quantifiable peaks. Among those, 26 compounds including a hydrocarbons, Neophytadiene. and Levulinic acid could be identified through GC-MS. Smoke properties of 5 manufacturing grades and 2 oriental cultivars of domestic and imported leaf tobacco including AB3O-1 were analyzed. For flue-cured tobacco, content of the compounds in the smoke was generally higher in American leaf tobacco except for glycerol compounds. For burley tobacco, domestic leaves were found to have much higher amount of smoke compound than imported leaves. Among oriental tobacco, Izmir contained slightly higher amount of smoke compounds than Basma. Key words : GC - profile. TPM. TMS. Leaf tobacco.

  • PDF

Extended document format map service for mobile device (바일 기기를 위한 확장 문서 포맷의 맵 서비스)

  • Kim, Jung Sook
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.6 no.4
    • /
    • pp.83-94
    • /
    • 2010
  • Mobile network infrastructure is being completed with the development of hardware and software for mobile devices. Network in mobile devices has evolved for telematics that is expanded much more than its existing concept. Telematics is compound word that is formed from the words "telecommunication" and "informatics". It means that telematics performs control and monitoring service with using mobile device resources. These services provide their services for users' requests through wired or wireless network from mobile devices and server that offers contents and network service collects management information of mobile devices. Map service is one of the preferred services for many telematics users. However, mobile map service has a limit between traffic and information sharing. Therefore it is very important to supply their information for both service provider and terminal user. In this paper, we design a new interactive sketch map using routes and information on the space to be applied effectively, and provide an extended document format that is defined to an extensible and dynamic clustering scheme to have portability map service for mobile device.