• Title/Summary/Keyword: 한글목록

Search Result 50, Processing Time 0.023 seconds

The Classification of Korean Adjectives using Case Frame Set (격틀집합을 이용한 한국어 형용사 유형 분류)

  • Jeon, Ji-Eun;Choe, Jae-Woong
    • Annual Conference on Human and Language Technology
    • /
    • 2006.10e
    • /
    • pp.254-261
    • /
    • 2006
  • 형용사 분류에 격틀이 중요한 역할을 한다는 주장은 여러 연구에서 제기된 바 있다. 본 연구에서는 격틀이 의미 분류에 기여하는 바를 보다 체계적으로 검토하기 위하여 '격틀집합'을 활용한다. 격틀집합은 한 개의 어휘가 취할 수 있는 격틀의 집합을 말한다. 격틀집합에 근거하여 형용사를 분류할 경우, 의미적으로 연관성이 높은 그룹으로 나뉠 수 있다는 가설을 바탕으로 이러한 가설의 타당성을 검증하고 이를 입증하는 것이 본 연구의 목적이다. 아울러 본 연구에서는 그러한 가설을 검증하기 위한 구체적인 방법론을 제시한다. 격틀집합정보는 세종전자사전에 들어있는 어휘별 격틀정보를 추출하여 활용한다. 본 연구 결과 도출된 총 101개의 격틀집합 중에서 한 개의 격틀만을 갖는 유형과 어휘목록이 5개미만인 유형을 제외한 12개의 격틀집합이 주요 분석 대상으로, 본 연구에서는 그 중에서 6개를 자세히 분석한다. 격틀집합별 어휘들을 살펴보면 의미적 연관성이 파악되지 않는 어휘들도 일부 포함되어 있기는 하나, 대부분은 의미적으로 상관관계가 있음을 확인할 수 있었다 이와 같은 방법론을 통해 국어 형용사 전체의 유형, 더 나아가 국어 용언을 분류하는데 본 연구의 가설과 방법론이 활용될 수 있다.

  • PDF

A Study on the Performances of Korean WWW Search Tools (국내 웹 검색도구의 특성 및 탐색 기능 평가에 관한 연구)

  • Lee Lan-Ju;Choi Kyung-Hwa
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.31 no.3
    • /
    • pp.75-108
    • /
    • 1997
  • The purpose of the article is to help users to select appropriate Korean WWW search tools and to retrieve Web documents for their information needs by using them effectively. It analyzes the characteristics, functions, and advantages and disadvatages of each search tool, while search tools are divided broadly by two categories-subject directory services (Kor-seek, DIR, ZIP, Netquest21, Simmany) and keyword search engines (Simmany, Kachine, Jungbotamjung, Anysearch, Unifinder, Webglider, Missdachanni). It provides the selection criteria of search tools for users who have information needs. In addition, the study also attempts to contribute to the improvements in designing Korean WWW search tools, as investigating their shortcomings.

  • PDF

A Method of Classification of Overseas Direct Purchase Product Groups Based on Transfer Learning (언어모델 전이학습 기반 해외 직접 구매 상품군 분류)

  • Kyo-Joong Oh;Ho-Jin Choi;Wonseok Cha;Ilgu Kim;Chankyun Woo
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.571-575
    • /
    • 2022
  • 본 논문에서는 통계청에서 매월 작성되는 온라인쇼핑동향조사를 위해, 언어모델 전이학습 기반 분류모델 학습 방법론을 이용하여, 관세청 제공 전자상거래 수입 목록통관 자료를 처리하기 위해서 해외 직접 구매 상품군 분류 모델을 구축한다. 최근에 텍스트 분류 태스크에서 많이 이용되는 BERT 기반의 언어모델을 이용하며 기존의 색인어 정보 분석 과정이나 사례사전 구축 등의 중간 단계 없이 해외 직접 판매 및 구매 상품군을 94%라는 높은 예측 정확도로 분류가 가능해짐을 알 수 있다.

  • PDF

A Comparative Analysis of Cataloging Records Related to Korea in the Major Asia-Pacific University Libraries (아태지역 주요 대학도서관의 한국관련 목록레코드 비교 분석)

  • Kim, Jeong-Hyen
    • Journal of Korean Library and Information Science Society
    • /
    • v.46 no.3
    • /
    • pp.301-323
    • /
    • 2015
  • This study was conducted to analyze the characteristics of records related to Korea shown in the cataloging records of major 10 Asia-Pacific university libraries. The results are as follows. To begin with, Korea-related records in most university libraries are very poor except for some libraries and are generally 2 times less than Japan related records. There are even 6 times less in 2 libraries. Second, most libraries organize records in MARC 21 format rather than UNIMARC, and apply subject headings of the national library, or go together with LCSH. Third, Korean materials usually write Korean in Roman characters, but 5 libraries are marked with the original Korean language and available Hangeul search. Forth, on investigation of the subject distribution in sub criteria of Korea-related records, subject related to 'history', 'economy', and 'politics' is largely shown in the highest rate. Fifth, Among the Korea-related subject heading, the terms such as 'Taekwondo', 'Kimchi', 'Dokdo', 'Donghae', 'Duman-gang', 'Baekdu-san' have different meanings in different libraries. However, these terms agree to LCSH in most libraries except for neighboring countries's libraries.

The Effect of Syllable Frequency, Syllable Type and Final Consonant on Hangeul Word and Pseudo-word Lexical Decision: An Analysis of the Korean Lexicon Project Database (한글 두 글자 단어와 비단어의 어휘판단에 글자 빈도, 글자 유형, 받침이 미치는 영향: KLP 자료의 분석)

  • Myong Seok Shin;ChangHo Park
    • Korean Journal of Cognitive Science
    • /
    • v.34 no.4
    • /
    • pp.277-297
    • /
    • 2023
  • This study attempted to find out how lexical decision of two-syllable words or pseudo-words is affected by syllabic information, such as syllable frequency, syllable (i.e. vowel) type, and presence of final consonant (i.e. batchim), through the analysis of the Korean Lexicon Project Database (KLP-DB). Hierarchical regression of RT data showed that lexical decision of words was influenced by the frequency of the first syllable, the syllable type of the first and second syllables, batchim for the first and second syllables, and also by the interaction of the two syllable types and the interaction of syllable frequency and batchim of the second syllable. For pseudo-words lexical decision was influenced by the frequency of the first and second syllables, syllable type of the first syllable, and batchim for the first and second syllables, and also by the interaction of the two syllable frequencies, the interaction of the two syllable types, and the interaction of syllable frequency and batchim of the first syllable. Word frequency had a strong effect on lexical decision of words, while syllabic information had a stable effect on the lexical decision of pseudo-words. These results indicate that syllabic information should be seriously considered in constructing word and pseudo-word lists and interpreting lexical decision time. Understanding the effect of syllabic information will also contribute to the understanding of word recognition process.

The Effectiveness of Hierarchic Clustering on Query Results in OPAC (OPAC에서 탐색결과의 클러스터링에 관한 연구)

  • Ro, Jung-Soon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.38 no.1
    • /
    • pp.35-50
    • /
    • 2004
  • This study evaluated the applicability of the static hierarchic clustering model to clustering query results in OPAC. Two clustering methods(Between Average Linkage(BAL) and Complete Linkage(CL)) and two similarity coefficients(Dice and Jaccard) were tested on the query results retrieved from 16 title-based keyword searchings. The precision of optimal dusters was improved more than 100% compared with title-word searching. There was no difference between similarity coefficients but clustering methods in optimal cluster effectiveness. CL method is better in precision ratio but BAL is better in recall ratio at the optimal top-level and bottom-level clusters. However the differences are not significant except higher recall ratio of BAL at the top-level duster. Small number of clusters and long chain of hierarchy for optimal cluster resulted from BAL could not be desirable and efficient.

Cultural Adaptation and Reliability Testing of Korean Version of the World Health Organization Disability Assessment Schedule 2.0 : 12-item versions (세계보건기구 기능제약평가목록 2.0 : 12항목-버전의 한글도구 개발과 신뢰도 검사)

  • Lee, Hae-Jung;Kim, Da-Jeong
    • Journal of the Korean Society of Physical Medicine
    • /
    • v.6 no.4
    • /
    • pp.475-488
    • /
    • 2011
  • Purpose: The aims of the study were to develop and to establish reliability in Korean versions of World Health Organization Disability Assessment Schedule 2.0(KWHODAS 2.0): 12 item-self(12-self) and 12 item-interviewer(12-interviewer) versions. Methods: KWHODAS 2.0: 12-item versions were developed in idiomatic modern Korean with a process involving independent translation, synthesis of the translations, independent back translation, and review by an expert committee to achieve equivalence with the original English. 88 participants were included in the study. 33 of participants filled the 12-self version twice to examine test-retest reliability and 55 of participants were assessed simultaneously by four interviewers using the 12-interviewer version. Intra-rater reliability was evaluated using the intra-class correlation coefficient(ICC) and inter-rater reliability was evaluated using both the ICC and k statistic. Results: Test-retest reliability for the 12-self version was excellent with $ICC_{(2,1)}$ value ranged from 0.94(CI 0.88-0.98) to 0.96(CI 0.90-0.98). Inter-rater reliability for the 12-interviewer version showed excellent agreement with $ICC_{(2,1)}$ from 0.94(CI 0.91-0.96) to 1(CI 1.0-1.0). K value was observed from 0.95 to 1. Conclusion: KWHODAS 2.0: 12-self and 12-interviewer versions were successfully translated and both scales showed excellent reliability. It is now suitable for use in clinical and research applications.

Use of Text Processing Technologies in a Semantic Web Application (시맨틱 웹 응용 서비스에서의 텍스트 처리 기술 적용)

  • Jung, Han-Min;Kang, In-Su;Koo, Hee-Kwan;Lee, Seung-Woo;Kim, Pyung;Sung, Won-Kyung
    • Annual Conference on Human and Language Technology
    • /
    • 2006.10e
    • /
    • pp.189-196
    • /
    • 2006
  • 본 논문은 시맨틱 웹 응용 서비스를 구현함에 있어 필수적으로 요구되는 온톨로지 인스턴스 구축을 효율적으로 처리하는 데 있어 텍스트 처리 기술이 어떤 역할을 수행할 수 있는 가를 $OntoFrame-K^{(R)}$라는 시맨틱 웹 기반 정보 유통 체계에의 적용 사례를 통해 살펴본다. 본 논문에서 소개하는 텍스트 처리 기술은 개체 확인물 통한 개념 사례화, 주제 분야 할당을 통한 메타데이터 확장에, 그리고 인용 정보 추출 및 인용 관계 구축을 통한 객체 관계속성 구축에 적용된다. 개체 확인에서는 메타데이터 비교 잊 병합을 사용하였으며 이를 기반으로 한 수작업 구축을 통해 8,543명의 인력 URI를 확보하였다. 주제 및 분야 할당에서는 색인어와 분야분류명이 매핑된 시소러스 개념어의 매칭을 통해 색인어 별 TF (Term Frequency), 색인어와 매칭된 개념어 별 TF, 색인어와 매칭된 개념어 별 시소러스에서의 깊이, 색인어와 매칭된 개념어 별 개념 패싯, 색인어와 매칭된 각 개념어에 부착된 분야분류명 목록 등 할당을 위한 다양한 자질을 확보 적용하였다. 인용 정보 추출과 인용 관계 구축에서는 객체 URI와 인력 URI를 기반으로 하여 자동 추출된 인용 정보를 반영하는 방식으로 7,237개 문헌으로부터 총 135개의 인용 네트워크 그룹을 자동으로 확보하였다. 본 연구를 통해 제시된 텍스트 처리 기술의 활용 방안이 향후 시맨틱 웹 응용 서비스 및 인프라 구현에서 다각적으로 활용될 수 있기를 기대한다.

  • PDF

A Study on Vascular Plants, Distribution Status and Management Plans of the Cactus Habitat (No. 429 Natural Monument) in Wolryung-ri, Jeju Island (제주 월령리 선인장군락지(천연기념물 제429호)의 관속식물상, 분포실태, 관리방안에 관한 연구)

  • Lee, Cheol-Ho;Jang, Gye-Hyun;Ryu, Tae-Bok;Choi, Byoung-Ki
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.36 no.1
    • /
    • pp.55-66
    • /
    • 2018
  • The cactus habitat in Jeju Island has a phytogeographically specific distribution in the East Asian region, and forms a unique landscape as the only native cactus (Opuntia ficus-indica (L.) Mill.) habitat in Korea. However, there has been no detailed investigation on the distribution of cacti in the habitat and no investigation on the diversity of the mixed composition of plants in the habitats and their correlation with the distribution of cactus populations. This study attempted to investigate the diversity of vascular plants in the Wollyeong-ri cactus habitat and record the actual distribution and trends of cactus distribution. In addition to the distribution characteristics of cacti, we also discuss the characteristics of species reflecting the characteristics of the habitat among the mixed population of plant species, as well as the biological and environmental factors that threaten the maintenance of cactus populations or require management for preservation of cacti. Considering the phenological character, we conducted the field surveys for flora identification six times between June 2015 and September 2017. The Engler classification system was used for the arrangement sequence and names of plants, and the Korean Plant Names Index was adopted for the Korean name of the species. The study results showed that the Wolryung-ri cactus habitat in Jeju Island has the characteristic physiognomy of an area dominated by cactus. For the vascular plants, a total of 125 taxa were identified, including 53 families, 104 genera, 109 species, 15 varieties and 1 forma. Endangered plants specified by the Ministry of Environment were not found. Two species, Cyrtomium falcatum and Asplenium incisum, were identified as the ferns, and no gymnosperms were found. In addition, 123 taxa of angiosperms, 91 taxa of dicotyledones and 32 taxa of monocotyledons were identified. The distributions of cacti were confirmed in 289 meshes corresponding to 59.3% of the total 487 meshes in the cactus protected area, which showed various coverage distributions ranging from 5% to 95%. Most of the meshes where no cacti were found are coastal areas with exposed basalt rocks where the soil depth has not developed or extremely restricted due to repeated waves, or areas where artificial facilities, grasslands, and observation paths have been constructed. On the other hand, there were 71 lattice points in 14.5% of the total area where the cactus showed 70% or higher dominance. Cacti are randomly distributed in these areas. They have adapted to the microhabitat environment and are found to be opportunistically distributed along the growable locations. Considering that the reproduction of cacti in the habitat is mostly dependent on parthenogenesis, the present distribution seems to reflect the potentially distributable regions of cacti in the habitat. Based on the results of field surveys, a management plan for conservation and protection of the protected areas has been proposed.

Color-related Query Processing for Intelligent E-Commerce Search (지능형 검색엔진을 위한 색상 질의 처리 방안)

  • Hong, Jung A;Koo, Kyo Jung;Cha, Ji Won;Seo, Ah Jeong;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.109-125
    • /
    • 2019
  • As interest on intelligent search engines increases, various studies have been conducted to extract and utilize the features related to products intelligencely. In particular, when users search for goods in e-commerce search engines, the 'color' of a product is an important feature that describes the product. Therefore, it is necessary to deal with the synonyms of color terms in order to produce accurate results to user's color-related queries. Previous studies have suggested dictionary-based approach to process synonyms for color features. However, the dictionary-based approach has a limitation that it cannot handle unregistered color-related terms in user queries. In order to overcome the limitation of the conventional methods, this research proposes a model which extracts RGB values from an internet search engine in real time, and outputs similar color names based on designated color information. At first, a color term dictionary was constructed which includes color names and R, G, B values of each color from Korean color standard digital palette program and the Wikipedia color list for the basic color search. The dictionary has been made more robust by adding 138 color names converted from English color names to foreign words in Korean, and with corresponding RGB values. Therefore, the fininal color dictionary includes a total of 671 color names and corresponding RGB values. The method proposed in this research starts by searching for a specific color which a user searched for. Then, the presence of the searched color in the built-in color dictionary is checked. If there exists the color in the dictionary, the RGB values of the color in the dictioanry are used as reference values of the retrieved color. If the searched color does not exist in the dictionary, the top-5 Google image search results of the searched color are crawled and average RGB values are extracted in certain middle area of each image. To extract the RGB values in images, a variety of different ways was attempted since there are limits to simply obtain the average of the RGB values of the center area of images. As a result, clustering RGB values in image's certain area and making average value of the cluster with the highest density as the reference values showed the best performance. Based on the reference RGB values of the searched color, the RGB values of all the colors in the color dictionary constructed aforetime are compared. Then a color list is created with colors within the range of ${\pm}50$ for each R value, G value, and B value. Finally, using the Euclidean distance between the above results and the reference RGB values of the searched color, the color with the highest similarity from up to five colors becomes the final outcome. In order to evaluate the usefulness of the proposed method, we performed an experiment. In the experiment, 300 color names and corresponding color RGB values by the questionnaires were obtained. They are used to compare the RGB values obtained from four different methods including the proposed method. The average euclidean distance of CIE-Lab using our method was about 13.85, which showed a relatively low distance compared to 3088 for the case using synonym dictionary only and 30.38 for the case using the dictionary with Korean synonym website WordNet. The case which didn't use clustering method of the proposed method showed 13.88 of average euclidean distance, which implies the DBSCAN clustering of the proposed method can reduce the Euclidean distance. This research suggests a new color synonym processing method based on RGB values that combines the dictionary method with the real time synonym processing method for new color names. This method enables to get rid of the limit of the dictionary-based approach which is a conventional synonym processing method. This research can contribute to improve the intelligence of e-commerce search systems especially on the color searching feature.