• Title/Summary/Keyword: Korean text classification

Search Result 413, Processing Time 0.032 seconds

Thesaurus Development for HiTEL Service (하이텔 메뉴검색용 시소러스의 개발에 관한 연구)

  • 최석두
    • Journal of the Korean Society for information Management
    • /
    • v.13 no.1
    • /
    • pp.227-241
    • /
    • 1996
  • We present development results for a Hangul thesaurus which was provided to improve performance of the intelligent information retrieval system. The important stages and methods in the process of term acquisition, classification, creation of the consistency-effectiveness relationship using HiTEL menu and text of dictionary are described. To cany out our study we have built a thesaurus management system and also describe its utility functions.

  • PDF

A Study on dart manipulation of women`s front bodice by CAD System(I)-the comparison automatic manopulating functions of dart in CAD system and the classification the dart of women`s front bodice- (CAD시스템을 이용한 앞길의 다트변형에 관한 연구(I)-CAD의 다트 자종변환기능의 비교분석 및 앞길 다트 분류를 중심으로-)

  • 조영아
    • Journal of the Korean Home Economics Association
    • /
    • v.34 no.5
    • /
    • pp.249-264
    • /
    • 1996
  • The purpose of this study was to investigate automatic manipulating functions of dart of CAD system, and to classify the dart of women's front bodice. The results from this study: 1.3 CAD systems, were compared in automatic manipulating functions of dart. Gerber system & Investronica system were based on the pivot-method of dart manipulating, Yuka system was based on the slash-method. 2. It is classified and made a dart-design chart with using darts, which were as examples related to dart manipulating in text & reference of the pattern design. 3. In case of education of dart manipulation, the classified dart-design chart provides variations of a basic pattern through dart manipulation.

  • PDF

Improving performance of Binary Text Classification Using the EM algorithm (EM 알고리즘을 이용한 이진 분류 문서 범주화의 성능 향상)

  • 한형동;고영중;서정연
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.790-792
    • /
    • 2004
  • 문서 범주화에서 이진분류를 다중 분류에 적용할 때, 일반적으로 One-Against-All 방법을 사용한다. 하지만, 이 One-Against-All 방법은 한가지 문제점을 가진다. 즉, positive 집합의 문서들은 사람이 직접 범주를 할당한 것이지만, negative 집합의 문서들은 사람이 직접 범주를 할당한 것이 아니기 때문에 오류 문서들이 포함될 수 있다는 것이다. 본 논문에서는 이러한 문제점을 해결하기 위해 Sliding Window기법과 EM 알고리즘을 이진 분류 기반의 문서 범주화에 적용할 것을 제안한다. 먼저 Sliding Window 기법을 이용하여 학습 데이터로부터 오류 문서들을 추출하고 이 문서들을 EM 알고리즘을 사용해서 다시 범주를 할당함으로써 이진 분류 기반의 문서 범주화 기법의 성능을 향상시킨다.

  • PDF

Text Classification to Analyze the Effect of Positive Similarity in Series Reviews on the Box Office Performance (시리즈물 리뷰의 긍정 유사도가 흥행에 미치는 영향을 분석하기 위한 텍스트 분류)

  • Kim, Sujin;Cho, Hyungmin
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.843-846
    • /
    • 2022
  • 오늘날 인터넷이 보편화되었고, 최근에는 최근에는 코로나19 유행으로 사람들이 집에 머무르는 시간이 많아지면서 여러 온라인 플랫폼을 통해 영화, 드라마 등의 프로그램을 시청하는 것에 관심이 많아지고 있다. 또한, 그러한 시대적 흐름에 따라 시즌제 형식의 시리즈물을 통해 보다 퀄리티 높은 콘텐츠를 보고자 하는 소비자 니즈도 증가하고 있다. 시리즈물은 전편과 속편이 유기적으로 연결되기 때문에 전편의 리뷰를 분석하여 관객의 니즈를 파악하고 그것을 속편에 반영하는 것이 중요해 보인다. 따라서 본 연구에서는 텍스트 분류를 통해 시리즈물의 전편과 속편 리뷰의 긍정 유사도를 비교하고, 나아가 긍정 유사도가 흥행 성적에 유의미한 영향을 미치는지 알아보고자 한다.

  • PDF

Data Mining Research on Maehwado Painting Poetry in the Early Joseon Dynasty

  • Haeyoung Park;Younghoon An
    • Journal of Information Processing Systems
    • /
    • v.19 no.4
    • /
    • pp.474-482
    • /
    • 2023
  • Data mining is a technique for extracting valuable information from vast amounts of data by analyzing statistical and mathematical operations, rules, and relationships. In this study, we employed data mining technology to analyze the data concerning the painting poetry of Maehwado (plum blossom paintings) from the early Joseon Dynasty. The data was extracted from the Hanguk Munjip Chonggan (Korean Literary Collections in Classical Chinese) in the Hanguk Gojeon Jonghap database (Korea Classics DB). Using computer information processing techniques, we carried out web scraping and classification of the painting poetry from the Hanguk Munjip Chonggan. Subsequently, we narrowed down our focus to the painting poetry specifically related to Maehwado in the early Joseon Dynasty. Based on this, refined dataset, we conducted an in-depth analysis and interpretation of the text data at the syllable corpus level. As a result, we found a direct correlation between the corpus statistics for each syllable in Maehwado painting poetry and the symbolic meaning of plum blossoms.

A Study on Clinical Classification and Characteristic of Children with Recurrent Abdominal Pain (만성(慢性) 반복성(反復性) 복통(腹痛)을 주증(主症)으로 하는 환아(患兒)의 임상적(臨床的) 특징(特徵)에 관한 연구(硏究) -기능성 복통을 중심으로-)

  • Kim, Sung-Hee;Park, Sang-Wook;Lee, Seung-Yeon
    • The Journal of Pediatrics of Korean Medicine
    • /
    • v.16 no.2
    • /
    • pp.1-22
    • /
    • 2002
  • Purpose : This Study was conducted to evaluate clinical characteristic of children with recurrent abdominal pain (RAP) and to be classified by its six subtype in the Oriental Pediatric Text Book and to find out relationship of western classification. Methods : Patients who visited Dong-Eui Oriental Medical hospital from August, 2001 to October, 2002 due to RAP were included. According to questionnaire and history taking, RAP was classified by its six subtype based on Oriental medical theory. Results : 1. Patients with RAP were more internalized, have a close relation with their parents, and have strong desires of success, but social intercourse is low. 2. 76% of Patients have a less desire to eat and 67% of Patients have a diarrhea or constipation. 3. According to questionnaire, first abdominal pain was their $3{\sim}5$ ages most, cause of occurrence was more 'eating cold foods' most, time of AP (abdominal pain) was $1{\sim}2$ hours after eating and no characteristic most, site of AP was the umbilicus most, shape of AP was impotent pain most, cause of reduce pain was abdominal massage and defection most. 4. frequency of RAP's type, AP caused by diet(食積腹痛) is 45.5%, AP caused by cold(寒腹痛) is 29.1%, AP caused by cold in internal organs of deficiency(臟腑虛冷腹痛) is 12.7%, stagnation of qi and stasis of blood(氣滯血瘀腹痛) is 10.9%, AP caused by internal diet and external cold(內食外寒腹痛) is 1.8%. There is no AP caused by parasites(蟲腹痛). 5. During clinical classifications of RAP, cause of occurrence was most important cause of reduce pian, defection practice was helpful for diagnosis, but shape of AP, site of AP was not helpful. 6. With relationship of Oriental classification and western classification, AP caused by diet is similar to dysmotilitylike dyspepsia and irriTable bowel syndrome. AP caused by cold is similar to irriTable bowel syndrome. AP caused by cold in internal organs of deficiency is similar to unspecified dyspepsia. stagnation of qi and stasis of blood and AP caused by internal diet and external cold is not like to western classification. Conclusion : RAP in Childhood is most occurred by food and cold. there is few AP caused by stagnation of qi and stasis of blood and internal diet and external cold. So the study on subclassification and clinical Manifestations of RAP in Childhood is more performed.

  • PDF

An Emotion Scanning System on Text Documents (텍스트 문서 기반의 감성 인식 시스템)

  • Kim, Myung-Kyu;Kim, Jung-Ho;Cha, Myung-Hoon;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.12 no.4
    • /
    • pp.433-442
    • /
    • 2009
  • People are tending to buy products through the Internet rather than purchasing them from the store. Some of the consumers give their feedback on line such as reviews, replies, comments, and blogs after they purchased the products. People are also likely to get some information through the Internet. Therefore, companies and public institutes have been facing this situation where they need to collect and analyze reviews or public opinions for them because many consumers are interested in other's opinions when they are about to make a purchase. However, most of the people's reviews on web site are too numerous, short and redundant. Under these circumstances, the emotion scanning system of text documents on the web is rising to the surface. Extracting writer's opinions or subjective ideas from text exists labeled words like GI(General Inquirer) and LKB(Lexical Knowledge base of near synonym difference) in English, however Korean language is not provided yet. In this paper, we labeled positive, negative, and neutral attribute at 4 POS(part of speech) which are noun, adjective, verb, and adverb in Korean dictionary. We extract construction patterns of emotional words and relationships among words in sentences from a large training set, and learned them. Based on this knowledge, comments and reviews regarding products are classified into two classes polarities with positive and negative using SO-PMI, which found the optimal condition from a combination of 4 POS. Lastly, in the design of the system, a flexible user interface is designed to add or edit the emotional words, the construction patterns related to emotions, and relationships among the words.

  • PDF

A Train Ticket Reservation Aid System Using Automated Call Routing Technology Based on Speech Recognition (음성인식을 이용한 자동 호 분류 철도 예약 시스템)

  • Shim Yu-Jin;Kim Jae-In;Koo Myung-Wan
    • MALSORI
    • /
    • no.52
    • /
    • pp.161-169
    • /
    • 2004
  • This paper describes the automated call routing for train ticket reservation aid system based on speech recognition. We focus on the task of automatically routing telephone calls based on user's fluently spoken response instead of touch tone menus in an interactive voice response system. Vector-based call routing algorithm is investigated and mapping table for key term is suggested. Korail database collected by KT is used for call routing experiment. We evaluate call-classification experiments for transcribed text from Korail database. In case of small training data, an average call routing error reduction rate of 14% is observed when mapping table is used.

  • PDF

Online Social Media Review Mining for Living Items with Probabilistic Approach: A Case Study

  • Li, Shuai;Hao, Fei;Kim, Hee-Cheol
    • Smart Media Journal
    • /
    • v.2 no.2
    • /
    • pp.20-27
    • /
    • 2013
  • The concept of social media is top of the agenda for many business executives and decision makers, as well as consultants try to identify ways where companies can make profitable use of applications such as Netflix, Flixster. The social media is playing an increasingly important role as the information sources for customers making product choices etc. With the flourish of Web 2.0 technology, customer reviews are becoming more and more useful and important information resources for people to save their time and energy on purchasing products that they want. This paper proposes the Bayesian Probabilistic Classification algorithm to mine the social media review, and evaluates it by different splits and cross validation mechanism from the real data set. The explored study experimental results show the robustness and effectiveness of proposed approach for mining the social media review.

  • PDF

Multi-channel CNN for Korean Sentiment Analysis (Multi-channel CNN을 이용한 한국어 감성분석)

  • Kim, Min;Byun, Jeunghyun;Lee, Chunghee;Lee, Yeonsoo
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.79-83
    • /
    • 2018
  • 본 논문은 한국어 문장의 형태소, 음절, 자소를 동시에 각자 다른 합성곱층을 통과시켜 문장의 감성을 분류하는 Multi-channel CNN을 제안한다. 오타를 포함하는 구어체 문장들의 경우에 형태소 기반 CNN으로 추출 할 수 없는 특징들을 음절이나 자소에서 추출 할 수 있다. 한국어 감성분석에 형태소 기반 CNN이 많이 쓰이지만, 본 논문의 Multi-channel CNN 모델은 형태소, 음절, 자소를 동시에 고려하여 더 정확하게 문장의 감성을 분류한다. 본 논문이 제안하는 모델이 형태소 기반 CNN보다 야구 댓글 데이터에서는 약 4.8%, 영화 리뷰 데이터에서는 약 1.3% 더 정확하게 문장의 감성을 분류하였다.

  • PDF