• Title/Summary/Keyword: Text features

Search Result 580, Processing Time 0.031 seconds

Text Categorization using Topic Signature and Co-occurrence Features (Topic Signature와 동시 출현 단어 쌍을 이용한 문서 범주화)

  • Bae, Won-Sik;Han, Yo-Sub;Cha, Jeong-Won
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06c
    • /
    • pp.262-267
    • /
    • 2008
  • 본 논문에서는 문서 내에서 동시에 출현하는 단어 쌍을 자질 추출 단위로 하는 문서 범주화 시스템에 대하여 기술한다. 자질 추출 단위를 단어 쌍으로 정의한 것은 문서에서 빈번하게 동시에 출현하는 단어들은 서로 연관관계가 높으며, 단어 하나보다는 연관관계가 높은 단어들의 쌍이 특정 범주의 문서에서만 나타날 확률이 높아지므로 문서 분류 능력을 높이는데 좋은 요인으로 작용할 수 있을 것이라는 가정 때문이다. 그리고 문서 요약 분야에서 제안된 Log-likelihood Ratio를 기반으로 하는 Topic Signature Term Extraction 방법을 사용하여 자질 추출을 하고, Naive Bayes 분류기를 이용하여 문서를 분류한다. 본 연구는 Reuters-21578 문서 집합을 이용한 성능평가에서 좋은 결과를 보였으며, 이는 앞으로의 연구에도 기여할 수 있을 것이라 기대한다.

  • PDF

Introduction to Graphic-based Power System Simulator (Graphic-based Power System Simulator 소개)

  • Shin, M.C.;Kim, K.J.;Eum, J.S.;Rhee, B.;Park, C.W.;Jang, J.C.
    • Proceedings of the KIEE Conference
    • /
    • 2001.11b
    • /
    • pp.133-136
    • /
    • 2001
  • In this paper the Graphic-based Power System Simulator(GPSS) program is introduced. GPSS is a Power System Simulator that is designed to provide friendly and highly interactive Graphic User Interface(GUI). The main features of GPSS are graphical free editing, quick and fine System-Drawing and Load-Flow analysis. Most of all, mapping power system data(only pure text information) into Graphic Data is of very practical use to power system designer and analyst.

  • PDF

A Study on the Hypertext Characteristics of Contemporary Architecture space (현대건축공간에 나타난 하이퍼텍스트의 특성에 관한 연구)

  • Lee, Sun-Mi;Shim, Eun-Ju
    • Proceedings of the Korean Institute of Interior Design Conference
    • /
    • 2007.11a
    • /
    • pp.128-133
    • /
    • 2007
  • Modern society changes so fast that it makes the borderlines obscure among all the elements in physical environments as well as culture and economy through rapid flows of Network or new media. Also these flows of changes appears and collides everywhere at the same time, which continuously generates heterogeneous environmental factors. For this reason, architecture is required to correspond with circumstances of the day, but it doesn't keep up with the speed of social changes actually because it features physically fixed construction. This research offers new direction and possibilities of architecture space elements using pluralistic and do-centering attributes of hypertext as a counterplan, and finds out how architecture space should correspond with the moving environment of modern society.

  • PDF

A study on the application of chinese traditional roof style to the modern architecture (중국 고대 건축 지붕 양식의 현대적 변용에 관한 연구)

  • Tang, Jie;Lee, Dong Hun
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2008.05a
    • /
    • pp.689-693
    • /
    • 2008
  • Ancient Chinese in the roof of the building referred to as "the big roof". This paper make a research on style of roof,to find out the identify contemporary social construction of a representative, the distinctive features of varying roof styles with examples. Using pictures and text, contrast to the situation, to explain why the roof of the ancient form of modern architecture can be widely used. Ultimately concluded that the ancient traditional roof styles in contemporary architecture in the use of a wide range is traditional culture and promoting the inheritance.

  • PDF

Automatic Conversion of Machining Data by the Feature Recognition of Press Mold (프레스 금형의 특징형상 인식에 의한 가공데이타 자동변환)

  • Choi, Hong-Tae;Bahn, Kab-Soo;Lee, Seok-Hee
    • IE interfaces
    • /
    • v.7 no.3
    • /
    • pp.181-191
    • /
    • 1994
  • This paper presents an automatic conversion of machining data from the orthographic views of press mold by feature recognition rule. The system includes following 6 modules : separation of views, function support, dimension text check and feature processing modules. The characteristic of this system is that with minimum user intervention, it recognizes basic features such as holes, slots, pockets and clamping parts and thus automatically converts CAD drawing details of press mold into machining data using 2D CAD system instead of using an expensive 3D Modeler. The system is developed by using IBM-PC in the environment of AutoCAD R12, AutoLISP and MetaWare High C. Performance of the system is verified as a good interfacing of CAD and CAM when applied to a lot of sample drawing.

  • PDF

Images Automatic Annotation: Multi-cues Integration (영상의 자동 주석: 멀티 큐 통합)

  • Shin, Seong-Yoon;Ahn, Eun-Mi;Rhee, Yang-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.05a
    • /
    • pp.589-590
    • /
    • 2010
  • All these images consist a considerable database. What's more, the semantic meanings of images are well presented by the surrounding text and links. But only a small minority of these images have precise assigned keyphrases, and manually assigning keyphrases to existing images is very laborious. Therefore it is highly desirable to automate the keyphrases extraction process. In this paper, we first introduce WWW image annotation methods, based on low level features, page tags, overall word frequency and local word frequency. Then we put forward our method of multi-cues integration image annotation. Also, show multi-cue image annotation method is more superior than other method through an experiment.

  • PDF

Rich Transcription Generation Using Automatic Insertion of Punctuation Marks (자동 구두점 삽입을 이용한 Rich Transcription 생성)

  • Kim, Ji-Hwan
    • MALSORI
    • /
    • no.61
    • /
    • pp.87-100
    • /
    • 2007
  • A punctuation generation system which combines prosodic information with acoustic and language model information is presented. Experiments have been conducted first for the reference text transcriptions. In these experiments, prosodic information was shown to be more useful than language model information. When these information sources are combined, an F-measure of up to 0.7830 was obtained for adding punctuation to a reference transcription. This method of punctuation generation can also be applied to the 1-best output of a speech recogniser. The 1-best output is first time aligned. Based on the time alignment information, prosodic features are generated. As in the approach applied in the punctuation generation for reference transcriptions, the best sequence of punctuation marks for this 1-best output is found using the prosodic feature model and an language model trained on texts which contain punctuation marks.

  • PDF

A Study of Hyungsang Medicine's definition and features (형상의학(形象醫學)의 정의(定義)와 특징(特徵)에 대한 고찰(考察))

  • Park, Jun Gyu;Kim, Nam Il
    • The Journal of Korean Medical History
    • /
    • v.20 no.2
    • /
    • pp.87-92
    • /
    • 2007
  • Hyungsang Medicine is a medical theory a Korean traditional medical doctor Park In Kyu established based on the contents of "Donguibogam" that is currently practiced commonly today in Korean TKM clinics. Although it is based on "Donguibogam", the most basic text of TKM, it differs from "Donguibogam" in that its diagnosis, treatment, and treatment methods of illnesses are based on the person's appearance. This study organizes the medical theory by considering its relation to other medical theories and organizing the theoretical backgrounds through examining documents that it is based on.

  • PDF

Semantic Word Categorization using Feature Similarity based K Nearest Neighbor

  • Jo, Taeho
    • Journal of Multimedia Information System
    • /
    • v.5 no.2
    • /
    • pp.67-78
    • /
    • 2018
  • This article proposes the modified KNN (K Nearest Neighbor) algorithm which considers the feature similarity and is applied to the word categorization. The texts which are given as features for encoding words into numerical vectors are semantic related entities, rather than independent ones, and the synergy effect between the word categorization and the text categorization is expected by combining both of them with each other. In this research, we define the similarity metric between two vectors, including the feature similarity, modify the KNN algorithm by replacing the exiting similarity metric by the proposed one, and apply it to the word categorization. The proposed KNN is empirically validated as the better approach in categorizing words in news articles and opinions. The significance of this research is to improve the classification performance by utilizing the feature similarities.

Text Categorization Features Automatic Extraction Method Using Chi-squared Statistic (카이제곱 통계량을 이용한 문서분류 자질 자동추출 방법)

  • Park, Jong-Hyun;Park, So-Young;Chang, Ju-No;Kihl, Tae-Suk
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.695-697
    • /
    • 2010
  • 문서에 포함되는 어휘는 문서 분류의 정보를 가지므로 문서를 분석하여 유용한 단어를 추출하는 것은 다양한 서비스와 연계되어 사용될 수 있어 매우 유용한 일이다. 문서 자동 분류에서는 분류자질 선정 방식에 따라 분류정확도가 서로 달라질 수 있으며, 문서에서 추출되는 유용한 단어에 따라 인지되는 분야가 달라질 수 있다. 이에 본 논문에서는 각 문서에 포함되는 단어에 대한 카이제곱 통계량 점수를 사용하여 단어별 문서 분류에 대한 단어의 자질을 평가하고 문서의 분류별 유용한 단어를 자동 추출하는 방법을 제안하고 개발한다.

  • PDF