• Title/Summary/Keyword: 담화단위

Search Result 16, Processing Time 0.022 seconds

Performance Improvement by a Virtual Documents Technique in Text Categorization (문서분류에서 가상문서기법을 이용한 성능 향상)

  • Lee, Kyung-Soon;An, Dong-Un
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.501-508
    • /
    • 2004
  • This paper proposes a virtual relevant document technique in the teaming phase for text categorization. The method uses a simple transformation of relevant documents, i.e. making virtual documents by combining document pairs in the training set. The virtual document produced by this method has the enriched term vector space, with greater weights for the terms that co-occur in two relevant documents. The experimental results showed a significant improvement over the baseline, which proves the usefulness of the proposed method: 71% improvement on TREC-11 filtering test collection and 11% improvement on Routers-21578 test set for the topics with less than 100 relevant documents in the micro average F1. The result analysis indicates that the addition of virtual relevant documents contributes to the steady improvement of the performance.

A Study on the Construction of the Automatic Summaries - on the basis of Straight News in the Web - (자동요약시스템 구축에 대한 연구 - 웹 상의 보도기사를 중심으로 -)

  • Lee, Tae-Young
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.4 s.62
    • /
    • pp.41-67
    • /
    • 2006
  • The writings frame and various rules based on discourse structure and knowledge-based methods were applied to construct the automatic Ext/sums (extracts & summaries) system from the straight news in web. The frame contains the slot and facet represented by the role of paragraphs, sentences , and clauses in news and the rules determining the type of slot. Rearrangement like Unification, separation, and synthesis of the candidate sentences to summary, maintaining the coherence of meanings, was carried out by using the rules derived from similar degree measurement, syntactic information, discourse structure, and knowledge-based methods and the context plots defined with the syntactic/semantic signature of noun and verb and category of verb suffix. The critic sentence were tried to insert into summary.

Abductive Rules for Text Cohesion (글의 응집성을 포착하기 위한 개연규칙)

  • Kim Gon;Yang Jae-Gun;Kim Min-Chan;Bae Jae-Hak
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.11a
    • /
    • pp.517-520
    • /
    • 2004
  • 본 논문에서는 글의 응집성을 포착하기 위하여 개연규칙을 활용한다. 개연규칙은 문장 구성성분들의 문장간 개연적 연결상황을 나타내고, 글의 인과 성향이나 담화작용을 반영한다. 글을 이해하기 위한 대표적인 속성에는 글에 긴밀성을 부여하는 응집성이 있다. 글의 응집성을 파악하기 위한 대표적인 언어학적 도구나 지식으로는 어휘사슬을 들 수 있다. 이에 본 논문에서는 주어진 예문의 어휘사슬을 개연규칙으로 찾아낸 개연사슬과 비교해 보았다. 그 결과, 중요도가 높은 어휘사슬과 대응하는 개연사슬을 발견할 수 있었다. 개연사슬은 종래의 어휘사슬의 기능을 포함할 뿐만 아니라, 줄거리 단위, 단서구 용법, 문장사이의 개연성 등을 감지하여 문장간의 의미적 연관성을 포착할 수 있다. 이는 개연규칙을 활용하여 글의 화제문을 효과적으로 선별할 수 있음을 보인다.

  • PDF

Development of Language Rehabilitation Program Using the Smart Device-based Application (스마트 기기 기반 언어재활 프로그램 개발)

  • Hwang, Yu Mi;Park, Kinam;Jung, Young Hee;Pyun, Sung-Bom
    • Journal of Digital Convergence
    • /
    • v.17 no.10
    • /
    • pp.321-327
    • /
    • 2019
  • The purpose of this study is to develop a smart device-based Language Rehabilitation Program (LRP) to improve communication ability for the patients with language disability. The content of the LRP includes a variety of semantic categories and grammatical elements and consists of 17 semantic categories, 29 tasks and 3780 items to improve comprehension/production ability at word level, semantic category level, sentence level and discourse level. We developed LRP as a Windows-base management program and an Android-base language rehabilitation application. LRP was developed into an application for smart devices, providing real-time delivery of training contents, measurement and database of training task results, and patient progress and monitoring. A follow-up study will be conducted on the verification of the language rehabilitation effect using LRP by patients with language disability.

The relation between Movement working as a Grouping clue in Moving Picture and Semantic structure forming (동영상에서 그룹핑(grouping) 단서로 작용하는 움직임(Movement)과 의미구조 형성의 관계)

  • Lee, Soo-Jin
    • Archives of design research
    • /
    • v.19 no.5 s.67
    • /
    • pp.119-128
    • /
    • 2006
  • The scale of visual expression has expanded from freeze frame to motion picture as media have developed. Moving pictures such as animation, movies, TV CM and GUI become formative elements whose movement is necessary compared to freeze frame as apparent movement phenomenon and unit structure such as short and scene appear. Therefore, of formative elements such as a shape, color, space, size and movement, movement is importantly distinguished in the moving image. The expression and form of image as a relationship between the signified and signifier explained by Saussure are accepted as a sign by mutual complement even though they limit the content. This makes it possible to infer that the formal feature of movement participates in the message content. To verify this, the result of moving picture visual perception experiment based on the gestalt grouping principle result shows that 70-80 percent of subjects think that 'movement' is the important grouping clue in perception. Movement affects the maintenance of the context of message content in the communication process when the meaning structure of moving picture is analyzed based on the structural feature. The identity can be maintained with if there is a movement with similar directive point even if the color and shape of people, things and background are changed. Second, the clarity of the content is elevated by a distinguished object as a figure by movement. Third, it acts as a knowledge representation which can predict similar movement process of next information processing. Forth, movement gives the content consistency even though more than two scenes have fast switch and complicated editing structure like cross-cutting. Movement becomes a clue which can make grouping information input by visual perception reaction. Also, it gives the order to the visual expression which can be used improperly by formation of structural frame of image message and has the effectiveness which elevates the clarity of signification. Moving picture has discourse with several mixed unit structures because it fundamentally contains time and the common and distinguished expression is needed by media-mix circumstances. Therefore, by the application of gestalt grouping principle to moving picture field, movement becomes the more distinguished than other formative elements and affects the formation of meaning structure. This study propose a viewpoint that develops structural formative beauty and new image expression in the media image field.

  • PDF

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis (텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석)

  • Kam, Miah;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.53-77
    • /
    • 2012
  • This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.