• Title/Summary/Keyword: Sentence Weight

Search Result 29, Processing Time 0.024 seconds

Event Sentence Extraction for Online Trend Analysis (온라인 동향 분석을 위한 이벤트 문장 추출 방안)

  • Yun, Bo-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.9
    • /
    • pp.9-15
    • /
    • 2012
  • A conventional event sentence extraction research doesn't learn the 3W features in the learning step and applies the rule on whether the 3W feature exists in the extraction step. This paper presents a sentence weight based event sentence extraction method that calculates the weight of the 3W features in the learning step and applies the weight of the 3W features in the extraction step. In the experimental result, we show that top 30% features by the $TF{\times}IDF$ weighting method is good in the feature filtering. In the real estate domain of the public issue, the performance of sentence weight based event sentence extraction method is improved by who and when of 3W features. Moreover, In the real estate domain of the public issue, the sentence weight based event sentence extraction method is better than the other machine learning based extraction method.

A Text Summarization Model Based on Sentence Clustering (문장 클러스터링에 기반한 자동요약 모형)

  • 정영미;최상희
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.3
    • /
    • pp.159-178
    • /
    • 2001
  • This paper presents an automatic text summarization model which selects representative sentences from sentence clusters to create a summary. Summary generation experiments were performed on two sets of test documents after learning the optimum environment from a training set. Centroid clustering method turned out to be the most effective in clustering sentences, and sentence weight was found more effective than the similarity value between sentence and cluster centroid vectors in selecting a representative sentence from each cluster. The result of experiments also proves that inverse sentence weight as well as title word weight for terms and location weight for sentences are effective in improving the performance of summarization.

  • PDF

A Development of MiTS Network Protocol based on Light-Weight Ethernet (Light-Weight Ethernet 기반 MiTS 네트워크 프로토콜 개발)

  • Hwang, Hun-Gyu;Yoon, Jin-Sik;Lee, Seong-Dae;Seo, Jeong-Min;Jang, Kil-Woong;Lee, Jang-Se;Park, Hyu-Chan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.34 no.8
    • /
    • pp.1172-1179
    • /
    • 2010
  • In this paper, we analyze and design requirements of Network Function block and System Function block of MiTS network protocol based on Light-Weight Ethernet, also implement and test the protocol and library files. Light-Weight Ethernet Protocol consists of Network Function block and System Function block. NF receives and sends datagram based on UDP multi-casting communication. SF processes messages after distinguished Sentence and Binary Image Data.

The Study of Developing Korean SentiWordNet for Big Data Analytics : Focusing on Anger Emotion (빅데이터 분석을 위한 한국어 SentiWordNet 개발 방안 연구 : 분노 감정을 중심으로)

  • Choi, Sukjae;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.19 no.4
    • /
    • pp.1-19
    • /
    • 2014
  • Efforts to identify user's recognition which exists in the big data are being conducted actively. They try to measure scores of people's view about products, movies and social issues by analyzing statements raised on Internet bulletin boards or SNS. So this study deals with the problem of determining how to find the emotional vocabulary and the degree of these values. The survey methods are using the results of previous studies for the basic emotional vocabulary and degree, and inferring from the dictionary's glosses for the extended emotional vocabulary. The results were found to have the 4 emotional words lists (vocabularies) as basic emotional list, extended 1 stratum 1 level list from basic vocabulary's glosses, extended 2 stratum 1 level list from glosses of non-emotional words, and extended 2 stratum 2 level list from glosses' glosses. And we obtained the emotional degrees by applying the weight of the sentences and the emphasis multiplier values on the basis of basic emotional list. Experimental results have been identified as AND and OR sentence having a weight of average degree of included words. And MULTIPLY sentence having 1.2 to 1.5 weight depending on the type of adverb. It is also assumed that NOT sentence having a certain degree by reducing and reversing the original word's emotional degree. It is also considered that emphasis multiplier values have 2 for 1 stratum and 3 for 2 stratum.

Word sense disambiguation using dynamic sized context and distance weighting (가변 크기 문맥과 거리가중치를 이용한 동형이의어 중의성 해소)

  • Lee, Hyun Ah
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.38 no.4
    • /
    • pp.444-450
    • /
    • 2014
  • Most researches on word sense disambiguation have used static sized context regardless of sentence patterns. This paper proposes to use dynamic sized context considering sentence patterns and distance between words for word sense disambiguation. We evaluated our system 12 words in 32,735sentences with Sejong POS and sense tagged corpus, and dynamic sized context showed 92.2% average accuracy for predicates, which is better than accuracy of static sized context.

Summarizing the Differences in Chinese-Vietnamese Bilingual News

  • Wu, Jinjuan;Yu, Zhengtao;Liu, Shulong;Zhang, Yafei;Gao, Shengxiang
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1365-1377
    • /
    • 2019
  • Summarizing the differences in Chinese-Vietnamese bilingual news plays an important supporting role in the comparative analysis of news views between China and Vietnam. Aiming at cross-language problems in the analysis of the differences between Chinese and Vietnamese bilingual news, we propose a new method of summarizing the differences based on an undirected graph model. The method extracts elements to represent the sentences, and builds a bridge between different languages based on Wikipedia's multilingual concept description page. Firstly, we calculate the similarity between Chinese and Vietnamese news sentences, and filter the bilingual sentences accordingly. Then we use the filtered sentences as nodes and the similarity grade as the weight of the edge to construct an undirected graph model. Finally, combining the random walk algorithm, the weight of the node is calculated according to the weight of the edge, and sentences with highest weight can be extracted as the difference summary. The experiment results show that our proposed approach achieved the highest score of 0.1837 on the annotated test set, which outperforms the state-of-the-art summarization models.

Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation

  • Jeon, Hyung-Bae;Lee, Soo-Young
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.487-493
    • /
    • 2016
  • Two new methods are proposed for an unsupervised adaptation of a language model (LM) with a single sentence for automatic transcription tasks. At the training phase, training documents are clustered by a method known as Latent Dirichlet allocation (LDA), and then a domain-specific LM is trained for each cluster. At the test phase, an adapted LM is presented as a linear mixture of the now trained domain-specific LMs. Unlike previous adaptation methods, the proposed methods fully utilize a trained LDA model for the estimation of weight values, which are then to be assigned to the now trained domain-specific LMs; therefore, the clustering and weight-estimation algorithms of the trained LDA model are reliable. For the continuous speech recognition benchmark tests, the proposed methods outperform other unsupervised LM adaptation methods based on latent semantic analysis, non-negative matrix factorization, and LDA with n-gram counting.

An Analysis of Capacity and Weight in the Elementary Mathematics Textbooks (들이와 무게의 단위에 대한 초등학교 수학 교과서 분석)

  • Kwon, MiSun;Pang, JeongSuk
    • School Mathematics
    • /
    • v.19 no.2
    • /
    • pp.385-403
    • /
    • 2017
  • This paper analyzed the units of capacity and weight in the mathematics textbooks in terms of units, relationship between the units, and the need of standard units. The results of this study showed that there were differences in the representation of the units, the notation of units, the types of units, and the representation of basic amount of units in the mathematics textbooks developed by the 5th, 6th, 7th, 2007 revised, and 2009 revised national mathematics curriculum respectively. The mathematics textbook developed by the 2009 revised mathematics curriculum was found to be generally consistent with the International System of Units (SI). However, the way of defining 1kg through the weight of water was found to be different from the SI. The relationship between the units was consistently described only with a written sentence in the previous mathematics textbooks, but the 2009 revised mathematics textbook presents questions and pictures to foster students' understanding of the relationship. Finally, activities and questions are required for students to recognize the universality and convenience of the standard units through the inconvenience of arbitrary units. Based on these results, this paper provides implications for the development of mathematics textbooks in the units of capacity and weight.

Content Analysis of Food & Nutrition Section in Middle School Textbooks -Home Economics, Physical Education and Science- (중학교 교과서 식생활 내용분석 -가정, 체육, 과학을 중심으로-)

  • 이영숙;김영남
    • Journal of Korean Home Economics Education Association
    • /
    • v.12 no.3
    • /
    • pp.53-63
    • /
    • 2000
  • The purpose of this study was quantitative and qualitative contents analysis of food and nutrition section in middle school textbooks of home economics, physical education and science. As a quantitative approach numbers of sentence lines tables, figures, photos, activities, and exercises were counted. As a qualitative approach, types of explanations were categorized by 7 criteria, and commons and differences of the contents of those subjects were compared. The conclusions of this study were summarized as follows: 1) Contents of food and nutrition section were divided into nutrients. water. energy, food groups, and nutritional problems. When average sentence lines of each were compared, those of nutrients were the longest in all 3 subjects. 2) When compared the numbers of tables, figures, and photos in 3 subjects of textbooks, there were more figures in home economics and science, and more tables in physical education. 3) There were more activities and exercises in home economics an science than in physical education. 4) The D(sentences with table) or E type(sentences with figure) was adapted for the explanation of nutrients functions, recommended dietary allowance, food sources, food groups, eating habits, and weight control in home economics: nutritions functions and energy metabolism in physical education : and digestion, body constituents, energy metabolism, and detection of nutrients in science. 5) Contents about classification and functions of nutrients. food sources deficiency water, energy contents of nutrients and obesity were shown in all 3 subjects. Food groups and eating habits were explained in detail in home economics whereas digestion of nutrients in the digestive tracts were explained in detail in science. Recommended dietary allowance for Koreans and basic food groups revised in 1995 were presented in home economics, whereas those revised in 1989 were presented in physical education. To avoid confusion, recommended dietary allowance for Koreans and food groups presented in physical education tex should be updated.

  • PDF

Automatic Extractive Summarization of Newspaper Articles using Activation Degree of 5W1H (육하원칙 활성화도를 이용한 신문기사 자동추출요약)

  • 윤재민;정유진;이종혁
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.505-515
    • /
    • 2004
  • In a newspaper, 5W1H information is the most fundamental and important element for writing and understanding articles. Focusing on such a relation between a newspaper article and the 5W1H, we propose a summarization method based on the activation degree of 5W1H. To overcome problems of the lead-based and the title-based methods, both of which are known to be the most effective in newspaper summarization, sufficient 5W1H information is extracted from both a title and a lead sentence. Moreover, for each sentence, its weight is computed by considering various factors, such as activation degree of 5W1H, the number of 5W1H categories, and its length and position. These factors make a great contribution to the selection of more important sentences, and thus to the improvement of readability of the summarized texts. In an experimental evaluation, the proposed method achieved a precision of 74.7% outperforming the lead-based method. In sum, our 5W1H approach was shown to be promising for automatic summarization of newspaper articles.