• Title/Summary/Keyword: Sentence Importance

Search Result 59, Processing Time 0.022 seconds

A Korean Text Summarization System Using Aggregate Similarity (도합유사도를 이용한 한국어 문서요약 시스템)

  • 김재훈;김준홍
    • Korean Journal of Cognitive Science
    • /
    • v.12 no.1_2
    • /
    • pp.35-42
    • /
    • 2001
  • In this paper. a document is represented as a weighted graph called a text relationship map. In the graph. a node represents a vector of nouns in a sentence, an edge completely connects other nodes. and a weight on the edge is a value of the similarity between two nodes. The similarity is based on the word overlap between the corresponding nodes. The importance of a node. called an aggregate similarity in this paper. is defined as the sum of weights on the links connecting it to other nodes on the map. In this paper. we present a Korean text summarization system using the aggregate similarity. To evaluate our system, we used two test collection, one collection (PAPER-InCon) consists of 100 papers in the field of computer science: the other collection (NEWS) is composed of 105 articles in the newspapers and had built by KOROlC. Under the compression rate of 20%. we achieved the recall of 46.6% (PAPER-InCon) and 30.5% (NEWS) and the precision of 76.9% (PAPER-InCon) and 42.3% (NEWS).

  • PDF

A Recognition Method for Main Characters Name in Korean Novels (한국어 소설에서 주요 인물명 인식 기법)

  • Kim, Seo-Hee;Park, Tae-Keun;Kim, Seung-Hoon
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.9 no.1
    • /
    • pp.75-81
    • /
    • 2016
  • The main characters play leading roles in novels. In the previous studies, they recognize the main characters in a novel mainly based on dictionaries that built beforehand. In English, names begin with upper cases and are used with some words. In this paper, we propose a recognition method for main characters name in Korean novels by using predicates, rules and weights. We first recognize candidates for the characters name by predicates and propose some rules to exclude candidates that cannot be characters. We assign importances for candidates, considering weights that given by the number of candidates appeared in a sentence. Finally, if the importance of the character is more than a threshold, we decide that the character is one of main characters. The results from the experiments for 300 novels show that an average accuracy is 85.97%. The main characters name may be used to grasp relationships among characters, character's action and tendency.

An Improved Automatic Text Summarization Based on Lexical Chaining Using Semantical Word Relatedness (단어 간 의미적 연관성을 고려한 어휘 체인 기반의 개선된 자동 문서요약 방법)

  • Cha, Jun Seok;Kim, Jeong In;Kim, Jung Min
    • Smart Media Journal
    • /
    • v.6 no.1
    • /
    • pp.22-29
    • /
    • 2017
  • Due to the rapid advancement and distribution of smart devices of late, document data on the Internet is on the sharp increase. The increment of information on the Web including a massive amount of documents makes it increasingly difficult for users to understand corresponding data. In order to efficiently summarize documents in the field of automated summary programs, various researches are under way. This study uses TextRank algorithm to efficiently summarize documents. TextRank algorithm expresses sentences or keywords in the form of a graph and understands the importance of sentences by using its vertices and edges to understand semantic relations between vocabulary and sentence. It extracts high-ranking keywords and based on keywords, it extracts important sentences. To extract important sentences, the algorithm first groups vocabulary. Grouping vocabulary is done using a scale of specific weight. The program sorts out sentences with higher scores on the weight scale, and based on selected sentences, it extracts important sentences to summarize the document. This study proved that this process confirmed an improved performance than summary methods shown in previous researches and that the algorithm can more efficiently summarize documents.

The Method of Deriving Japanese Keyword Using Dependence (의존관계에 기초한 일본어 키워드 추출방법)

  • Lee, Tae-Hun;Jung, Kyu-Cheol;Park, Ki-Hong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.1
    • /
    • pp.41-46
    • /
    • 2003
  • This thesis composes separated words in text for extracting keywords from Japanese, proposes extracting indexing keywords which consist of a compound noun using words and sentences information with the rules in the sentences. It constructs generative rules of compound nouns to be based In dependence as a result of analysing character of keywords in the text not the same way as before. To hold other extracting keywords and the content of sentence, and suggest how to decide importance concerned some restriction and repetition of words about generative rules. To verify the validity of keywords extracting, we have used titles and abstracts from Japanese thesis 65 files about natural language and/or voice processing, and obtain 63% in outputting one in the top rank.

The connection between illustrations and contents in elementary mathematics textbooks (초등학교 수학교과서 그림과 내용의 연계성)

  • Hong, Gap Ju
    • The Mathematical Education
    • /
    • v.58 no.2
    • /
    • pp.225-237
    • /
    • 2019
  • The picture of the mathematics curriculum should carry the complex role of relieving the difficulties of mathematics while conveying the core of the mathematics contents well. This study examined the precedence of picture and text harmony and the importance of emotional expression. The discussion of children's picture books became an important reference in this process. The understanding of the child's psychology and cognitive characteristics in the long history of picture books and the insight into the relationship between text and pictures will be important guidelines for elementary school textbooks. Based on these previous studies, this study found some impressive examples of Chinese, Japanese, Indian, and American textbooks on the two complementary relationships between paintings and texts and emotional expressions of paintings. If necessary, we compared these textbooks with Korean textbooks. Through this analysis, this study draws some implications for Korean textbook drawing and textbook production process. That is, the process of reading the picture and interpreting its meaning should be treated as part of the study of mathematics. The mathematical concepts to be dealt with or the sentence description of the problem should be concurrent with the design of the picture. The monotonous expressions and dialogues of characters in textbooks should be avoided, and the personality and emotions of characters should be more abundant and freely expressive.

Frequency and Social Network Analysis of the Bible Data using Big Data Analytics Tools R (빅데이터 분석도구 R을 이용한 성경 데이터의 빈도와 소셜 네트워크 분석)

  • Ban, ChaeHoon;Ha, JongSoo;Kim, Dong Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.2
    • /
    • pp.166-171
    • /
    • 2020
  • Big data processing technology that can store and analyze data and obtain new knowledge has been adjusted for importance in many fields of the society. Big data is emerging as an important problem in the field of information and communication technology, but the mind of continuous technology is rising. the R, a tool that can analyze big data, is a language and environment that enables information analysis of statistical bases. In this paper, we use this to analyze the Bible data. We analyze the four Gospels of the New Testament in the Bible. We collect the Bible data and perform filtering for analysis. The R is used to investigate the frequency of what text is distributed and analyze the Bible through social network analysis, in which words from a sentence are paired and analyzed between words for accurate data analysis.

Multi-Dimensional Emotion Recognition Model of Counseling Chatbot (상담 챗봇의 다차원 감정 인식 모델)

  • Lim, Myung Jin;Yi, Moung Ho;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.10 no.4
    • /
    • pp.21-27
    • /
    • 2021
  • Recently, the importance of counseling is increasing due to the Corona Blue caused by COVID-19. Also, with the increase of non-face-to-face services, researches on chatbots that have changed the counseling media are being actively conducted. In non-face-to-face counseling through chatbot, it is most important to accurately understand the client's emotions. However, since there is a limit to recognizing emotions only in sentences written by the client, it is necessary to recognize the dimensional emotions embedded in the sentences for more accurate emotion recognition. Therefore, in this paper, the vector and sentence VAD (Valence, Arousal, Dominance) generated by learning the Word2Vec model after correcting the original data according to the characteristics of the data are learned using a deep learning algorithm to learn the multi-dimensional We propose an emotion recognition model. As a result of comparing three deep learning models as a method to verify the usefulness of the proposed model, R-squared showed the best performance with 0.8484 when the attention model is used.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

PRELIMINARY STUDY FOR ADHD TREATMENT GUIDELINE (ADHD 최적치료 지침을 위한 예비연구)

  • Kim, Eun-Young;Ra, Chul;Lee, Young-Sik
    • Journal of the Korean Academy of Child and Adolescent Psychiatry
    • /
    • v.13 no.1
    • /
    • pp.129-138
    • /
    • 2002
  • Objectives:In order to treatment guideline of ADHD, present clinical practise of child psychiatrists and their opinion of optimal intervention were evaluated. Methods:Structured questionnaire items about diagnostic workup, drug choice of 5 different situations according to different co-morbid disorders, and non - pharmacological treatment were applied to 32 child psychiatrists working at university and general hospital. we compared the data with Texas Algorithm Project guideline. Results:(1) Intelligence Test, Sentence Completion Test, sustained attention test, and Conner's questionnaire were the basic routine test that must be performed. (2) Main trend of medication in this study was not different from TAP guideline. (3) In case of co-morbid tic disorder, first recommending drug is still psychostimulant in the TAP guideline. But in this study initial psychostimulant prescription was not main trend. (4) In case of MPH non-response co-morbid disruptive behavior disorder, MPH medication combined with other drug were more common than switching to other drug as suggested the TAP guidelines. (5) In non-pharmacological treatment, most child psychiatrists reported the importance of parent management. Conclusion:There were some difference in medication trend in this study compared with TAP guideline. Further study and conference are needed for experts consensus in Korea.

  • PDF

Growth of Felonies after the 1997 Financial Crisis in Korea (외환위기 이후 흉악범죄의 증가와 정부의 범죄억지정책)

  • Kim, Duol;Kim, Jee Eun
    • KDI Journal of Economic Policy
    • /
    • v.31 no.2
    • /
    • pp.155-194
    • /
    • 2009
  • The Korean economy successfully overcame the macroeconomic downturns driven from the Asian financial crisis in a very short period of time. The economic shock, however, generated a variety of social problems, one of which was the increase in felonies (homicides, robbery, rape, and arson), or degradation of public safety. We argue that the Korean criminal policy has not been effective to ameliorate the rising trends in crime caused by the financial crisis. In order to substantiate this claim, we assess the effectiveness of criminal policy: policing, sentencing, and corrections. First, there has been resource shortage in policing since the 1997 financial crisis. For the past ten years, the investment of human resource and budget in the police has been virtually stagnant, as well as in prosecutors' investigation activities. The insufficient resource allocation in policing caused a huge decline in arrest rates and prosecution rates. Second, the Korean judicial system has not increased the severity of punishment. Comparing the pre- and the post-financial crisis period, the average length of prison sentence by the courts has declined. Given the degrading in the quality of crime and the decreasing amount of inputs into the policing and prosecution, the government should have increased the severity of punishment to deter crime. Third, we found that the government hired more officers and allocated larger budget into prison and probation. However, it is difficult to suggest that the increased level of resources in correctional programs have been effective in preventing released prisoners from committing future crimes. This is because the number of repeat offenders convicted of more than a third offense increased dramatically since 1997, pushing felonies upward. In sum, the government organizations failed to respond respectively or to make coordinated actions, eventually causing a dramatic increase in crimes. This research brings explicit policy implications. In order to prevent possible additional degradation of public safety, the government must put more efforts into increasing the effectiveness of policy and to investing more resources into said policies. We also emphasize the importance of the institutional mechanisms which foster policy coordination among the Police, the Prosecutor's Office, the Ministry of Justice, and other relevant government organizations.

  • PDF