• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.033 seconds

A Semantic-Based Feature Expansion Approach for Improving the Effectiveness of Text Categorization by Using WordNet (문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구)

  • Chung, Eun-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.261-278
    • /
    • 2009
  • Identifying optimal feature sets in Text Categorization(TC) is crucial in terms of improving the effectiveness. In this study, experiments on feature expansion were conducted using author provided keyword sets and article titles from typical scientific journal articles. The tool used for expanding feature sets is WordNet, a lexical database for English words. Given a data set and a lexical tool, this study presented that feature expansion with synonymous relationship was significantly effective on improving the results of TC. The experiment results pointed out that when expanding feature sets with synonyms using on classifier names, the effectiveness of TC was considerably improved regardless of word sense disambiguation.

An Exploratory Study of Happiness and Unhappiness Among Koreans based on Text Mining Techniques (텍스트마이닝 기법을 활용한 한국인의 행복과 불행 탐색연구)

  • Park, Sanghyeon;Do, Kanghyuk;Kim, Hakyeong;Park, Gaeun;Yun, Jinhyeok;Kim, Kyungil
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.7
    • /
    • pp.10-27
    • /
    • 2018
  • The purpose of this study is to explore the meaning of happiness and unhappiness in Korean society through text mining analysis. Similar words with keywords(happiness/unhappiness) from online news portal are extracted using Word2Vec and TF-IDF method. We also use the K-LIWC dictionary to perform the sentiment analysis of words associated with happiness and unhappiness. In TF-IDF analysis, happiness and unhappiness are highly related to social factors and social issues of the year. In Word2Vec analysis, 'Hope' has been similar with happiness for six years. In K-LIWC analysis, 'money/financial issues', 'school', 'communication' is highly related with happiness and unhappiness. In addition, 'physical condition and symptom' is highly related to unhappiness. Implications, limitations, and suggestions for future research are also discussed.

A Study on Text Mining Analysis of Presidential Maritime Concept in KOREA (텍스트마이닝을 이용한 한국 대통령의 해양관에 관한 연구)

  • Kim, Sung-Kuk;Lee, Tae-Hwee
    • Journal of Korea Port Economic Association
    • /
    • v.36 no.3
    • /
    • pp.39-54
    • /
    • 2020
  • In the presidential political system, the word of the president has great influence on the formation of national policy and the decision-making process. Policy priorities are determined according to the president's ideology and core values, and various policies are established and executed according to the priorities. Therefore, this paper analyzes the contents of the president's speech. Since the president's speech is a semantic datum, in order to analyze unstructured text, big data analysis is conducted through the methods of machine learning and deep learning. In this study, the president's speech at the "National Sea Day" commemoration was obtained 1996 onwards and analyzed using topic modeling. As a result of the analysis, all the presidents' speeches were delivered with a view of the ocean that was consistent with the direction of their administration. It was confirmed that the ocean-industry-resource topics, which are the intrinsic values of the ocean, were not damaged and consistently emphasized by all presidents.

Research Trend Analysis in Fashion Design Studies in Korea using Topic Modeling (토픽모델링을 이용한 국내 패션디자인 연구동향 분석)

  • Jang, Namkyung;Kim, Min-Jeong
    • Journal of Digital Convergence
    • /
    • v.15 no.6
    • /
    • pp.415-423
    • /
    • 2017
  • This study explored research trends by investigating articles published in the Journal of Korean Society of Fashion Design from 2001 through 2015. English key words and abstracts were analyzed using text mining and topic modeling techniques. The findings are as followings. By the text mining technique, 183 core terms, appeared more than 30 times, were derived from 7137 words used in total 338 articles' key words and abstracts. 'Fashion' and 'design' showed the highest frequency rate. After that, the well-received topic modeling technique, LDA, was applied to the collected data sets. Several distinct sub-research domains strongly tied with the previous fashion design field, except for topics such as fashion brand marketing and digital technology, were extracted. It was observed that there are the growing and declining trends in the research topics. Based on findings, implication, limitation, and future research questions were presented.

Examining the Intellectual Structure of Records Management & Archival Science in Korea with Text Mining (텍스트 마이닝을 이용한 국내 기록관리학 분야 지적구조 분석)

  • Lee, Jae-Yun;Moon, Ju-Young;Kim, Hee-Jung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.41 no.1
    • /
    • pp.345-372
    • /
    • 2007
  • In this study, the intellectual structure of Records Management & Archival Science in Korea was analyzed using document clustering, a widely used method of text mining, and document similarity network analysis. The data used in this study were 145 articles written on the subject of Records Management & Archival Science selected from five major representative journals in the field of Library & Information Science in Korea, published from 2001 to 2006. The results of cluster analysis show that the core subject areas are "electronic records management and digital Preservation," "records management policy and institution," "records description and catalogues." and "records management domain and education." The results of document analysis, which is more detailed than cluster analysis, show that "digital archiving," a specialized subject in digital preservation, plays a central role. The results of serial analysis, which proceeds according to a timeline, show the emergence of "archival services" as a new subject area.

Perceived Social Presence in the Text-Based Media: Mobile Communication Case (문자기반 매체에서 느끼는 사회적 현존감: 모바일 커뮤니케이션의 사례)

  • Lee, Hae-Kyung;Lee, Hyejung;Lee, Jungwoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.1
    • /
    • pp.164-174
    • /
    • 2013
  • Since the advent of smart phone, a variety of text-based media are developed and used as popular communication tools even in mobile phone environment. This study explores differences in terms of users' perceived social presence level across different text-based media, specifically KakaoTalk, Facebook, SMS. A survey was conducted using items adopted from previous studies on social presence. 203 data points were collected and used for analysis. Across the whole sample, KakaoTalk is perceived as the highest in terms of perceived social presence level, followed by Facebook and SMS. Also, the users with higher level of sociality tends to reveal higher level of perceived social presence across all the media while younger and/or student users tend to maintain higher level of social presence perception across all the media. Further studies seem necessary investigating features of specific medium that may increase or decrease the perceived level of social presence.

Inferring Disease-related Genes using Title and Body in Biomedical Text (생물학 문헌 데이터의 제목과 본문을 이용한 질병 관련 유전자 추론 방법)

  • Kim, Jeongwoo;Kim, Hyunjin;Yeo, Yunku;Shin, Mincheol;Park, Sanghyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.1
    • /
    • pp.28-36
    • /
    • 2017
  • After the genome projects of the 90s, a vast number of gene studies have been stored in online databases. By using these databases, several biological relationships can be inferred. In this study, we proposed a method to infer disease-gene relationships using title and body in biomedical text. The title was used to extract hub genes from data in the literature; whereas, the body of the literature was used to extract sub genes that are related to hub genes. Through these steps, we were able to construct a local gene-network for each report in the literature. By integrating the local gene-networks, we then constructed a global gene-network. Subsequent analyses of the global gene-network allowed inference of disease-related genes with high rank. We validated the proposed method by comparing with previous methods. The results indicated that the proposed method is a meaningful approach to infer disease-related genes.

An Exploratory Study of Platform Government in Korea : Topic Modeling and Network Analysis of Public Agency Reports (한국 플랫폼 정부의 방향성 모색 : 공공기관 연구보고서에 대한 토픽 모델링과 네트워크 분석)

  • Nam, Hyun-Dong;Nam, Taewoo
    • Journal of Digital Convergence
    • /
    • v.18 no.2
    • /
    • pp.139-149
    • /
    • 2020
  • New platform governments will play a role to pull intelligent information technology to drive new ecological government innovation and sustainable development in which the government and people work together. On this, in order to establish the platform of the platform government, we will look at recent research trends and lay the foundation for future policy directions and research bases. using Text Mining method, and went through Topic modeling for the collected text data and network analysis was conducted. According to the result, based on latent topic, the stronger the connection center, the weaker the relationship. Through this study, we hope that discussions will take place in a variety of ways to improve the understanding of the supply and demand approach of Korea's platform government and implement appropriate change management methods such as service public base and service provision in accordance with the value and potential topics of platform government.

Natural Scene Text Binarization using Tensor Voting and Markov Random Field (텐서보팅과 마르코프 랜덤 필드를 이용한 자연 영상의 텍스트 이진화)

  • Choi, Hyun Su;Lee, Guee Sang
    • Smart Media Journal
    • /
    • v.4 no.4
    • /
    • pp.18-23
    • /
    • 2015
  • In this paper, we propose a method for detecting the number of clusters. This method can improve the performance of a gaussian mixture model function in conventional markov random field method by using the tensor voting. The key point of the proposed method is that extracts the number of the center through the continuity of saliency map of the input data of the tensor voting token. At first, we separate the foreground and background region candidate in a given natural images. After that, we extract the appropriate cluster number for each separate candidate regions by applying the tensor voting. We can make accurate modeling a gaussian mixture model by using a detected number of cluster. We can return the result of natural binary text image by calculating the unary term and the pairwise term of markov random field. After the experiment, we can confirm that the proposed method returns the optimal cluster number and text binarization results are improved.

Analyzing Architectural History Terminologies by Text Mining and Association Analysis (텍스트 마이닝과 연관 관계 분석을 이용한 건축역사 용어 분석)

  • Kim, Min-Jeong;Kim, Chul-Joo
    • Journal of Digital Convergence
    • /
    • v.15 no.1
    • /
    • pp.443-452
    • /
    • 2017
  • Architectural history traces the changes in architecture through various traditions, regions, overarching stylistic trends, and dates. This study identified terminologies related to the proximity and frequency in the architectural history areas by text mining and association analysis. This study explored terminologies by investigating articles published in the "Journal of Architectural History", a sole journal for the architectural history studies. First, key terminologies that appeared frequently were extracted from paper that had titles, keywords, and abstracts. Then, we analyzed some typical and specific key terminologies that appear frequently and partially depending on the research areas. Finally, association analysis was used to find the frequent patterns in the key terminologies. This research can be used as fundamental data for understanding issues and trends in areas on the architectural history.