• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.031 seconds

Improving the Performance of Korean Text Chunking by Machine learning Approaches based on Feature Set Selection (자질집합선택 기반의 기계학습을 통한 한국어 기본구 인식의 성능향상)

  • Hwang, Young-Sook;Chung, Hoo-jung;Park, So-Young;Kwak, Young-Jae;Rim, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.9
    • /
    • pp.654-668
    • /
    • 2002
  • In this paper, we present an empirical study for improving the Korean text chunking based on machine learning and feature set selection approaches. We focus on two issues: the problem of selecting feature set for Korean chunking, and the problem of alleviating the data sparseness. To select a proper feature set, we use a heuristic method of searching through the space of feature sets using the estimated performance from a machine learning algorithm as a measure of "incremental usefulness" of a particular feature set. Besides, for smoothing the data sparseness, we suggest a method of using a general part-of-speech tag set and selective lexical information under the consideration of Korean language characteristics. Experimental results showed that chunk tags and lexical information within a given context window are important features and spacing unit information is less important than others, which are independent on the machine teaming techniques. Furthermore, using the selective lexical information gives not only a smoothing effect but also the reduction of the feature space than using all of lexical information. Korean text chunking based on the memory-based learning and the decision tree learning with the selected feature space showed the performance of precision/recall of 90.99%/92.52%, and 93.39%/93.41% respectively.

Analysis of News Regarding New Southeastern Airport Using Text Mining Techniques (텍스트 마이닝 기법을 활용한 동남권 신공항 신문기사 분석)

  • Han, Mu Moung Cho;Kim, Yang Sok;Lee, Choong Kwon
    • Smart Media Journal
    • /
    • v.6 no.1
    • /
    • pp.47-53
    • /
    • 2017
  • Social issues are important factors that decide government policy and newspapers are critical channels that reflect them. Analysing news articles can contribute to understanding social issues, but it is very difficult to analyse the unstructured large volumes of news data manually. Therefore, this study aims to analyze the different views among stakeholders of a specific social issue by using text analysis, word cloud analysis and associative analysis methods, which systematically transform unstructured news data into structured one. We analyzed a total of 115 news articles and a total of 6,772 comments, collected from the selected newspapers (Chosun-Il-bo, Joongang-Il-bo, Donga-Il-bo, Maeil Newspaper, Busan-Il-bo) for two weeks. We found that there are significant differences in tone between newspapers. While nation-wide daily newspapers focus on political relations with local areas, local daily newspapers tend to write articles to represent local governments' interests.

A Study on Domestic Research Trends (2001-2020) of Forest Ecology Using Text Mining (텍스트마이닝을 활용한 국내 산림생태 분야 연구동향(2001-2020) 분석)

  • Lee, Jinkyu;Lee, Chang-Bae
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.3
    • /
    • pp.308-321
    • /
    • 2021
  • The purpose of this study was to analyze domestic research trends over the past 20 years and future direction of forest ecology using text mining. A total of 1,015 academic papers and keywords data related to forest ecology were collected by the "Research and Information Service Section" and analyzed using big data analysis programs, such as Textom and UCINET. From the results of word frequency and N-gram analyses, we found domestic studies on forest ecology rapidly increased since 2011. The most common research topic was "species diversity" over the past 20 years and "climate change" became a major topic since 2011. Based on CONCOR analysis, study subjects were grouped intoeight categories, such as "species diversity," "environmental policy," "climate change," "management," "plant taxonomy," "habitat suitability index," "vascular plants," and "recreation and welfare." Consequently, species diversity and climate change will remain important topics in the future and diversifying and expanding domestic research topics following global research trendsis necessary.

Sentiment Analysis of Foot-and-Mouth Disease Using Tweet Text-Mining Technique (트윗 텍스트 마이닝 기법을 이용한 구제역의 감성분석)

  • Chae, Heechan;Lee, Jonguk;Choi, Yoona;Park, Daihee;Chung, Yongwha
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.11
    • /
    • pp.419-426
    • /
    • 2018
  • Due to the FMD(foot-and-mouth disease), the domestic animal husbandry and related industries suffer enormous damage every year. Although various academic researches related to FMD are ongoing, engineering studies on the social effects of FMD are very limited. In this study, we propose a systematic methodology to analyze emotional responses of regular citizens on FMD using text mining techniques. The proposed system first collects data related to FMD from the tweets posted on Twitter, and then performs a polarity classification process using a deep-learning technique. Second, keywords are extracted from the tweet using LDA, which is one of the typical techniques of topic modeling, and a keyword network is constructed from the extracted keywords. Finally, we analyze the various social effects of regular citizens on FMD through keyword network. As a case study, we performed the emotional analysis experiment of regular citizens about FMD from July 2010 to December 2011 in Korea.

A Study on the Trend Analysis Based on Personal Information Threats Using Text Mining (텍스트 마이닝을 활용한 개인정보 위협기반의 트렌드 분석 연구)

  • Kim, Young-Hee;Lee, Taek-Hyun;Kim, Jong-Myoung;Park, Won-Hyung;Koo, Kwang-Ho
    • Convergence Security Journal
    • /
    • v.19 no.2
    • /
    • pp.29-38
    • /
    • 2019
  • For that reason, trend research has been actively conducted to identify and analyze the key topics in large amounts of data and information. Also personal information protection field is increasing activities in order to identify prospects and trends in advance for preemptive response. However, only research based on technology such as trends in information security field and personal information protection solution is broadly taking place. In this study, threat-based trends in personal information protection field is analyzed through text mining method. This will be the key to deduct undiscovered issues and provide visibility of current and future trends. Policy formulation is possible for companies handling personal information and for that reason, it is expected to be used for searching direction of strategy establishment for effective response.

Millennial parents' perception of babywearing products: A text analysis approach (밀레니얼 세대의 Babywearing 제품에 대한 인식: 텍스트 분석 접근)

  • Lee, Wan-Gee;Park, Myung-Ja;Lee, Kyu-Hye
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.23 no.2
    • /
    • pp.17-28
    • /
    • 2021
  • The baby-tech industry, which combines IT with existing parenting product, is attracting increasing amounts of attention. Consequently various types of baby products incorporating functionality and design are being launched. In recent years, particularly as the market segments increases for babywearing products, parenting products that account for the child's comfort and parents' convenience are required. Therefore, this study examines the characteristics and consumer perception of babywear products, which are important for the emotional stability, development, and rearing of children. The study utilizes text mining and a network analysis by collecting unstructured text data. An examination of the network, based on the frequency of keywords for each babywear product and the degree of the connection to the centering index, revealed that consumers value convenience and price when purchasing products. The consumer perception and consideration factors that appear individually according to the product were also identified. In addition, studying body parts with high TF-IDF values revealed a difference in the body parts considered by consumers for each product. Lastly, through the visualization data based on the keywords that appeared in public, commonly appearing keywords, and those that appeared individually were examined. Through SNS, product characteristics as well as a new parenting culture that shared child-rearing routines were confirmed. This study suggests planning and marketing directions for the development of babywear products that meet consumer needs.

Analysis of Symptoms-Herbs Relationships in Shanghanlun Using Text Mining Approach (텍스트마이닝 기법을 이용한 『상한론』 내의 증상-본초 조합의 탐색적 분석)

  • Jang, Dongyeop;Ha, Yoonsu;Lee, Choong-Yeol;Kim, Chang-Eop
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.34 no.4
    • /
    • pp.159-169
    • /
    • 2020
  • Shanghanlun (Treatise on Cold Damage Diseases) is the oldest document in the literature on clinical records of Traditional Asian medicine (TAM), on which TAM theories about symptoms-herbs relationships are based. In this study, we aim to quantitatively explore the relationships between symptoms and herbs in Shanghanlun. The text in Shanghanlun was converted into structured data. Using the structured data, Term Frequency - Inverse Document Frequency (TF-IDF) scores of symptoms and herbs were calculated from each chapter to derive the major symptoms and herbs in each chapter. To understand the structure of the entire document, principal component analysis (PCA) was performed for the 6-dimensional chapter space. Bipartite network analysis was conducted focusing on Jaccard scores between symptoms and herbs and eigenvector centralities of nodes. TF-IDF scores showed the characteristics of each chapter through major symptoms and herbs. Principal components drawn by PCA suggested the entire structure of Shanghanlun. The network analysis revealed a 'multi herbs - multi symptoms' relationship. Common symptoms and herbs were drawn from high eigenvector centralities of their nodes, while specific symptoms and herbs were drawn from low centralities. Symptoms expected to be treated by herbs were derived, respectively. Using measurable metrics, we conducted a computational study on patterns of Shanghanlun. Quantitative researches on TAM theories will contribute to improving the clarity of TAM theories.

A Study on the Construction of a Car Camping Map and Recommendation of Car Camping based on SNS Text Mining Analysis for the Post-Corona Era (SNS 텍스트 마이닝 기반 포스트 코로나 신트렌드 차박 여행 지도 제작 및 차박지 추천에 관한 연구)

  • Kim, Minjeong;Kim, Soohyun;Oh, Jihye;Eom, Jiyoon;Kang, Juyoung
    • Journal of Information Technology Services
    • /
    • v.20 no.5
    • /
    • pp.11-28
    • /
    • 2021
  • As untact travel has become a new trend in leisure culture due to the spread of COVID-19, car camping market is rapidly increasing. The sales of car camping-related goods increased by up to 600 percent, and the sales of SUV in Korea also increased by about four times. Despite the growth of the car camping market, there is a lack of research on the actual condition of the car camping market or research on the user's perspective. Therefore, in this study, a survey of actual camping users was conducted to derive factors that they consider important in camping, and through this, a car camping map was produced. As a result, two types of maps were produced: a map about the car camping site and convenience facilities closest to the car camping site in Gangwon-do, and a hash tag themed map based on keywords for each car camping site. We gathered data on portal sites and social media to obtain information related to camping sites and proceeded with analysis using text mining. In addition, we extracted keywords using network analysis techniques and selected key themes that represent them. This allows the user to choose a car camping site by selecting keywords that suit their taste. We hope that this research will help car camping researchers as a prior study and provide a foundation for leading a clean camping culture through clean camping campaign. Also, we hope that car camping users will be able to do quality trip.

Topic Analysis of the "Right to be Forgotten" Using Text Mining (텍스트마이닝을 활용한 "잊힐 권리"의 토픽 분석)

  • Lee, So-Hyun;Koo, Bon-Jin
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.2
    • /
    • pp.275-298
    • /
    • 2022
  • This study examined the issues and characteristics that appeared in news and journal articles related to the 'right to be forgotten' using text mining analysis. Data for analysis were collected from 2010 to 2020 with the keyword 'right to be forgotten'. Keyword analysis and topic modeling analysis were performed on the collected data. As a result, in the last 10 years the issues about 'right to be forgotten' are not much different in news and journal articles and the approaches also are similar. However, it confirmed common issues and the partial difference between news and journal articles through comparison. Therefore in Archives and Records Management Studies, it is necessary to discuss derived in this study. In particular common issues are considered first but if there are differences in issues, it is needed to discuss them in various ways. This study is meaningful to understand the meaning and to draw issues that may arise in the future of the 'right to be forgotten'. The results of this study will contribute to be variously discussed on the 'right to be forgotten' in Archives and Records Management Studies.

A Text Content Classification Using LSTM For Objective Category Classification

  • Noh, Young-Dan;Cho, Kyu-Cheol
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.5
    • /
    • pp.39-46
    • /
    • 2021
  • AI is deeply applied to various algorithms that assists us, not only daily technologies like translator and Face ID, but also contributing to innumerable fields in industry, due to its dominance. In this research, we provide convenience through AI categorization, extracting the only data that users need, with objective classification, rather than verifying all data to find from the internet, where exists an immense number of contents. In this research, we propose a model using LSTM(Long-Short Term Memory Network), which stands out from text classification, and compare its performance with models of RNN(Recurrent Neural Network) and BiLSTM(Bidirectional LSTM), which is suitable structure for natural language processing. The performance of the three models is compared using measurements of accuracy, precision, and recall. As a result, the LSTM model appears to have the best performance. Therefore, in this research, text classification using LSTM is recommended.