• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.028 seconds

WV-BTM: A Technique on Improving Accuracy of Topic Model for Short Texts in SNS (WV-BTM: SNS 단문의 주제 분석을 위한 토픽 모델 정확도 개선 기법)

  • Song, Ae-Rin;Park, Young-Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.51-58
    • /
    • 2018
  • As the amount of users and data of NS explosively increased, research based on SNS Big data became active. In social mining, Latent Dirichlet Allocation(LDA), which is a typical topic model technique, is used to identify the similarity of each text from non-classified large-volume SNS text big data and to extract trends therefrom. However, LDA has the limitation that it is difficult to deduce a high-level topic due to the semantic sparsity of non-frequent word occurrence in the short sentence data. The BTM study improved the limitations of this LDA through a combination of two words. However, BTM also has a limitation that it is impossible to calculate the weight considering the relation with each subject because it is influenced more by the high frequency word among the combined words. In this paper, we propose a technique to improve the accuracy of existing BTM by reflecting semantic relation between words.

A Study on Information Linkage Service for Disaster Situation Management : Focusing on Earthquake (재난 상황관리를 위한 재난안전정보 연계 서비스 방안 연구 : 지진을 중심으로)

  • Yu, Eun-Ji;Shim, Hyoung Seop
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.67-73
    • /
    • 2018
  • Researchers have increased their interest in effectively managing the disaster that appear in large scale and complex form. There are two types of disaster information, which are unstructured text data and structured data. Unstructured text data usually refers to text documents that have been referenced by disaster management personnel such as disaster manuals and related regulations, while structured data refers to various disaster information build in the disaster related organization system. This paper proposes a methodology of constructing a disaster information sharing system that enables joint use of disaster related organizations through the establishment of a mutual linkage system by utilizing both unstructured and structured form of disaster information. Especially, Based on the linkage information between structured earthquake information in earthquake related system and earthquake manuals and countermeasures against earthquake disaster, we propose a service that provides the necessary information for earthquake management. It is expected that the task manager will perform effective earthquake state management by acquiring the integrated structured and unstructured earthquake information of the ministries and related organizations.

Topic modeling for automatic classification of learner question and answer in teaching-learning support system (교수-학습지원시스템에서 학습자 질의응답 자동분류를 위한 토픽 모델링)

  • Kim, Kyungrog;Song, Hye jin;Moon, Nammee
    • Journal of Digital Contents Society
    • /
    • v.18 no.2
    • /
    • pp.339-346
    • /
    • 2017
  • There is increasing interest in text analysis based on unstructured data such as articles and comments, questions and answers. This is because they can be used to identify, evaluate, predict, and recommend features from unstructured text data, which is the opinion of people. The same holds true for TEL, where the MOOC service has evolved to automate debating, questioning and answering services based on the teaching-learning support system in order to generate question topics and to automatically classify the topics relevant to new questions based on question and answer data accumulated in the system. Therefore, in this study, we propose topic modeling using LDA to automatically classify new query topics. The proposed method enables the generation of a dictionary of question topics and the automatic classification of topics relevant to new questions. Experimentation showed high automatic classification of over 0.7 in some queries. The more new queries were included in the various topics, the better the automatic classification results.

A Study on the Music Therapy Management Model Based on Text Mining (텍스트 마이닝 기반의 음악치료 관리 모델에 관한 연구)

  • Park, Seong-Hyun;Kim, Jae-Woong;Kim, Dong-Hyun;Cho, Han-Jin
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.8
    • /
    • pp.15-20
    • /
    • 2019
  • Music therapy has shown many benefits in the treatment of disabled children and the mind. Today's music therapy system is a situation where no specific treatment system has been built. In order for the music therapist to make an accurate treatment, various music therapy cases and treatment history data must be analyzed. Although the most appropriate treatment is given to the client or patient, in reality a number of difficulties are followed due to several factors. In this paper, we propose a music therapy knowledge management model which convergence the existing therapy data and text mining technology. By using the proposed model, similar cases can be searched and accurate and effective treatment can be made for the patient or the client based on specific and reliable data related to the patient. This can be expected to bring out the original purpose of the music therapy and its effect to the maximum, and is expected to be useful for treating more patients.

Regional Image Change Analysis using Text Mining and Network Analysis (텍스트 마이닝과 네트워크 분석을 이용한 지역 이미지 변화 분석)

  • Jeong, Eun-Hee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.15 no.2
    • /
    • pp.79-88
    • /
    • 2022
  • Social media big data includes a lot of information that can identify not only consumer consumption patterns but also local images. This paper was collected annually data including 'Samcheok' from 2015 to 2019 from Blog and Cafe of Naver and Daum in domestic portal site, and analyzed the regional image change after refining keyword which forms the regional image by performing text mining and network analysis. According to the research results, the regional image of 2015 was expressed with image cognitive elements of the nearby place name or place etc. such as 'Jangho Port', 'Donghae', and 'Beach'. However the regional image both 2016 and 2019 were changed with image cognitive elements of 'SamcheokSolbich' which is a special place within region. Therefore as the keywords related to the local image include 'Jangho Port' and Resort, which are the representative attractions of Samcheok, it can be seen that the infrastructure factor plays a big role in forming the local image. The significance test for the network data used the bootstrap technique, and the p-values in 2015, 2016, and 2019 were 0.0002, 0.0006, and 0.0002, respectively, which were found to be statistically significant at the significance level of 5%.

A Big Data Analysis of Public Interest in Defense Reform 2.0 and Suggestions for Policy Completion

  • Kim, Tae Kyoung;Kang, Wonseok
    • Journal of East Asia Management
    • /
    • v.4 no.1
    • /
    • pp.1-22
    • /
    • 2023
  • This study conducted a big data analysis study through text mining and semantic network analysis to explore the perception of defense reform 2.0. The collected data were analyzed with the top 70 keywords as the appropriate range for network visualization. Through word frequency analysis, connection centrality analysis, and an N-gram analysis, we identified issues that received much attention such as troop reduction, shortening of military service period, dismantling of the border area unit, and returning wartime operational control. In particular, the results of clustering words through CONCOR analysis showed that there was a great interest in pursuing the technical group, concerns about military capacity reduction, and reorganization of manpower structure. The results of the analysis through text mining techniques are as follows. First, it was found that there was a lack of awareness about measures to reinforce the reduced troops while receiving much attention to the reduction of troops in Defense Reform 2.0. Second, it was found that it is necessary to actively communicate with the local community due to the deconstruction and movement of the border area units, such as the decrease of the population of the region and the collapse of the local commercial area. Third, it was judged that it is necessary to show substantial results through the promotion of barracks culture and the defense industry, which showed that there was less interest than military structure and defense operation from the people and the introduction of active policies. Through this study, we analyzed the public's interest in defense reform 2.0, which is a representative defense policy, and suggested a plan to draw support for national policy.

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.

Trend Analysis of FinTech and Digital Financial Services using Text Mining (텍스트마이닝을 활용한 핀테크 및 디지털 금융 서비스 트렌드 분석)

  • Kim, Do-Hee;Kim, Min-Jeong
    • Journal of Digital Convergence
    • /
    • v.20 no.3
    • /
    • pp.131-143
    • /
    • 2022
  • Focusing on FinTech keywords, this study is analyzing newspaper articles and Twitter data by using text mining methodology in order to understand trends in the industry of domestic digital financial service. In the growth of FinTech lifecycle, the frequency analysis has been performed by four important points: Mobile Payment Service, Internet Primary Bank, Data 3 Act, MyData Businesses. Utilizing frequency analysis, which combines the keywords 'China', 'USA', and 'Future' with the 'FinTech', has been predicting the FinTech industry regarding of the current and future position. Next, sentiment analysis was conducted on Twitter to quantify consumers' expectations and concerns about FinTech services. Therefore, this study is able to share meaningful perspective in that it presented strategic directions that the government and companies can use to understanding future FinTech market by combining frequency analysis and sentiment analysis.

Trends in FTA Research of Domestic and International Journal using Paper Abstract Data (초록데이터를 활용한 국내외 FTA 연구동향: 2000-2020)

  • Hee-Young Yoon;Il-Youp Kwak
    • Korea Trade Review
    • /
    • v.45 no.5
    • /
    • pp.37-53
    • /
    • 2020
  • This study aims to provide the implications of research development by comparing domestic and international studies conducted on the subject of FTA. To this end, among the papers written during the period from 2000 to July 23, 2020, papers whose title is searched by FTA (Free Trade Agreement) were selected as research data. In the case of domestic research, 1,944 searches from the Korean Citation Index (KCI) and 970 from the Web of Science and SCOPUS were selected for international research, and the research trend was analyzed through keywords and abstracts. Frequency analysis and word embedding (Word2vec) were used to analyze the data and visualized using t-SNE and Scattertext. The results of the analysis are as follows. First, in the top 30 keywords of domestic and international research, 16 out of 30 were found to be the same. In domestic research, many studies have been conducted to analyze the outcomes or expected effects of countries that have concluded or discussed FTAs with Korea, on the other hand there are diverse range of study subjects in international research. Second, in the word embedding analysis, t-SNE was used to visually represent the research connection of the top 60 keywords. Finally, Scattertext was used to visually indicate which keywords were frequently used in studies from 2000 to 2010, and from 2011 to 2020. This study is the first to draw implications for academic development through abstract and keyword analysis by applying various text mining approaches to the FTA related research papers. Further in-depth research is needed, including collecting a variety of FTA related text data, comparing and analyzing FTA studies in different countries.

Trends in Deep Learning-based Medical Optical Character Recognition (딥러닝 기반의 의료 OCR 기술 동향)

  • Sungyeon Yoon;Arin Choi;Chaewon Kim;Sumin Oh;Seoyoung Sohn;Jiyeon Kim;Hyunhee Lee;Myeongeun Han;Minseo Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.2
    • /
    • pp.453-458
    • /
    • 2024
  • Optical Character Recognition is the technology that recognizes text in images and converts them into digital format. Deep learning-based OCR is being used in many industries with large quantities of recorded data due to its high recognition performance. To improve medical services, deep learning-based OCR was actively introduced by the medical industry. In this paper, we discussed trends in OCR engines and medical OCR and provided a roadmap for development of medical OCR. By using natural language processing on detected text data, current medical OCR has improved its recognition performance. However, there are limits to the recognition performance, especially for non-standard handwriting and modified text. To develop advanced medical OCR, databaseization of medical data, image pre-processing, and natural language processing are necessary.