• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.03 seconds

A Study of Data Mining Application in Information Management Field (정보관리분야의 데이터 마이닝 기법 적용에 대한 연구)

  • Choi, Hee-Yoon
    • Journal of Information Management
    • /
    • v.31 no.3
    • /
    • pp.1-20
    • /
    • 2000
  • A variety of trials selecting necessary and valuable information from rapidly increasing volume of data are made, and as one of them, data mining methods is an interest. This methodology is increasingly appzied to information management field which consists of efficient processing and systemizing increasing digital documents for user service. This article analyzes theoletical background and empirical case studies of data mining, and predicts the possibility of its application to information management area.

  • PDF

A study on changes in domestic tourism trends using social big data analysis - Comparison before and after COVID19 -

  • Yoo, Kyoung-mi;Choi, Youn-hee
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.2
    • /
    • pp.98-108
    • /
    • 2022
  • In this study, social network analysis was performed to compare and analyze changes in domestic tourism trends before and after the outbreak of COVID-19 in a situation where the damage to the tourism industry due to COVID-19 is increasing. Using Textom, a big data analysis service, data were collected using the keywords "travel destination" and "travel trend" based on the collection period of 2019 and 2020, when the epidemic spread to the world and became chaotic. After extracting a total of 80 key words through text mining, centrality was analyzed using NetDraw of Ucinet6, and clustered into 4 groups through CONCOR analysis. Through this, we compared and analyzed changes in domestic tourism trends before and after the outbreak of COVID-19, and it is judged to provide basic data for tourism marketing strategies and tourism product development in the post-COVID-19.

The study of the restaurant start-up chatbot system using big data

  • Sung-woo Park;Gi-Hwan Ryu
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.3
    • /
    • pp.52-57
    • /
    • 2023
  • In the restaurant industry, along with the fourth industry, there is a food technology craze due to IT development. In addition, many prospective restaurant founders are increasing due to restaurant start-ups with relatively low entry barriers. And ChatGPT is causing a craze for chatbots. Therefore, the purpose of this paper is to analyze factors for restaurant start-ups with big data and implement a system to make it easier for prospective restaurant start-ups to recommend restaurant start-ups that suit them and further increase the success rate for restaurant start-ups. Therefore, this paper is meaningful in analyzing the start-up factors desired by prospective restaurant founders with big data, turning them into text, and furthermore, designing and studying the start-up factors shown as big data into a restaurant start-up chatbot system.

An Intelligence Support System Research on KTX Rolling Stock Failure Using Case-based Reasoning and Text Mining (사례기반추론과 텍스트마이닝 기법을 활용한 KTX 차량고장 지능형 조치지원시스템 연구)

  • Lee, Hyung Il;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.47-73
    • /
    • 2020
  • KTX rolling stocks are a system consisting of several machines, electrical devices, and components. The maintenance of the rolling stocks requires considerable expertise and experience of maintenance workers. In the event of a rolling stock failure, the knowledge and experience of the maintainer will result in a difference in the quality of the time and work to solve the problem. So, the resulting availability of the vehicle will vary. Although problem solving is generally based on fault manuals, experienced and skilled professionals can quickly diagnose and take actions by applying personal know-how. Since this knowledge exists in a tacit form, it is difficult to pass it on completely to a successor, and there have been studies that have developed a case-based rolling stock expert system to turn it into a data-driven one. Nonetheless, research on the most commonly used KTX rolling stock on the main-line or the development of a system that extracts text meanings and searches for similar cases is still lacking. Therefore, this study proposes an intelligence supporting system that provides an action guide for emerging failures by using the know-how of these rolling stocks maintenance experts as an example of problem solving. For this purpose, the case base was constructed by collecting the rolling stocks failure data generated from 2015 to 2017, and the integrated dictionary was constructed separately through the case base to include the essential terminology and failure codes in consideration of the specialty of the railway rolling stock sector. Based on a deployed case base, a new failure was retrieved from past cases and the top three most similar failure cases were extracted to propose the actual actions of these cases as a diagnostic guide. In this study, various dimensionality reduction measures were applied to calculate similarity by taking into account the meaningful relationship of failure details in order to compensate for the limitations of the method of searching cases by keyword matching in rolling stock failure expert system studies using case-based reasoning in the precedent case-based expert system studies, and their usefulness was verified through experiments. Among the various dimensionality reduction techniques, similar cases were retrieved by applying three algorithms: Non-negative Matrix Factorization(NMF), Latent Semantic Analysis(LSA), and Doc2Vec to extract the characteristics of the failure and measure the cosine distance between the vectors. The precision, recall, and F-measure methods were used to assess the performance of the proposed actions. To compare the performance of dimensionality reduction techniques, the analysis of variance confirmed that the performance differences of the five algorithms were statistically significant, with a comparison between the algorithm that randomly extracts failure cases with identical failure codes and the algorithm that applies cosine similarity directly based on words. In addition, optimal techniques were derived for practical application by verifying differences in performance depending on the number of dimensions for dimensionality reduction. The analysis showed that the performance of the cosine similarity was higher than that of the dimension using Non-negative Matrix Factorization(NMF) and Latent Semantic Analysis(LSA) and the performance of algorithm using Doc2Vec was the highest. Furthermore, in terms of dimensionality reduction techniques, the larger the number of dimensions at the appropriate level, the better the performance was found. Through this study, we confirmed the usefulness of effective methods of extracting characteristics of data and converting unstructured data when applying case-based reasoning based on which most of the attributes are texted in the special field of KTX rolling stock. Text mining is a trend where studies are being conducted for use in many areas, but studies using such text data are still lacking in an environment where there are a number of specialized terms and limited access to data, such as the one we want to use in this study. In this regard, it is significant that the study first presented an intelligent diagnostic system that suggested action by searching for a case by applying text mining techniques to extract the characteristics of the failure to complement keyword-based case searches. It is expected that this will provide implications as basic study for developing diagnostic systems that can be used immediately on the site.

Big Data Smoothing and Outlier Removal for Patent Big Data Analysis

  • Choi, JunHyeog;Jun, Sunghae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.8
    • /
    • pp.77-84
    • /
    • 2016
  • In general statistical analysis, we need to make a normal assumption. If this assumption is not satisfied, we cannot expect a good result of statistical data analysis. Most of statistical methods processing the outlier and noise also need to the assumption. But the assumption is not satisfied in big data because of its large volume and heterogeneity. So we propose a methodology based on box-plot and data smoothing for controling outlier and noise in big data analysis. The proposed methodology is not dependent upon the normal assumption. In addition, we select patent documents as target domain of big data because patent big data analysis is a important issue in management of technology. We analyze patent documents using big data learning methods for technology analysis. The collected patent data from patent databases on the world are preprocessed and analyzed by text mining and statistics. But the most researches about patent big data analysis did not consider the outlier and noise problem. This problem decreases the accuracy of prediction and increases the variance of parameter estimation. In this paper, we check the existence of the outlier and noise in patent big data. To know whether the outlier is or not in the patent big data, we use box-plot and smoothing visualization. We use the patent documents related to three dimensional printing technology to illustrate how the proposed methodology can be used for finding the existence of noise in the searched patent big data.

The Big Data Analytics Regarding the Cadastral Resurvey News Articles

  • Joo, Yong-Jin;Kim, Duck-Ho
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.32 no.6
    • /
    • pp.651-659
    • /
    • 2014
  • With the popularization of big data environment, big data have been highlighted as a key information strategy to establish national spatial data infrastructure for a scientific land policy and the extension of the creative economy. Especially interesting from our point of view is the cadastral information is a core national information source that forms the basis of spatial information that leads to people's daily life including the production and consumption of information related to real estate. The purpose of our paper is to suggest the scheme of big data analytics with respect to the articles of cadastral resurvey project in order to approach cadastral information in terms of spatial data integration. As specific research method, the TM (Text Mining) package from R was used to read various formats of news reports as texts, and nouns were extracted by using the KoNLP package. That is, we searched the main keywords regarding cadastral resurvey, performing extraction of compound noun and data mining analysis. And visualization of the results was presented. In addition, new reports related to cadastral resurvey between 2012 and 2014 were searched in newspapers, and nouns were extracted from the searched data for the data mining analysis of cadastral information. Furthermore, the approval rating, reliability, and improvement of rules were presented through correlation analyses among the extracted compound nouns. As a result of the correlation analysis among the most frequently used ones of the extracted nouns, five groups of data consisting of 133 keywords were generated. The most frequently appeared words were "cadastral resurvey," "civil complaint," "dispute," "cadastral survey," "lawsuit," "settlement," "mediation," "discrepant land," and "parcel." In Conclusions, the cadastral resurvey performed in some local governments has been proceeding smoothly as positive results. On the other hands, disputes from owner of land have been provoking a stream of complaints from parcel surveying for the cadastral resurvey. Through such keyword analysis, various public opinion and the types of civil complaints related to the cadastral resurvey project can be identified to prevent them through pre-emptive responses for direct call centre on the cadastral surveying, Electronic civil service and customer counseling, and high quality services about cadastral information can be provided. This study, therefore, provides a stepping stones for developing an account of big data analytics which is able to comprehensively examine and visualize a variety of news report and opinions in cadastral resurvey project promotion. Henceforth, this will contribute to establish the foundation for a framework of the information utilization, enabling scientific decision making with speediness and correctness.

Topic Modeling-Based Domestic and Foreign Public Data Research Trends Comparative Analysis (토픽 모델링 기반의 국내외 공공데이터 연구 동향 비교 분석)

  • Park, Dae-Yeong;Kim, Deok-Hyeon;Kim, Keun-Wook
    • Journal of Digital Convergence
    • /
    • v.19 no.2
    • /
    • pp.1-12
    • /
    • 2021
  • With the recent 4th Industrial Revolution, the growth and value of big data are continuously increasing, and the government is also actively making efforts to open and utilize public data. However, the situation still does not reach the level of demand for public data use by citizens, At this point, it is necessary to identify research trends in the public data field and seek directions for development. In this study, in order to understand the research trends related to public data, the analysis was performed using topic modeling, which is mainly used in text mining techniques. To this end, we collected papers containing keywords of 'Public data' among domestic and foreign research papers (1,437 domestically, 9,607 overseas) and performed topic modeling based on the LDA algorithm, and compared domestic and foreign public data research trends. After analysis, policy implications were presented. Looking at the time series by topic, research in the fields of 'personal information protection', 'public data management', and 'urban environment' has increased in Korea. Overseas, it was confirmed that research in the fields of 'urban policy', 'cell biology', 'deep learning', and 'cloud·security' is active.

A Study on Audio-Visual Expression of Biometric Data Based on the Polysomnography Test (수면다원검사에 기반한 생체데이터 시청각화 연구)

  • Kim, Hee Soo;Oh, Na Yea;Park, Jin Wan
    • Korea Science and Art Forum
    • /
    • v.35
    • /
    • pp.145-155
    • /
    • 2018
  • The goal of the study is to provide a new type of audio-visualization method through case analysis and work production based on Polysomnography(PSG) data that is difficult to interpret or not familiar to the public. Most art works are produced with conscious actions during waking hours. On the other hand, during sleep, we get into the world of unconsciousness. Therefore, through the experiment, want to discover if could get something new when we were in the subconscious state, and if so, wondered what kind of art could be made through it. The study method is to consider definition of sleep and sleep data first. The sleep data were classified into normal group and Narcolepsy, Insomnia, and sleep apnea by focusing on sleep disorder graphs that is measured by sleep polygraph. After that, I refined and converted the acquired biometric data into a text-based script. The degree of sleep in the text form of the script was rendered as a 3D animated image using Maya. In addition, the heart rate data script was transformed into a midi format, and the audition was implemented in the garage band. After Effects combines the image and sound to create four single channel images of 3 minutes and 20 seconds each. As a result of the research, I made an opportunity for anyone easy to understand the results, having difference with the normal data, through art instead of using difficult medical term. It also showed the possibility of artistic expression even when conscious actions did not occur. Through the results of this research, I expect the expansion and diversity of artistic audiovisual expression of biometric data.

Agriculture Big Data Analysis System Based on Korean Market Information

  • Chuluunsaikhan, Tserenpurev;Song, Jin-Hyun;Yoo, Kwan-Hee;Rah, Hyung-Chul;Nasridinov, Aziz
    • Journal of Multimedia Information System
    • /
    • v.6 no.4
    • /
    • pp.217-224
    • /
    • 2019
  • As the world's population grows, how to maintain the food supply is becoming a bigger problem. Now and in the future, big data will play a major role in decision making in the agriculture industry. The challenge is how to obtain valuable information to help us make future decisions. Big data helps us to see history clearer, to obtain hidden values, and make the right decisions for the government and farmers. To contribute to solving this challenge, we developed the Agriculture Big Data Analysis System. The system consists of agricultural big data collection, big data analysis, and big data visualization. First, we collected structured data like price, climate, yield, etc., and unstructured data, such as news, blogs, TV programs, etc. Using the data that we collected, we implement prediction algorithms like ARIMA, Decision Tree, LDA, and LSTM to show the results in data visualizations.

The Functional Requirements of Core Elements for Research Data Management and Service (연구 데이터 관리 및 서비스를 위한 핵심요소의 기능적 요건)

  • Kim, Juseop;Kim, Suntae;Choi, Sangki
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.3
    • /
    • pp.317-344
    • /
    • 2019
  • Increasing the value of data, paradigm shifts in research methods, and specific manifestations of open science indicate that research is no longer text-centric, but data-driven. In this study, we analyzed the services for DCC, ICPSR, ANDS and DataONE to derive key elements and functional requirements for research data management and services that are still insufficient in domestic research. Key factors derived include DMP writing support, data description, data storage, data sharing and access, data citations, and data management training. In addition, by presenting functional requirements to the derived key elements, this study can be applied to construct and operate RDM service in the future.