• Title/Summary/Keyword: 빅데이터 분석 기법

Search Result 596, Processing Time 0.026 seconds

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

A Study on AI Evolution Trend based on Topic Frame Modeling (인공지능발달 토픽 프레임 연구 -계열화(seriation)와 통합화(skeumorph)의 사회구성주의 중심으로-)

  • Kweon, Sang-Hee;Cha, Hyeon-Ju
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.66-85
    • /
    • 2020
  • The purpose of this study is to explain and predict trends the AI development process based on AI technology patents (total) and AI reporting frames in major newspapers. To that end, a summary of South Korean and U.S. technology patents filed over the past nine years and the AI (Artificial Intelligence) news text of major domestic newspapers were analyzed. In this study, Topic Modeling and Time Series Return Analysis using Big Data were used, and additional network agenda correlation and regression analysis techniques were used. First, the results of this study were confirmed in the order of artificial intelligence and algorithm 5G (hot AI technology) in the AI technical patent summary, and in the news report, AI industrial application and data analysis market application were confirmed in the order, indicating the trend of reporting on AI's social culture. Second, as a result of the time series regression analysis, the social and cultural use of AI and the start of industrial application were derived from the rising trend topics. The downward trend was centered on system and hardware technology. Third, QAP analysis using correlation and regression relationship showed a high correlation between AI technology patents and news reporting frames. Through this, AI technology patents and news reporting frames have tended to be socially constructed by the determinants of media discourse in AI development.

Analysis of global trends on smart manufacturing technology using topic modeling (토픽모델링을 활용한 주요국의 스마트제조 기술 동향 분석)

  • Oh, Yoonhwan;Moon, HyungBin
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.4
    • /
    • pp.65-79
    • /
    • 2022
  • This study identified smart manufacturing technologies using patent and topic modeling, and compared the technology development trends in countries such as the United States, Japan, Germany, China, and South Korea. To this purpose, this study collected patents in the United States and Europe between 1991 and 2020, processed patent abstracts, and identified topics by applying latent Dirichlet allocation model to the data. As a result, technologies related to smart manufacturing are divided into seven categories. At a global level, it was found that the proportion of patents in 'data processing system' and 'thermal/fluid management' technologies is increasing. Considering the fact that South Korea has relative competitiveness in thermal/fluid management technologies related to smart manufacturing, it would be a successful strategy for South Korea to promote smart manufacturing in heavy and chemical industry. This study is significant in that it overcomes the limitations of quantitative technology level evaluation proposed a new methodology that applies text mining.

Study for Analyzing Defense Industry Technology using Datamining technique: Patent Analysis Approach (데이터마이닝을 통한 방위산업기술 분석 연구: 특허분석을 중심으로)

  • Son, Changho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.10
    • /
    • pp.101-107
    • /
    • 2018
  • Recently, Korea's defense industry has advanced highly, and defense R&D budget is gradually increasing in defense budget. However, without objective analysis of defense industry technology, effective defense R&D activities are limited and defense budgets can be used inefficiently. Therefore, in addition to analyzing the defense industry technology quantitatively reflecting the opinions of the experts, this paper aims to analyze the defense industry technology objectively by quantitative methods, and to make efficient use of the defense budget. In addition, we propose a patent analysis method to grasp the characteristics of the defense industry technology and the vacant technology objectively and systematically by applying the big data analysis method, which is one of the keywords of the 4th industrial revolution, to the defense industry technology. The proposed method is applied to the technology of the firepower industry among several defense industrial technologies and the case analysis is conducted. In the process, the patents of 10 domestic companies related to firepower were collected through the Kipris in the defense industry companies' classification of the Korea Defense Industry Association(KDIA), and the data matrix was preprocessed to utilize IPC codes among them. And then, we Implemented association rule mining which can grasp the relation between each item in data mining technique using R program. The results of this study are suggested through interpretation of support, confidence lift index which were resulted from suggested approach. Therefore, this paper suggests that it can help the efficient use of massive national defense budget and enhance the competitiveness of defense industry technology.

Big Data Analysis of Software Performance Trend using SPC with Flexible Moving Window and Fuzzy Theory (가변 윈도우 기법을 적용한 통계적 공정 제어와 퍼지추론 기법을 이용한 소프트웨어 성능 변화의 빅 데이터 분석)

  • Lee, Dong-Hun;Park, Jong-Jin
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.18 no.11
    • /
    • pp.997-1004
    • /
    • 2012
  • In enterprise software projects, performance issues have become more critical during recent decades. While developing software products, many performance tests are executed in the earlier development phase against the newly added code pieces to detect possible performance regressions. In our previous research, we introduced the framework to enable automated performance anomaly detection and reduce the analysis overhead for identifying the root causes, and showed Statistical Process Control (SPC) can be successfully applied to anomaly detection. In this paper, we explain the special performance trend in which the existing anomaly detection system can hardly detect the noticeable performance change especially when a performance regression is introduced and recovered again a while later. Within the fixed number of sampling period, the fluctuation gets aggravated and the lower and upper control limit get relaxed so that sometimes the existing system hardly detect the noticeable performance change. To resolve the issue, we apply dynamically tuned sampling window size based on the performance trend, and Fuzzy theory to find an appropriate size of the moving window.

Location Inference of Twitter Users using Timeline Data (타임라인데이터를 이용한 트위터 사용자의 거주 지역 유추방법)

  • Kang, Ae Tti;Kang, Young Ok
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.69-81
    • /
    • 2015
  • If one can infer the residential area of SNS users by analyzing the SNS big data, it can be an alternative by replacing the spatial big data researches which result from the location sparsity and ecological error. In this study, we developed the way of utilizing the daily life activity pattern, which can be found from timeline data of tweet users, to infer the residential areas of tweet users. We recognized the daily life activity pattern of tweet users from user's movement pattern and the regional cognition words that users text in tweet. The models based on user's movement and text are named as the daily movement pattern model and the daily activity field model, respectively. And then we selected the variables which are going to be utilized in each model. We defined the dependent variables as 0, if the residential areas that users tweet mainly are their home location(HL) and as 1, vice versa. According to our results, performed by the discriminant analysis, the hit ratio of the two models was 67.5%, 57.5% respectively. We tested both models by using the timeline data of the stress-related tweets. As a result, we inferred the residential areas of 5,301 users out of 48,235 users and could obtain 9,606 stress-related tweets with residential area. The results shows about 44 times increase by comparing to the geo-tagged tweets counts. We think that the methodology we have used in this study can be used not only to secure more location data in the study of SNS big data, but also to link the SNS big data with regional statistics in order to analyze the regional phenomenon.

Text Mining-Based Analysis for Research Trends in Vocational Studies (텍스트 마이닝을 활용한 직업학 연구동향 분석)

  • Yook, Dong-In
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.3
    • /
    • pp.586-599
    • /
    • 2017
  • This study attempts to understand the overall research trends in Vocational Studies using a text mining method, which is a means to analyze big data. The findings of the research show that Vocational Studies in Korea has been directly influenced by global economic crises, as evidenced by its exponential growth after the 1997 foreign exchange crisis that resulted in a bailout from the IMF. In addition, the topics of research have been shifting from such macro subjects as government policies and systems to such micro topics as individual career development. Moreover, the perspective of research is being moved from the socially vulnerable, including women and the disabled, to the economically marginalized, including retirees and the unemployed. As for the research targets, college students overwhelmingly outnumbered primary and secondary school students. However, few cases analyzed the clinical outcomes of career counseling or attempted to process job information and study the history of jobs. This research is limited in that it only analyzed journal abstracts. Nonetheless, it is meaningful because it used topic analysis, one of the text mining methods, to give a complete enumeration of all articles available for search, thereby crafting a framework of quantitative analysis methodology for Vocational Studies. It is also significant in that it is the first attempt to analyze themes in every stage of the development of Vocational Studies.

How to use attack cases and intelligence of Korean-based APT groups (한국어 기반 APT 그룹의 공격사례 및 인텔리전스 활용 방안)

  • Lee Jung Hun;Choi Youn Sung
    • Convergence Security Journal
    • /
    • v.24 no.3
    • /
    • pp.153-163
    • /
    • 2024
  • Despite the increasing hacking threats and security threats as IT technology advances and many companies adopt security solutions, cyberattacks and threats still persist for years. APT attack is a technique of selecting a specific target and continuing to attack. The threat of an APT attack uses all possible means through the electronic network to perform APT for years. Zero-day attacks, malicious code distribution, and social engineering techniques are performed, and some of them directly invade companies. These techniques have been in effect since 2000, and are similarly used in voice phishing, especially for social engineering techniques. Therefore, it is necessary to study countermeasures against APT attacks. This study analyzes the attack cases of Korean-based APT groups in Korea and suggests the correct method of using intelligence to analyze APT attack groups.

A Study on Establishing a Market Entry Strategy for the Satellite Industry Using Future Signal Detection Techniques (미래신호 탐지 기법을 활용한 위성산업 시장의 진입 전략 수립 연구)

  • Sehyoung Kim;Jaehyeong Park;Hansol Lee;Juyoung Kang
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.249-265
    • /
    • 2023
  • Recently, the satellite industry has been paying attention to the private-led 'New Space' paradigm, which is a departure from the traditional government-led industry. The space industry, which is considered to be the next food industry, is still receiving relatively little attention in Korea compared to the global market. Therefore, the purpose of this study is to explore future signals that can help determine the market entry strategies of private companies in the domestic satellite industry. To this end, this study utilizes the theoretical background of future signal theory and the Keyword Portfolio Map method to analyze keyword potential in patent document data based on keyword growth rate and keyword occurrence frequency. In addition, news data was collected to categorize future signals into first symptom and early information, respectively. This is utilized as an interpretive indicator of how the keywords reveal their actual potential outside of patent documents. This study describes the process of data collection and analysis to explore future signals and traces the evolution of each keyword in the collected documents from a weak signal to a strong signal by specifically visualizing how it can be used through the visualization of keyword maps. The process of this research can contribute to the methodological contribution and expansion of the scope of existing research on future signals, and the results can contribute to the establishment of new industry planning and research directions in the satellite industry.

Development of Heat Demand Forecasting Model using Deep Learning (딥러닝을 이용한 열 수요예측 모델 개발)

  • Seo, Han-Seok;Shin, KwangSup
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.59-70
    • /
    • 2018
  • In order to provide stable district heat supplying service to the certain limited residential area, it is the most important to forecast the short-term future demand more accurately and produce and supply heat in efficient way. However, it is very difficult to develop a universal heat demand forecasting model that can be applied to general situations because the factors affecting the heat consumption are very diverse and the consumption patterns are changed according to individual consumers and regional characteristics. In particular, considering all of the various variables that can affect heat demand does not help improve performance in terms of accuracy and versatility. Therefore, this study aims to develop a demand forecasting model using deep learning based on only limited information that can be acquired in real time. A demand forecasting model was developed by learning the artificial neural network of the Tensorflow using past data consisting only of the outdoor temperature of the area and date as input variables. The performance of the proposed model was evaluated by comparing the accuracy of demand predicted with the previous regression model. The proposed heat demand forecasting model in this research showed that it is possible to enhance the accuracy using only limited variables which can be secured in real time. For the demand forecasting in a certain region, the proposed model can be customized by adding some features which can reflect the regional characteristics.