• 제목/요약/키워드: Big data modeling

Search Result 329, Processing Time 0.032 seconds

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

A Leading Study of Data Lake Platform based on Big Data to support Business Intelligence (Business Intelligence를 지원하기 위한 Big Data 기반 Data Lake 플랫폼의 선행 연구)

  • Lee, Sang-Beom
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.01a
    • /
    • pp.31-34
    • /
    • 2018
  • We live in the digital era, and the characteristics of our customers in the digital era are constantly changing. That's why understanding business requirements and converting them to technical requirements is essential, and you have to understand the data model behind the business layout. Moreover, BI(Business Intelligence) is at the crux of revolutionizing enterprise to minimize losses and maximize profits. In this paper, we have described a leading study about the situation of desk-top BI(software product & programming language) in aspect of front-end side and the Data Lake platform based on Big Data by data modeling in aspect of back-end side to support the business intelligence.

  • PDF

Probabilistic Modeling of Photovoltaic Power Systems with Big Learning Data Sets (대용량 학습 데이터를 갖는 태양광 발전 시스템의 확률론적 모델링)

  • Cho, Hyun Cheol;Jung, Young Jin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.5
    • /
    • pp.412-417
    • /
    • 2013
  • Analytical modeling of photovoltaic power systems has been receiving significant attentions in recent years in that it is easy to apply for prediction of its dynamics and fault detection and diagnosis in advanced engineering technologies. This paper presents a novel probabilistic modeling approach for such power systems with a big data sequence. Firstly, we express input/output function of photovoltaic power systems in which solar irradiation and ambient temperature are regarded as input variable and electric power is output variable respectively. Based on this functional relationship, conditional probability for these three random variables(such as irradiation, temperature, and electric power) is mathematically defined and its estimation is accomplished from ratio of numbers of all sample data to numbers of cases related to two input variables, which is efficient in particular for a big data sequence of photovoltaic powers systems. Lastly, we predict the output values from a probabilistic model of photovoltaic power systems by using the expectation theory. Two case studies are carried out for testing reliability of the proposed modeling methodology in this paper.

Big Data Analysis and Prediction of Traffic in Los Angeles

  • Dauletbak, Dalyapraz;Woo, Jongwook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.2
    • /
    • pp.841-854
    • /
    • 2020
  • The paper explains the method to process, analyze and predict traffic patterns in Los Angeles county using Big Data and Machine Learning. The dataset is used from a popular navigating platform in the USA, which tracks information on the road using connected users' devices and also collects reports shared by the users through the app. The dataset mainly consists of information about traffic jams and traffic incidents reported by users, such as road closure, hazards, accidents. The major contribution of this paper is to give a clear view of how the large-scale road traffic data can be stored and processed using the Big Data system - Hadoop and its ecosystem (Hive). In addition, analysis is explained with the help of visuals using Business Intelligence and prediction with classification machine learning model on the sampled traffic data is presented using Azure ML. The process of modeling, as well as results, are interpreted using metrics: accuracy, precision and recall.

Big Data Analytics of Construction Safety Incidents Using Text Mining (텍스트 마이닝을 활용한 건설안전사고 빅데이터 분석)

  • Jeong Uk Seo;Chie Hoon Song
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.3
    • /
    • pp.581-590
    • /
    • 2024
  • This study aims to extract key topics through text mining of incident records (incident history, post-incident measures, preventive measures) from construction safety accident case data available on the public data portal. It also seeks to provide fundamental insights contributing to the establishment of manuals for disaster prevention by identifying correlations between these topics. After pre-processing the input data, we used the LDA-based topic modeling technique to derive the main topics. Consequently, we obtained five topics related to incident history, and four topics each related to post-incident measures and preventive measures. Although no dominant patterns emerged from the topic pattern analysis, the study holds significance as it provides quantitative information on the follow-up actions related to the incident history, thereby suggesting practical implications for the establishment of a preventive decision-making system through the linkage between accident history and subsequent measures for reccurrence prevention.

Developing a Big Data Analytics Platform Architecture for Smart Factory (스마트공장을 위한 빅데이터 애널리틱스 플랫폼 아키텍쳐 개발)

  • Shin, Seung-Jun;Woo, Jungyub;Seo, Wonchul
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.8
    • /
    • pp.1516-1529
    • /
    • 2016
  • While global manufacturing is becoming more competitive due to variety of customer demand, increase in production cost and uncertainty in resource availability, the future ability of manufacturing industries depends upon the implementation of Smart Factory. With the convergence of new information and communication technology, Smart Factory enables manufacturers to respond quickly to customer demand and minimize resource usage while maximizing productivity performance. This paper presents the development of a big data analytics platform architecture for Smart Factory. As this platform represents a conceptual software structure needed to implement data-driven decision-making mechanism in shop floors, it enables the creation and use of diagnosis, prediction and optimization models through the use of data analytics and big data. The completion of implementing the platform will help manufacturers: 1) acquire an advanced technology towards manufacturing intelligence, 2) implement a cost-effective analytics environment through the use of standardized data interfaces and open-source solutions, 3) obtain a technical reference for time-efficiently implementing an analytics modeling environment, and 4) eventually improve productivity performance in manufacturing systems. This paper also presents a technical architecture for big data infrastructure, which we are implementing, and a case study to demonstrate energy-predictive analytics in a machine tool system.

Method for Selecting a Big Data Package (빅데이터 패키지 선정 방법)

  • Byun, Dae-Ho
    • Journal of Digital Convergence
    • /
    • v.11 no.10
    • /
    • pp.47-57
    • /
    • 2013
  • Big data analysis needs a new tool for decision making in view of data volume, speed, and variety. Many global IT enterprises are announcing a variety of Big data products with easy to use, best functionality, and modeling capability. Big data packages are defined as a solution represented by analytic tools, infrastructures, platforms including hardware and software. They can acquire, store, analyze, and visualize Big data. There are many types of products with various and complex functionalities. Because of inherent characteristics of Big data, selecting a best Big data package requires expertise and an appropriate decision making method, comparing the selection problem of other software packages. The objective of this paper is to suggest a decision making method for selecting a Big data package. We compare their characteristics and functionalities through literature reviews and suggest selection criteria. In order to evaluate the feasibility of adopting packages, we develop two Analytic Hierarchy Process(AHP) models where the goal node of a model consists of costs and benefits and the other consists of selection criteria. We show a numerical example how the best package is evaluated by combining the two models.

Urban Big Data: Social Costs Analysis for Urban Planning with Crowd-sourced Mobile Sensing Data (도시 빅데이터: 모바일 센싱 데이터를 활용한 도시 계획을 위한 사회 비용 분석)

  • Shin, Dongyoun
    • Journal of KIBIM
    • /
    • v.13 no.4
    • /
    • pp.106-114
    • /
    • 2023
  • In this study, we developed a method to quantify urban social costs using mobile sensing data, providing a novel approach to urban planning. By collecting and analyzing extensive mobile data over time, we transformed travel patterns into measurable social costs. Our findings highlight the effectiveness of big data in urban planning, revealing key correlations between transportation modes and their associated social costs. This research not only advances the use of mobile data in urban planning but also suggests new directions for future studies to enhance data collection and analysis methods.

The Impact of Exploration and Exploitation Activities and Market Agility on the Relationship between Big Data Analytics Capability and Firms' Performance (빅 데이터 분석능력과 기업 성과 간의 관계에서 혁신 및 개선 활동과 시장 민첩성의 영향)

  • Jung, He-Kyung;Boo, Jeman
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.3
    • /
    • pp.150-162
    • /
    • 2022
  • This study investigated the impact of the latest developments in big data analytics capabilities (BDAC) on firm performance. The BDAC have the power to innovate existing management practices. Nevertheless, their impact on firm performance has not been fully is not yet fully elucidated. The BDAC relates to the flexibility of infrastructure as well as the skills of management and firm's personnel. Most studies have explored the phenomena from a theoretical perspective or based on factors such as organizational characteristics. However, this study extends the flow of previous research by proposing and testing a model which examines whether organizational exploration, exploitation and market agility mediate the relationship between the BDAC and firm performance. The proposed model was tested using survey data collected from the long-term employees over 10 years in 250 companies. The results analyzed through structural equation modeling show that a strong BDAC can help improve firm performance. An organization's ability to analyze big data affects its exploration and exploitation thereby affecting market agility, and, consequently, firm performance. These results also confirm the powerful mediating role of exploration, exploitation, and market agility in improving insights into big data utilization and improving firm performance.