• Title/Summary/Keyword: Optimal Tool

Search Result 1,352, Processing Time 0.023 seconds

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

An Empirical Study on the Influencing Factors for Big Data Intented Adoption: Focusing on the Strategic Value Recognition and TOE Framework (빅데이터 도입의도에 미치는 영향요인에 관한 연구: 전략적 가치인식과 TOE(Technology Organizational Environment) Framework을 중심으로)

  • Ka, Hoi-Kwang;Kim, Jin-soo
    • Asia pacific journal of information systems
    • /
    • v.24 no.4
    • /
    • pp.443-472
    • /
    • 2014
  • To survive in the global competitive environment, enterprise should be able to solve various problems and find the optimal solution effectively. The big-data is being perceived as a tool for solving enterprise problems effectively and improve competitiveness with its' various problem solving and advanced predictive capabilities. Due to its remarkable performance, the implementation of big data systems has been increased through many enterprises around the world. Currently the big-data is called the 'crude oil' of the 21st century and is expected to provide competitive superiority. The reason why the big data is in the limelight is because while the conventional IT technology has been falling behind much in its possibility level, the big data has gone beyond the technological possibility and has the advantage of being utilized to create new values such as business optimization and new business creation through analysis of big data. Since the big data has been introduced too hastily without considering the strategic value deduction and achievement obtained through the big data, however, there are difficulties in the strategic value deduction and data utilization that can be gained through big data. According to the survey result of 1,800 IT professionals from 18 countries world wide, the percentage of the corporation where the big data is being utilized well was only 28%, and many of them responded that they are having difficulties in strategic value deduction and operation through big data. The strategic value should be deducted and environment phases like corporate internal and external related regulations and systems should be considered in order to introduce big data, but these factors were not well being reflected. The cause of the failure turned out to be that the big data was introduced by way of the IT trend and surrounding environment, but it was introduced hastily in the situation where the introduction condition was not well arranged. The strategic value which can be obtained through big data should be clearly comprehended and systematic environment analysis is very important about applicability in order to introduce successful big data, but since the corporations are considering only partial achievements and technological phases that can be obtained through big data, the successful introduction is not being made. Previous study shows that most of big data researches are focused on big data concept, cases, and practical suggestions without empirical study. The purpose of this study is provide the theoretically and practically useful implementation framework and strategies of big data systems with conducting comprehensive literature review, finding influencing factors for successful big data systems implementation, and analysing empirical models. To do this, the elements which can affect the introduction intention of big data were deducted by reviewing the information system's successful factors, strategic value perception factors, considering factors for the information system introduction environment and big data related literature in order to comprehend the effect factors when the corporations introduce big data and structured questionnaire was developed. After that, the questionnaire and the statistical analysis were performed with the people in charge of the big data inside the corporations as objects. According to the statistical analysis, it was shown that the strategic value perception factor and the inside-industry environmental factors affected positively the introduction intention of big data. The theoretical, practical and political implications deducted from the study result is as follows. The frist theoretical implication is that this study has proposed theoretically effect factors which affect the introduction intention of big data by reviewing the strategic value perception and environmental factors and big data related precedent studies and proposed the variables and measurement items which were analyzed empirically and verified. This study has meaning in that it has measured the influence of each variable on the introduction intention by verifying the relationship between the independent variables and the dependent variables through structural equation model. Second, this study has defined the independent variable(strategic value perception, environment), dependent variable(introduction intention) and regulatory variable(type of business and corporate size) about big data introduction intention and has arranged theoretical base in studying big data related field empirically afterwards by developing measurement items which has obtained credibility and validity. Third, by verifying the strategic value perception factors and the significance about environmental factors proposed in the conventional precedent studies, this study will be able to give aid to the afterwards empirical study about effect factors on big data introduction. The operational implications are as follows. First, this study has arranged the empirical study base about big data field by investigating the cause and effect relationship about the influence of the strategic value perception factor and environmental factor on the introduction intention and proposing the measurement items which has obtained the justice, credibility and validity etc. Second, this study has proposed the study result that the strategic value perception factor affects positively the big data introduction intention and it has meaning in that the importance of the strategic value perception has been presented. Third, the study has proposed that the corporation which introduces big data should consider the big data introduction through precise analysis about industry's internal environment. Fourth, this study has proposed the point that the size and type of business of the corresponding corporation should be considered in introducing the big data by presenting the difference of the effect factors of big data introduction depending on the size and type of business of the corporation. The political implications are as follows. First, variety of utilization of big data is needed. The strategic value that big data has can be accessed in various ways in the product, service field, productivity field, decision making field etc and can be utilized in all the business fields based on that, but the parts that main domestic corporations are considering are limited to some parts of the products and service fields. Accordingly, in introducing big data, reviewing the phase about utilization in detail and design the big data system in a form which can maximize the utilization rate will be necessary. Second, the study is proposing the burden of the cost of the system introduction, difficulty in utilization in the system and lack of credibility in the supply corporations etc in the big data introduction phase by corporations. Since the world IT corporations are predominating the big data market, the big data introduction of domestic corporations can not but to be dependent on the foreign corporations. When considering that fact, that our country does not have global IT corporations even though it is world powerful IT country, the big data can be thought to be the chance to rear world level corporations. Accordingly, the government shall need to rear star corporations through active political support. Third, the corporations' internal and external professional manpower for the big data introduction and operation lacks. Big data is a system where how valuable data can be deducted utilizing data is more important than the system construction itself. For this, talent who are equipped with academic knowledge and experience in various fields like IT, statistics, strategy and management etc and manpower training should be implemented through systematic education for these talents. This study has arranged theoretical base for empirical studies about big data related fields by comprehending the main variables which affect the big data introduction intention and verifying them and is expected to be able to propose useful guidelines for the corporations and policy developers who are considering big data implementationby analyzing empirically that theoretical base.