• Title/Summary/Keyword: Big Data Log Analysis System

Search Result 38, Processing Time 0.03 seconds

Design of Extended Real-time Data Pipeline System Architecture (확장형 실시간 데이터 파이프라인 시스템 아키텍처 설계)

  • Shin, Hoseung;Kang, Sungwon;Lee, Jihyun
    • Journal of KIISE
    • /
    • v.42 no.8
    • /
    • pp.1010-1021
    • /
    • 2015
  • Big data systems are widely used to collect large-scale log data, so it is very important for these systems to operate with a high level of performance. However, the current Hadoop-based big data system architecture has a problem in that its performance is low as a result of redundant processing. This paper solves this problem by improving the design of the Hadoop system architecture. The proposed architecture uses the batch-based data collection of the existing architecture in combination with a single processing method. A high level of performance can be achieved by analyzing the collected data directly in memory to avoid redundant processing. The proposed architecture guarantees system expandability, which is an advantage of using the Hadoop architecture. This paper confirms that the proposed architecture is approximately 30% to 35% faster in analyzing and processing data than existing architectures and that it is also extendable.

Analysis of Network Log based on Hadoop (하둡 기반 네트워크 로그 시스템)

  • Kim, Jeong-Joon;Park, Jeong-Min;Chung, Sung-Taek
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.5
    • /
    • pp.125-130
    • /
    • 2017
  • Since field control equipment such as PLC has no function to log key event information in the log, it is difficult to analyze the accident. Therefore, it is necessary to secure information that can analyze when a cyber accident occurs by logging the main event information of the field control equipment such as PLC and IED. The protocol analyzer is required to analyze the field control device (the embedded device) communication protocol for event logging. However, the conventional analyzer, such as Wireshark is difficult to process the data identification and extraction of the large variety of protocols for event logging is difficult analysis of the payload data based and classification. In this paper, we developed a system for Big Data based on field control device communication protocol payload data extraction for event logging of large studies.

Implementation of Customer Behavior Evaluation System Using Real-time Web Log Stream Data (실시간 웹로그 스트림데이터를 이용한 고객행동평가시스템 구현)

  • Lee, Hanjoo;Park, Hongkyu;Lee, Wonsuk
    • The Journal of Korean Institute of Information Technology
    • /
    • v.16 no.12
    • /
    • pp.1-11
    • /
    • 2018
  • Recently, the volume of online shopping market continues to be fast-growing, that is important to provide customized service based on customer behavior evaluation analysis. The existing systems only provide analysis data on the profiles and behaviors of the consumers, and there is a limit to the processing in real time due to disk based mining. There are problems of accuracy and system performance problems to apply existing systems to web services that require real-time processing and analysis. Therefore, The system proposed in this paper analyzes the web click log streams generated in real time to calculate the concentration level of specific products and finds interested customers which are likely to purchase the products, and provides and intensive promotions to interested customers. And we verify the efficiency and accuracy of the proposed system.

The Analysis of the APT Prelude by Big Data Analytics (빅데이터 분석을 통한 APT공격 전조 현상 분석)

  • Choi, Chan-young;Park, Dea-woo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.05a
    • /
    • pp.317-320
    • /
    • 2016
  • The NH-NongHyup network and servers were paralyzed in 2011, in the 2013 3.20 cyber attack happened and Classified documents of Korea Hydro & Nuclear Power Co. Ltd were leaked on December in 2015. All of them were conducted by a foreign country. These attacks were planned for a long time compared to the script kids attacks and the techniques used were very complex and sophisticated. However, no successful solution has been implemented to defend an APT attack thus far. Therefore, we will use big data analytics to analyze whether or not APT attack has occurred in order to defend against the manipulative attackers. This research is based on the data collected through ISAC monitoring among 3 hierarchical Korean defense system. First, we will introduce related research about big data analytics and machine learning. Then, we design two big data analytics models to detect an APT attack and evaluate the models' accuracy and other results. Lastly, we will present an effective response method to address a detected APT attack.

  • PDF

A study on the MD&A Disclosure Quality in real-time calculated and provided By Programming Technology

  • Shin, YeounOuk
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.3
    • /
    • pp.41-48
    • /
    • 2019
  • The Management Discussion and Analysis(MD&A) provides investors with an opportunity to gain insight into the company from a manager's perspective and enables short-term and long-term analysis of the business. And MD&A is an important channel through which companies and investors can communicate, providing a useful source of information for analyzing financialstatements. MD&A is measured by the quality of disclosure and there are many previous studies on the usefulness of disclosure information. Therefore, it is very important for the financial analyst who is the representative information user group in the capital market that MD&A Disclosure Quality is measured in real-time in combination with IT information technology and provided timely to financial analyst. In this study, we propose a method that real-time data is converted to digitalized data by combining MD&A disclosure with IT information technology and provided to financial analyst's information environment in real-time. The real-time information provided by MD&A can help the financial analysts' activities and reduce information asymmetry.

A Design and Development of Big Data Indexing and Search System using Lucene (루씬을 이용한 빅데이터 인덱싱 및 검색시스템의 설계 및 구현)

  • Kim, DongMin;Choi, JinWoo;Woo, ChongWoo
    • Journal of Internet Computing and Services
    • /
    • v.15 no.6
    • /
    • pp.107-115
    • /
    • 2014
  • Recently, increased use of the internet resulted in generation of large and diverse types of data due to increased use of social media, expansion of a convergence of among industries, use of the various smart device. We are facing difficulties to manage and analyze the data using previous data processing techniques since the volume of the data is huge, form of the data varies and evolves rapidly. In other words, we need to study a new approach to solve such problems. Many approaches are being studied on this issue, and we are describing an effective design and development to build indexing engine of big data platform. Our goal is to build a system that could effectively manage for huge data set which exceeds previous data processing range, and that could reduce data analysis time. We used large SNMP log data for an experiment, and tried to reduce data analysis time through the fast indexing and searching approach. Also, we expect our approach could help analyzing the user data through visualization of the analyzed data expression.

A Multimodal Profile Ensemble Approach to Development of Recommender Systems Using Big Data (빅데이터 기반 추천시스템 구현을 위한 다중 프로파일 앙상블 기법)

  • Kim, Minjeong;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.93-110
    • /
    • 2015
  • The recommender system is a system which recommends products to the customers who are likely to be interested in. Based on automated information filtering technology, various recommender systems have been developed. Collaborative filtering (CF), one of the most successful recommendation algorithms, has been applied in a number of different domains such as recommending Web pages, books, movies, music and products. But, it has been known that CF has a critical shortcoming. CF finds neighbors whose preferences are like those of the target customer and recommends products those customers have most liked. Thus, CF works properly only when there's a sufficient number of ratings on common product from customers. When there's a shortage of customer ratings, CF makes the formation of a neighborhood inaccurate, thereby resulting in poor recommendations. To improve the performance of CF based recommender systems, most of the related studies have been focused on the development of novel algorithms under the assumption of using a single profile, which is created from user's rating information for items, purchase transactions, or Web access logs. With the advent of big data, companies got to collect more data and to use a variety of information with big size. So, many companies recognize it very importantly to utilize big data because it makes companies to improve their competitiveness and to create new value. In particular, on the rise is the issue of utilizing personal big data in the recommender system. It is why personal big data facilitate more accurate identification of the preferences or behaviors of users. The proposed recommendation methodology is as follows: First, multimodal user profiles are created from personal big data in order to grasp the preferences and behavior of users from various viewpoints. We derive five user profiles based on the personal information such as rating, site preference, demographic, Internet usage, and topic in text. Next, the similarity between users is calculated based on the profiles and then neighbors of users are found from the results. One of three ensemble approaches is applied to calculate the similarity. Each ensemble approach uses the similarity of combined profile, the average similarity of each profile, and the weighted average similarity of each profile, respectively. Finally, the products that people among the neighborhood prefer most to are recommended to the target users. For the experiments, we used the demographic data and a very large volume of Web log transaction for 5,000 panel users of a company that is specialized to analyzing ranks of Web sites. R and SAS E-miner was used to implement the proposed recommender system and to conduct the topic analysis using the keyword search, respectively. To evaluate the recommendation performance, we used 60% of data for training and 40% of data for test. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. A widely used combination metric called F1 metric that gives equal weight to both recall and precision was employed for our evaluation. As the results of evaluation, the proposed methodology achieved the significant improvement over the single profile based CF algorithm. In particular, the ensemble approach using weighted average similarity shows the highest performance. That is, the rate of improvement in F1 is 16.9 percent for the ensemble approach using weighted average similarity and 8.1 percent for the ensemble approach using average similarity of each profile. From these results, we conclude that the multimodal profile ensemble approach is a viable solution to the problems encountered when there's a shortage of customer ratings. This study has significance in suggesting what kind of information could we use to create profile in the environment of big data and how could we combine and utilize them effectively. However, our methodology should be further studied to consider for its real-world application. We need to compare the differences in recommendation accuracy by applying the proposed method to different recommendation algorithms and then to identify which combination of them would show the best performance.

Security Log collection and analysis System Design Using Big Data System (빅 데이터 시스템을 이용한 보안 로그 수집 및 분석 시스템 설계)

  • Kim, Du-Hoe;Shin, Dong-Kyoo;Shin, Dong-Il
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.04a
    • /
    • pp.321-323
    • /
    • 2016
  • 최근 SNS, 클라우드 서비스, IoT 등 신기술이 발전함에 따라서 개인 정보 보호와 보안에 관심이 대두 되었다. 때문에 기업들은 고객 정보 보호를 위한 보안 솔루션 구축이 필수불가결해졌다. 이러한 기업의 니즈를 충족시키기 위해 ESM이라는 보안 관리 시스템이 등장하고 최근에는 SIEM으로 넘어가고 있는 추세이다. SIEM은 관리자가 로그들을 모니터링 하는 방식으로 많은 양의 로그가 발생하거나 축적된 로그들을 분석하는 것은 한계가 있다. 따라서 본 논문에서는 빅 데이터 시스템을 이용하여 로그들을 축적하고 머하웃을 이용하여 축적된 로그들을 분석하는 자동화 시스템을 제안한다.

A Study on the Measurement of Voluntary Disclosure Quality Using Real-Time Disclosure By Programming Technology

  • Shin, YeounOuk;Kim, KiBum
    • International journal of advanced smart convergence
    • /
    • v.7 no.2
    • /
    • pp.86-94
    • /
    • 2018
  • This study focuses on presenting the IT program module provided by real - time forecasting and database of the voluntary disclosure quality measure in order to solve the problem of capital cost due to information asymmetry of external investors and corporate executives. This study suggests a model of the algorithm that the quality of real - time voluntary disclosure can be provided to all investors immediately by IT program in order to deliver the meaningful value in the domestic capital market. This is a method of generating and analyzing real-time or non-real-time prediction models by transferring the predicted estimates delivered to the Big Data Log Analysis System through the statistical DB to the statistical forecasting engine.

Hacking Detection Mechanism of Cyber Attacks Modeling (외부 해킹 탐지를 위한 사이버 공격 모델링)

  • Cheon, Yang-Ha
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.9
    • /
    • pp.1313-1318
    • /
    • 2013
  • In order to actively respond to cyber attacks, not only the security systems such as IDS, IPS, and Firewalls, but also ESM, a system that detects cyber attacks by analyzing various log data, are preferably deployed. However, as the attacks be come more elaborate and advanced, existing signature-based detection methods start to face their limitations. In response to that, researches upon symptom detection technology based on attack modeling by employing big-data analysis technology are actively on-going. This symptom detection technology is effective when it can accurately extract features of attacks and manipulate them to successfully execute the attack modeling. We propose the ways to extract attack features which can play a role as the basis of the modeling and detect intelligent threats by carrying out scenario-based modeling.