• Title/Summary/Keyword: HADOOP

Search Result 392, Processing Time 0.027 seconds

Design of Splunk Platform based Big Data Analysis System for Objectionable Information Detection (Splunk 플랫폼을 활용한 유해 정보 탐지를 위한 빅데이터 분석 시스템 설계)

  • Lee, Hyeop-Geon;Kim, Young-Woon;Kim, Ki-Young;Choi, Jong-Seok
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.1
    • /
    • pp.76-81
    • /
    • 2018
  • The Internet of Things (IoT), which is emerging as a future economic growth engine, has been actively introduced in areas close to our daily lives. However, there are still IoT security threats that need to be resolved. In particular, with the spread of smart homes and smart cities, an explosive amount of closed-circuit televisions (CCTVs) have been installed. The Internet protocol (IP) information and even port numbers assigned to CCTVs are open to the public via search engines of web portals or on social media platforms, such as Facebook and Twitter; even with simple tools these pieces of information can be easily hacked. For this reason, a big-data analytics system is needed, capable of supporting quick responses against data, that can potentially contain risk factors to security or illegal websites that may cause social problems, by assisting in analyzing data collected by search engines and social media platforms, frequently utilized by Internet users, as well as data on illegal websites.

Development of Information Technology Infrastructures through Construction of Big Data Platform for Road Driving Environment Analysis (도로 주행환경 분석을 위한 빅데이터 플랫폼 구축 정보기술 인프라 개발)

  • Jung, In-taek;Chong, Kyu-soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.669-678
    • /
    • 2018
  • This study developed information technology infrastructures for building a driving environment analysis platform using various big data, such as vehicle sensing data, public data, etc. First, a small platform server with a parallel structure for big data distribution processing was developed with H/W technology. Next, programs for big data collection/storage, processing/analysis, and information visualization were developed with S/W technology. The collection S/W was developed as a collection interface using Kafka, Flume, and Sqoop. The storage S/W was developed to be divided into a Hadoop distributed file system and Cassandra DB according to the utilization of data. Processing S/W was developed for spatial unit matching and time interval interpolation/aggregation of the collected data by applying the grid index method. An analysis S/W was developed as an analytical tool based on the Zeppelin notebook for the application and evaluation of a development algorithm. Finally, Information Visualization S/W was developed as a Web GIS engine program for providing various driving environment information and visualization. As a result of the performance evaluation, the number of executors, the optimal memory capacity, and number of cores for the development server were derived, and the computation performance was superior to that of the other cloud computing.

Design of a MapReduce-Based Mobility Pattern Mining System for Next Place Prediction (다음 장소 예측을 위한 맵리듀스 기반의 이동 패턴 마이닝 시스템 설계)

  • Kim, Jongwhan;Lee, Seokjun;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.8
    • /
    • pp.321-328
    • /
    • 2014
  • In this paper, we present a MapReduce-based mobility pattern mining system which can predict efficiently the next place of mobile users. It learns the mobility pattern model of each user, represented by Hidden Markov Models(HMM), from a large-scale trajectory dataset, and then predicts the next place for the user to visit by applying the learned models to the current trajectory. Our system consists of two parts: the back-end part, in which the mobility pattern models are learned for individual users, and the front-end part, where the next place for a certain user to visit is predicted based on the mobility pattern models. While the back-end part comprises of three distinct MapReduce modules for POI extraction, trajectory transformation, and mobility pattern model learning, the front-end part has two different modules for candidate route generation and next place prediction. Map and reduce functions of each module in our system were designed to utilize the underlying Hadoop infrastructure enough to maximize the parallel processing. We performed experiments to evaluate the performance of the proposed system by using a large-scale open benchmark dataset, GeoLife, and then could make sure of high performance of our system as results of the experiments.

Digital Forensic Investigation of HBase (HBase에 대한 디지털 포렌식 조사 기법 연구)

  • Park, Aran;Jeong, Doowon;Lee, Sang Jin
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.2
    • /
    • pp.95-104
    • /
    • 2017
  • As the technology in smart device is growing and Social Network Services(SNS) are becoming more common, the data which is difficult to be processed by existing RDBMS are increasing. As a result of this, NoSQL databases are getting popular as an alternative for processing massive and unstructured data generated in real time. The demand for the technique of digital investigation of NoSQL databases is increasing as the businesses introducing NoSQL database in their system are increasing, although the technique of digital investigation of databases has been researched centered on RDMBS. New techniques of digital forensic investigation are needed as NoSQL Database has no schema to normalize and the storage method differs depending on the type of database and operation environment. Research on document-based database of NoSQL has been done but it is not applicable as itself to other types of NoSQL Database. Therefore, the way of operation and data model, grasp of operation environment, collection and analysis of artifacts and recovery technique of deleted data in HBase which is a NoSQL column-based database are presented in this paper. Also the proposed technique of digital forensic investigation to HBase is verified by an experimental scenario.

A MapReduce-based kNN Join Query Processing Algorithm for Analyzing Large-scale Data (대용량 데이터 분석을 위한 맵리듀스 기반 kNN join 질의처리 알고리즘)

  • Lee, HyunJo;Kim, TaeHoon;Chang, JaeWoo
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.504-511
    • /
    • 2015
  • Recently, the amount of data is rapidly increasing with the popularity of the SNS and the development of mobile technology. So, it has been actively studied for the effective data analysis schemes of the large amounts of data. One of the typical schemes is a Voronoi diagram based on kNN join algorithm (VkNN-join) using MapReduce. For two datasets R and S, VkNN-join can reduce the time of the join query processing involving big data because it selects the corresponding subset Sj for each Ri and processes the query with them. However, VkNN-join requires a high computational cost for constructing the Voronoi diagram. Moreover, the computational overhead of the VkNN-join is high because the number of the candidate cells increases as the value of the k increases. In order to solve these problems, we propose a MapReduce-based kNN-join query processing algorithm for analyzing the large amounts of data. Using the seed-based dynamic partitioning, our algorithm can reduce the overhead for constructing the index structure. Also, it can reduce the computational overhead to find the candidate partitions by selecting corresponding partitions with the average distance between two seeds. We show that our algorithm has better performance than the existing scheme in terms of the query processing time.

Irregular Bigdata Analysis and Considerations for Civil Complaint Based on Design Thinking (비정형 빅데이터 분석 및 디자인씽킹을 활용한 민원문제 해결에 대한 고찰)

  • Kim, Tae-Hyung;Park, Byung-Jae;Suh, Eung-Kyo
    • The Journal of Industrial Distribution & Business
    • /
    • v.9 no.8
    • /
    • pp.51-60
    • /
    • 2018
  • Purpose - Civil affairs are increasing in various forms, but civil servants who are able to handle them want to reduce the complaints and provide keywords that will help in the future due to their lack of time. While various ideas are presented and implemented as policies in solving civil affairs, there are many cases that are not policies that people can sympathize with. Therefore, it is necessary to analyze the complaints accurately and to present correct solutions to the analyzed civil complaint data. Research design, data, and methodology - We analyzed the complaints data for the last three years and found out how to solve the problems of Yongin City and alleviate the burdens of civil servants. To do this, the Hadoop platform and Design Thinking process were reviewed, and proposed a new process to fuse it. The big data analysis stage focuses on civil complaints - Civil data extraction - Civil data analysis - Categorization of the year by keywords analyzing them and the needs of citizens were identified. In the forecast analysis for deriving insights, - The case of innovation case study - Idea derivation - Idea evaluation - Prototyping - Case analysis stage used. Results - Through this, a creative idea of providing free transportation cards to solve the major issues of construction, apartment, installation, and vehicle problems was discovered. There is a specific problem of how to provide these services to certain areas, but there is a pressing need for a policy that can contribute as much as it can to the citizens who are suffering from various problems at this moment. Conclusions - In the past, there were many cases in which free traffic cards were issued mainly to the elderly or disabled. In other countries, foreign residents of other area visit the areas for accommodation, and may give out free transportation cards as well. In this case, the local government will be able to set up a framework to present with a win-win scenario in various ways. It is necessary to reorganize the process in future studies so that the actual solution will be adopted, reduce civil complaints, help establish policies in the future, and be applied in other cities as well.

Bio-Sensing Convergence Big Data Computing Architecture (바이오센싱 융합 빅데이터 컴퓨팅 아키텍처)

  • Ko, Myung-Sook;Lee, Tae-Gyu
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.43-50
    • /
    • 2018
  • Biometric information computing is greatly influencing both a computing system and Big-data system based on the bio-information system that combines bio-signal sensors and bio-information processing. Unlike conventional data formats such as text, images, and videos, biometric information is represented by text-based values that give meaning to a bio-signal, important event moments are stored in an image format, a complex data format such as a video format is constructed for data prediction and analysis through time series analysis. Such a complex data structure may be separately requested by text, image, video format depending on characteristics of data required by individual biometric information application services, or may request complex data formats simultaneously depending on the situation. Since previous bio-information processing computing systems depend on conventional computing component, computing structure, and data processing method, they have many inefficiencies in terms of data processing performance, transmission capability, storage efficiency, and system safety. In this study, we propose an improved biosensing converged big data computing architecture to build a platform that supports biometric information processing computing effectively. The proposed architecture effectively supports data storage and transmission efficiency, computing performance, and system stability. And, it can lay the foundation for system implementation and biometric information service optimization optimized for future biometric information computing.

Data Processing Architecture for Cloud and Big Data Services in Terms of Cost Saving (비용절감 측면에서 클라우드, 빅데이터 서비스를 위한 대용량 데이터 처리 아키텍쳐)

  • Lee, Byoung-Yup;Park, Jae-Yeol;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.5
    • /
    • pp.570-581
    • /
    • 2015
  • In recent years, many institutions predict that cloud services and big data will be popular IT trends in the near future. A number of leading IT vendors are focusing on practical solutions and services for cloud and big data. In addition, cloud has the advantage of unrestricted in selecting resources for business model based on a variety of internet-based technologies which is the reason that provisioning and virtualization technologies for active resource expansion has been attracting attention as a leading technology above all the other technologies. Big data took data prediction model to another level by providing the base for the analysis of unstructured data that could not have been analyzed in the past. Since what cloud services and big data have in common is the services and analysis based on mass amount of data, efficient operation and designing of mass data has become a critical issue from the early stage of development. Thus, in this paper, I would like to establish data processing architecture based on technological requirements of mass data for cloud and big data services. Particularly, I would like to introduce requirements that must be met in order for distributed file system to engage in cloud computing, and efficient compression technology requirements of mass data for big data and cloud computing in terms of cost-saving, as well as technological requirements of open-source-based system such as Hadoop eco system distributed file system and memory database that are available in cloud computing.

Energy Big Data Pre-processing System for Energy New Industries (에너지신산업을 위한 에너지 빅데이터 전처리 시스템)

  • Yang, Soo-Young;Kim, Yo-Han;Kim, Sang-Hyun;Kim, Won-Jung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.5
    • /
    • pp.851-858
    • /
    • 2021
  • Due to the increase in renewable energy and distributed resources, not only traditional data but also various energy-related data are being generated in the new energy industry. In other words, there are various renewable energy facilities and power generation data, system operation data, metering and rate-related data, as well as weather and energy efficiency data necessary for new services and analysis. Energy big data processing technology can systematically analyze and diagnose data generated in the first half of the power production and consumption infrastructure, including distributed resources, systems, and AMI. Through this, it will be a technology that supports the creation of new businesses in convergence between the ICT industry and the energy industry. To this end, research on the data analysis system, such as itemized characteristic analysis of the collected data, correlation sampling, categorization of each feature, and element definition, is needed. In addition, research on data purification technology for data loss and abnormal state processing should be conducted. In addition, it is necessary to develop and structure NIFI, Spark, and HDFS systems so that energy data can be stored and managed in real time. In this study, the overall energy data processing technology and system for various power transactions as described above were proposed.

Design and Implemention of Real-time web Crawling distributed monitoring system (실시간 웹 크롤링 분산 모니터링 시스템 설계 및 구현)

  • Kim, Yeong-A;Kim, Gea-Hee;Kim, Hyun-Ju;Kim, Chang-Geun
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.1
    • /
    • pp.45-53
    • /
    • 2019
  • We face problems from excessive information served with websites in this rapidly changing information era. We find little information useful and much useless and spend a lot of time to select information needed. Many websites including search engines use web crawling in order to make data updated. Web crawling is usually used to generate copies of all the pages of visited sites. Search engines index the pages for faster searching. With regard to data collection for wholesale and order information changing in realtime, the keyword-oriented web data collection is not adequate. The alternative for selective collection of web information in realtime has not been suggested. In this paper, we propose a method of collecting information of restricted web sites by using Web crawling distributed monitoring system (R-WCMS) and estimating collection time through detailed analysis of data and storing them in parallel system. Experimental results show that web site information retrieval is applied to the proposed model, reducing the time of 15-17%.