• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.026 seconds

A study on the security policy improvement using the big data (빅데이터를 이용한 보안정책 개선에 관한 연구)

  • Kim, Song-Young;Kim, Joseph;Lim, Jong-In;Lee, Kyung-Ho
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.23 no.5
    • /
    • pp.969-976
    • /
    • 2013
  • The information protection systems of company are intended to detect all weak points, intrusion, document drain. All actions of people in company are recorded and can check persistently. On the other hand, what analyze security log generated by these systems becomes more difficult. Most staff who manages the security systems, and analyze log is more incomprehensible than a user or a person of drain for an information distribution process of the work-site operations and the management procedure of the critical information. Such a reality say the serious nature of the internal information leakage that can be brought up more. While the research on the big data proceeds actively recently, the successful cases are being announced in the various areas. This research is going to present the improved big data processing technology and case of the security field.

A GPU-enabled Face Detection System in the Hadoop Platform Considering Big Data for Images (이미지 빅데이터를 고려한 하둡 플랫폼 환경에서 GPU 기반의 얼굴 검출 시스템)

  • Bae, Yuseok;Park, Jongyoul
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.1
    • /
    • pp.20-25
    • /
    • 2016
  • With the advent of the era of digital big data, the Hadoop platform has become widely used in various fields. However, the Hadoop MapReduce framework suffers from problems related to the increase of the name node's main memory and map tasks for the processing of large number of small files. In addition, a method for running C++-based tasks in the MapReduce framework is required in order to conjugate GPUs supporting hardware-based data parallelism in the MapReduce framework. Therefore, in this paper, we present a face detection system that generates a sequence file for images to process big data for images in the Hadoop platform. The system also deals with tasks for GPU-based face detection in the MapReduce framework using Hadoop Pipes. We demonstrate a performance increase of around 6.8-fold as compared to a single CPU process.

A Method for Compound Noun Extraction to Improve Accuracy of Keyword Analysis of Social Big Data

  • Kim, Hyeon Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.55-63
    • /
    • 2021
  • Since social big data often includes new words or proper nouns, statistical morphological analysis methods have been widely used to process them properly which are based on the frequency of occurrence of each word. However, these methods do not properly recognize compound nouns, and thus have a problem in that the accuracy of keyword extraction is lowered. This paper presents a method to extract compound nouns in keyword analysis of social big data. The proposed method creates a candidate group of compound nouns by combining the words obtained through the morphological analysis step, and extracts compound nouns by examining their frequency of appearance in a given review. Two algorithms have been proposed according to the method of constructing the candidate group, and the performance of each algorithm is expressed and compared with formulas. The comparison result is verified through experiments on real data collected online, where the results also show that the proposed method is suitable for real-time processing.

Suggestion of BigData Processing System for Enhanced Data Processing on ETL (ETL 상에서 처리속도 향상을 위한 빅데이터 처리 시스템 제안)

  • Lee, Jung-Been;Park, Seok-Cheon;Kil, Gi-Beom;Chun, Seung-Tea
    • Annual Conference of KIPS
    • /
    • 2015.04a
    • /
    • pp.170-171
    • /
    • 2015
  • 최근 디지털 정보량의 기하급수적인 증가에 따라 대규모 데이터인 빅데이터가 등장하였다. 빅데이터는 데이터가 실시간으로 매우 빠르게 생성되며 다양한 형태의 데이터를 가지며 이 데이터를 수집, 처리, 분석을 통해 새로운 지식을 창출한다. 그러나 기존의 ETL(Exact/Transform/Load) 연구에서 이러한 빅데이터를 처리 하는데 성능 저하가 발생되고 있으며 비정형 데이터를 관리할 수 없다. 따라서 본 논문에서는 기존의 ETL 처리의 한계를 극복하기 위해서 하둡을 이용하여 ETL 상에서 처리 속도를 높이고 비정형 데이터를 처리할 수 있는 빅데이터 처리 시스템을 제안하고자 한다.

Location Recommendation Customize System Using Opinion Mining (오피니언마이닝을 이용한 사용자 맞춤 장소 추천 시스템)

  • Choi, Eun-jeong;Kim, Dong-keun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2043-2051
    • /
    • 2017
  • Lately, In addition to the increased interest in the big data field, there is also a growing interest in application fields through the processing of big data. Opinion Mining is a big data processing technique that is widely used in providing personalized service to users. Based on this, in this paper, textual review of users' places is processed by Opinion mining technique and the sentiment of users was analyzed through k-means clustering. The same numerical value is given to users who have a similar category of sentiment classified as a clustering operation. We propose a method to show recommendation contents to users by predicting preference using collaborative filtering recommendation system with assigned numerical values and marking contents with markers on the map in order of places with high predicted value.

An Optimization Method for the Calculation of SCADA Main Grid's Theoretical Line Loss Based on DBSCAN

  • Cao, Hongyi;Ren, Qiaomu;Zou, Xiuguo;Zhang, Shuaitang;Qian, Yan
    • Journal of Information Processing Systems
    • /
    • v.15 no.5
    • /
    • pp.1156-1170
    • /
    • 2019
  • In recent years, the problem of data drifted of the smart grid due to manual operation has been widely studied by researchers in the related domain areas. It has become an important research topic to effectively and reliably find the reasonable data needed in the Supervisory Control and Data Acquisition (SCADA) system has become an important research topic. This paper analyzes the data composition of the smart grid, and explains the power model in two smart grid applications, followed by an analysis on the application of each parameter in density-based spatial clustering of applications with noise (DBSCAN) algorithm. Then a comparison is carried out for the processing effects of the boxplot method, probability weight analysis method and DBSCAN clustering algorithm on the big data driven power grid. According to the comparison results, the performance of the DBSCAN algorithm outperforming other methods in processing effect. The experimental verification shows that the DBSCAN clustering algorithm can effectively screen the power grid data, thereby significantly improving the accuracy and reliability of the calculation result of the main grid's theoretical line loss.

An Analysis of Big Video Data with Cloud Computing in Ubiquitous City (클라우드 컴퓨팅을 이용한 유시티 비디오 빅데이터 분석)

  • Lee, Hak Geon;Yun, Chang Ho;Park, Jong Won;Lee, Yong Woo
    • Journal of Internet Computing and Services
    • /
    • v.15 no.3
    • /
    • pp.45-52
    • /
    • 2014
  • The Ubiquitous-City (U-City) is a smart or intelligent city to satisfy human beings' desire to enjoy IT services with any device, anytime, anywhere. It is a future city model based on Internet of everything or things (IoE or IoT). It includes a lot of video cameras which are networked together. The networked video cameras support a lot of U-City services as one of the main input data together with sensors. They generate huge amount of video information, real big data for the U-City all the time. It is usually required that the U-City manipulates the big data in real-time. And it is not easy at all. Also, many times, it is required that the accumulated video data are analyzed to detect an event or find a figure among them. It requires a lot of computational power and usually takes a lot of time. Currently we can find researches which try to reduce the processing time of the big video data. Cloud computing can be a good solution to address this matter. There are many cloud computing methodologies which can be used to address the matter. MapReduce is an interesting and attractive methodology for it. It has many advantages and is getting popularity in many areas. Video cameras evolve day by day so that the resolution improves sharply. It leads to the exponential growth of the produced data by the networked video cameras. We are coping with real big data when we have to deal with video image data which are produced by the good quality video cameras. A video surveillance system was not useful until we find the cloud computing. But it is now being widely spread in U-Cities since we find some useful methodologies. Video data are unstructured data thus it is not easy to find a good research result of analyzing the data with MapReduce. This paper presents an analyzing system for the video surveillance system, which is a cloud-computing based video data management system. It is easy to deploy, flexible and reliable. It consists of the video manager, the video monitors, the storage for the video images, the storage client and streaming IN component. The "video monitor" for the video images consists of "video translater" and "protocol manager". The "storage" contains MapReduce analyzer. All components were designed according to the functional requirement of video surveillance system. The "streaming IN" component receives the video data from the networked video cameras and delivers them to the "storage client". It also manages the bottleneck of the network to smooth the data stream. The "storage client" receives the video data from the "streaming IN" component and stores them to the storage. It also helps other components to access the storage. The "video monitor" component transfers the video data by smoothly streaming and manages the protocol. The "video translator" sub-component enables users to manage the resolution, the codec and the frame rate of the video image. The "protocol" sub-component manages the Real Time Streaming Protocol (RTSP) and Real Time Messaging Protocol (RTMP). We use Hadoop Distributed File System(HDFS) for the storage of cloud computing. Hadoop stores the data in HDFS and provides the platform that can process data with simple MapReduce programming model. We suggest our own methodology to analyze the video images using MapReduce in this paper. That is, the workflow of video analysis is presented and detailed explanation is given in this paper. The performance evaluation was experiment and we found that our proposed system worked well. The performance evaluation results are presented in this paper with analysis. With our cluster system, we used compressed $1920{\times}1080(FHD)$ resolution video data, H.264 codec and HDFS as video storage. We measured the processing time according to the number of frame per mapper. Tracing the optimal splitting size of input data and the processing time according to the number of node, we found the linearity of the system performance.

A Development on a Predictive Model for Buying Unemployment Insurance Program Based on Public Data (공공데이터 기반 고용보험 가입 예측 모델 개발 연구)

  • Cho, Minsu;Kim, Dohyeon;Song, Minseok;Kim, Kwangyong;Jeong, Chungsik;Kim, Kidae
    • The Journal of Bigdata
    • /
    • v.2 no.2
    • /
    • pp.17-31
    • /
    • 2017
  • With the development of the big data environment, public institutions also have been providing big data infrastructures. Public data is one of the typical examples, and numerous applications using public data have been provided. One of the cases is related to the employment insurance. All employers have to make contracts for the employment insurance for all employees to protect the rights. However, there are abundant cases where employers avoid to buy insurances. To overcome these challenges, a data-driven approach is needed; however, there are lacks of methodologies to integrate, manage, and analyze the public data. In this paper, we propose a methodology to build a predictive model for identifying whether employers have made the contracts of employment insurance based on public data. The methodology includes collection, integration, pre-processing, analysis of data and generating prediction models based on process mining and data mining techniques. Also, we verify the methodology with case studies.

  • PDF

Design and Implementation of Reed-Solomon Code for 2-Dimensional Bar Code System (Reed-Solomon 알고리즈을 이용한 2차원 바코드 시스템에서 오류 극복 기능 설계 및 구현)

  • Jang, Seung-Ju
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.5
    • /
    • pp.1491-1499
    • /
    • 2000
  • This paper is designed and implemented the data recovery mechanism for 2-D (2-dimensional) bar code system. The data recovery algorithm used the modified Reed-Solomon algorithm and it is implemented into 2-D bar code system. There are 7 types of 2-D bar code system, which are 21x21, 25x25, 41x41, 73x73, 101x101, 177x177. This paper has been experimented that how many data is saved among several 2-D bar code types and how many data re recovered. In the first experiment, the big size 2-D bar code system has many ECC codeword. Therefore, original data cannot be assigned to 2-D bar code system. In the second experiment, even if 35∼40% loss dta for the 2-D bar code system, the 2-D bar code system could have been recovered to original data.

  • PDF

Analysis of Freight Big Data using R-Language (화물 배차 빅데이터 분석)

  • Selvaraj, Suganya;Choi, Eunmi
    • Annual Conference of KIPS
    • /
    • 2018.05a
    • /
    • pp.320-322
    • /
    • 2018
  • Data analysis is a process of generating useful information by evaluating real-world raw data for making better decisions in business development. In the freight transport logistics companies, the analysis of freight data is increasingly garnering considerable importance among the users for making better decisions regarding freight cost reductions. Consequently, in this study, we used R programming language to analyze the freight data that are collected from freight transport logistics company. Usually, the freight rate varies based on chosen day of the week. In here, we analyzed and visualized the results such as frequency of cost vs days, frequency of requested goods in ton vs days, frequency of order vs days, and frequency of order status vs days for the last one-year freight data. These analysis results are beneficial in the viewpoint of the users in ordering process.