• 제목/요약/키워드: Hadoop system

Search Result 237, Processing Time 0.024 seconds

A Hadoop-based System of Analyzing Real-time Advertisement Effectiveness in Social Network (소셜 네트워크에서의 Hadoop 기반 실시간 광고 효과 분석 시스템 설계)

  • Bang, Jiseon;Lee, A-Reum;Ock, YoonJung;Kim, Yoonhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.73-76
    • /
    • 2014
  • 소셜 네트워크 서비스의 증가로 인해 개인의 관심분야의 수집과 분석이 용이해졌을 뿐만 아니라, 많은 양의 정보를 활용할 수 있게 되었다. 이에 빅 데이터를 이용한 분석이 여러 분야에서 제안되고 있다 한편, 광고효과 측정 방법에 있어서 빅 데이터 분석은 많은 부분 정확도가 떨어지고, 시간이 오래 걸린다는 단점이 있었다. 때문에 본 시스템에서는 소셜 네트워크에서의 데이터를 파싱하여 TV 광고에 대한 사람들의 반응을 분석하고 그 효과를 그래프로 보여주도록 제작하였다. 본 시스템을 통해 광고효과 분석이 기존보다 빨라졌으며 다양한 방식의 분석이 가능해졌다.

Design and implementation of a Large-Scale Security Log Collection System based on Hadoop Ecosystem (Hadoop Ecosystem 기반 대용량 보안로그 수집 시스템 설계 및 구축)

  • Lee, Jong-Yoon;Lee, Bong-Hwan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.461-463
    • /
    • 2014
  • 네트워크 공격이 다양해지고 빈번하게 발생함에 따라 이에 따라 해킹 공격의 유형을 파악하기 위해 다양한 보안 솔루션이 생겨났다. 그 중 하나인 통합보안관리시스템은 다양한 로그 관리와 분석을 통해 보안 정책을 세워 차후에 있을 공격에 대비할 수 있지만 기존 통합보안관리시스템은 대부분 관계형 데이터베이스의 사용으로 급격히 증가하는 데이터를 감당하지 못한다. 많은 정보를 가지는 로그데이터의 유실 방지 및 시스템 저하를 막기 위해 대용량의 로그 데이터를 처리하는 방식이 필요해짐에 따라 분산처리에 특화되어 있는 하둡 에코시스템을 이용하여 늘어나는 데이터에 따라 유연하게 대처할 수 있고 기존 NoSQL 로그 저장방식에서 나아가 로그 저장단계에서 정규화를 사용하여 처리, 저장 능력을 향상시켜 실시간 처리 및 저장, 확장성이 뛰어난 하둡 기반의 로그 수집 시스템을 제안하고자 한다.

Development of Real-time High-Fidelity Video Processing System using Hadoop and Spark (하둡 및 스파크를 이용한 초고품질 영상 실시간 처리 시스템 개발)

  • Huh, Jingang;Kim, Yonghwan
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2018.06a
    • /
    • pp.326-328
    • /
    • 2018
  • 최근 4K/8K 급 초고품질 콘텐츠의 서비스에 관심이 집중되는 만큼 스트리밍 서비스에 대한 연구도 활발히 이루어지고 있다. 하지만 단일 PC 성능의 한계로 인해 SW 기반 영상 처리에 어려움을 겪고 있다. 본 논문에서는 분산 처리를 통해 실시간 영상 처리가 가능하도록 시스템을 제안한다. 제안한 시스템은 영상 패킷 분석 및 분할, 분산 트랜스코딩, 패킷 통합 단계로 이루어지며 Hadoop 과 Spark 를 이용하여 실시간 분산 처리를 지원한다. 실험 결과는 초고품질 입력 영상($3840{\times}2160@60Hz$, YCbCr 4:2:2, 10-bit)에 대해 평균 74.47fps 의 트랜스코딩 속도를 보인다.

  • PDF

Implementation on Online Storage with Hadoop (하둡을 이용한 온라인 대용량 저장소 구현)

  • Eom, Se-Jin;Lim, Seung-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.05a
    • /
    • pp.56-58
    • /
    • 2013
  • 최근 페이스북이나 트위터와 같은 소셜네트워크 서비스를 포함하여 대용량의 빅데이터에 대한 처리와 분석이 중요한 이슈로 다뤄지고 있으며, 사용자들이 끊임없이 쏟아내는 데이터로 인해서 이러한 데이터들을 어떻게 다룰 것인지, 혹은 어떻게 분석하여 의미 있고, 가치 있는 것으로 가공할 것인지가 중요한 사안으로 여겨지고 있다. 이러한 빅데이터 관리 도구로써 하둡은 빅데이터의 처리와 분석에 있어서 가장 해결에 근접한 도구로 평가받고 있다. 이 논문은 하둡의 주요 구성요소인 HDFS(Hadoop Distributed File System)와 JAVA에 기반하여 제작되는 온라인 대용량 저장소 시스템의 가장 기본적인 요소인 온라인 데이터 저장소를 직접 설계하고 제작하고, 구현하여 봄으로써 대용량 저장소의 구현 방식에 대한 이슈를 다뤄보도록 한다.

Scalable RDFS Reasoning using Logic Programming Approach in a Single Machine (단일머신 환경에서의 논리적 프로그래밍 방식 기반 대용량 RDFS 추론 기법)

  • Jagvaral, Batselem;Kim, Jemin;Lee, Wan-Gon;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.41 no.10
    • /
    • pp.762-773
    • /
    • 2014
  • As the web of data is increasingly producing large RDFS datasets, it becomes essential in building scalable reasoning engines over large triples. There have been many researches used expensive distributed framework, such as Hadoop, to reason over large RDFS triples. However, in many cases we are required to handle millions of triples. In such cases, it is not necessary to deploy expensive distributed systems because logic program based reasoners in a single machine can produce similar reasoning performances with that of distributed reasoner using Hadoop. In this paper, we propose a scalable RDFS reasoner using logical programming methods in a single machine and compare our empirical results with that of distributed systems. We show that our logic programming based reasoner using a single machine performs as similar as expensive distributed reasoner does up to 200 million RDFS triples. In addition, we designed a meta data structure by decomposing the ontology triples into separate sectors. Instead of loading all the triples into a single model, we selected an appropriate subset of the triples for each ontology reasoning rule. Unification makes it easy to handle conjunctive queries for RDFS schema reasoning, therefore, we have designed and implemented RDFS axioms using logic programming unifications and efficient conjunctive query handling mechanisms. The throughputs of our approach reached to 166K Triples/sec over LUBM1500 with 200 million triples. It is comparable to that of WebPIE, distributed reasoner using Hadoop and Map Reduce, which performs 185K Triples/sec. We show that it is unnecessary to use the distributed system up to 200 million triples and the performance of logic programming based reasoner in a single machine becomes comparable with that of expensive distributed reasoner which employs Hadoop framework.

Analysis of Factors for Korean Women's Cancer Screening through Hadoop-Based Public Medical Information Big Data Analysis (Hadoop기반의 공개의료정보 빅 데이터 분석을 통한 한국여성암 검진 요인분석 서비스)

  • Park, Min-hee;Cho, Young-bok;Kim, So Young;Park, Jong-bae;Park, Jong-hyock
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.10
    • /
    • pp.1277-1286
    • /
    • 2018
  • In this paper, we provide flexible scalability of computing resources in cloud environment and Apache Hadoop based cloud environment for analysis of public medical information big data. In fact, it includes the ability to quickly and flexibly extend storage, memory, and other resources in a situation where log data accumulates or grows over time. In addition, when real-time analysis of accumulated unstructured log data is required, the system adopts Hadoop-based analysis module to overcome the processing limit of existing analysis tools. Therefore, it provides a function to perform parallel distributed processing of a large amount of log data quickly and reliably. Perform frequency analysis and chi-square test for big data analysis. In addition, multivariate logistic regression analysis of significance level 0.05 and multivariate logistic regression analysis of meaningful variables (p<0.05) were performed. Multivariate logistic regression analysis was performed for each model 3.

The Efficient Method of Parallel Genetic Algorithm using MapReduce of Big Data (빅 데이터의 MapReduce를 이용한 효율적인 병렬 유전자 알고리즘 기법)

  • Hong, Sung-Sam;Han, Myung-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.5
    • /
    • pp.385-391
    • /
    • 2013
  • Big Data is data of big size which is not processed, collected, stored, searched, analyzed by the existing database management system. The parallel genetic algorithm using the Hadoop for BigData technology is easily realized by implementing GA(Genetic Algorithm) using MapReduce in the Hadoop Distribution System. The previous study that the genetic algorithm using MapReduce is proposed suitable transforming for the GA by MapReduce. However, they did not show good performance because of frequently occurring data input and output. In this paper, we proposed the MRPGA(MapReduce Parallel Genetic Algorithm) using improvement Map and Reduce process and the parallel processing characteristic of MapReduce. The optimal solution can be found by using the topology, migration of parallel genetic algorithm and local search algorithm. The convergence speed of the proposal method is 1.5 times faster than that of the existing MapReduce SGA, and is the optimal solution can be found quickly by the number of sub-generation iteration. In addition, the MRPGA is able to improve the processing and analysis performance of Big Data technology.

The Design of Method for Efficient Processing of Small Files in the Distributed System based on Hadoop Framework (하둡 프레임워크 기반 분산시스템 내의 작은 파일들을 효율적으로 처리하기 위한 방법의 설계)

  • Kim, Seung-Hyun;Kim, Young-Geun;Kim, Won-Jung
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.10
    • /
    • pp.1115-1122
    • /
    • 2015
  • Hadoop framework was designed to be suitable for processing very large files. On the other hand, when processing the Small Files, it waste the resource of a distributed system, and occur performance degradation. It is shown noticeable the more the Small Files. This problem is caused by the Small Files, it can be solved through the merging of associated Small Files. But a way of merging of Small Files has some limited point. in this paper, examines existing limit of merging method, design merging method Small Files for effective process.

A Design of SNS and Web Data Analysis System for Company Marketing Strategy (기업 마케팅 전략을 위한 SNS 및 Web 데이터 분석 시스템 설계)

  • Lee, ByungKwan;Jeong, EunHee;Jung, YiNa
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.6 no.4
    • /
    • pp.195-200
    • /
    • 2013
  • This paper proposes an SNS and Web Data Analytics System which can utilize a business marketing strategy by analyzing negative SNS and Web Data that can do great damage to a business image. It consists of the Data Collection Module collecting SNS and Web Data, the Hbase Module storing the collected data, the Data Analysis Module estimating and classifying the meaning of data after an semantic analysis of the collected data, and the PHS Module accomplishing an optimized Map Reduce by using SNS and Web data involved a Businesse. This paper can utilize this analysis result for a business marketing strategy by efficiently managing SNS and Web data with these modules.

Performance Improvement of Data Replication in Cloud Computing (Cloud Computing에서의 데이터 복제 성능 개선)

  • Lee, Joon-Kyu;Lee, Bong-Hwan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.53-56
    • /
    • 2008
  • Recently, the distributed system is being evolved into a new paradigm, named cloud computing, which provides users with efficient computing resources and services from data centers. Cloud computing would reduce the potential danger of Grid computing which utilizes resource sharing by constructing centralized data center. In this paper, a new data replication scheme is proposed for Hadoop distributed file system by changing 1:1 data transmission to 1:N. The proposed scheme considerably reduced the data transmission delay comparing to the current mechanism.

  • PDF