• Title/Summary/Keyword: HADOOP

Search Result 395, Processing Time 0.022 seconds

A Design of SNS and Web Data Analysis System for Company Marketing Strategy (기업 마케팅 전략을 위한 SNS 및 Web 데이터 분석 시스템 설계)

  • Lee, ByungKwan;Jeong, EunHee;Jung, YiNa
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.6 no.4
    • /
    • pp.195-200
    • /
    • 2013
  • This paper proposes an SNS and Web Data Analytics System which can utilize a business marketing strategy by analyzing negative SNS and Web Data that can do great damage to a business image. It consists of the Data Collection Module collecting SNS and Web Data, the Hbase Module storing the collected data, the Data Analysis Module estimating and classifying the meaning of data after an semantic analysis of the collected data, and the PHS Module accomplishing an optimized Map Reduce by using SNS and Web data involved a Businesse. This paper can utilize this analysis result for a business marketing strategy by efficiently managing SNS and Web data with these modules.

Basic Prototype Design and Verification of Hadoop Cluster based on Private Cloud Infrastructure for SMB (중소기업을 위한 프라이빗 클라우드 인프라 기반 하둡 클러스터의 기본 프로토타입 설계 및 실증)

  • Cha, Byung-Rae;Kim, Hyeong-Gyun;Kim, Dae-Gue;Kim, Jong-Won;Kim, Yong-Il
    • Journal of Advanced Navigation Technology
    • /
    • v.17 no.2
    • /
    • pp.225-233
    • /
    • 2013
  • Recently, Cloud Computing and Big Data has become a buzzword in the field of IT. In this paper, as part of special efforts to support small businesses (SMB) in these situations, we designed the basic prototypes ver. 0.1, 0.2, and 0.5 for Hadoop cluster based on private cloud infrastructure and implemented the part of basic prototypes. And we verified the performances of the basic prototypes using ASA Dataset.

Cloud-based Intelligent Management System for Photovoltaic Power Plants (클라우드 기반 태양광 발전단지 통합 관리 시스템)

  • Park, Kyoung-Wook;Ban, Kyeong-Jin;Song, Seung-Heon;Kim, Eung-Kon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.7 no.3
    • /
    • pp.591-596
    • /
    • 2012
  • Recently, the efficient management system for photovoltaic power plants has been required due to the continuously increasing construction of photovoltaic power plants. In this paper, we propose a cloud-based intelligent management system for many photovoltaic power plants. The proposed system stores the measured data of power plants using Hadoop HBase which is a column-oriented database, and processes the calculations of performance, efficiency, and prediction the amount of power generation by parallel processing based on Map-Reduce model. And, Web-based data visualization module allows the administrator to provide information in various forms.

Design and Implementation of Cloud-based Sensor Data Management System (클라우드 기반 센서 데이터 관리 시스템 설계 및 구현)

  • Park, Kyoung-Wook;Kim, Kyong-Og;Ban, Kyeong-Jin;Kim, Eung-Kon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.5 no.6
    • /
    • pp.672-677
    • /
    • 2010
  • Recently, the efficient management system for large-scale sensor data has been required due to the increasing deployment of large-scale sensor networks. In this paper, we propose a cloud-based sensor data management system with low cast, high scalability, and efficiency. Sensor data in sensor networks are transmitted to the cloud through a cloud-gateway. At this point, outlier detection and event processing is performed. Transmitted sensor data are stored in the Hadoop HBase, distributed column-oriented database, and processed in parallel by query processing module designed as the MapReduce model. The proposed system can be work with the application of a variety of platforms, because processed results are provided through REST-based web service.

RHadoop platform for K-Means clustering of big data (빅데이터 K-평균 클러스터링을 위한 RHadoop 플랫폼)

  • Shin, Ji Eun;Oh, Yoon Sik;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.3
    • /
    • pp.609-619
    • /
    • 2016
  • RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. In this paper, we implement K-Means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. The main idea introduces a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. We showed that our K-Means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases. We also implemented Elbow method with MapReduce for finding the optimum number of clusters for K-Means clustering on large dataset. Comparison with our MapReduce implementation of Elbow method and classical kmeans() in R with small data showed similar results.

An Efficient Data Transmission to Cloud Storage using USB Hijacking (USB 하이재킹을 이용한 클라우드 스토리지로의 효율적인 데이터 전송 기법)

  • Eom, Hyun-Chul;No, Jae-Chun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.6
    • /
    • pp.47-55
    • /
    • 2011
  • The performance of data transmission from mobile devices to cloud storages is limited by the amount of data being transferred, communication speed and battery consumption of mobile devices. Especially, when the large-scale data communication takes place using mobile devices, such as smart phones, the performance turbulence and power consumption become an obstacle to establish the reliable communication environment. In this paper, we present an efficient data transmission method using USB Hijacking. In our approach, the synchronization to transfer a large amount of data between mobile devices and user PC is executed by using USB Hijacking. Also, there is no need to concern about data capacity and battery consumption in the data communication. We presented several experimental results to verify the effectiveness and suitability of our approach.

Performance Improvement of Data Replication in Cloud Computing (Cloud Computing에서의 데이터 복제 성능 개선)

  • Lee, Joon-Kyu;Lee, Bong-Hwan
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.53-56
    • /
    • 2008
  • Recently, the distributed system is being evolved into a new paradigm, named cloud computing, which provides users with efficient computing resources and services from data centers. Cloud computing would reduce the potential danger of Grid computing which utilizes resource sharing by constructing centralized data center. In this paper, a new data replication scheme is proposed for Hadoop distributed file system by changing 1:1 data transmission to 1:N. The proposed scheme considerably reduced the data transmission delay comparing to the current mechanism.

  • PDF

An Adaptively Speculative Execution Strategy Based on Real-Time Resource Awareness in a Multi-Job Heterogeneous Environment

  • Liu, Qi;Cai, Weidong;Liu, Qiang;Shen, Jian;Fu, Zhangjie;Liu, Xiaodong;Linge, Nigel
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.2
    • /
    • pp.670-686
    • /
    • 2017
  • MapReduce (MRV1), a popular programming model, proposed by Google, has been well used to process large datasets in Hadoop, an open source cloud platform. Its new version MapReduce 2.0 (MRV2) developed along with the emerging of Yarn has achieved obvious improvement over MRV1. However, MRV2 suffers from long finishing time on certain types of jobs. Speculative Execution (SE) has been presented as an approach to the problem above by backing up those delayed jobs from low-performance machines to higher ones. In this paper, an adaptive SE strategy (ASE) is presented in Hadoop-2.6.0. Experiment results have depicted that the ASE duplicates tasks according to real-time resources usage among work nodes in a cloud. In addition, the performance of MRV2 is largely improved using the ASE strategy on job execution time and resource consumption, whether in a multi-job environment.

On Implementing a Learning Environment for Big Data Processing using Raspberry Pi (라즈베리파이를 이용한 빅 데이터 처리 학습 환경 구축)

  • Hwang, Boram;Kim, Seonggyu
    • Journal of Digital Convergence
    • /
    • v.14 no.4
    • /
    • pp.251-258
    • /
    • 2016
  • Big data processing is a broad term for processing data sets so large or complex that traditional data processing applications are inadequate. Widespread use of smart devices results in a huge impact on the way we process data. Many organizations are contemplating how to incorporate or integrate those devices into their enterprise data systems. We have proposed a way to process big data by way of integrating Raspberry Pi into a Hadoop cluster as a computational grid. We have then shown the efficiency through several experiments and the ease of scaling of the proposed system.

A Study on Phon Call Big Data Analytics (전화통화 빅데이터 분석에 관한 연구)

  • Kim, Jeongrae;Jeong, Chanki
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.3
    • /
    • pp.387-397
    • /
    • 2013
  • This paper proposes an approach to big data analytics for phon call data. The analytical models for phon call data is composed of the PVPF (Parallel Variable-length Phrase Finding) algorithm for identifying verbal phrases of natural language and the word count algorithm for measuring the usage frequency of keywords. In the proposed model, we identify words using the PVPF algorithm, and measure the usage frequency of the identified words using word count algorithm in MapReduce. The results can be interpreted from various viewpoints. We design and implement the model based HDFS (Hadoop Distributed File System), verify the proposed approach through a case study of phon call data. So we extract useful results through analysis of keyword correlation and usage frequency.