• Title/Summary/Keyword: MapReduce model

Search Result 158, Processing Time 0.028 seconds

Hadoop and MapReduce (하둡과 맵리듀스)

  • Park, Jeong-Hyeok;Lee, Sang-Yeol;Kang, Da Hyun;Won, Joong-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.1013-1027
    • /
    • 2013
  • As the need for large-scale data analysis is rapidly increasing, Hadoop, or the platform that realizes large-scale data processing, and MapReduce, or the internal computational model of Hadoop, are receiving great attention. This paper reviews the basic concepts of Hadoop and MapReduce necessary for data analysts who are familiar with statistical programming, through examples that combine the R programming language and Hadoop.

An Improved Hybrid Canopy-Fuzzy C-Means Clustering Algorithm Based on MapReduce Model

  • Dai, Wei;Yu, Changjun;Jiang, Zilong
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2016
  • The fuzzy c-means (FCM) is a frequently utilized algorithm at present. Yet, the clustering quality and convergence rate of FCM are determined by the initial cluster centers, and so an improved FCM algorithm based on canopy cluster concept to quickly analyze the dataset has been proposed. Taking advantage of the canopy algorithm for its rapid acquisition of cluster centers, this algorithm regards the cluster results of canopy as the input. In this way, the convergence rate of the FCM algorithm is accelerated. Meanwhile, the MapReduce scheme of the proposed FCM algorithm is designed in a cloud environment. Experimental results demonstrate the hybrid canopy-FCM clustering algorithm processed by MapReduce be endowed with better clustering quality and higher operation speed.

Design and Implementation of a Book Recommendation System based on the MapReduce Model (MapReduce Model에 기반한 도서 추천 시스템의 설계 및 구현)

  • Lim, Chan-Shik;Lee, Won-Jae;Lee, Ha-Na;Lee, Se-Hwa;Lee, Sang-Jun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.201-204
    • /
    • 2010
  • 하루에도 수많은 도서가 출판되는 현실에서 사용자가 원하는 목적에 맞는 도서를 찾아 읽기는 어려운 일이다. 본 논문에서는 방대한 분량의 도서 데이타를 바탕으로, MapReduce 모델을 활용하여 도서들 사이의 연관 관계를 추출하였다. 추출한 연관 관계 DB를 이용하여 사용자에게 서로 관련 있는 도서를 추천해줄 수 있는 시스템을 개발하고자 한다.

  • PDF

Chip Load Control Using a NC Verification Model Based on Z-Map (Z-map 기반 가공 검증모델을 이용한 칩부하 제어기)

  • Baek Dae Kyun;Ko Tae Jo;Park Jung Whan;Kim Hee Sool
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.22 no.4
    • /
    • pp.68-75
    • /
    • 2005
  • This paper presents a new method for the optimization of feed rate in sculptured surface machining. A NC verification model based on Z-map was utilized to obtain chip load according to feed per tooth. This optimization method can regenerate a new NC program with respect to the commanded cutting conditions and the NC program that was generated from CAM system. The regenerated NC program has not only the same data of the ex-NC program but also the updated feed rate in every block. The new NC data can reduce the cutting time and produce precision products with almost even chip load to the feed per tooth. This method can also reduce tool chipping and make constant tool wear.

The Method of Analyzing Firewall Log Data using MapReduce based on NoSQL (NoSQL기반의 MapReduce를 이용한 방화벽 로그 분석 기법)

  • Choi, Bomin;Kong, Jong-Hwan;Hong, Sung-Sam;Han, Myung-Mook
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.23 no.4
    • /
    • pp.667-677
    • /
    • 2013
  • As the firewall is a typical network security equipment, it is usually installed at most of internal/external networks and makes many packet data in/out. So analyzing the its logs stored in it can provide important and fundamental data on the network security research. However, along with development of communications technology, the speed of internet network is improved and then the amount of log data is becoming 'Massive Data' or 'BigData'. In this trend, there are limits to analyze log data using the traditional database model RDBMS. In this paper, through our Method of Analyzing Firewall log data using MapReduce based on NoSQL, we have discovered that the introducing NoSQL data base model can more effectively analyze the massive log data than the traditional one. We have demonstrated execellent performance of the NoSQL by comparing the performance of data processing with existing RDBMS. Also the proposed method is evaluated by experiments that detect the three attack patterns and shown that it is highly effective.

Cutting Force Prediction in NC Machining Using a ME Z-map Model (ME Z-map 모델을 이용한 NC 가공의 절삭력 예측)

  • 이한울;고정훈;조동우
    • Proceedings of the Korean Society of Precision Engineering Conference
    • /
    • 2002.05a
    • /
    • pp.86-89
    • /
    • 2002
  • In NC machining, the ability to automatically generate an optimal process plan is an essential step toward achieving automation, higher productivity, and better accuracy. For this ability, a system that is capable of simulating the actual machining process has to be designed. In this paper, a milling process simulation system for the general NC machining was presented. The system needs first to accurately compute the cutting configuration. ME Z-map(Moving Edge node Z-map) was developed to reduce the entry/exit angle calculation error in cutting force prediction. It was shorn to drastically improve the conventional Z-map model. Experimental results applied to the pocket machining show the accuracy of the milling process simulation system.

  • PDF

The Analysis Framework for User Behavior Model using Massive Transaction Log Data (대규모 로그를 사용한 유저 행동모델 분석 방법론)

  • Lee, Jongseo;Kim, Songkuk
    • The Journal of Bigdata
    • /
    • v.1 no.2
    • /
    • pp.1-8
    • /
    • 2016
  • User activity log includes lots of hidden information, however it is not structured and too massive to process data, so there are lots of parts uncovered yet. Especially, it includes time series data. We can reveal lots of parts using it. But we cannot use log data directly to analyze users' behaviors. In order to analyze user activity model, it needs transformation process through extra framework. Due to these things, we need to figure out user activity model analysis framework first and access to data. In this paper, we suggest a novel framework model in order to analyze user activity model effectively. This model includes MapReduce process for analyzing massive data quickly in the distributed environment and data architecture design for analyzing user activity model. Also we explained data model in detail based on real online service log design. Through this process, we describe which analysis model is fit for specific data model. It raises understanding of processing massive log and designing analysis model.

  • PDF

Efficient Computation of Data Cubes Using MapReduce (맵리듀스를 사용한 데이터 큐브의 효율적인 계산 기법)

  • Lee, Ki Yong;Park, Sojeong;Park, Eunju;Park, Jinkyung;Choi, Yeunjung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.11
    • /
    • pp.479-486
    • /
    • 2014
  • MapReduce is a programing model used for parallelly processing a large amount of data. To analyze a large amount data, the data cube is widely used, which is an operator that computes group-bys for all possible combinations of given dimension attributes. When the number of dimension attributes is n, the data cube computes $2^n$ group-bys. In this paper, we propose an efficient method for computing data cubes using MapReduce. The proposed method partitions $2^n$ group-bys into $_nC_{{\lceil}n/2{\rceil}}$ batches, and computes those batches in stages using ${\lceil}n/2{\rceil}$ MapReduce jobs. Compared to the existing methods, the proposed method significantly reduces the amount of intermediate data generated by mappers, so that the cost of sorting and transferring those intermediate data is reduced significantly. Consequently, the total processing time for computing a data cube is reduced. Through experiments, we show the efficiency of the proposed method over the existing methods.

The Design of Blog Network Analysis System using Map/Reduce Programming Model (Map/Reduce를 이용한 블로그 연결망 분석 시스템 설계)

  • Joe, In-Whee;Park, Jae-Kyun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.9B
    • /
    • pp.1259-1265
    • /
    • 2010
  • Recently, on-line social network has been increasing according to development of internet. The most representative service is blog. A Blog is a type of personal web site, usually maintained by an individual with regular entries of commentary. These blogs are related to each other, and it is called Blog Network in this paper. In a blog network, posts in a blog can be diffused to other blogs. Analyzing information diffusion in a blog world is a very useful research issue, which can be used for predicting information diffusion, abnormally detection, marketing, and revitalizing the blog world. Existing studies on network analysis have no consideration for the passage of time and these approaches can only measure network activity for a node by the number of direct connections that a given node has. As one solution, this paper suggests the new method of measuring the blog network activity using logistic curve model and Cosine-similarity in key words by the Map/Reduce programming model.

MapReduce-based Localized Linear Regression for Electricity Price Forecasting (전기 가격 예측을 위한 맵리듀스 기반의 로컬 단위 선형회귀 모델)

  • Han, Jinju;Lee, Ingyu;On, Byung-Won
    • The Transactions of the Korean Institute of Electrical Engineers P
    • /
    • v.67 no.4
    • /
    • pp.183-190
    • /
    • 2018
  • Predicting accurate electricity prices is an important task in the electricity trading market. To address the electricity price forecasting problem, various approaches have been proposed so far and it is known that linear regression-based approaches are the best. However, the use of such linear regression-based methods is limited due to low accuracy and performance. In traditional linear regression methods, it is not practical to find a nonlinear regression model that explains the training data well. If the training data is complex (i.e., small-sized individual data and large-sized features), it is difficult to find the polynomial function with n terms as the model that fits to the training data. On the other hand, as a linear regression model approximating a nonlinear regression model is used, the accuracy of the model drops considerably because it does not accurately reflect the characteristics of the training data. To cope with this problem, we propose a new electricity price forecasting method that divides the entire dataset to multiple split datasets and find the best linear regression models, each of which is the optimal model in each dataset. Meanwhile, to improve the performance of the proposed method, we modify the proposed localized linear regression method in the map and reduce way that is a framework for parallel processing data stored in a Hadoop distributed file system. Our experimental results show that the proposed model outperforms the existing linear regression model. Specifically, the accuracy of the proposed method is improved by 45% and the performance is faster 5 times than the existing linear regression-based model.