• Title/Summary/Keyword: 맵-리듀스

Search Result 32, Processing Time 0.028 seconds

Improving the Map/Reduce Model through Data Distribution and Task Progress Scheduling (데이터 분배 및 태스크 진행 스케쥴링을 통한 맵/리듀스 모델의 성능 향상)

  • Hwang, In-Sung;Chung, Kyung-Yong;Rim, Kee-Wook;Lee, Jung-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.10
    • /
    • pp.78-85
    • /
    • 2010
  • Map/Reduce is the programing model which can implement the Cloud Computing recently has been noticed. The model operates an application program processing amount of data using a lot of computers. It is important to plan the mechanism of separating the data in proper size and distributing that to a cluster consisted of computing node in efficient for using the computing nodes very well. Besides that, planning a process of Map phases and Reduce phases also influences the performance of Map/Reduce. This paper suggests the effectively distributing scheme that separates a huge data and operates Map task in the considering the performance of computing node and network status. And we make the Reduce task can be processed quickly through the tuning the mechanism of Map and Reduce task operation. Using the two Map/Reduce sample application, we experimented the suggestion and we evaluate suggestion considered it in how impact the Map/Reduce performance.

Design of a Sentiment Analysis System to Prevent School Violence and Student's Suicide (학교폭력과 자살사고를 예방하기 위한 감성분석 시스템의 설계)

  • Kim, YoungTaek
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.6
    • /
    • pp.115-122
    • /
    • 2014
  • One of the problems with current youth generations is increasing rate of violence and suicide in their school lives, and this study aims at the design of a sentiment analysis system to prevent suicide by uising big data process. The main issues of the design are economical implementation, easy and fast processing for the users, so, the open source Hadoop system with MapReduce algorithm is used on the HDFS(Hadoop Distributed File System) for the experimentation. This study uses word count method to do the sentiment analysis with informal data on some sns communications concerning a kinds of violent words, in terms of text mining to avoid some expensive and complex statistical analysis methods.

  • PDF

Clustering Algorithm using the DFP-Tree based on the MapReduce (맵리듀스 기반 DFP-Tree를 이용한 클러스터링 알고리즘)

  • Seo, Young-Won;Kim, Chang-soo
    • Journal of Internet Computing and Services
    • /
    • v.16 no.6
    • /
    • pp.23-30
    • /
    • 2015
  • As BigData is issued, many applications that operate based on the results of data analysis have been developed, typically applications are products recommend service of e-commerce application service system, search service on the search engine service and friend list recommend system of social network service. In this paper, we suggests a decision frequent pattern tree that is combined the origin frequent pattern tree that is mining similar pattern to appear in the data set of the existing data mining techniques and decision tree based on the theory of computer science. The decision frequent pattern tree algorithm improves about problem of frequent pattern tree that have to make some a lot's pattern so it is to hard to analyze about data. We also proposes to model for a Mapredue framework that is a programming model to help to operate in distributed environment.

Design and Implementation of a Large-Scale Spatial Reasoner Using MapReduce Framework (맵리듀스 프레임워크를 이용한 대용량 공간 추론기의 설계 및 구현)

  • Nam, Sang Ha;Kim, In Cheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.10
    • /
    • pp.397-406
    • /
    • 2014
  • In order to answer the questions successfully on behalf of the human in DeepQA environments such as Jeopardy! of the American quiz show, the computer is required to have the capability of fast temporal and spatial reasoning on a large-scale commonsense knowledge base. In this paper, we present a scalable spatial reasoning algorithm for deriving efficiently new directional and topological relations using the MapReduce framework, one of well-known parallel distributed computing environments. The proposed reasoning algorithm assumes as input a large-scale spatial knowledge base including CSD-9 directional relations and RCC-8 topological relations. To infer new directional and topological relations from the given spatial knowledge base, it performs the cross-consistency checks as well as the path-consistency checks on the knowledge base. To maximize the parallelism of reasoning computations according to the principle of the MapReduce framework, we design the algorithm to partition effectively the large knowledge base into smaller ones and distribute them over multiple computing nodes at the map phase. And then, at the reduce phase, the algorithm infers the new knowledge from distributed spatial knowledge bases. Through experiments performed on the sample knowledge base with the MapReduce-based implementation of our algorithm, we proved the high performance of our large-scale spatial reasoner.

A Scalable OWL Horst Lite Ontology Reasoning Approach based on Distributed Cluster Memories (분산 클러스터 메모리 기반 대용량 OWL Horst Lite 온톨로지 추론 기법)

  • Kim, Je-Min;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.42 no.3
    • /
    • pp.307-319
    • /
    • 2015
  • Current ontology studies use the Hadoop distributed storage framework to perform map-reduce algorithm-based reasoning for scalable ontologies. In this paper, however, we propose a novel approach for scalable Web Ontology Language (OWL) Horst Lite ontology reasoning, based on distributed cluster memories. Rule-based reasoning, which is frequently used for scalable ontologies, iteratively executes triple-format ontology rules, until the inferred data no longer exists. Therefore, when the scalable ontology reasoning is performed on computer hard drives, the ontology reasoner suffers from performance limitations. In order to overcome this drawback, we propose an approach that loads the ontologies into distributed cluster memories, using Spark (a memory-based distributed computing framework), which executes the ontology reasoning. In order to implement an appropriate OWL Horst Lite ontology reasoning system on Spark, our method divides the scalable ontologies into blocks, loads each block into the cluster nodes, and subsequently handles the data in the distributed memories. We used the Lehigh University Benchmark, which is used to evaluate ontology inference and search speed, to experimentally evaluate the methods suggested in this paper, which we applied to LUBM8000 (1.1 billion triples, 155 gigabytes). When compared with WebPIE, a representative mapreduce algorithm-based scalable ontology reasoner, the proposed approach showed a throughput improvement of 320% (62k/s) over WebPIE (19k/s).

Design of a MapReduce-Based Mobility Pattern Mining System for Next Place Prediction (다음 장소 예측을 위한 맵리듀스 기반의 이동 패턴 마이닝 시스템 설계)

  • Kim, Jongwhan;Lee, Seokjun;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.8
    • /
    • pp.321-328
    • /
    • 2014
  • In this paper, we present a MapReduce-based mobility pattern mining system which can predict efficiently the next place of mobile users. It learns the mobility pattern model of each user, represented by Hidden Markov Models(HMM), from a large-scale trajectory dataset, and then predicts the next place for the user to visit by applying the learned models to the current trajectory. Our system consists of two parts: the back-end part, in which the mobility pattern models are learned for individual users, and the front-end part, where the next place for a certain user to visit is predicted based on the mobility pattern models. While the back-end part comprises of three distinct MapReduce modules for POI extraction, trajectory transformation, and mobility pattern model learning, the front-end part has two different modules for candidate route generation and next place prediction. Map and reduce functions of each module in our system were designed to utilize the underlying Hadoop infrastructure enough to maximize the parallel processing. We performed experiments to evaluate the performance of the proposed system by using a large-scale open benchmark dataset, GeoLife, and then could make sure of high performance of our system as results of the experiments.

Big Data Analysis Platform Technology R&D Trend through Patent Analysis (특허분석을 통한 빅데이터 분석 플랫폼 기술 개발 동향)

  • Rho, Seungmin
    • Journal of Digital Convergence
    • /
    • v.12 no.9
    • /
    • pp.169-175
    • /
    • 2014
  • The ICT (information and communication technology) paradigm shift, including the burgeoning use of mobile, SNS, and smart devices, has resulted in an explosion of data along with lifestyle changes. We have thus arrived at the age of big data. In the meantime, a number of difficulties have arisen in terms of cost or on the technical side with respect to the use of large quantities of data. However, big data has begun to receive attention with the advent of efficient big data technologies such as Hadoop. In this paper, we discuss the patent analysis of big data platform technology research and development in major countries. Especially, we analyzed 2,568 patent applications and registered patents in four countries on December 2010.

Technical Entity Recognition System based on Distributed Parallel Processing (분산병렬처리 기반 기술개체 인식 시스템)

  • Choi, Yun-Soo;Lee, Won-Goo;Lee, Min-Ho;Choi, Dong-Hoon;Yoon, Hwa-Mook;Cho, Min-Hee;Jeong, Han-Min
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06a
    • /
    • pp.242-244
    • /
    • 2012
  • 과학기술 문헌의 기술개체 인식에 관한 연구는 정보추출, 텍스트마이닝, 질의응답 분야 등의 선행 연구로서 다양한 통계적 방법론을 사용하여 기술개체 인식 정확률을 향상시키기 위해 연구되어 왔다. 하지만 기존의 연구는 단일-코어 또는 단일 머신 상에서 수행되었기 때문에, 폭발적으로 증가하는 문헌들에 대한 실시간 분석 요구를 처리할 수 없는 상황에 직면하고 있다. 이에 본 논문에서는 기술개체를 인식하는 과정에서 병목현상이 발생하는 작업을 "후보개체 추출 과정"의 언어처리 부분과 "개체 가중치 할당 과정"에서 통계정보를 취합하는 부분으로 분류하고, 각 작업을 하둡의 맵 작업과 리듀스 작업을 이용하여 해결하는 분산 병렬 처리 기반의 기술개체 인식 방법에 대해 살펴보고자 한다.

A MapReduce-Based Workflow BIG-Log Clustering Technique (맵리듀스기반 워크플로우 빅-로그 클러스터링 기법)

  • Jin, Min-Hyuck;Kim, Kwanghoon Pio
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.87-96
    • /
    • 2019
  • In this paper, we propose a MapReduce-supported clustering technique for collecting and classifying distributed workflow enactment event logs as a preprocessing tool. Especially, we would call the distributed workflow enactment event logs as Workflow BIG-Logs, because they are satisfied with as well as well-fitted to the 5V properties of BIG-Data like Volume, Velocity, Variety, Veracity and Value. The clustering technique we develop in this paper is intentionally devised for the preprocessing phase of a specific workflow process mining and analysis algorithm based upon the workflow BIG-Logs. In other words, It uses the Map-Reduce framework as a Workflow BIG-Logs processing platform, it supports the IEEE XES standard data format, and it is eventually dedicated for the preprocessing phase of the ${\rho}$-Algorithm that is a typical workflow process mining algorithm based on the structured information control nets. More precisely, The Workflow BIG-Logs can be classified into two types: of activity-based clustering patterns and performer-based clustering patterns, and we try to implement an activity-based clustering pattern algorithm based upon the Map-Reduce framework. Finally, we try to verify the proposed clustering technique by carrying out an experimental study on the workflow enactment event log dataset released by the BPI Challenges.

Large Scale Incremental Reasoning using SWRL Rules in a Distributed Framework (분산 처리 환경에서 SWRL 규칙을 이용한 대용량 점증적 추론 방법)

  • Lee, Wan-Gon;Bang, Sung-Hyuk;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.44 no.4
    • /
    • pp.383-391
    • /
    • 2017
  • As we enter a new era of Big Data, the amount of semantic data has rapidly increased. In order to derive meaningful information from this large semantic data, studies that utilize the SWRL(Semantic Web Rule Language) are being actively conducted. SWRL rules are based on data extracted from a user's empirical knowledge. However, conventional reasoning systems developed on single machines cannot process large scale data. Similarly, multi-node based reasoning systems have performance degradation problems due to network shuffling. Therefore, this paper overcomes the limitations of existing systems and proposes more efficient distributed inference methods. It also introduces data partitioning strategies to minimize network shuffling. In addition, it describes a method for optimizing the incremental reasoning process through data selection and determining the rule order. In order to evaluate the proposed methods, the experiments were conducted using WiseKB consisting of 200 million triples with 83 user defined rules and the overall reasoning task was completed in 32.7 minutes. Also, the experiment results using LUBM bench datasets showed that our approach could perform reasoning twice as fast as MapReduce based reasoning systems.