• Title/Summary/Keyword: Map-Reduce

Search Result 849, Processing Time 0.022 seconds

LDBAS: Location-aware Data Block Allocation Strategy for HDFS-based Applications in the Cloud

  • Xu, Hua;Liu, Weiqing;Shu, Guansheng;Li, Jing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.204-226
    • /
    • 2018
  • Big data processing applications have been migrated into cloud gradually, due to the advantages of cloud computing. Hadoop Distributed File System (HDFS) is one of the fundamental support systems for big data processing on MapReduce-like frameworks, such as Hadoop and Spark. Since HDFS is not aware of the co-location of virtual machines in the cloud, the default scheme of block allocation in HDFS does not fit well in the cloud environments behaving in two aspects: data reliability loss and performance degradation. In this paper, we present a novel location-aware data block allocation strategy (LDBAS). LDBAS jointly optimizes data reliability and performance for upper-layer applications by allocating data blocks according to the locations and different processing capacities of virtual nodes in the cloud. We apply LDBAS to two stages of data allocation of HDFS in the cloud (the initial data allocation and data recovery), and design the corresponding algorithms. Finally, we implement LDBAS into an actual Hadoop cluster and evaluate the performance with the benchmark suite BigDataBench. The experimental results show that LDBAS can guarantee the designed data reliability while reducing the job execution time of the I/O-intensive applications in Hadoop by 8.9% on average and up to 11.2% compared with the original Hadoop in the cloud.

Processing large-scale data with Apache Spark (Apache Spark를 활용한 대용량 데이터의 처리)

  • Ko, Seyoon;Won, Joong-Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1077-1094
    • /
    • 2016
  • Apache Spark is a fast and general-purpose cluster computing package. It provides a new abstraction named resilient distributed dataset, which is capable of support for fault tolerance while keeping data in memory. This type of abstraction results in a significant speedup compared to legacy large-scale data framework, MapReduce. In particular, Spark framework is suitable for iterative machine learning applications such as logistic regression and K-means clustering, and interactive data querying. Spark also supports high level libraries for various applications such as machine learning, streaming data processing, database querying and graph data mining thanks to its versatility. In this work, we introduce the concept and programming model of Spark as well as show some implementations of simple statistical computing applications. We also review the machine learning package MLlib, and the R language interface SparkR.

Route Optimization Scheme in Nested NEMO Environment based on Prefix Delegation (프리픽스 할당에 기반한 중첩된 NEMO 환경에서의 경로최적화 기법)

  • Rho, Kyung-Taeg;Kang, Jeong-Jin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.8 no.5
    • /
    • pp.95-103
    • /
    • 2008
  • The Network Mobility (NEMO) basic support protocol extends the operation of Mobile IPv6 to provide uninterrupted Internet connectivity to the communicating nodes of mobile networks. The protocol is not efficient to offer delays in data delivery and higher overheads in the case of nested mobile networks because it uses fairly sub-optimal routing and multiple encapsulation of data packets. In this paper, our scheme combining Hierarchical Mobile IPv6 (HMIPv6) functionality and Hierarchical Prefix Delegation (HPD) protocol for IPv6, which provide more effective route optimization and reduce packet header overhead and the burden of location registration for handoff. The scheme also uses hierarchical mobile network prefix (HMNP) assignment and tree-based routing mechanism to allocate the location address of mobile network nodes (MNNs) and support micro-mobility and intra-domain data communication. The performance is evaluated using NS-2.

  • PDF

A Design of Parallel Turbo Decoder based on Double Flow Method Using Even-Odd Cross Mapping (짝·홀 교차 사상을 이용한 Double Flow 기법 기반 병렬 터보 복호기 설계)

  • Jwa, Yu-Cheol;Rim, Chong-Suck
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.7
    • /
    • pp.36-46
    • /
    • 2017
  • The turbo code, an error correction code, needs a long decoding time since the same decoding process must be repeated several times in order to obtain a good BER performance. Thus, parallel processing may be used to reduce the decoding time, in which case there may be a memory contention that requires additional buffers. The QPP interleaving has been proposed to avoid such case, but there is still a possibility of memory contention when a decoder is constructed using the so-called double flow technique. In this paper, we propose an even-odd cross mapping technique to avoid memory conflicts even in decoding using the double-flow technique. This method uses the address generation characteristic of the QPP interleaving and can be used to implement the interleaving circuit between the decoding blocks and the LLR memory blocks. When the decoder implemented by applying the double flow and the proposed methods is compared with the decoder by the conventional MDF techniques, the decoding time is reduced by up to 32% with the total area increase by 8%.

Yet Another BGP Archive Forensic Analysis Tool Using Hadoop and Hive (하둡과 하이브를 이용한 BGP 아카이브 데이터의 포렌직 분석 툴)

  • Lee, Yeonhee;Lee, YoungSeok
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.541-549
    • /
    • 2015
  • A large volume of continuously growing BGP data files can raise two technical challenges regarding scalability and manageability. Due to the recent development of the open-source distributed computing infrastructure, Hadoop, it becomes feasible to handle a large amount of data in a scalable manner. In this paper, we present a new Hadoop-based BGP tool (BGPdoop) that provides the scale-out performance as well as the extensible and agile analysis capability. In particular, BGPdoop realizes a query-based BGP record exploration function using Hive on the partitioned BGP data structure, which enables flexible and versatile analytics of BGP archive files. From the experiments for the scalability with a Hadoop cluster of 20 nodes, we demonstrate that BGPdoop achieves 5 times higher performance and the user-defined analysis capability by expressing diverse BGP routing analytics in Hive queries.

GIS Landslide Hazard Mapping Using Root Strength Reinforcement Model (GIS 기반의 뿌리보강모델을 활용한 산사태 위험도 작성방법 연구)

  • Kim, Min-Gu;Yoon, Young-Jin;Chang, Hoon
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.13 no.1 s.31
    • /
    • pp.45-53
    • /
    • 2005
  • Korea is formed a mountainous district at 70% of country. Damages are occurred by landslide. It is increased annually by industrial development and extension of urban area. Serious damages occurred in Kangreng area of Gangwondo by typoon Rusa in August, 2002. Landslide hazard mapping needs to forecast and to manage landslide hazard area. GIS fits landslide hazard mapping, to reduce of time, manpower and costs, to investigate a board area, to do a spatial forecast model in study site. Variety of methods is for landslide hazard mapping. A study use root strength reinforcement model. This model use factors such as topography of slope, soil depth, soil properties, and vegetation of slope. Landslide hazard map is made by this model.

  • PDF

On the Height Growth of Several Species growing in the Middle Korea (국내주요수종의 수고생장에 대하여)

  • Ma, Sang-Kyu
    • Journal of Korean Society of Forest Science
    • /
    • v.21 no.1
    • /
    • pp.39-45
    • /
    • 1974
  • The objects of this study are to induce the experimental equation suitable to estimate the height growing process, and to get some information that could be to reduce the error percent on determining the site index. The height growing process per sample plot were made up from the height age curve getted through the stem-analysis of the dominant tree. The 11 equations were calculated in the functional formulas between the age and the dominant height of Alnus hirsuta, Larix leptolepsis, L. Gmelini, P. koraiensis, Abies holophylla, P. rigida, Q. acutissima and Q. aliena. The suitable equation for estimating the guide curve were selected with the highest correlation and the low standard error with comparision on the age-height distributed map as Fig. 2. The suitable equation is different between the species as their calculated results as follows; The growing process of tree height are occasionally found to be polymorphous in the growth pattern as Fig. 3. By this result, the site index are practically found to be always not constant but shows variation to age in same plot as example 5. So, that the site index are estimated from the height-growth curve to age drawing by the stem-analysis or the polymorphic-curve, are concluded to be suitable on the study of site evaluation.

  • PDF

Lineament analysis in the euiseong area using automatic lineament extraction algorithm (자동 선구조 추출 알고리즘을 이용한 경북 의성지역의 선구조 분석)

  • 김상완
    • Economic and Environmental Geology
    • /
    • v.32 no.1
    • /
    • pp.19-31
    • /
    • 1999
  • In this study, we have estimated lineaments in the Euiseong area, Kyungbuk Province, from Landsat TM by applying the algorithm developed by Kim and Won et al. which can effectively reduce the look direction bias associated with the Sun's azimuth angle. Fratures over the study area were also mapped in the field at 57 selected sites to compare them with the results from the satellite image. The trends of lineaments estimated from the Landsat TM images are characterized as $N50^{\circ}$~70W, NS~$N10^{\circ}$W, and $N10^{\circ}$~$60^{\circ}$E trends. The spatial distribution of lineaments is also studied using a circular grid, and the results show that the area can be divided into two domains : domain A in which NS~$N20^{\circ}$E direction is dominant, and domain B in which west-north-west direction is prominent. The trends of lineaments can also be classified into seven groups. Among them, only C, D and G trends are found to be dominant based upon Donnelly's nearest neighbor analysis and correlations of lineament desities. In the color composite image produced by overlaying the lineament density map of these C-, D-, and G-trends, G-trend is shown to be developed in the whole study area while the eastern part of the area is dominated by D-trend. C-trend develops extensively over the whole are except the southeastern part. The orientation of fractures measured at 35 points in the field shows major trends of NS~$N30^{\circ}$E, $N50^{\circ}$~$80^{\circ}$W, and N80$^{\circ}$E~EW, which agree relatively well with the lineaments estimated form the satellite image. The rose diagram analysis fo field data shows that WNW-ESE trending discontinuities are developed in the whole area while discontinuities of NS~$N20^{\circ}$E are develped only in the estern part, which also coincide with the result from the satellite image. The combined results of lineaments from the satellite image and fracture orientation of field data at 22 points including 18 minor faults in Sindong Group imply that the WNW-ESE trend is so prominent that Gumchun and Gaum faults are possibly extended up to the lower Sindong Group in the study area.

  • PDF

A Border Line-Based Pruning Scheme for Shortest Path Computations

  • Park, Jin-Kyu;Moon, Dae-Jin;Hwang, Een-Jun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.4 no.5
    • /
    • pp.939-955
    • /
    • 2010
  • With the progress of IT and mobile positioning technologies, various types of location-based services (LBS) have been proposed and implemented. Finding a shortest path between two nodes is one of the most fundamental tasks in many LBS related applications. So far, there have been many research efforts on the shortest path finding problem. For instance, $A^*$ algorithm estimates neighboring nodes using a heuristic function and selects minimum cost node as the closest one to the destination. Pruning method, which is known to outperform the A* algorithm, improves its routing performance by avoiding unnecessary exploration in the search space. For pruning, shortest paths for all node pairs in a map need to be pre-computed, from which a shortest path container is generated for each edge. The container for an edge consists of all the destination nodes whose shortest path passes through the edge and possibly some unnecessary nodes. These containers are used during routing to prune unnecessary node visits. However, this method shows poor performance as the number of unnecessary nodes included in the container increases. In this paper, we focus on this problem and propose a new border line-based pruning scheme for path routing which can reduce the number of unnecessary node visits significantly. Through extensive experiments on randomly-generated, various complexity of maps, we empirically find out optimal number of border lines for clipping containers and compare its performance with other methods.

TeT: Distributed Tera-Scale Tensor Generator (분산 테라스케일 텐서 생성기)

  • Jeon, ByungSoo;Lee, JungWoo;Kang, U
    • Journal of KIISE
    • /
    • v.43 no.8
    • /
    • pp.910-918
    • /
    • 2016
  • A tensor is a multi-dimensional array that represents many data such as (user, user, time) in the social network system. A tensor generator is an important tool for multi-dimensional data mining research with various applications including simulation, multi-dimensional data modeling/understanding, and sampling/extrapolation. However, existing tensor generators cannot generate sparse tensors like real-world tensors that obey power law. In addition, they have limitations such as tensor sizes that can be processed and additional time required to upload generated tensor to distributed systems for further analysis. In this study, we propose TeT, a distributed tera-scale tensor generator to solve these problems. TeT generates sparse random tensor as well as sparse R-MAT and Kronecker tensor without any limitation on tensor sizes. In addition, a TeT-generated tensor is immediately ready for further tensor analysis on the same distributed system. The careful design of TeT facilitates nearly linear scalability on the number of machines.