• Title/Summary/Keyword: model-based cluster

Search Result 634, Processing Time 0.022 seconds

An Analytical Approach to Evaluation of SSD Effects under MapReduce Workloads

  • Ahn, Sungyong;Park, Sangkyu
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.5
    • /
    • pp.511-518
    • /
    • 2015
  • As the cost-per-byte of SSDs dramatically decreases, the introduction of SSDs to Hadoop becomes an attractive choice for high performance data processing. In this paper the cost-per-performance of SSD-based Hadoop cluster (SSD-Hadoop) and HDD-based Hadoop cluster (HDD-Hadoop) are evaluated. For this, we propose a MapReduce performance model using queuing network to simulate the execution time of MapReduce job with varying cluster size. To achieve an accurate model, the execution time distribution of MapReduce job is carefully profiled. The developed model can precisely predict the execution time of MapReduce jobs with less than 7% difference for most cases. It is also found that SSD-Hadoop is 20% more cost efficient than HDD-Hadoop because SSD-Hadoop needs a smaller number of nodes than HDD-Hadoop to achieve a comparable performance, according to the results of simulation with varying the number of cluster nodes.

Cluster Based Fuzzy Model Tree Using Node Information (상호 노드 정보를 이용한 클러스터 기반 퍼지 모델트리)

  • Park, Jin-Il;Lee, Dae-Jong;Kim, Yong-Sam;Cho, Young-Im;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.41-47
    • /
    • 2008
  • Cluster based fuzzy model tree has certain drawbacks to decrease performance of testinB data when over-fitting of training data exists. To reduce the sensitivity of performance due to over-fitting problem, we proposed a modified cluster based fuzzy model tree with node information. To construct model tree, cluster centers are calculated by fuzzy clustering method using all input and output attributes in advance. And then, linear models are constructed at internal nodes with fuzzy membership values between centers and input attributes. In the prediction step, membership values are calculated by using fuzzy distance between input attributes and all centers that passing the nodes from root to leaf nodes. Finally, data prediction is performed by the weighted average method with the linear models and fuzzy membership values. To show the effectiveness of the proposed method, we have applied our method to various dataset. Under various experiments, our proposed method shows better performance than conventional cluster based fuzzy model tree.

Traffic based Estimation of Optimal Number of Super-peers in Clustered P2P Environments

  • Kim, Ju-Gyun;Lee, Jun-Soo
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.12
    • /
    • pp.1706-1715
    • /
    • 2008
  • In a super-peer based P2P network, the network is clustered and each cluster is managed by a special peer, which is called a super-peer. A Super-peer has information of all the peers in its cluster. This type of clustered P2P model is known to have efficient information search and less traffic load than unclustered P2P model. In this paper, we compute the message traffic cost incurred by peers' query, join and update actions within a cluster as well as between the clusters. With these values, we estimate the optimal number of super-peers that minimizes the traffic cost for the various size of super-peer based P2P networks.

  • PDF

A Study of Library Grouping using Cluster Analysis Methods (군집분석 기법을 이용한 공공도서관 그룹화에 대한 연구)

  • Kwak, Chul Wan
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.31 no.3
    • /
    • pp.79-99
    • /
    • 2020
  • The purpose of this study is to investigate the model of cluster analysis techniques for grouping public libraries and analyze their characteristics. Statistical data of public libraries of the National Library Statistics System were used, and three models of cluster analysis were applied. As a result of the study, cluster analysis was conducted based on the size of public libraries, and it was largely divided into two clusters. The size of the cluster was largely skewed to one side. For grouping based on size, the ward method of hierarchical cluster analysis and the k-means cluster analysis model were suitable. Three suggestions were presented as implications of the grouping method of public libraries. First, it is necessary to collect library service-related data in addition to statistical data. Second, an analysis model suitable for the data set to be analyzed must be applied. Third, it is necessary to study the possibility of using cluster analysis techniques in various fields other than library grouping.

Privacy Protection Model for Location-Based Services

  • Ni, Lihao;Liu, Yanshen;Liu, Yi
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.96-112
    • /
    • 2020
  • Solving the disclosure problem of sensitive information with the k-nearest neighbor query, location dummy technique, or interfering data in location-based services (LBSs) is a new research topic. Although they reduced security threats, previous studies will be ineffective in the case of sparse users or K-successive privacy, and additional calculations will deteriorate the performance of LBS application systems. Therefore, a model is proposed herein, which is based on geohash-encoding technology instead of latitude and longitude, memcached server cluster, encryption and decryption, and authentication. Simulation results based on PHP and MySQL show that the model offers approximately 10× speedup over the conventional approach. Two problems are solved using the model: sensitive information in LBS application is not disclosed, and the relationship between an individual and a track is not leaked.

A Re-Ranking Retrieval Model based on Two-Level Similarity Relation Matrices (2단계 유사관계 행렬을 기반으로 한 순위 재조정 검색 모델)

  • 이기영;은희주;김용성
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.11
    • /
    • pp.1519-1533
    • /
    • 2004
  • When Web-based special retrieval systems for scientific field extremely restrict the expression of user's information request, the process of the information content analysis and that of the information acquisition become inconsistent. In this paper, we apply the fuzzy retrieval model to solve the high time complexity of the retrieval system by constructing a reduced term set for the term's relatively importance degree. Furthermore, we perform a cluster retrieval to reflect the user's Query exactly through the similarity relation matrix satisfying the characteristics of the fuzzy compatibility relation. We have proven the performance of a proposed re-ranking model based on the similarity union of the fuzzy retrieval model and the document cluster retrieval model.

A Study on Cluster Lifetime in Multi-HopWireless Sensor Networks with Cooperative MISO Scheme

  • Huang, Zheng;Okada, Hiraku;Kobayashi, Kentaro;Katayama, Masaaki
    • Journal of Communications and Networks
    • /
    • v.14 no.4
    • /
    • pp.443-450
    • /
    • 2012
  • As for cluster-based wireless sensor networks (WSNs), cluster lifetime is one of the most important subjects in recent researches. Besides reducing the energy consumptions of the clusters, it is necessary to make the clusters achieve equal lifetimes so that the whole network can survive longer. In this paper, we focus on the cluster lifetimes in multi-hop WSNs with cooperative multi-input single-output scheme. With a simplified model of multi-hop WSNs, we change the transmission schemes, the sizes and transmission distances of clusters to investigate their effects on the cluster lifetimes. Furthermore, linear and uniform data aggregations are considered in our model. As a result, we analyze the cluster lifetimes in different situations and discuss the requirements on the sizes and transmission distances of clusters for equal lifetimes.

Importance of Clusters in Industry Development: A Case of Singapore's Petrochemical Industry

  • Pillai Jayarethanam
    • Journal of Technology Innovation
    • /
    • v.14 no.2
    • /
    • pp.1-27
    • /
    • 2006
  • This paper rejuvenates the existing discussion on the importance of cluster approach to industry development strategies. Current evidences suggest that the shape of economic policy and practice is changing significantly around the world. Governments continually search for new tools and policy formulas to improve economic performance and create economic prosperity for all citizens. In this context a more proactive and strategic role for government in support of the cluster-based economic development model has emerged. This paper uses Singapore's petrochemical industry as an example to study the cluster approach to industry development. In doing so, there is much optimism to the importance of state and its institutions to play a significant role on industry development. Nevertheless, the study also raises doubts on whether the cluster-based strategy is due to the concept itself or due to other important factors.

  • PDF

Cluster analysis by month for meteorological stations using a gridded data of numerical model with temperatures and precipitation (기온과 강수량의 수치모델 격자자료를 이용한 기상관측지점의 월별 군집화)

  • Kim, Hee-Kyung;Kim, Kwang-Sub;Lee, Jae-Won;Lee, Yung-Seop
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1133-1144
    • /
    • 2017
  • Cluster analysis with meteorological data allows to segment meteorological region based on meteorological characteristics. By the way, meteorological observed data are not adequate for cluster analysis because meteorological stations which observe the data are located not uniformly. Therefore the clustering of meteorological observed data cannot reflect the climate characteristic of South Korea properly. The clustering of $5km{\times}5km$ gridded data derived from a numerical model, on the other hand, reflect it evenly. In this study, we analyzed long-term grid data for temperatures and precipitation using cluster analysis. Due to the monthly difference of climate characteristics, clustering was performed by month. As the result of K-Means cluster analysis is so sensitive to initial values, we used initial values with Ward method which is hierarchical cluster analysis method. Based on clustering of gridded data, cluster of meteorological stations were determined. As a result, clustering of meteorological stations in South Korea has been made spatio-temporal segmentation.

THE UNUSUAL STELLAR MASS FUNCTION OF STARBURST CLUSTERS

  • Dib, Sami
    • Journal of The Korean Astronomical Society
    • /
    • v.40 no.4
    • /
    • pp.157-160
    • /
    • 2007
  • I present a model to explain the mass segregation and shallow mass functions observed in the central parts of starburst stellar clusters. The model assumes that the initial pre-stellar cores mass function resulting from the turbulent fragmentation of the proto-cluster cloud is significantly altered by the cores coalescence before they collapse to form stars. With appropriate, yet realistic parameters, this model based on the competition between cores coalescence and collapse reproduces the mass spectra of the well studied Arches cluster. Namely, the slopes at the intermediate and high mass ends, as well as the peculiar bump observed at $6M_{\bigodot}$. This coalescence-collapse process occurs on a short timescale of the order of the free fall time of the proto-cluster cloud (i.e., a few $10^4$ years), suggesting that mass segregation in Arches and similar clusters is primordial. The best fitting model implies the total mass of the Arches cluster is $1.45{\times}10^5M_{\bigodot}$, which is slightly higher than the often quoted, but completeness affected, observational value of a few $10^4M_{\bigodot}$. The model implies a star formation efficiency of ${\sim}30$ percent which implies that the Arches cluster is likely to a gravitationally bound system.