• Title/Summary/Keyword: 최적의 클러스터 수

Search Result 156, Processing Time 0.028 seconds

Workflow-based Bio Data Analysis System for HPC (HPC 환경을 위한 워크플로우 기반의 바이오 데이터 분석 시스템)

  • Ahn, Shinyoung;Kim, ByoungSeob;Choi, Hyun-Hwa;Jeon, Seunghyub;Bae, Seungjo;Choi, Wan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.97-106
    • /
    • 2013
  • Since human genome project finished, the cost for human genome analysis has decreased very rapidly. This results in the sharp increase of human genome data to be analyzed. As the need for fast analysis of very large bio data such as human genome increases, non IT researchers such as biologists should be able to execute fast and effectively many kinds of bio applications, which have a variety of characteristics, under HPC environment. To accomplish this purpose, a biologist need to define a sequence of bio applications as workflow easily because generally bio applications should be combined and executed in some order. This bio workflow should be executed in the form of distributed and parallel computing by allocating computing resources efficiently under HPC cluster system. Through this kind of job, we can expect better performance and fast response time of very large bio data analysis. This paper proposes a workflow-based data analysis system specialized for bio applications. Using this system, non-IT scientists and researchers can analyze very large bio data easily under HPC environment.

Time Synchronization Robust to Topology Change Through Reference Node Re-Election (기준노드의 재선정을 통한 토폴로지 변화에 강인한 시간 동기화)

  • Jeon, Young;Kim, Taehong;Kim, Taejoon;Lee, Jaeseang;Ham, Jae-Hyun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.8 no.8
    • /
    • pp.191-200
    • /
    • 2019
  • In an Ad-hoc network, a method of time synchronizing all the nodes in a network centering on one reference node can be used. A representative algorithm based on a reference node is Flooding Time Synchronization Protocol (FTSP). In the process of sending and receiving messages, predictable and unpredictable delays occur, which should be removed because it hinders accurate time synchronization. In multi-hop communications, hop delays occur when a packet traverses a number of hops. These hop delays significantly degrade the synchronization performance among nodes. Therefore, we need to find a method to reduce these hop delays and increase synchronization performance. In the FTSP scheme, hop delays can be greatly increased depending on the position of a reference node. In addition, in FTSP, a node with the smallest node ID is elected as a reference node, hence, the position of a reference node is actually arbitrarily determined. In this paper, we propose an optimal reference node election algorithm to reduce hop delays, and compare the performance of the proposed scheme with FTSP using the network simulator OPNET. In addition, we verify that the proposed scheme has an improved synchronization performance, which is robust to topology changes.

Initialization of Fuzzy C-Means Using Kernel Density Estimation (커널 밀도 추정을 이용한 Fuzzy C-Means의 초기화)

  • Heo, Gyeong-Yong;Kim, Kwang-Baek
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.8
    • /
    • pp.1659-1664
    • /
    • 2011
  • Fuzzy C-Means (FCM) is one of the most widely used clustering algorithms and has been used in many applications successfully. However, FCM has some shortcomings and initial prototype selection is one of them. As FCM is only guaranteed to converge on a local optimum, different initial prototype results in different clustering. Therefore, much care should be given to the selection of initial prototype. In this paper, a new initialization method for FCM using kernel density estimation (KDE) is proposed to resolve the initialization problem. KDE can be used to estimate non-parametric data distribution and is useful in estimating local density. After KDE, in the proposed method, one initial point is placed at the most dense region and the density of that region is reduced. By iterating the process, initial prototype can be obtained. The initial prototype such obtained showed better result than the randomly selected one commonly used in FCM, which was demonstrated by experimental results.

Performance Optimization of Numerical Ocean Modeling on Cloud Systems (클라우드 시스템에서 해양수치모델 성능 최적화)

  • JUNG, KWANGWOOG;CHO, YANG-KI;TAK, YONG-JIN
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.27 no.3
    • /
    • pp.127-143
    • /
    • 2022
  • Recently, many attempts to run numerical ocean models in cloud computing environments have been tried actively. A cloud computing environment can be an effective means to implement numerical ocean models requiring a large-scale resource or quickly preparing modeling environment for global or large-scale grids. Many commercial and private cloud computing systems provide technologies such as virtualization, high-performance CPUs and instances, ether-net based high-performance-networking, and remote direct memory access for High Performance Computing (HPC). These new features facilitate ocean modeling experimentation on commercial cloud computing systems. Many scientists and engineers expect cloud computing to become mainstream in the near future. Analysis of the performance and features of commercial cloud services for numerical modeling is essential in order to select appropriate systems as this can help to minimize execution time and the amount of resources utilized. The effect of cache memory is large in the processing structure of the ocean numerical model, which processes input/output of data in a multidimensional array structure, and the speed of the network is important due to the communication characteristics through which a large amount of data moves. In this study, the performance of the Regional Ocean Modeling System (ROMS), the High Performance Linpack (HPL) benchmarking software package, and STREAM, the memory benchmark were evaluated and compared on commercial cloud systems to provide information for the transition of other ocean models into cloud computing. Through analysis of actual performance data and configuration settings obtained from virtualization-based commercial clouds, we evaluated the efficiency of the computer resources for the various model grid sizes in the virtualization-based cloud systems. We found that cache hierarchy and capacity are crucial in the performance of ROMS using huge memory. The memory latency time is also important in the performance. Increasing the number of cores to reduce the running time for numerical modeling is more effective with large grid sizes than with small grid sizes. Our analysis results will be helpful as a reference for constructing the best computing system in the cloud to minimize time and cost for numerical ocean modeling.

A Genetic Algorithm with a New Encoding Method for Bicriteria Network Designs (2기준 네트워크 설계를 위한 새로운 인코딩 방법을 기반으로 하는 유전자 알고리즘)

  • Kim Jong-Ryul;Lee Jae-Uk;Gen Mituso
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.10
    • /
    • pp.963-973
    • /
    • 2005
  • Increasing attention is being recently devoted to various problems inherent in the topological design of networks systems. The topological structure of these networks can be based on service centers, terminals (users), and connection cable. Lately, these network systems are well designed with tiber optic cable, because the requirements from users become increased. But considering the high cost of the fiber optic cable, it is more desirable that the network architecture is composed of a spanning tree. In this paper, we present a GA (Genetic Algorithm) for solving bicriteria network topology design problems of wide-band communication networks connected with fiber optic cable, considering the connection cost, average message delay, and the network reliability We also employ the $Pr\ddot{u}fer$ number (PN) and cluster string in order to represent chromosomes. Finally, we get some experiments in order to certify that the proposed GA is the more effective and efficient method in terms of the computation time as well as the Pareto optimality.

A Novel of Data Clustering Architecture for Outlier Detection to Electric Power Data Analysis (전력데이터 분석에서 이상점 추출을 위한 데이터 클러스터링 아키텍처에 관한 연구)

  • Jung, Se Hoon;Shin, Chang Sun;Cho, Young Yun;Park, Jang Woo;Park, Myung Hye;Kim, Young Hyun;Lee, Seung Bae;Sim, Chun Bo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.465-472
    • /
    • 2017
  • In the past, researchers mainly used the supervised learning technique of machine learning to analyze power data and investigated the identification of patterns through the data mining technique. Data analysis research, however, faces its limitations with the old data classification and analysis techniques today when the size of electric power data has increased with the possible real-time provision of data. This study thus set out to propose a clustering architecture to analyze large-sized electric power data. The clustering process proposed in the study supplements the K-means algorithm, an unsupervised learning technique, for its problems and is capable of automating the entire process from the collection of electric power data to their analysis. In the present study, power data were categorized and analyzed in total three levels, which include the row data level, clustering level, and user interface level. In addition, the investigator identified K, the ideal number of clusters, based on principal component analysis and normal distribution and proposed an altered K-means algorithm to reduce data that would be categorized as ideal points in order to increase the efficiency of clustering.

Development of Information Technology Infrastructures through Construction of Big Data Platform for Road Driving Environment Analysis (도로 주행환경 분석을 위한 빅데이터 플랫폼 구축 정보기술 인프라 개발)

  • Jung, In-taek;Chong, Kyu-soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.669-678
    • /
    • 2018
  • This study developed information technology infrastructures for building a driving environment analysis platform using various big data, such as vehicle sensing data, public data, etc. First, a small platform server with a parallel structure for big data distribution processing was developed with H/W technology. Next, programs for big data collection/storage, processing/analysis, and information visualization were developed with S/W technology. The collection S/W was developed as a collection interface using Kafka, Flume, and Sqoop. The storage S/W was developed to be divided into a Hadoop distributed file system and Cassandra DB according to the utilization of data. Processing S/W was developed for spatial unit matching and time interval interpolation/aggregation of the collected data by applying the grid index method. An analysis S/W was developed as an analytical tool based on the Zeppelin notebook for the application and evaluation of a development algorithm. Finally, Information Visualization S/W was developed as a Web GIS engine program for providing various driving environment information and visualization. As a result of the performance evaluation, the number of executors, the optimal memory capacity, and number of cores for the development server were derived, and the computation performance was superior to that of the other cloud computing.

Partially Evaluated Genetic Algorithm based on Fuzzy Clustering (퍼지 클러스터링 기반의 국소평가 유전자 알고리즘)

  • Yoo Si-Ho;Cho Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1246-1257
    • /
    • 2004
  • To find an optimal solution with genetic algorithm, it is desirable to maintain the population sire as large as possible. In some cases, however, the cost to evaluate each individual is relatively high and it is difficult to maintain large population. To solve this problem we propose a novel genetic algorithm based on fuzzy clustering, which considerably reduces evaluation number without any significant loss of its performance by evaluating only one representative for each cluster. The fitness values of other individuals are estimated from the representative fitness values indirectly. We have used fuzzy c-means algorithm and distributed the fitness using membership matrix, since it is hard to distribute precise fitness values by hard clustering method to individuals which belong to multiple groups. Nine benchmark functions have been investigated and the results are compared to six hard clustering algorithms with Euclidean distance and Pearson correlation coefficients as fitness distribution method.

Merge Algorithm of Maximum weighted Independent Vertex Pair at Maximal Weighted Independent Set Problem (최대 가중치 독립집합 문제의 최대 가중치 독립정점 쌍 병합 알고리즘)

  • Lee, Sang-Un
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.171-176
    • /
    • 2020
  • This paper proposes polynomial-time algorithm for maximum weighted independent set(MWIS) problem that is well known as NP-hard. The known algorithms for MWIS problem are polynomial-time to specialized in particular graph type, distributed, or clustering method. But there is no unified algorithm is suitable to all kinds of graph types. Therefore, this paper suggests unique polynomial-time algorithm that is suitable to all kinds of graph types. The proposed algorithm merges the maximum weighted vertex vi and maximum weighted vertex vj that is not adjacent to vi. As a result of apply to undirected graphs and trees, this algorithm can be get the optimal solution. This algorithm improves previously known solution to new optimal solution.

Oxygen Permeation Characteristics of Nano-silica Hybrid Thin Films (나노 실리카 하이브리드 박막의 산소 투과 특성)

  • Kim, Seong-Woo
    • Journal of the Korean Applied Science and Technology
    • /
    • v.24 no.2
    • /
    • pp.174-181
    • /
    • 2007
  • In this study, $SiO_2/poly(ethylene-co-vinyl$ alcohol)(EVOH) hybrid coating materials with gas barrier property could be produced using sol-gel method. The biaxially oriented polypropylene (BOPP) substrate with surface pretreatment was coated with the prepared hybrid sols containing various inorganic silicate component by a spin coating method. Crystallization behavior of the hybrids was investigated in terms of analysis of X-ray diffraction and cooling thermogram from DSC experiment. From the morphological observation of the $SiO_2/EVOH$ hybrid gel, it was confirmed that there existed an optimum content of inorganic silicate precursor, Tetraethylorthosilicate (TEOS), to produce hybrid materials with dense microstructure, exhibiting uniformly dispersed silica particles with average size below 100 nm. When TEOS was added at below or above the optimum content, particle clusters with large domain were observed, resulting in phase separation. This morphological result was found to be in good agreement with that of oxygen permeability of the hybrid coated films. In the case of film coated with hybrid prepared from addition of 0.01 - 0.02mol of TEOS, a remarkable improvement in barrier property could be obtained, however, with the addition of TEOS more than 0.04 mol, the barrier property was dramatically reduced because of phase separation and micro-crack formation on the film surface.