• Title/Summary/Keyword: 병렬/분산 컴퓨팅 환경

Search Result 100, Processing Time 0.026 seconds

Workflow-based Bio Data Analysis System for HPC (HPC 환경을 위한 워크플로우 기반의 바이오 데이터 분석 시스템)

  • Ahn, Shinyoung;Kim, ByoungSeob;Choi, Hyun-Hwa;Jeon, Seunghyub;Bae, Seungjo;Choi, Wan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.97-106
    • /
    • 2013
  • Since human genome project finished, the cost for human genome analysis has decreased very rapidly. This results in the sharp increase of human genome data to be analyzed. As the need for fast analysis of very large bio data such as human genome increases, non IT researchers such as biologists should be able to execute fast and effectively many kinds of bio applications, which have a variety of characteristics, under HPC environment. To accomplish this purpose, a biologist need to define a sequence of bio applications as workflow easily because generally bio applications should be combined and executed in some order. This bio workflow should be executed in the form of distributed and parallel computing by allocating computing resources efficiently under HPC cluster system. Through this kind of job, we can expect better performance and fast response time of very large bio data analysis. This paper proposes a workflow-based data analysis system specialized for bio applications. Using this system, non-IT scientists and researchers can analyze very large bio data easily under HPC environment.

Implementation of Massive FDTD Simulation Computing Model Based on MPI Cluster for Semi-conductor Process (반도체 검증을 위한 MPI 기반 클러스터에서의 대용량 FDTD 시뮬레이션 연산환경 구축)

  • Lee, Seung-Il;Kim, Yeon-Il;Lee, Sang-Gil;Lee, Cheol-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.9
    • /
    • pp.21-28
    • /
    • 2015
  • In the semi-conductor process, a simulation process is performed to detect defects by analyzing the behavior of the impurity through the physical quantity calculation of the inner element. In order to perform the simulation, Finite-Difference Time-Domain(FDTD) algorithm is used. The improvement of semiconductor which is composed of nanoscale elements, the size of simulation is getting bigger. Problems that a processor such as CPU or GPU cannot perform the simulation due to the massive size of matrix or a computer consist of multiple processors cannot handle a massive FDTD may come up. For those problems, studies are performed with parallel/distributed computing. However, in the past, only single type of processor was used. In GPU's case, it performs fast, but at the same time, it has limited memory. On the other hand, in CPU, it performs slower than that of GPU. To solve the problem, we implemented a computing model that can handle any FDTD simulation regardless of size on the cluster which consist of heterogeneous processors. We tested the simulation on processors using MPI libraries which is based on 'point to point' communication and verified that it operates correctly regardless of the number of node and type. Also, we analyzed the performance by measuring the total execution time and specific time for the simulation on each test.

A Design and Implementation of a Grid Job Monitoring Service Based on the OGSA(Open Grid Service Architecture) (OGSA(Open Grid Service Architecture)에 기반한 그리드 작업 모니터링 서비스 설계 및 구현)

  • Hahm, Jae-Gyoon;Kwon, Ok-Kyoung;Kim, Sang-Wan;Park, Hyoung-Woo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.11a
    • /
    • pp.213-216
    • /
    • 2003
  • 그리드 컴퓨팅에 있어서 핵심적인 역할을 하는 그리드 미들웨어는 사용자에게 있어서 사용하기에 편리해야 한다. 사용자가 자신의 계산을 수행하려고 할 때 사용해야 할 자원의 위치 및 가용성 등에 대해서 지식이 없더라도 자원의 할당을 자율적으로 할 수 있어야 한다. 특히 그리드 작업은 대부분 병렬작업으로서 분산된 복수의 자원을 동시에 이용하게 되는데, 이러한 환경에서 작업에 대한 모니터링은 사용자의 편의성을 최대한 고려하여 통합적인 서비스를 제공해야 한다. 그리고 OGSA(Open Grid Service Architecture)는 그리드에 웹 서비스 개념을 도입하여, 그리드 서비스의 확장성 및 구현의 용이성을 크게 향상시켰다. OGSA를 이용하여 그리드 서비스를 개발함으로써 사용자가 직접 미들웨어를 이용하기에 용이하게 할 뿐만 아니라, 사용자 어플리케이션을 만드는데 있어서도 쉽게 할 수 있다. 따라서 본 논문에서는 OGSA를 이용하여 사용자에게 통합적인 모니터링 서비스를 제공하는 그리드 작업 모니터링 서비스를 구현하였다.

  • PDF

Design and Implementation of HDFS Data Encryption Scheme Using ARIA Algorithms on Hadoop (하둡 상에서 ARIA 알고리즘을 이용한 HDFS 데이터 암호화 기법의 설계 및 구현)

  • Song, Youngho;Shin, YoungSung;Chang, Jae-Woo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.2
    • /
    • pp.33-40
    • /
    • 2016
  • Due to the growth of social network systems (SNS), big data are realized and Hadoop was developed as a distributed platform for analyzing big data. Enterprises analyze data containing users' sensitive information by using Hadoop and utilize them for marketing. Therefore, researches on data encryption have been done to protect the leakage of sensitive data stored in Hadoop. However, the existing researches support only the AES encryption algorithm, the international standard of data encryption. Meanwhile, Korean government choose ARIA algorithm as a standard data encryption one. In this paper, we propose a HDFS data encryption scheme using ARIA algorithms on Hadoop. First, the proposed scheme provide a HDFS block splitting component which performs ARIA encryption and decryption under the distributed computing environment of Hadoop. Second, the proposed scheme also provide a variable-length data processing component which performs encryption and decryption by adding dummy data, in case when the last block of data does not contains 128 bit data. Finally, we show from performance analysis that our proposed scheme can be effectively used for both text string processing applications and science data analysis applications.

Design And Implementation Real-Time Load Balancing Using TMO Replica Of LTMOS In Distributed Environment (분산 환경에서 LTMOS의 TMO 리플리카를 이용한 실시간 로드 밸런싱의 설계 및 구현)

  • Joo Koonho;Lim Bosub;Heu Shin;Kim Jungguk
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07a
    • /
    • pp.829-831
    • /
    • 2005
  • 실시간 시스템이란 시간적인 제한을 가지고 작업 수행 결과의 정확도가 보장되는 시스템으로 경성 실시간 시스템과 연성 실시간 시스템으로 분류된다. 최근 분산 실시간 분야에서 새로운 패러다임으로써 폭넓게 활용되기 시작한 실시간 객체 모델인 TMO는 Kane Kim과 Kopetz에 의해 처음 제안되었다. TMO 모델은 경성 또는 연성 실시간 응용과 병렬 컴퓨팅 응용 프로그램에서 사용 될 수 있으며, 시스템의 기능적인 면과 시간 조건 수행 모두를 명확히 정의할 수 있다. TMO의 네트워크로 구성되는 실시간 분산 환경에서의 실행을 위해 몇 개의 TMO 실행 엔진이 개발 되었는데, 그 중에서 LTMOS라는 리눅스 기반의 연성 실시간 미들웨어 엔진이 한국외대 RTDCS lab.에서 개발되었다. 하지만 LTMOS의 실시간 시스템 수행 중 작업량의 과부하로 인한 deadline 위반이나, 시스템간의 분산 IPC 통신에 있어서 Channel Traffic이 빈번한 경우 실시간 시스템을 유지할 수 없다는 문제점들을 갖고 있다. 이러한 문제점들을 해결하고 조금 더 효율적인 실시간 시스템을 유지하기 위해서, TMO 프로그램의 resource 정보를 담고 있는 ODS(Object Data Store)만을 다른 노드에 있는 자신의 TMO 프로그램 Replica로 이주해서 실시간 로드 밸런싱을 구현하는 기법을 사용하였다. 이에 본 논문에서는 TMO 프로그램들의 deadline 위반 및 Channel Traffic 부하를 감지할 수 있는 Node Monitor와 최적의 노드를 선별할 수 있는 Migration Manager를 새롭게 추가하였고, 쓰레드들의 스케줄러인 WRMT에 이주 작업을 하기 위한 부가적인 기능을 구현하였다. 2D 이미지의 관측점을 줄여 계산량을 대폭 감소시키는 장점을 갖는다.것으로 조사되었으며 40대 이상의 연령층은 점심비용으로 더 많은 지출을 하고 있는 것으로 나타났다. 4) 끼니별 한식에 대한 선호도는 아침식사의 경우가 가장 높았으며, 이는 40대와 50대에서 높게 나타났다. 점심 식사로 가장 선호되는 음식은 중식, 일식이었으며 저녁 식사에서 가장 선호되는 메뉴는 전 연령층에서 일식, 분식류 이었으며, 한식에 대한 선택 정도는 전 연령층에서 매우 낮게 나타났다. 5) 각 연령층에서 선호하는 한식에 대한 조사에서는 된장찌개가 전 연령층에서 가장 높은 선호도를 나타내었고, 김치는 40대 이상의 선호도가 30대보다 높게 나타났으며, 흥미롭게도 30세 이하의 선호도는 30대보다 높게 나타났다. 그 외에도 떡과 죽에 대한 선호도는 전 연령층에서 낮게 조사되었다. 장아찌류의 선호도는 전 연령대에서 낮았으며 특히 30세 이하에서 매우 낮게 조사되었다. 한식의 맛에 대한 만족도 조사에서는 연령이 올라갈수록 한식의 맛에 대한 만족도는 낮아지고 있었으나, 한식의 맛에 대한 만족도가 높을수록 양과 가격에 대한 만족도는 높은 경향을 나타내었다. 전반적으로 한식에 대한 선호도는 식사 때와 식사 목적에 따라 연령대 별로 다르게 나타나고 있으나, 선호도는 성별이나 세대에 관계없이 폭 넓은 선호도를 반영하고 있으며, 이는 대학생들을 대상으로 하는 연구 등에서도 나타난바 같다. 주 5일 근무제의 확산과 초 중 고생들의 토요일 휴무와 더불어 여행과 엔터테인먼트산업은 더욱 더 발전을 거듭하고 있으며, 외식은 여행과 여가 활동의 필수적인 요소로써 그 역할을 일조하고 있다. 이와 같은 여가시간의 증가는 독신자들에게는 좀더 많은 여유시간을 가족을

  • PDF

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Development of Information Technology Infrastructures through Construction of Big Data Platform for Road Driving Environment Analysis (도로 주행환경 분석을 위한 빅데이터 플랫폼 구축 정보기술 인프라 개발)

  • Jung, In-taek;Chong, Kyu-soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.669-678
    • /
    • 2018
  • This study developed information technology infrastructures for building a driving environment analysis platform using various big data, such as vehicle sensing data, public data, etc. First, a small platform server with a parallel structure for big data distribution processing was developed with H/W technology. Next, programs for big data collection/storage, processing/analysis, and information visualization were developed with S/W technology. The collection S/W was developed as a collection interface using Kafka, Flume, and Sqoop. The storage S/W was developed to be divided into a Hadoop distributed file system and Cassandra DB according to the utilization of data. Processing S/W was developed for spatial unit matching and time interval interpolation/aggregation of the collected data by applying the grid index method. An analysis S/W was developed as an analytical tool based on the Zeppelin notebook for the application and evaluation of a development algorithm. Finally, Information Visualization S/W was developed as a Web GIS engine program for providing various driving environment information and visualization. As a result of the performance evaluation, the number of executors, the optimal memory capacity, and number of cores for the development server were derived, and the computation performance was superior to that of the other cloud computing.

An Efficient Clustering Method based on Multi Centroid Set using MapReduce (맵리듀스를 이용한 다중 중심점 집합 기반의 효율적인 클러스터링 방법)

  • Kang, Sungmin;Lee, Seokjoo;Min, Jun-ki
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.7
    • /
    • pp.494-499
    • /
    • 2015
  • As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.

Development of Network Based MT Data Processing System (네트워크에 기반한 MT자료의 처리기술 개발 연구)

  • Lee Heuisoon;Kwon Byung-Doo;Chung Hojoon;Oh Seokhoon
    • Geophysics and Geophysical Exploration
    • /
    • v.3 no.2
    • /
    • pp.53-60
    • /
    • 2000
  • The server/client systems using the web protocol and distribution computing environment by network was applied to the MT data processing based on the Java technology. Using this network based system, users can get consistent and stable results because the system has standard analysing methods and has been tested from many users through the internet. Users can check the MT data processing at any time and get results during exploration to reduce the exploration time and money. The pure/enterprised Java technology provides facilities to develop the network based MT data processing system. Web based socket communication and RMI technology are tested respectively to produce the effective and practical client application. Intrinsically, the interpretation of MT data performing the inversion and data process requires heavy computational ability. Therefore we adopt the MPI parallel processing technique to fit the desire of in situ users and expect the effectiveness for the control and upgrade of programing codes.

  • PDF

Analysis of Factors for Korean Women's Cancer Screening through Hadoop-Based Public Medical Information Big Data Analysis (Hadoop기반의 공개의료정보 빅 데이터 분석을 통한 한국여성암 검진 요인분석 서비스)

  • Park, Min-hee;Cho, Young-bok;Kim, So Young;Park, Jong-bae;Park, Jong-hyock
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.10
    • /
    • pp.1277-1286
    • /
    • 2018
  • In this paper, we provide flexible scalability of computing resources in cloud environment and Apache Hadoop based cloud environment for analysis of public medical information big data. In fact, it includes the ability to quickly and flexibly extend storage, memory, and other resources in a situation where log data accumulates or grows over time. In addition, when real-time analysis of accumulated unstructured log data is required, the system adopts Hadoop-based analysis module to overcome the processing limit of existing analysis tools. Therefore, it provides a function to perform parallel distributed processing of a large amount of log data quickly and reliably. Perform frequency analysis and chi-square test for big data analysis. In addition, multivariate logistic regression analysis of significance level 0.05 and multivariate logistic regression analysis of meaningful variables (p<0.05) were performed. Multivariate logistic regression analysis was performed for each model 3.