• Title/Summary/Keyword: Distributed Data Analysis

Search Result 2,340, Processing Time 0.031 seconds

Design and Implementation of a Grid System META for Executing CFD Analysis Programs on Distributed Environment (분산 환경에서 CFD 분석 프로그램 수행을 위한 그리드 시스템 META 설계 및 구현)

  • Kang, Kyung-Woo;Woo, Gyun
    • The KIPS Transactions:PartA
    • /
    • v.13A no.6 s.103
    • /
    • pp.533-540
    • /
    • 2006
  • This paper describes the design and implementation of a grid system META (Metacomputing Environment using Test-run of Application) which facilitates the execution of a CFD (Computational Fluid Dynamics) analysis program on distributed environment. The grid system META allows the CFD program developers can access the computing resources distributed over the network just like one computer system. The research issues involved in the grid computing include fault-tolerance, computing resource selection, and user-interface design. In this paper, we exploits an automatic resource selection scheme for executing the parallel SPMD (Single Program Multiple Data) application written in MPI (Message Passing Interface). The proposed resource selection scheme is informed from the network latency time and the elapsed time of the kernel loop attained from test-run. The network latency time highly influences the executional performance when a parallel program is distributed and executed over several systems. The elapsed time of the kernel loop can be used as an estimator of the whole execution time of the CFD Program due to a common characteristic of CFD programs. The kernel loop consumes over 90% of the whole execution time of a CFD program.

The impact of Marketing Communication Content Distributed on Social Networks on Electronic Word-of-Mouth

  • VO, Minh Sang;HUYNH, Dung Quoc Vu;NGUYEN, Giang Huong;DANG, Giang Ha Nguyen;HUYNH, Duong Dai;LE, Bao Quang;DANG, Nhut Minh
    • Journal of Distribution Science
    • /
    • v.20 no.5
    • /
    • pp.65-74
    • /
    • 2022
  • Purpose: This paper evaluates which characteristics of marketing communication content distributed on social networks impact electronic word-of-mouth (e-WOM). Research design, data, and methodology: Quantitative research was carried out on 637 Vietnamese people aged from 18, who were exposed to marketing communication programs of fashion brands. Preliminary data were analyzed by the reliability of the scale, multivariate regression analysis, and analysis of variance. Results: The research findings have identified the four characteristics of social media content that positively impact e-WOM, including entertainment, interaction, trendiness, and customization. Participants aged 30 and under have a higher appreciation for media content and e-WOM than those from 31 and over. Conclusion: To promote e-WOM, marketing communication content distributed on social networks should focus on the following characteristics: (1) The entertainment of marketing communication content should involve positive emotions, fun, and enjoyment; (2) With interactive content, focus should be placed on discussion and exchange content, content that encourages sharing, and two-way interactive content; (3) For trending marketing communication content, marketers consider communicating brand-related latest information, up-to-date information, and hot discussion topics; and (4) When creating customized content, brands should be interesting, customized (information, product, price), and unique.

Performance Improvement of BLAST using Grid Computing and Implementation of Genome Sequence Analysis System (그리드 컴퓨팅을 이용한 BLAST 성능개선 및 유전체 서열분석 시스템 구현)

  • Kim, Dong-Wook;Choi, Han-Suk
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.7
    • /
    • pp.81-87
    • /
    • 2010
  • This paper proposes a G-BLAST(BLAST using Grid Computing) system, an integrated software package for BLAST searches operated in heterogeneous distributed environment. G-BLAST employed 'database splicing' method to improve the performance of BLAST searches using exists computing resources. G-BLAST is a basic local alignment search tool of DNA Sequence using grid computing in heterogeneous distributed environment. The G-BLAST improved the existing BLAST search performance in gene sequence analysis. Also G-BLAST implemented the pipeline and data management method for users to easily manage and analyze the BLAST search results. The proposed G-BLAST system has been confirmed the speed and efficiency of BLAST search performance in heterogeneous distributed computing.

Throughput Analysis and Optimization of Distributed Collision Detection Protocols in Dense Wireless Local Area Networks

  • Choi, Hyun-Ho;Lee, Howon;Kim, Sanghoon;Lee, In-Ho
    • Journal of Communications and Networks
    • /
    • v.18 no.3
    • /
    • pp.502-512
    • /
    • 2016
  • The wireless carrier sense multiple access with collision detection (WCSMA/CD) and carrier sense multiple access with collision resolution (CSMA/CR) protocols are considered representative distributed collision detection protocols for fully connected dense wireless local area networks. These protocols identify collisions through additional short-sensing within a collision detection (CD) period after the start of data transmission. In this study, we analyze their throughput numerically and show that the throughput has a trade-off that accords with the length of the CD period. Consequently, we obtain the optimal length of the CD period that maximizes the throughput as a closed-form solution. Analysis and simulation results show that the throughput of distributed collision detection protocols is considerably improved when the optimal CD period is allocated according to the number of stations and the length of the transmitted packet.

Design of Distributed Hadoop Full Stack Platform for Big Data Collection and Processing (빅데이터 수집 처리를 위한 분산 하둡 풀스택 플랫폼의 설계)

  • Lee, Myeong-Ho
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.45-51
    • /
    • 2021
  • In accordance with the rapid non-face-to-face environment and mobile first strategy, the explosive increase and creation of many structured/unstructured data every year demands new decision making and services using big data in all fields. However, there have been few reference cases of using the Hadoop Ecosystem, which uses the rapidly increasing big data every year to collect and load big data into a standard platform that can be applied in a practical environment, and then store and process well-established big data in a relational database. Therefore, in this study, after collecting unstructured data searched by keywords from social network services based on Hadoop 2.0 through three virtual machine servers in the Spring Framework environment, the collected unstructured data is loaded into Hadoop Distributed File System and HBase based on the loaded unstructured data, it was designed and implemented to store standardized big data in a relational database using a morpheme analyzer. In the future, research on clustering and classification and analysis using machine learning using Hive or Mahout for deep data analysis should be continued.

The Qualifications for the Application of the Rainfall Spatial Distribution Analysis Technique (강우량 공간분포 분석기법의 적용조건에 관한 연구)

  • Hwang Sye-Woon;Park Seung-Woo;Cho Young-Kyoung
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2005.05b
    • /
    • pp.943-947
    • /
    • 2005
  • This study was intended to interpose an objection about the analysis of rainfall spatial distribution without a proper standard, and offer the improved approach using 1,he geostatistical analysis method to analyze it. For this, spatially distributed daily rainfall data sets were collected for 41 weather stations in study area, and variogram and correlation analysis were conducted. In the results of correlation analysis, it was found that the longer distance between the stations reduces the correlation of the rainfall data, and maltes the characteristics of the rainfall spatial distribution. The variogram analysis shows that correlation range was less than 50 km for the 17 daily rainfall data sets of total 91 sets. It says that it involves some rike, to determine the application method for rainfall spatial distribution without some qualifications, hence the Application standards of the Rainfall Spatial Distribution Analysis Technique, were essential and that was contingent on characteristics of rainfall and landscape.

  • PDF

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Development of Distributed Hydrological Analysis Tool for Future Climate Change Impacts Assessment of South Korea (전국 기후변화 영향평가를 위한 분포형 수문분석 툴 개발)

  • Kim, Seong Joon;Kim, Sang Ho;Joh, Hyung Kyung;Ahn, So Ra
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.57 no.2
    • /
    • pp.15-26
    • /
    • 2015
  • The purpose of this paper is to develop a software tool, PGA-CC (Projection of hydrology via Grid-based Assessment for Climate Change) to evaluate the present hydrologic cycle and the future watershed hydrology by climate change. PGA-CC is composed of grid-based input data pre-processing module, hydrologic cycle calculation module, output analysis module, and output data post-processing module. The grid-based hydrological model was coded by Fortran and compiled using Compaq Fortran 6.6c, and the Graphic User Interface was developed by using Visual C#. Other most elements viz. Table and Graph, and GIS functions were implemented by MapWindow. The applicability of PGA-CC was tested by assessing the future hydrology of South Korea by HadCM3 SRES B1 and A2 climate change scenarios. For the whole country, the tool successfully assessed the future hydrological components including input data and evapotranspiration, soil moisture, surface runoff, lateral flow, base flow etc. From the spatial outputs, we could understand the hydrological changes both seasonally and regionally.

Study on Flood Prediction System Based on Radar Rainfall Data (레이더 강우자료에 의한 홍수 예보 시스템 연구)

  • Kim, Won-Il;Oh, Kyoung-Doo;Ahn, Won-Sik;Jun, Byong-Ho
    • Journal of Korea Water Resources Association
    • /
    • v.41 no.11
    • /
    • pp.1153-1162
    • /
    • 2008
  • The use of radar rainfall for hydrological appraisal has been a challenge due to the limitations in raw data generation followed by the complex analysis needed to come up with precise data interpretation. In this study, RAIDOM (RAdar Image DigitalizatiOn Method) has been developed to convert synthetic radar CAPPI(Constant Altitude Plan Position Indicator) image data from Korea Meteorological Administration into digital format in order to come up with a more practical and useful radar image data. RAIDOM was used to examine a severe local rainstorm that occurred in July 2006 as well as two other separate events that caused heavy floods on both upper and mid parts of the HanRiver basin. A distributed model was developed based on the available radar rainfall data. The Flood Hydrograph simulation has been found consistent with actual values. The results show the potentials of RAIDOM and the distributed model as tools for flood prediction. Furthermore, these findings are expected to extend the usefulness of radar rainfall data in hydrological appraisal.

Development of the korea spatial data infrastructure based on the open GIS component architecture (개방형 GIS 컴포넌트를 이용한 국가공간정보유통체계의 구축)

  • Seo, Young-Won;Lee, Deuk-Woo;Jin, Heui-Chae;Lee, Sang-Moo
    • Journal of Korea Spatial Information System Society
    • /
    • v.2 no.2 s.4
    • /
    • pp.49-58
    • /
    • 2000
  • With the growing realization that the GIS data management becomes important more than ever, a great deal of the spatial data is being digitalized through various GIS projects, such as NGIS project. However, the integrated search and analysis for GIS data are highly constrained by the databases being heterogeneous and distributed among organizations. This paper is to introduce the system developed to solve the problem of interoperability in the heterogeneous and distributed databases environment. The system architecture is presented which is composed of spatial data servers, nodes, and a gateway. Also, the paper provides the implementation details of the client application to access and analyze the distributed and heterogeneous spatial data with the standardized interface. Finally, based on the technical architecture, the korea spatial data infrastructure is explained.

  • PDF