• 제목/요약/키워드: Distributed Data Analysis

검색결과 2,350건 처리시간 0.033초

하둡 및 Spark 기반 공간 통계 핫스팟 분석의 분산처리 방안 연구 (Distributed Processing Method of Hotspot Spatial Analysis Based on Hadoop and Spark)

  • 김창수;이주섭;황규문;성효진
    • 정보과학회 논문지
    • /
    • 제45권2호
    • /
    • pp.99-105
    • /
    • 2018
  • 공간통계 분석중 하나인 핫스팟 분석은 "인접해 있는 것은 멀리 있는 것 보다 더 연관성이 있다"는 법칙에 따라 공간속성이나 사건의 공간 패턴을 쉽게 파악할 수 있는 기법 중 하나 이지만, 공간의 인접성이 고려되어야 하므로 분산 처리하기 용이하지 않다. 본 논문에서는 핫스팟 분석의 분산처리 방안을 기술하고 성능을 하둡 및 인메모리 기반인 Spark으로 평가한 결과 단일 시스템 대비 하둡기반 처리는 625.89%, Spark기반 처리는 870.14%의 성능향상을 확인하였으며, 하둡 기반과 Spark기반의 비교에서는 대용량 데이터 셋을 처리 할수록 Spark기반의 성능향상율이 높아짐을 확인하였다.

분산객체그룹프레임워크 기반의 프로액티브 응용서비스엔진 개발 (A Development of Proactive Application Service Engine Based on the Distributed Object Group Framework)

  • 신창선;서종성
    • 인터넷정보학회논문지
    • /
    • 제11권1호
    • /
    • pp.153-165
    • /
    • 2010
  • 본 논문에서는 분산응용의 관점에서 네트워크 상에 응용을 구성하는 분산된 객체들을 효율적으로 관리하는 분산객체그룹 프레임워크를 기반으로 사용자 맞춤형 분산응용 서비스를 제공하는 프로액티브응용서비스엔진을 제안한다. 본 엔진은 물리계층, 미들웨어 계층, 응용 계층으로 구성되며, 사용자의 요청에 의해 하드웨어 기기로부터 수집된 데이터 및 응용을 구성하는 객체의 속성정보를 그룹으로 관리하는 그룹서비스와 수집된 데이터 및 객체에 대한 사용자의 권한별 접근을 관리하는 보안서비스, 수집된 데이터를 추출 및 가공하여 응용에 제공하는 필터링서비스, 과거의 데이터를 이용한 통계서비스, 수집된 데이터를 토대로 현재의 운영 상태를 진단하는 진단서비스, 통계서비스와 진단서비스를 통해 미래의 발생 가능한 상황을 예측하기 위한 예측서비스를 제공한다. 최종적으로 엔진이 제공하는 서비스의 수행성을 검증하기 위하여 유비쿼터스 농업 분야의 온실 자동제어 응용에 적용하여 결과를 확인했다.

풍력발전출력의 공간예측 향상을 위한 상관관계감소거리(CoDecDist) 모형 분석에 관한 연구 (A Study on the Analysis of Correlation Decay Distance(CoDecDist) Model for Enhancing Spatial Prediction Outputs of Spatially Distributed Wind Farms)

  • 허진
    • 조명전기설비학회논문지
    • /
    • 제29권7호
    • /
    • pp.80-86
    • /
    • 2015
  • As wind farm outputs depend on natural wind resources that vary over space and time, spatial correlation analysis is needed to estimate power outputs of wind generation resources. As a result, geographic information such as latitude and longitude plays a key role to estimate power outputs of spatially distributed wind farms. In this paper, we introduce spatial correlation analysis to estimate the power outputs produced by wind farms that are geographically distributed. We present spatial correlation analysis of empirical power output data for the JEJU Island and ERCOT ISO (Texas) wind farms and propose the Correlation Decay Distance (CoDecDist) model based on geographic correlation analysis to enhance the estimation of wind power outputs.

Web Services Based Biological Data Analysis Tool

  • Kim, Min Kyung;Choi, Yo Hahn;Yoo, Seong Joon;Park, Hyun Seok
    • Genomics & Informatics
    • /
    • 제2권3호
    • /
    • pp.142-146
    • /
    • 2004
  • Biological data and analysis tools are accumulated in distributed databases and web servers. For this reason, biologists who want to find information from the web should be aware of the various kinds of resources where it is located and how it is retrieved. Integrating the data from heterogeneous biological resources will enable biologists to discover new knowledge across the specific domain boundaries from sequences to expression, structure, and pathway. And inevitably biological databases contain noisy data. Therefore, consensus among databases will confirm the reliability of its contents. We have developed WeSAT that integrates distributed and heterogeneous biological databases and analysis tools, providing through Web Services protocols. In WeSAT, biologists are retrieved specific entries in SWISS-PROT/EMBL, PDB, and KEGG, which have annotated information about sequence, structure, and pathway. And further analysis is carried by integrated services for example homology search and multiple alignments. WeSAT makes it possible to retrieve real time updated data and analysis from the scattered databases in a single platform through Web Services.

Spark 기반에서 Python과 Scala API의 성능 비교 분석 (Performance Comparison of Python and Scala APIs in Spark Distributed Cluster Computing System)

  • 지경엽;권영미
    • 한국멀티미디어학회논문지
    • /
    • 제23권2호
    • /
    • pp.241-246
    • /
    • 2020
  • Hadoop is a framework to process large data sets in a distributed way across clusters of nodes. It has been a popular platform to process big data, but in recent years, other platforms became competitive ones depending on the characteristics of the application. Spark is one of distributed platforms to enable real-time data processing and improve overall processing performance over Hadoop by introducing in-memory processing instead of disk I/O. Whereas Hadoop is designed to work on Java and data analysis is processed using Java API, Spark provides a variety of APIs with Scala, Python, Java and R. In this paper, the goal is to find out whether the APIs of different programming languages af ect the performances in Spark. We chose two popular APIs: Python and Scala. Python is easy to learn and is used in AI domain in a wide range. Scala is a programming language with advantages of parallelism. Our experiment shows much faster processing with Scala API than Python API. For the performance issues on AI-based analysis, further study is needed.

분산 시스템의 적응형 내결합성 및 QoS 미들웨어 지원 (An Adaptive Fault Tolerant and QoS-Enabled Middleware Support in Distributed Systems)

  • 조바니 카가라반;김석수
    • 한국산학기술학회:학술대회논문집
    • /
    • 한국산학기술학회 2009년도 추계학술발표논문집
    • /
    • pp.461-465
    • /
    • 2009
  • Normally, a distributed computing environment is flexible in controlling complex embedded systems but their software components are becoming complex as these systems are equipped with several platforms and attached to various electronic devices, sensors, and actuators. These systems requires inter-object communication mechanisms to provide fault tolerant and QoS-enabled middleware service support in a distributed system. Generally, a middleware performs analysis of the parameters to ensure the availability and reliability of data dissemination. This paper focuses in particular to designing an application middleware for the specific scenario to improve the high availability and fault tolerance of data thus improving the QoS (Quality of Service) of a distributed system. The performance of an adaptive and highly reliable middleware can be significant based on the selection of vital parameters of the system.

  • PDF

A Study on Knowledge Sharing in Distributed Environment

  • Lee, Hong-Girl;Lee, Cheol-Yeong
    • 한국항해항만학회지
    • /
    • 제27권6호
    • /
    • pp.683-691
    • /
    • 2003
  • This exploratory study aims to investigate issues that, according to the Nonaka's theoretical model, are believed to hold significant ramifications on the effectiveness of creating and sharing organizational knowledge among distributed workers. These include changes in accessibility of knowledge with different levels of implicity, and the choice of communication media as a knowledge management channel. Related data were gathered from distributed-workers in Japan through interviews and a survey questionnaire. Data analysis revealed changes in the dynamics of internal and external interactivity, in the accessibility of necessary knowledge, and in the reliance on electronic media for knowledge exchange. The findings' implications are discussed from the perspective of knowledge creation ana sharing, and further suggestions have been made for the direction of future research efforts.

분포시차모형의 Bayesian 의사결정법에 관한 연구 (A Study on the Distributed Lag Model by Bayesian Decision Making Method)

  • 이필령
    • 산업경영시스템학회지
    • /
    • 제8권11호
    • /
    • pp.27-34
    • /
    • 1985
  • Recently the distributed lag models for time series data have been used in several quantitative analyses. But the analyses of time series which have the serial correlations in error terms and the lagged values of dependent variables violate the hypothesis of OLS method. This paper suggests that the approach technique of distributed lay model with serial correlation should be applied by the Bayesian inference to estimate the parameters. For the application of distributed lag model by Bayesian analysis, the data for monthly consumption expenditure per household by items of commodities from 1972 to 1981 are used in order to estimate the lagged coefficient of processed food and the regression coefficient of the food and beverage.

  • PDF

An Efficient Design and Implementation of an MdbULPS in a Cloud-Computing Environment

  • Kim, Myoungjin;Cui, Yun;Lee, Hanku
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권8호
    • /
    • pp.3182-3202
    • /
    • 2015
  • Flexibly expanding the storage capacity required to process a large amount of rapidly increasing unstructured log data is difficult in a conventional computing environment. In addition, implementing a log processing system providing features that categorize and analyze unstructured log data is extremely difficult. To overcome such limitations, we propose and design a MongoDB-based unstructured log processing system (MdbULPS) for collecting, categorizing, and analyzing log data generated from banks. The proposed system includes a Hadoop-based analysis module for reliable parallel-distributed processing of massive log data. Furthermore, because the Hadoop distributed file system (HDFS) stores data by generating replicas of collected log data in block units, the proposed system offers automatic system recovery against system failures and data loss. Finally, by establishing a distributed database using the NoSQL-based MongoDB, the proposed system provides methods of effectively processing unstructured log data. To evaluate the proposed system, we conducted three different performance tests on a local test bed including twelve nodes: comparing our system with a MySQL-based approach, comparing it with an Hbase-based approach, and changing the chunk size option. From the experiments, we found that our system showed better performance in processing unstructured log data.

두건류 제작을 위한 남성의 두부 형태 분석 (Analysis of Head shape of college students for the Headgears)

  • 이진희
    • 한국의류학회지
    • /
    • 제28권1호
    • /
    • pp.182-188
    • /
    • 2004
  • The purpose of the study was to provide scientific and accurate data of head shape for men. This study was carried out on 214 men and Factor analysis, Cluster analysis, Duncan analysis with 15 variables were performed using the data. A 3D scanner was used for visual results of head shape. The results were as follows. First, through factor analysis of the variables, three factors were extracted upon factor scores. The first factor described thickness part, and second factor described width parts and the third factor described vertical length parts. Four clusters represented characteristics of men's head types. Type 1 had a larger head thickness, type 2 had a smaller thickness and smaller width. type 4 had a generally larger head. In the distribution of the four clusters, type 1 was distributed 34%. Type 4 was distributed 23%. According to the results, type 1 of the more thick and narrow head was dominant among head types of men.