• Title/Summary/Keyword: Distributed Data Analysis

Search Result 2,350, Processing Time 0.032 seconds

Assessment of Small Mountainous Catchment Runoff at Yongdam-dam Guryang (산지 소규모 유역의 유출 특성 평가-용담 구량천)

  • Kim, Seong-Goo;Chang, Hyung-Joon;Lee, Hyo-Sang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.4
    • /
    • pp.633-641
    • /
    • 2018
  • The risk of disasters, such as floods and drought, has increased. Reliable hydrological data is important for analyzing the water resource and designing hydraulic structure to manage these risks. The Yongdam Guryang river catchment located in the central of Korea is the research catchment of K-water and UNESCO IHP, and the hydrological data, such as rainfall, runoff, evapotranspiration, etc. has been observed at the catchment. The aim of this study was to assess the runoff characteristics of the small mountainous catchment of Korea based on the observed hydrological data, and the Probability Distributed Model was applied as the Rainfall-Runoff Model at the Yongdam Guryang river catchment. The hydrological data was divided into the wet period from June to September and dry period from October to May according to data analysis. The runoff ratio was 0.27~0.41 in the wet period and 0.30~0.45 in the dry period. The calibration result by the Probability Distributed Model showed a difference in the calibrated model parameters according to the periods. In addition, the model simulated the runoff accurately except for the dry period of 2015, and the result revealed the applicability of the PDM. This study showed the runoff characteristics of the small mountainous catchment by dividing the hydrological data into dry and wet periods.

A Method for Analyzing Web Log of the Hadoop System for Analyzing a Effective Pattern of Web Users (효과적인 웹 사용자의 패턴 분석을 위한 하둡 시스템의 웹 로그 분석 방안)

  • Lee, Byungju;Kwon, Jungsook;Go, Gicheol;Choi, Yonglak
    • Journal of Information Technology Services
    • /
    • v.13 no.4
    • /
    • pp.231-243
    • /
    • 2014
  • Of the various data that corporations can approach, web log data are important data that correspond to data analysis to implement customer relations management strategies. As the volume of approachable data has increased exponentially due to the Internet and popularization of smart phone, web log data have also increased a lot. As a result, it has become difficult to expand storage to process large amounts of web logs data flexibly and extremely hard to implement a system capable of categorizing, analyzing, and processing web log data accumulated over a long period of time. This study thus set out to apply Hadoop, a distributed processing system that had recently come into the spotlight for its capacity of processing large volumes of data, and propose an efficient analysis plan for large amounts of web log. The study checked the forms of web log by the effective web log collection methods and the web log levels by using Hadoop and proposed analysis techniques and Hadoop organization designs accordingly. The present study resolved the difficulty with processing large amounts of web log data and proposed the activity patterns of users through web log analysis, thus demonstrating its advantages as a new means of marketing.

Simulation Studies on Monitoring System for Interconnection of Distibuted Generation to Power Grid (분산전원 계통 연계 상황 모니터링을 위한 시뮬레이션 연구)

  • Oh, Sung-Nam;Han, Chul-Wan;Yoon, Ki-Don;Son, Young-Ik;Kim, Kab-Il
    • Proceedings of the KIEE Conference
    • /
    • 2005.05a
    • /
    • pp.96-98
    • /
    • 2005
  • This paper describes a monitoring system that monitors power quality and undesirable accidents when distributed generations are connected to the power grid. Prior to develop and operate a physical monitoring system, we constitute a simulation device to simulate the monitoring processes for the situations. The simulation system consists of the server and the client that connected by communication line. This system has various functions to monitor the power quality and the connection situation. Those functions are generation, transmission, acquisition and analysis of the simulated power data. This research seems very important to get the reliable and intelligent connection algorithm through the result of simulated monitoring system. Also hereafter, as this system uses the remote monitoring system through network and constitute the data base(DB), it will play an important role in building the automation of power system efficiently and systematically.

  • PDF

Pattern mining for large distributed dataset: A parallel approach (PMLDD)

  • Pal, Amrit;Kumar, Manish
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5287-5303
    • /
    • 2018
  • Handling vast amount of data found in large transactional datasets is an obvious challenge for the conventional data mining algorithms. Addressing this challenge, our paper proposes a parallel approach for proper decomposition of mining problem into sub-problems in order to find frequent patterns from these datasets. The proposed, Pattern Mining for Large Distributed Dataset (PMLDD) approach, ensures minimum dependencies as well as minimum communications among sub-problems. It establishes a linear aggregation of the intermediate results so that it can be adapted to large-scale programming models like MapReduce. In this context, an algorithmic structure for MapReduce programming model is presented. PMLDD guarantees an efficient load balancing among the sub-problems by a specific selection criterion. Further, it optimizes the number of required iterations over the dataset for mining frequent patterns as compared to the existing approaches. Finally, we believe that our approach is scalable enough to handle larger datasets in terms of performance evaluation, and the result analysis justifies all these mentioned concerns.

An Extraction Method of Sentiment Infromation from Unstructed Big Data on SNS (SNS상의 비정형 빅데이터로부터 감성정보 추출 기법)

  • Back, Bong-Hyun;Ha, Ilkyu;Ahn, ByoungChul
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.6
    • /
    • pp.671-680
    • /
    • 2014
  • Recently, with the remarkable increase of social network services, it is necessary to extract interesting information from lots of data about various individual opinions and preferences on SNS(Social Network Service). The sentiment information can be applied to various fields of society such as politics, public opinions, economics, personal services and entertainments. To extract sentiment information, it is necessary to use processing techniques that store a large amount of SNS data, extract meaningful data from them, and search the sentiment information. This paper proposes an efficient method to extract sentiment information from various unstructured big data on social networks using HDFS(Hadoop Distributed File System) platform and MapReduce functions. In experiments, the proposed method collects and stacks data steadily as the number of data is increased. When the proposed functions are applied to sentiment analysis, the system keeps load balancing and the analysis results are very close to the results of manual work.

Distributed REID Information Service Architecture for Ubiquitous Logistics (유비쿼터스 물류를 위한 분산형 RFID 정보서비스 구조)

  • Lee, Jae-Won;Lee, Young-Koo
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.2
    • /
    • pp.105-121
    • /
    • 2005
  • To realize a ubiquitous logistics management system using the smart object of Electronic Product Code(EPC) enabled RFID tag, the design and management of RFID Information Service is very important. RFID Information Service searches, transfers and responds to the other's PML request, but Physical Markup Language (PML) data management between trading system elements has issues of standardization of PML data description and processing, and problems of data traffic and communication time overload because of the innate distributed characteristics. As a complementary study, this research analyzes the usage patterns and data types of PML. On that analysis we provide a design of the distributed RFID Information Service architecture of PML data management that is using DB middleware. Standalone and Integrated type of RFID IS were proposed.

  • PDF

Analysis of a Communication Network for Control Systems in Nuclear Power Plants and a Case Study

  • Lee, Sung-Woo;Gwak, Kwi?Yil
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.338-341
    • /
    • 2005
  • In this paper, a real-time communication method using a PICNET-NP(Plant Instrumentation and Control Network for Nuclear Power Plant) is proposed with an analysis of the control network requirements of DCS (Distributed Control System) in nuclear power plants. The method satisfies deadline in case of worst data traffics by considering aperiodic and periodic real-time data and others. In addition, the method was used to analyze the data characteristics of the DCS in existing nuclear power plant. The result shows that use of this method meets the response time requirement(100ms).

  • PDF

Performance Analysis of Building Change Detection Algorithm (연합학습 기반 자치구별 건물 변화탐지 알고리즘 성능 분석)

  • Kim Younghyun
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.19 no.3
    • /
    • pp.233-244
    • /
    • 2023
  • Although artificial intelligence and machine learning technologies have been used in various fields, problems with personal information protection have arisen based on centralized data collection and processing. Federated learning has been proposed to solve this problem. Federated learning is a process in which clients who own data in a distributed data environment learn a model using their own data and collectively create an artificial intelligence model by centrally collecting learning results. Unlike the centralized method, Federated learning has the advantage of not having to send the client's data to the central server. In this paper, we quantitatively present the performance improvement when federated learning is applied using the building change detection learning data. As a result, it has been confirmed that the performance when federated learning was applied was about 29% higher on average than the performance when it was not applied. As a future work, we plan to propose a method that can effectively reduce the number of federated learning rounds to improve the convergence time of federated learning.

Analysis of Traffic Card Big Data by Hadoop and Sequential Mining Technique (하둡과 순차패턴 마이닝 기술을 통한 교통카드 빅데이터 분석)

  • Kim, Woosaeng;Kim, Yong Hoon;Park, Hee-Sung;Park, Jin-Kyu
    • Journal of Information Technology Applications and Management
    • /
    • v.24 no.4
    • /
    • pp.187-196
    • /
    • 2017
  • It is urgent to prepare countermeasures for traffic congestion problems of Korea's metropolitan area where central functions such as economic, social, cultural, and education are excessively concentrated. Most users of public transportation in metropolitan areas including Seoul use the traffic cards. If various information is extracted from traffic big data produced by the traffic cards, they can provide basic data for transport policies, land usages, or facility plans. Therefore, in this study, we extract valuable information such as the subway passengers' frequent travel patterns from the big traffic data provided by the Seoul Metropolitan Government Big Data Campus. For this, we use a Hadoop (High-Availability Distributed Object-Oriented Platform) to preprocess the big data and store it into a Mongo database in order to analyze it by a sequential pattern data mining technique. Since we analysis the actual big data, that is, the traffic cards' data provided by the Seoul Metropolitan Government Big Data Campus, the analyzed results can be used as an important referenced data when the Seoul government makes a plan about the metropolitan traffic policies.

Retrieval of High-Resolution Grid Type Visibility Data in South Korea Using Inverse Distance Weighting and Kriging

  • Kang, Taeho;Suh, Myoung-Seok
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.1
    • /
    • pp.97-110
    • /
    • 2021
  • Fog can cause large-scale human and economic damages, including traffic systems and agriculture. So, Korea Meteorological Administration is operating about 290 visibility meters to improve the observation level of fog. However, it is still insufficient to detect very localized fog. In this study, high-resolution grid-type visibility data were retrieved from irregularly distributed visibility data across the country. To this end, three objective analysis techniques (Inverse Distance Weighting (IDW), Ordinary Kriging (OK) and Universal Kriging (UK)) were used. To find the best method and parameters, sensitivity test was performed for the effective radius, power parameter and variogram model that affect the level of objective analysis. Also, the effect of data distribution characteristics (level of normality) on the performance level of objective analysis was evaluated. IDW showed a relatively high level of objective analysis in terms of bias, RMSE and correlation, and the performance is inversely proportional to the effective radius and power parameter. However, the two Krigings showed relatively low level of objective analysis, in particular, greatly weakened the variability of the variables, although the level of output was different depending on the variogram model used. As the level of objective analysis is greatly influenced by the distribution characteristics of data, power, and models used, care should be taken when selecting objective analysis techniques and parameters.