• Title/Summary/Keyword: Spark Computing Framework

Search Result 17, Processing Time 0.025 seconds

A Scalable OWL Horst Lite Ontology Reasoning Approach based on Distributed Cluster Memories (분산 클러스터 메모리 기반 대용량 OWL Horst Lite 온톨로지 추론 기법)

  • Kim, Je-Min;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.42 no.3
    • /
    • pp.307-319
    • /
    • 2015
  • Current ontology studies use the Hadoop distributed storage framework to perform map-reduce algorithm-based reasoning for scalable ontologies. In this paper, however, we propose a novel approach for scalable Web Ontology Language (OWL) Horst Lite ontology reasoning, based on distributed cluster memories. Rule-based reasoning, which is frequently used for scalable ontologies, iteratively executes triple-format ontology rules, until the inferred data no longer exists. Therefore, when the scalable ontology reasoning is performed on computer hard drives, the ontology reasoner suffers from performance limitations. In order to overcome this drawback, we propose an approach that loads the ontologies into distributed cluster memories, using Spark (a memory-based distributed computing framework), which executes the ontology reasoning. In order to implement an appropriate OWL Horst Lite ontology reasoning system on Spark, our method divides the scalable ontologies into blocks, loads each block into the cluster nodes, and subsequently handles the data in the distributed memories. We used the Lehigh University Benchmark, which is used to evaluate ontology inference and search speed, to experimentally evaluate the methods suggested in this paper, which we applied to LUBM8000 (1.1 billion triples, 155 gigabytes). When compared with WebPIE, a representative mapreduce algorithm-based scalable ontology reasoner, the proposed approach showed a throughput improvement of 320% (62k/s) over WebPIE (19k/s).

Development of Big Data System for Energy Big Data (에너지 빅데이터를 수용하는 빅데이터 시스템 개발)

  • Song, Mingoo
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.1
    • /
    • pp.24-32
    • /
    • 2018
  • This paper proposes a Big Data system for energy Big Data which is aggregated in real-time from industrial and public sources. The constructed Big Data system is based on Hadoop and the Spark framework is simultaneously applied on Big Data processing, which supports in-memory distributed computing. In the paper, we focus on Big Data, in the form of heat energy for district heating, and deal with methodologies for storing, managing, processing and analyzing aggregated Big Data in real-time while considering properties of energy input and output. At present, the Big Data influx is stored and managed in accordance with the designed relational database schema inside the system and the stored Big Data is processed and analyzed as to set objectives. The paper exemplifies a number of heat demand plants, concerned with district heating, as industrial sources of heat energy Big Data gathered in real-time as well as the proposed system.

An Approach of Scalable SHIF Ontology Reasoning using Spark Framework (Spark 프레임워크를 적용한 대용량 SHIF 온톨로지 추론 기법)

  • Kim, Je-Min;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.42 no.10
    • /
    • pp.1195-1206
    • /
    • 2015
  • For the management of a knowledge system, systems that automatically infer and manage scalable knowledge are required. Most of these systems use ontologies in order to exchange knowledge between machines and infer new knowledge. Therefore, approaches are needed that infer new knowledge for scalable ontology. In this paper, we propose an approach to perform rule based reasoning for scalable SHIF ontologies in a spark framework which works similarly to MapReduce in distributed memories on a cluster. For performing efficient reasoning in distributed memories, we focus on three areas. First, we define a data structure for splitting scalable ontology triples into small sets according to each reasoning rule and loading these triple sets in distributed memories. Second, a rule execution order and iteration conditions based on dependencies and correlations among the SHIF rules are defined. Finally, we explain the operations that are adapted to execute the rules, and these operations are based on reasoning algorithms. In order to evaluate the suggested methods in this paper, we perform an experiment with WebPie, which is a representative ontology reasoner based on a cluster using the LUBM set, which is formal data used to evaluate ontology inference and search speed. Consequently, the proposed approach shows that the throughput is improved by 28,400% (157k/sec) from WebPie(553/sec) with LUBM.

Spark based Scalable RDFS Ontology Reasoning over Big Triples with Confidence Values (신뢰값 기반 대용량 트리플 처리를 위한 스파크 환경에서의 RDFS 온톨로지 추론)

  • Park, Hyun-Kyu;Lee, Wan-Gon;Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.87-95
    • /
    • 2016
  • Recently, due to the development of the Internet and electronic devices, there has been an enormous increase in the amount of available knowledge and information. As this growth has proceeded, studies on large-scale ontological reasoning have been actively carried out. In general, a machine learning program or knowledge engineer measures and provides a degree of confidence for each triple in a large ontology. Yet, the collected ontology data contains specific uncertainty and reasoning such data can cause vagueness in reasoning results. In order to solve the uncertainty issue, we propose an RDFS reasoning approach that utilizes confidence values indicating degrees of uncertainty in the collected data. Unlike conventional reasoning approaches that have not taken into account data uncertainty, by using the in-memory based cluster computing framework Spark, our approach computes confidence values in the data inferred through RDFS-based reasoning by applying methods for uncertainty estimating. As a result, the computed confidence values represent the uncertainty in the inferred data. To evaluate our approach, ontology reasoning was carried out over the LUBM standard benchmark data set with addition arbitrary confidence values to ontology triples. Experimental results indicated that the proposed system is capable of running over the largest data set LUBM3000 in 1179 seconds inferring 350K triples.

Simulation and Analysis of Wildfire for Disaster Planning and Management

  • Yang, Fan;Zhang, Jiansong
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.443-449
    • /
    • 2022
  • With climate change and the global population growth, the frequency and scope of wildfires are constantly increasing, which threatened people's lives and property. For example, according to California Department of Forestry and Fire Protection, in 2020, a total of 9,917 incidents related to wildfires were reported in California, with an estimated burned area of 4,257,863 acres, resulting in 33 fatalities and 10,488 structures damaged or destroyed. At the same time, the ongoing development of technology provides new tools to simulate and analyze the spread of wildfires. How to use new technology to reduce the losses caused by wildfire is an important research topic. A potentially feasible strategy is to simulate and analyze the spread of wildfires through computing technology to explore the impact of different factors (such as weather, terrain, etc.) on the spread of wildfires, figure out how to take preemptive/responsive measures to minimize potential losses caused by wildfires, and as a result achieve better management support of wildfires. In preparation for pursuing these goals, the authors used a powerful computing framework, Spark, developed by the Commonwealth Scientific and Industrial Research Organization (CSIRO), to study the effects of different weather factors (wind speed, wind direction, air temperature, and relative humidity) on the spread of wildfires. The test results showed that wind is a key factor in determining the spread of wildfires. A stable weather condition (stable wind and air conditions) is beneficial to limit the spread of wildfires. Joint consideration of weather factors and environmental obstacles can help limit the threat of wildfires.

  • PDF

Big Data based Tourist Attractions Recommendation - Focus on Korean Tourism Organization Linked Open Data - (빅데이터 기반 관광지 추천 시스템 구현 - 한국관광공사 LOD를 중심으로 -)

  • Ahn, Jinhyun;Kim, Eung-Hee;Kim, Hong-Gee
    • Management & Information Systems Review
    • /
    • v.36 no.4
    • /
    • pp.129-148
    • /
    • 2017
  • Conventional exhibition management information systems recommend tourist attractions that are close to the place in which an exhibition is held. Some recommended attractions by the location-based recommendation could be meaningless when nothing is related to the exhibition's topic. Our goal is to recommend attractions that are related to the content presented in the exhibition, which can be coined as content-based recommendation. Even though human exhibition curators can do this, the quality is limited to their manual task and knowledge. We propose an automatic way of discovering attractions relevant to an exhibition of interests. Language resources are incorporated to discover attractions that are more meaningful. Because a typical single machine is unable to deal with such large-scale language resources efficiently, we implemented the algorithm on top of Apache Spark, which is a well-known distributed computing framework. As a user interface prototype, a web-based system is implemented that provides users with a list of relevant attractions when users are browsing exhibition information, available at http://bike.snu.ac.kr/WARP. We carried out a case study based on Korean Tourism Organization Linked Open Data with Korean Wikipedia as a language resource. Experimental results are demonstrated to show the efficiency and effectiveness of the proposed system. The effectiveness was evaluated against well-known exhibitions. It is expected that the proposed approach will contribute to the development of both exhibition and tourist industries by motivating exhibition visitors to become active tourists.

  • PDF

Operational Big Data Analytics platform for Smart Factory (스마트팩토리를 위한 운영빅데이터 분석 플랫폼)

  • Bae, Hyerim;Park, Sanghyuck;Choi, Yulim;Joo, Byeongjun;Sutrisnowati, Riska Asriana;Pulshashi, Iq Reviessay;Putra, Ahmad Dzulfikar Adi;Adi, Taufik Nur;Lee, Sanghwa;Won, Seokrae
    • The Journal of Bigdata
    • /
    • v.1 no.2
    • /
    • pp.9-19
    • /
    • 2016
  • Since ICT convergence became a major issue, German government has carried forward a policy 'Industry 4.0' that triggered ICT convergence with manufacturing. Now this trend gets into our stride. From this facts, we can expect great leap up to quality perfection in low cost. Recently Korean government also enforces policy with 'Manufacturing 3.0' for upgrading Korean manufacturing industry with being accelerated by many related technologies. We, in the paper, developed a custom-made operational big data analysis platform for the implementation of operational intelligence to improve industry capability. Our platform is designed based on spring framework and web. In addition, HDFS and spark architectures helps our system analyze massive data on the field with streamed data processed by process mining algorithm. Extracted knowledge from data will support enhancement of manufacturing performance.

  • PDF