• Title/Summary/Keyword: distributed query processing

Search Result 130, Processing Time 0.024 seconds

A Keyword-based Filtering Technique of Document-centric XML using NFA Representation (NFA 표현을 사용한 문서-중심적 XML의 키워드 기반 필터링 기법)

  • Lee, Kyoung-Han;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.33 no.5
    • /
    • pp.437-452
    • /
    • 2006
  • In this paper, we propose an extended XPath specification which includes a special matching character '%' used in the LIKE operation of SQL in order to solve the difficulty of writing some queries to filter element contents well, using the previous XPath specification. We also present a novel technique for filtering a collection of document-centric XMLs, called Pfilter, which is able to exploit the extended XPath specification. Owing to sharing the common prefix characters of the operands in value-based predicates, the Pfilter improves the performance in processing those. We show several performance studies, comparing Pfilter with Yfilter in respect to efficiency and scalability as using multi-query processing time (MQPT), and reporting the results with respect to inserting, deleting, and processing of value-based predicates. In conclusion, our approach provides a core algorithm for evaluating the contains() function of XPath queries in previous XML filtering researches, and a foundation for building XML-based distributed information systems.

Hazelcast Vs. Ignite: Opportunities for Java Programmers

  • Maxim, Bartkov;Tetiana, Katkova;S., Kruglyk Vladyslav;G., Murtaziev Ernest;V., Kotova Olha
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.406-412
    • /
    • 2022
  • Storing large amounts of data has always been a big problem from the beginning of computing history. Big Data has made huge advancements in improving business processes by finding the customers' needs using prediction models based on web and social media search. The main purpose of big data stream processing frameworks is to allow programmers to directly query the continuous stream without dealing with the lower-level mechanisms. In other words, programmers write the code to process streams using these runtime libraries (also called Stream Processing Engines). This is achieved by taking large volumes of data and analyzing them using Big Data frameworks. Streaming platforms are an emerging technology that deals with continuous streams of data. There are several streaming platforms of Big Data freely available on the Internet. However, selecting the most appropriate one is not easy for programmers. In this paper, we present a detailed description of two of the state-of-the-art and most popular streaming frameworks: Apache Ignite and Hazelcast. In addition, the performance of these frameworks is compared using selected attributes. Different types of databases are used in common to store the data. To process the data in real-time continuously, data streaming technologies are developed. With the development of today's large-scale distributed applications handling tons of data, these databases are not viable. Consequently, Big Data is introduced to store, process, and analyze data at a fast speed and also to deal with big users and data growth day by day.

PROJECTION OF TRAJECTORY FOR SUPPORTING UNCERTAINTY FUTURE TIME OF MOVING OBJECT

  • Won Ho-Gyeong;Jung Young Jin;Lee Yang Koo;Park Mi;Kim Hak-cheol;Ryu Keun Ho
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.72-75
    • /
    • 2005
  • Uncertainty of objects in Moving Object Database is a coherent property. It has been discussed in a lot of researches on modelling and query processing. The previous studies assume that uncertain future time is determined through utilizing recent speed and direction of vehicles. This method is simple and useful for estimating the time of the near future location. However, it is not appropriate when we estimate the time of the far future location. Therefore, in this paper, we propose a concept of planned route. It is used to estimate uncertain future time, which has to be located at a given point. If the route of an object is planned beforehand its locations are uncertainly distributed near that route. By a simple projection operation, the probability that a location lies in the planned route is increased. Moreover, we identify the future time of an object based on the speed for passing the route, which is offered via a website.

  • PDF

Web Service Workflows for Distributed Visual Media Retrieval Framework

  • Nah, Yun-Mook;Lee, Bog-Ju;Kim, Jung-Sun;Kwon, O-Byoung;Suh, Bo-Won;Ahn, Chul-Bum;Shin, Dong-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.6
    • /
    • pp.707-715
    • /
    • 2007
  • The need for content-based retrieval from visual media, such as image and video data, is ever increasing rapidly in many applications, such as electronic art museums, internet shopping malls, internet search engines, and medical information systems. In our previous research, we proposed an architecture, called the HERMES, which is a Web Service-enabled visual media retrieval framework. In this paper, we propose the Web Service workflows that are employed in the HERMES. We describe how we designed the workflows for service registration and query processing in the framework. We especially explain how metadata and ontology can be utilized to realize more intelligent content-based retrieval on visual media data.

  • PDF

A Practical Method of a Distributed Information Resources Based on a Mediator for the u-Learning Environment (유비쿼터스 학습(u-Learning)을 위한 미디에이터 기반의 분산정보 활용방법)

  • Joo, Kil-Hong
    • Journal of The Korean Association of Information Education
    • /
    • v.9 no.1
    • /
    • pp.79-86
    • /
    • 2005
  • With the rapid advance of computer and communication technology, the amount of data transferred is also increasing more than ever. The recent trend of education systems is connecting related information semantically in different systems in order to improve the utilization of computerized information Therefore, Web-based teaching-learning is developing in the ubiquitous learning direction that learners select and organize the contents, time and order of learning by themselves. That is, it is evolving to provide teaching-learning environment adaptive to individual learners' characteristics (their level of knowledge, pattern of study, areas of interest). This paper proposes the efficient evaluation method of learning contents in a mediator for the integration of heterogeneous information resources. This means that the autonomy of a remote server can be preserved to the highest degree. In addition, this paper proposes the adaptive optimization of learning contents such that available storage in a mediator can be highly utilized at any time. In order to differentiate the recent usage of a learning content from the past, the accumulated usage frequency of a learning content decays as time goes by.

  • PDF

Parallel Spatial Join Method Using Efficient Spatial Relation Partition In Distributed Spatial Database Systems (분산 공간 DBMS에서의 효율적인 공간 릴레이션 분할 기법을 이용한 병렬 공간 죠인 기법)

  • Ko, Ju-Il;Lee, Hwan-Jae;Bae, Hae-Young
    • Journal of Korea Spatial Information System Society
    • /
    • v.4 no.1 s.7
    • /
    • pp.39-46
    • /
    • 2002
  • In distributed spatial database systems, users nay issue a query that joins two relations stored at different sites. The sheer volume and complexity of spatial data bring out expensive CPU and I/O costs during the spatial join processing. This paper shows a new spatial join method which joins two spatial relation in a parallel way. Firstly, the initial join operation is divided into two distinct ones by partitioning one of two participating relations based on the region. This two join operations are assigned to each sites and executed simultaneously. Finally, each intermediate result sets from the two join operations are merged to an ultimate result set. This method reduces the number of spatial objects participating in the spatial operations. It also reduces the scope and the number of scanning spatial indices. And it does not materialize the temporary results by implementing the join algebra operators using the iterator. The performance test shows that this join method can lead to efficient use in terms of buffer and disk by narrowing down the joining region and decreasing the number of spatial objects.

  • PDF

An Efficient Data Centric Storage Scheme with Non-uniformed Density of Wireless Sensor Networks (센서의 불균일한 배포밀도를 고려한 효율적인 데이터 중심 저장기법)

  • Seong, dong-ook;Lee, seok-jae;Song, seok-il;Yoo, jae-soo
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.135-139
    • /
    • 2007
  • Recently Data Centric Storage (DCS) schemes are variously studied for several applications (e.g. natural environment investigation, military application systems and environmental changes monitoring). In DCS scheme, data is stored at nodes within the network by name. There are several drawbacks in the existing schemes. The first is the inefficiency of the range query processing on not considered the locality of store point. the second is the non-homogeneity of store load of each sensors in case of the sensor distribution density is non-uniformed. In this paper, we propose a novel data centric storage scheme with the sensor distribution density which satisfied with the locality of data store location. This scheme divides whole sensor network area using grid and distributes the density bit map witch consist of the sensor density information of each cell. sensors use the density bit map for storing and searching the data. We evaluate our scheme with existing schemes. As a result, we show improved load balancing and more efficient range query processing than existing schemes in environment which sensors are distributed non-uniform.

  • PDF

Design and Implementation of an Efficient Web Services Data Processing Using Hadoop-Based Big Data Processing Technique (하둡 기반 빅 데이터 기법을 이용한 웹 서비스 데이터 처리 설계 및 구현)

  • Kim, Hyun-Joo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.1
    • /
    • pp.726-734
    • /
    • 2015
  • Relational databases used by structuralizing data are the most widely used in data management at present. However, in relational databases, service becomes slower as the amount of data increases because of constraints in the reading and writing operations to save or query data. Furthermore, when a new task is added, the database grows and, consequently, requires additional infrastructure, such as parallel configuration of hardware, CPU, memory, and network, to support smooth operation. In this paper, in order to improve the web information services that are slowing down due to increase of data in the relational databases, we implemented a model to extract a large amount of data quickly and safely for users by processing Hadoop Distributed File System (HDFS) files after sending data to HDFSs and unifying and reconstructing the data. We implemented our model in a Web-based civil affairs system that stores image files, which is irregular data processing. Our proposed system's data processing was found to be 0.4 sec faster than that of a relational database system. Thus, we found that it is possible to support Web information services with a Hadoop-based big data processing technique in order to process a large amount of data, as in conventional relational databases. Furthermore, since Hadoop is open source, our model has the advantage of reducing software costs. The proposed system is expected to be used as a model for Web services that provide fast information processing for organizations that require efficient processing of big data because of the increase in the size of conventional relational databases.

The e-Business Component Construction based on Distributed Component Specification (분산 컴포넌트 명세를 통한 e-비즈니스 컴포넌트 구축)

  • Kim, Haeng-Gon;Choe, Ha-Jeong;Han, Eun-Ju
    • The KIPS Transactions:PartD
    • /
    • v.8D no.6
    • /
    • pp.705-714
    • /
    • 2001
  • The computing systems of today expanded business trade and distributed business process Internet. More and more systems are developed from components with exactly reusability, independency, and portability. Component based development is focused on advanced concepts rater than passive manipulation or source code in class library. The primary component construction in CBD. However, lead to an additional cost for reconstructing the new component with CBD model. It also difficult to serve component information with rapidly and exactly, which normalization model are not established, frequency user logging in Web caused overload. A lot of difficult issues and aspects of Component Based Development have to be investigated to develop good component-based products. There is no established normalization model which will guarantee a proper treatment of components. This paper elaborates on some of those aspects of web application to adapt user requirement with exactly and rapidly. Distributed components in this paper are used in the most tiny size on network and suggest the network-addressable interface based on business domain. We also discuss the internal and external specifications for grasping component internal and external relations of user requirements to be analyzed. The specifications are stored on Servlets after dividing the information between session and entity as an EJB (Enterprise JavaBeans) that are reusable unit size in business domain. The reusable units are used in business component through query to get business component. As a major contribution, we propose a systems model for registration, auto-arrange, search, test, and download component, which covers component reusability and component customization.

  • PDF

SSQUSAR : A Large-Scale Qualitative Spatial Reasoner Using Apache Spark SQL (SSQUSAR : Apache Spark SQL을 이용한 대용량 정성 공간 추론기)

  • Kim, Jonghoon;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.2
    • /
    • pp.103-116
    • /
    • 2017
  • In this paper, we present the design and implementation of a large-scale qualitative spatial reasoner, which can derive new qualitative spatial knowledge representing both topological and directional relationships between two arbitrary spatial objects in efficient way using Aparch Spark SQL. Apache Spark SQL is well known as a distributed parallel programming environment which provides both efficient join operations and query processing functions over a variety of data in Hadoop cluster computer systems. In our spatial reasoner, the overall reasoning process is divided into 6 jobs such as knowledge encoding, inverse reasoning, equal reasoning, transitive reasoning, relation refining, knowledge decoding, and then the execution order over the reasoning jobs is determined in consideration of both logical causal relationships and computational efficiency. The knowledge encoding job reduces the size of knowledge base to reason over by transforming the input knowledge of XML/RDF form into one of more precise form. Repeat of the transitive reasoning job and the relation refining job usually consumes most of computational time and storage for the overall reasoning process. In order to improve the jobs, our reasoner finds out the minimal disjunctive relations for qualitative spatial reasoning, and then, based upon them, it not only reduces the composition table to be used for the transitive reasoning job, but also optimizes the relation refining job. Through experiments using a large-scale benchmarking spatial knowledge base, the proposed reasoner showed high performance and scalability.