• Title/Summary/Keyword: Data Scalability Problem

Search Result 116, Processing Time 0.025 seconds

PPFP(Push and Pop Frequent Pattern Mining): A Novel Frequent Pattern Mining Method for Bigdata Frequent Pattern Mining (PPFP(Push and Pop Frequent Pattern Mining): 빅데이터 패턴 분석을 위한 새로운 빈발 패턴 마이닝 방법)

  • Lee, Jung-Hun;Min, Youn-A
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.12
    • /
    • pp.623-634
    • /
    • 2016
  • Most of existing frequent pattern mining methods address time efficiency and greatly rely on the primary memory. However, in the era of big data, the size of real-world databases to mined is exponentially increasing, and hence the primary memory is not sufficient enough to mine for frequent patterns from large real-world data sets. To solve this problem, there are some researches for frequent pattern mining method based on disk, but the processing time compared to the memory based methods took very time consuming. There are some researches to improve scalability of frequent pattern mining, but their processes are very time consuming compare to the memory based methods. In this paper, we present PPFP as a novel disk-based approach for mining frequent itemset from big data; and hence we reduced the main memory size bottleneck. PPFP algorithm is based on FP-growth method which is one of the most popular and efficient frequent pattern mining approaches. The mining with PPFP consists of two setps. (1) Constructing an IFP-tree: After construct FP-tree, we assign index number for each node in FP-tree with novel index numbering method, and then insert the indexed FP-tree (IFP-tree) into disk as IFP-table. (2) Mining frequent patterns with PPFP: Mine frequent patterns by expending patterns using stack based PUSH-POP method (PPFP method). Through this new approach, by using a very small amount of memory for recursive and time consuming operation in mining process, we improved the scalability and time efficiency of the frequent pattern mining. And the reported test results demonstrate them.

Sequential Pattern Mining Algorithms with Quantities (정량 정보를 포함한 순차 패턴 마이닝 알고리즘)

  • Kim, Chul-Yun;Lim, Jong-Hwa;Ng Raymond T.;Shim Kyu-Seok
    • Journal of KIISE:Databases
    • /
    • v.33 no.5
    • /
    • pp.453-462
    • /
    • 2006
  • Discovering sequential patterns is an important problem for many applications. Existing algorithms find sequential patterns in the sense that only items are included in the patterns. However, for many applications, such as business and scientific applications, quantitative attributes are often recorded in the data, which are ignored by existing algorithms but can provide useful insight to the users. In this paper, we consider the problem of mining sequential patterns with quantities. We demonstrate that naive extensions to existing algorithms for sequential patterns are inefficient, as they may enumerate the search space blindly. Thus, we propose hash filtering and quantity sampling techniques that significantly improve the performance of the naive extensions. Experimental results confirm that compared with the naive extensions, these schemes not only improve the execution time substantially but also show better scalability for sequential patterns with quantities.

Match Field based Algorithm Selection Approach in Hybrid SDN and PCE Based Optical Networks

  • Selvaraj, P.;Nagarajan, V.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.12
    • /
    • pp.5723-5743
    • /
    • 2018
  • The evolving internet-based services demand high-speed data transmission in conjunction with scalability. The next generation optical network has to exploit artificial intelligence and cognitive techniques to cope with the emerging requirements. This work proposes a novel way to solve the dynamic provisioning problem in optical network. The provisioning in optical network involves the computation of routes and the reservation of wavelenghs (Routing and Wavelength assignment-RWA). This is an extensively studied multi-objective optimization problem and its complexity is known to be NP-Complete. As the exact algorithms incurs more running time, the heuristic based approaches have been widely preferred to solve this problem. Recently the software-defined networking has impacted the way the optical pipes are configured and monitored. This work proposes the dynamic selection of path computation algorithms in response to the changing service requirements and network scenarios. A software-defined controller mechanism with a novel packet matching feature was proposed to dynamically match the traffic demands with the appropriate algorithm. A software-defined controller with Path Computation Element-PCE was created in the ONOS tool. A simulation study was performed with the case study of dynamic path establishment in ONOS-Open Network Operating System based software defined controller environment. A java based NOX controller was configured with a parent path computation element. The child path computation elements were configured with different path computation algorithms under the control of the parent path computation element. The use case of dynamic bulk path creation was considered. The algorithm selection method is compared with the existing single algorithm based method and the results are analyzed.

Effective Streaming of XML Data for Wireless Broadcasting (무선 방송을 위한 효과적인 XML 스트리밍)

  • Park, Jun-Pyo;Park, Chang-Sup;Chung, Yon-Dohn
    • Journal of KIISE:Databases
    • /
    • v.36 no.1
    • /
    • pp.50-62
    • /
    • 2009
  • In wireless and mobile environments, data broadcasting is recognized as an effective way for data dissemination due to its benefits to bandwidth efficiency, energy-efficiency, and scalability. In this paper, we address the problem of delayed query processing raised by tree-based index structures in wireless broadcast environments, which increases the access time of the mobile clients. We propose a novel distributed index structure and a clustering strategy for streaming XML data which enable energy and latency-efficient broadcast of XML data. We first define the DIX node structure to implement a fully distributed index structure which contains tag name, attributes, and text content of an element as well as its corresponding indices. By exploiting the index information in the DIX node stream, a mobile client can access the wireless stream in a shorter latency. We also suggest a method of clustering DIX nodes in the stream, which can further enhance the performance of query processing over the stream in the mobile clients. Through extensive performance experiments, we demonstrate that our approach is effective for wireless broadcasting of XML data and outperforms the previous methods.

A Multi-Agent framework for Distributed Collaborative Filtering (분산 환경에서의 협력적 여과를 위한 멀티 에이전트 프레임워크)

  • Ji, Ae-Ttie;Yeon, Cheol;Lee, Seung-Hun;Jo, Geun-Sik;Kim, Heung-Nam
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.3
    • /
    • pp.119-140
    • /
    • 2007
  • Recommender systems enable a user to decide which information is interesting and valuable in our world of information overload. As the recent studies of distributed computing environment have been progressing actively, recommender systems, most of which were centralized, have changed toward a peer-to-peer approach. Collaborative Filtering (CF), one of the most successful technologies in recommender systems, presents several limitations, namely sparsity, scalability, cold start, and the shilling problem, in spite of its popularity. The move from centralized systems to distributed approaches can partially improve the issues; distrust of recommendation and abuses of personal information. However, distributed systems can be vulnerable to attackers, who may inject biased profiles to force systems to adapt their objectives. In this paper, we consider both effective CF in P2P environment in order to improve overall performance of system and efficient solution of the problems related to abuses of personal data and attacks of malicious users. To deal with these issues, we propose a multi-agent framework for a distributed CF focusing on the trust relationships between individuals, i.e. web of trust. We employ an agent-based approach to improve the efficiency of distributed computing and propagate trust information among users with effect. The experimental evaluation shows that the proposed method brings significant improvement in terms of the distributed computing of similarity model building and the robustness of system against malicious attacks. Finally, we are planning to study trust propagation mechanisms by taking trust decay problem into consideration.

  • PDF

Item Recommendation Technique Using Spark (Spark를 이용한 항목 추천 기법에 관한 연구)

  • Yun, So-Young;Youn, Sung-Dae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.5
    • /
    • pp.715-721
    • /
    • 2018
  • With the spread of mobile devices, the users of social network services or e-commerce sites have increased dramatically, and the amount of data produced by the users has increased exponentially. E-commerce companies have faced a task regarding how to extract useful information from a vast amount of data produced by the users. To solve this problem, there are various studies applying big data processing technique. In this paper, we propose a collaborative filtering method that applies the tag weight in the Apache Spark platform. In order to elevate the accuracy of recommendation, the proposed method refines the tag data in the preprocessing process and categorizes the items and then applies the information of periods and tag weight to the estimate rating of the items. After generating RDD, we calculate item similarity and prediction values and recommend items to users. The experiment result indicated that the proposed method process large amounts of data quickly and improve the appropriateness of recommendation better.

A Scalable and Effective DDS Participant Discovery Mechanism (확장성과 효율성 고려한 DDS 참여자 디스커버리 기법)

  • Kwon, Ki-Jung;You, Yong-Duck;Choi, Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.7
    • /
    • pp.1344-1356
    • /
    • 2009
  • The DDS (Data Distribution Service) is a data-centric communication technology that provides an efficient communication service that supports a dynamic plug & play through an automatic setting of participants' location information for each data (Topic) by using DDS discovery technique. This paper proposes the hierarchical-structured DDS discovery technique (SPDP-TBF) suitable for the large-scale distributed systems by comparing and analyzing the existing DDS discovery techniques in terms of performance and problem areas. The proposed SPDP-TBF performs the periodic discovery of the involved participants only by having separate hierarchical managers which take charge of the registration and search (of participants) so that a participant sends its information to the related participants only, and it enhances the effectiveness of the message transfer. Moreover, the proposed SPDP-TBF provides the improved scalability by performing the hierarchical discovery through hierarchical manager nodes so that it can be applied to the large-scale distributed system.

Development of a Distributed Computing Framework far Implementing Multidisciplinary Design Optimization (다분야통합최적설계를 지원하는 분산환경 기반의 설계 프레임워크 개발)

  • Chu M. S.;Lee S. J.;Choi D.-H.
    • Korean Journal of Computational Design and Engineering
    • /
    • v.10 no.2
    • /
    • pp.143-150
    • /
    • 2005
  • A design framework to employ the multidisciplinary design optimization technologies on a computer system has been developed and is named as the Extensible Multidisciplinary Design Integration and Optimization System (EMDIOS). The framework can not only effectively solve complex system design problems but also conveniently handle MDO problems. Since the EMDIOS exploits both state-of-the-art of computing capabilities and sophisticated optimization techniques, it can overcome many scalability and complexity problems. It can make users who are not even familiar with the optimization technology use EMDIOS easily to solve their design problems. The client of EMDIOS provides a front end for engineers to communicate the EMDIOS engine and the server controls and manages various resources luck as scheduler, analysis codes, and user interfaces. EMDIOS client supports data monitoring, design problem definition, request for analyses and other user tasks. Three main components of the EMDIOS are the Engineering Design Object Model which is a basic idea to construct EMDIOS, EMDIOS Language (EMDIO-L) which is a script language representing design problems, and visual modeling tools which can help engineers define design problems using graphical user interface. Several example problems are solved and EMDIOS has shown various capabilities such as ease of use, process integration, and optimization monitoring.

Framework for End-to-End Optimal Traffic Control Law Based on Overlay Mesh

  • Liu, Chunyu;Xu, Ke
    • Journal of Communications and Networks
    • /
    • v.9 no.4
    • /
    • pp.428-437
    • /
    • 2007
  • Along with the development of network, more and more functions and services are required by users, while traditional network fails to support all of them. Although overlay is a good solution to some demands, using them in an efficient, scalable way is still a problem. This paper puts forward a framework on how to construct an efficient, scalable overlay mesh in real network. Main differences between other overlays and ours are that our overlay mesh processes some nice features including class-of-service(CoS) and traffic engineering(TE). It embeds the end-to-end optimal traffic control law which can distribute traffic in an optimal way. Then, an example is given for better understanding the framework. Particularly, besides good scalability, and failure recovery, it possesses other characteristics such as routing simplicity, self-organization, etc. In such an overlay mesh, an applicable source routing scheme called hierarchical source routing is used to transmit data packet based on UDP protocol. Finally, a guideline derived from a number of simulations is proposed on how to set various parameters in this overlay mesh, which makes the overlay more efficient.

Performance Enhancement of High-Speed TCP Protocols using Pacing (Pacing 적용을 통한 High-Speed TCP 프로토콜의 성능 개선 방안)

  • Choi Young Soo;Lee Gang Won;Cho You Ze;Han Tae Man
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.12B
    • /
    • pp.1052-1062
    • /
    • 2004
  • Recent studies have pointed out that existing high-speed TCP protocols have a severe unfairness and TCP friendliness problem. As the congestion window achieved by a high-speed TCP connection can be quite large, there is a strong possibility that the sender will transmit a large burst of packets. As such, the current congestion control mechanisms of high-speed TCP can lead to bursty traffic flows in hi인 speed networks, with a negative impact on both TCP friendliness and RTT unfairness. The proposed solution to these problems is to evenly space the data sent into the network over an entire round-trip time. Accordingly, the current paper evaluates this approach with a high bandwidth-delay product network and shows that pacing offers better TCP friendliness and fairness without degrading the bandwidth scalability.