• Title/Summary/Keyword: 슬라이딩 윈도우 기법

Search Result 79, Processing Time 0.026 seconds

Sharing Multiple Continuous MJoins for Window Queries over Data Streams (데이터 스트림 윈도우 질의를 위한 다중의 연속 MJoin 연산자 공유 처리)

  • Lee, Hun-Joo;Park, Seog
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06c
    • /
    • pp.43-48
    • /
    • 2007
  • 데이터 스트림 관리 시스템에서 조인 연산자는 질의가 내포하는 여러 연산자들 가운데 상대적인 계산비용이 높은 연산자로, 센서 네트워크와 같이 한정적 정보들이 개별적으로 입력되는 환경에서는 필연적으로 요구된다. 데이터 스트림은 잠재적으로 무한한 크기를 가지므로 조인 연산자는 슬라이딩 윈도우 제약사항을 가져야 하며, 종합적인 결과를 얻기 위해 조인 연산자가 여러 입력을 취할 수 있어야 한다. 이를 가능하게 하는 것이 바로 슬라이딩 윈도우를 가지는 MJoin 연산자이다. 본 논문에서는 이러한 여러 MJoin 연산자가 시스템에 등록되어 있는 환경을 가정하고, 슬라이딩 윈도우 제약사항과 MJoin의 특성을 반영하여 전역적으로 공유된 질의 실행 계획 수립 및 처리에 관한 문제를 다룬다. 이러한 다중 MJoin에 대한 전역 공유 질의 실행 계획 수립 문제가 NP-Hard임을 증명하고, 근사화 접근 방법을 제안한다. 또한 전역적으로 공유된 질의 실행 계획을 올바르게 수행할 수 있는 처리 기법을 제안한다. 이러한 연구의 노력은 데이터 스트림 환경에서 효율적인 다중 질의 최적화 및 처리기법의 기초 연구로 활용될 수 있다.

  • PDF

Incremental Frequent Pattern Detection Scheme Based on Sliding Windows in Graph Streams (그래프 스트림에서 슬라이딩 윈도우 기반의 점진적 빈발 패턴 검출 기법)

  • Jeong, Jaeyun;Seo, Indeok;Song, Heesub;Park, Jaeyeol;Kim, Minyeong;Choi, Dojin;Bok, Kyoungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.2
    • /
    • pp.147-157
    • /
    • 2018
  • Recently, with the advancement of network technologies, and the activation of IoT and social network services, many graph stream data have been generated. As the relationship between objects in the graph streams changes dynamically, studies have been conducting to detect or analyze the change of the graph. In this paper, we propose a scheme to incrementally detect frequent patterns by using frequent patterns information detected in previous sliding windows. The proposed scheme calculates values that represent whether the frequent patterns detected in previous sliding windows will be frequent in how many future silding windows. By using the values, the proposed scheme reduces the overall amount of computation by performing only necessary calculations in the next sliding window. In addition, only the patterns that are connected between the patterns are recognized as one pattern, so that only the more significant patterns are detected. We conduct various performance evaluations in order to show the superiority of the proposed scheme. The proposed scheme is faster than existing similar scheme when the number of duplicated data is large.

Processing Sliding Window Multi-Joins using a Graph-Based Method over Data Streams (데이터 스트림에서 그래프 기반 기법을 이용한 슬라이딩 윈도우 다중 조인 처리)

  • Zhang, Liang;Ge, Jun-Wei;Kim, Gyoung-Bae;Lee, Soon-Jo;Bae, Hae-Young;You, Byeong-Seob
    • Journal of Korea Spatial Information System Society
    • /
    • v.9 no.2
    • /
    • pp.25-34
    • /
    • 2007
  • Existing approaches that select an order for the join of three or more data streams have always used the simple heuristics. For their disadvantage - only one factor is considered and that is join selectivity or arrival rate, these methods lead to poor performance and inefficiency In some applications. The graph-based sliding window multi -join algorithm with optimal join sequence is proposed in this paper. In this method, sliding window join graph is set up primarily, in which a vertex represents a join operator and an edge indicates the join relationship among sliding windows, also the vertex weight and the edge weight represent the cost of join and the reciprocity of join operators respectively. Then the optimal join order can be found in the graph by using improved MVP algorithm. The final result can be produced by executing the join plan with the nested loop join procedure, The advantages of our algorithm are proved by the performance comparison with existing join algorithms.

  • PDF

A Study on Sliding Window based Machine Learning for Web Shell Detection (슬라이딩윈도우 기반 머신러닝을 활용한 웹쉘탐지 방안 연구)

  • Kim, Kihwan;Lee, DongGeun;Yi, Hyoung;Shin, Yongtae
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.121-122
    • /
    • 2019
  • 본 논문에서는 웹쉘을 탐지하기 위한 방법 중 하나로 슬라이딩윈도우 기반 머신러닝을 활용하는 방안을 제안하고자 한다. 웹 공격에 많이 활용되는 웹쉘의 탐지를 위하여 제안하는 슬라이딩윈도우 기반의 탐지 기법은 시간이 지남에 따라 발전해가는 웹쉘 탐지 우회 기술에 대응하여 보다 정확한 탐지를 제공하는 기술이며, 이를 기반으로 웹쉘의 다양한 변종 또한 탐지할 수 있다. 본제안의 경우 코드의 부분별 위험도를 측정 및 제공하여 보다 효과적으로 대응할 수 있을 것으로 전망된다.

  • PDF

Performance Analysis of Siding Window based Stream High Utility Pattern Mining Methods (슬라이딩 윈도우 기반의 스트림 하이 유틸리티 패턴 마이닝 기법 성능분석)

  • Ryang, Heungmo;Yun, Unil
    • Journal of Internet Computing and Services
    • /
    • v.17 no.6
    • /
    • pp.53-59
    • /
    • 2016
  • Recently, huge stream data have been generated in real time from various applications such as wireless sensor networks, Internet of Things services, and social network services. For this reason, to develop an efficient method have become one of significant issues in order to discover useful information from such data by processing and analyzing them and employing the information for better decision making. Since stream data are generated continuously and rapidly, there is a need to deal with them through the minimum access. In addition, an appropriate method is required to analyze stream data in resource limited environments where fast processing with low power consumption is necessary. To address this issue, the sliding window model has been proposed and researched. Meanwhile, one of data mining techniques for finding meaningful information from huge data, pattern mining extracts such information in pattern forms. Frequency-based traditional pattern mining can process only binary databases and treats items in the databases with the same importance. As a result, frequent pattern mining has a disadvantage that cannot reflect characteristics of real databases although it has played an essential role in the data mining field. From this aspect, high utility pattern mining has suggested for discovering more meaningful information from non-binary databases with the consideration of the characteristics and relative importance of items. General high utility pattern mining methods for static databases, however, are not suitable for handling stream data. To address this issue, sliding window based high utility pattern mining has been proposed for finding significant information from stream data in resource limited environments by considering their characteristics and processing them efficiently. In this paper, we conduct various experiments with datasets for performance evaluation of sliding window based high utility pattern mining algorithms and analyze experimental results, through which we study their characteristics and direction of improvement.

Comparison between k-means and k-medoids Algorithms for a Group-Feature based Sliding Window Clustering (그룹특징기반 슬라이딩 윈도우 클러스터링에서의 k-means와 k-medoids 비교 평가)

  • Yang, Ju-Yon;Shim, Junho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.3
    • /
    • pp.225-237
    • /
    • 2018
  • The demand for processing large data streams is growing rapidly as the generation and processing of large volumes of data become more popular. A variety of large data processing technologies are being developed to suit the increasing demand. One of the technologies that researchers have particularly observed is the data stream clustering with sliding windows. Data stream clustering with sliding windows may create a new set of clusters whenever the window moves. Previous data stream clustering techniques with sliding windows exploit the coresets, also known as group features that summarize the data. In this paper, we present some reformable elements of a group-feature based algorithm, and propose our algorithm that modified the clustering algorithm of the original one. We conduct a performance comparison between two algorithms by using different parameter values. Finally, we provide some guideline for the selective use of those algorithms with regard to the parameter values and their impacts on the performance.

MMJoin: An Optimization Technique for Multiple Continuous MJoins over Data Streams (데이타 스트림 상에서 다중 연속 복수 조인 질의 처리 최적화 기법)

  • Byun, Chang-Woo;Lee, Hun-Zu;Park, Seog
    • Journal of KIISE:Databases
    • /
    • v.35 no.1
    • /
    • pp.1-16
    • /
    • 2008
  • Join queries having heavy cost are necessary to Data Stream Management System in Sensor Network where plural short information is generated. It is reasonable that each join operator has a sliding-window constraint for preventing DISK I/O because the data stream represents the infinite size of data. In addition, the join operator should be able to take multiple inputs for overall results. It is possible for the MJoin operator with sliding-windows to do so. In this paper, we consider the data stream environment where multiple MJoin operators are registered and propose MMJoin which deals with issues of building and processing a globally shared query considering characteristics of the MJoin operator with sliding-windows. First, we propose a solution of building the global shared query execution plan. Second, we solved the problems of updating a window size and routing for a join result. Our study can be utilized as a fundamental research for an optimization technique for multiple continuous joins in the data stream environment.

A Sliding Window-based Multivariate Stream Data Classification (슬라이딩 윈도우 기반 다변량 스트림 데이타 분류 기법)

  • Seo, Sung-Bo;Kang, Jae-Woo;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.33 no.2
    • /
    • pp.163-174
    • /
    • 2006
  • In distributed wireless sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. We propose a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes input as a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a standard text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Bayesian classifier and SVM, and for unsupervised, we tested Jaccard, TFIDF Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

A Text Categorization Method Improved by Removing Noisy Training Documents (오류 학습 문서 제거를 통한 문서 범주화 기법의 성능 향상)

  • Han, Hyoung-Dong;Ko, Young-Joong;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.9
    • /
    • pp.912-919
    • /
    • 2005
  • When we apply binary classification to multi-class classification for text categorization, we use the One-Against-All method generally, However, this One-Against-All method has a problem. That is, documents of a negative set are not labeled by human. Thus, they can include many noisy documents in the training data. In this paper, we propose that the Sliding Window technique and the EM algorithm are applied to binary text classification for solving this problem. We here improve binary text classification through extracting noise documents from the training data by the Sliding Window technique and re-assigning categories of these documents using the EM algorithm.

Effective Subsequence Matching Supporting Time Warping in Sequence Databases (시퀸스 데이터베이스를 위한 타임 워핑을 지원하는 효과적인 서브시퀸스 매칭)

  • 박상현;김상옥;조준서
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.181-183
    • /
    • 2001
  • 본 논문에서는 대용량 시퀸스 데이터베이스에서 타임 워핑을 지원하는 인텍스 기반 서브시퀸스 매칭에 관하여 논의한다. 타임 워핑은 시퀸스의 길이가 서로 다른 경우에도 유사한 패턴을 갖는 시퀸스들을 찾을 수 있도록 해 준다. 최근의 연구에서 타임 워핑을 지원하는 효과적인 전체 매칭 기법이 제안된 바 있다. 본 연구에서는 이 기존의 연구에 슬라이딩 윈도우 개념을 결합하는 새로운 기법을 제안한다. 인덱싱을 위하여, 각 슬라이딩 윈도우와 대응되는 서브시퀸스로부터 특징 벡터를 추출하고, 이 특징 벡터를 인덱싱 애트리뷰트로 사용하는 다차원 인덱스를 구성한다. 질의 처리를 위하여, 조건을 만족하는 질의 접두어들에 대한 특징 벡터들을 이용하여 인덱스 검색을 수행한다. 제안된 기법은 대용량의 데이터베이스에서도 효과적인 서브시퀸스 매칭을 지원한다. 본 연구에서는 제안된 기법이 착오 기각을 유발시키지 않음을 증명하고, 실험을 통하여 제안된 기법의 우수성을 규명한다.

  • PDF