• Title/Summary/Keyword: Streaming Data Mining

Search Result 13, Processing Time 0.021 seconds

Streaming Decision Tree for Continuity Data with Changed Pattern (패턴의 변화를 가지는 연속성 데이터를 위한 스트리밍 의사결정나무)

  • Yoon, Tae-Bok;Sim, Hak-Joon;Lee, Jee-Hyong;Choi, Young-Mee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.1
    • /
    • pp.94-100
    • /
    • 2010
  • Data Mining is mainly used for pattern extracting and information discovery from collected data. However previous methods is difficult to reflect changing patterns with time. In this paper, we introduce Streaming Decision Tree(SDT) analyzing data with continuity, large scale, and changed patterns. SDT defines continuity data as blocks and extracts rules using a Decision Tree's learning method. The extracted rules are combined considering time of occurrence, frequency, and contradiction. In experiment, we applied time series data and confirmed resonable result.

Mining Frequent Itemsets with Normalized Weight in Continuous Data Streams

  • Kim, Young-Hee;Kim, Won-Young;Kim, Ung-Mo
    • Journal of Information Processing Systems
    • /
    • v.6 no.1
    • /
    • pp.79-90
    • /
    • 2010
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. The continuous characteristic of streaming data necessitates the use of algorithms that require only one scan over the stream for knowledge discovery. Data mining over data streams should support the flexible trade-off between processing time and mining accuracy. In many application areas, mining frequent itemsets has been suggested to find important frequent itemsets by considering the weight of itemsets. In this paper, we present an efficient algorithm WSFI (Weighted Support Frequent Itemsets)-Mine with normalized weight over data streams. Moreover, we propose a novel tree structure, called the Weighted Support FP-Tree (WSFP-Tree), that stores compressed crucial information about frequent itemsets. Empirical results show that our algorithm outperforms comparative algorithms under the windowed streaming model.

Mining of Frequent Structures over Streaming XML Data (스트리밍 XML 데이터의 빈발 구조 마이닝)

  • Hwang, Jeong-Hee
    • The KIPS Transactions:PartD
    • /
    • v.15D no.1
    • /
    • pp.23-30
    • /
    • 2008
  • The basic research of context aware in ubiquitous environment is an internet technique and XML. The XML data of continuous stream type are popular in network application through the internet. And also there are researches related to query processing for streaming XML data. As a basic research to efficiently query, we propose not only a labeled ordered tree model representing the XML but also a mining method to extract frequent structures from streaming XML data. That is, XML data to continuously be input are modeled by a stream tree which is called by XFP_tree and we exactly extract the frequent structures from the XFP_tree of current window to mine recent data. The proposed method can be applied to the basis of the query processing and index method for XML stream data.

Research of Knowledge Management and Reusability in Streaming Big Data with Privacy Policy through Actionable Analytics (스트리밍 빅데이터의 프라이버시 보호 동반 실용적 분석을 통한 지식 활용과 재사용 연구)

  • Paik, Juryon;Lee, Youngsook
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.12 no.3
    • /
    • pp.1-9
    • /
    • 2016
  • The current meaning of "Big Data" refers to all the techniques for value eduction and actionable analytics as well management tools. Particularly, with the advances of wireless sensor networks, they yield diverse patterns of digital records. The records are mostly semi-structured and unstructured data which are usually beyond of capabilities of the management tools. Such data are rapidly growing due to their complex data structures. The complex type effectively supports data exchangeability and heterogeneity and that is the main reason their volumes are getting bigger in the sensor networks. However, there are many errors and problems in applications because the managing solutions for the complex data model are rarely presented in current big data environments. To solve such problems and show our differentiation, we aim to provide the solution of actionable analytics and semantic reusability in the sensor web based streaming big data with new data structure, and to empower the competitiveness.

The Impact of Comments on Music Download and Streaming: A Text Mining Analysis (댓글이 음원 판매량에 미치는 차별적 영향에 관한 텍스트마이닝 분석)

  • Park, Myeong-Seok;Kwon, Young-Jin;Lee, Sang-Yong Tom
    • Knowledge Management Research
    • /
    • v.19 no.2
    • /
    • pp.91-108
    • /
    • 2018
  • This study mainly focused on measuring the impact of comments for a particular song on the number of streamings and downloads. We modeled multiple regression equations to perform this analysis. We chose digital music market for the object of analysis because of its inherent characteristics, such as experience goods, high bandwagon effect, and so on. We carefully utilized text mining technique in accordance with the algorithm of Naïve Bayes classifier to distinguish whether a comment for a piece of music be regarded as positive or negative. In addition, we used 'size of agency' and 'existence of hit song' as moderating variables. The reason for usage of those variables is that those are assumed to affect users' decision for selecting particular song especially when downloading or streaming via music sites. We found empirical evidences that positive comments for a particular song increase the number of both downloads and streamings. However, positive comments may decrease the number of downloads when the size of agency of the artist is big. As a result, we were able to say that a positive comment for a particular song functioned as 'word-of-mouth' effect, inducing other users' behavioral response. We also found that other features of an artist such as size of the agency that the artist belongs to functioned as an external factor along with feature of the song itself.

Novel Push-Front Fibonacci Windows Model for Finding Emerging Patterns with Better Completeness and Accuracy

  • Akhriza, Tubagus Mohammad;Ma, Yinghua;Li, Jianhua
    • ETRI Journal
    • /
    • v.40 no.1
    • /
    • pp.111-121
    • /
    • 2018
  • To find the emerging patterns (EPs) in streaming transaction data, the streaming is first divided into some time windows containing a number of transactions. Itemsets are generated from transactions in each window, and then the emergence of itemsets is evaluated between two windows. In the tilted-time windows model (TTWM), it is assumed that people need support data with finer accuracy from the most recent windows, while accepting coarser accuracy from older windows. Therefore, a limited array's elements are used to maintain all support data in a way that condenses old windows by merging them inside one element. The capacity of elements that accommodates the windows inside is modeled using a particular number sequence. However, in a stream, as new data arrives, the current array updating mechanisms lead to many null elements in the array and cause data incompleteness and inaccuracy problems. Two models derived from TTWM, logarithmic TTWM and Fibonacci windows model, also inherit the same problems. This article proposes a novel push-front Fibonacci windows model as a solution, and experiments are conducted to demonstrate its superiority in finding more EPs compared to other models.

Improving Video Quality by Diversification of Adaptive Streaming Strategies

  • Biernacki, Arkadiusz
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.1
    • /
    • pp.374-395
    • /
    • 2017
  • Users quite often experience volatile channel conditions which negatively influence multimedia transmission. HTTP adaptive streaming has emerged as a new promising technology where the video quality can be adjusted to variable network conditions. Nevertheless, the new technology does not remain without drawbacks. As it has been observed, multiple video players sharing the same network link have often problems with achieving good efficiency and stability of play-out due to a mutual interference and competition among video players. Our investigation indicates that there may be another cause for under-performance of the streamed video. In an emulated environment, we implemented three algorithms of adaptive video play-out based on bandwidth or buffer assessment. As we show, traffic generated by players employing the same or similar play-out strategies is positively correlated and synchronised (clustered), whereas traffic originated from different play-out strategies shows negative or no correlations. However, when some of the parameters of the play-out strategies are randomised, the correlation and synchronisation diminish what has a positive impact on the smoothness of the traffic and on the video quality perceived by end users. Our research shows that non-correlated traffic flows generated by play-out strategies improve efficiency and stability of streamed adaptive video.

The Study of Comparing Korean Consumers' Attitudes Toward Spotify and MelOn: Using Semantic Network Analysis

  • Namjae Cho;Bao Chen Liu;Giseob Yu
    • Journal of Information Technology Applications and Management
    • /
    • v.30 no.5
    • /
    • pp.1-19
    • /
    • 2023
  • This study examines Korean users' attitudes and emotions toward Melon and Spotify, which lead the music streaming market. We used Text Mining, Semantic Network Analysis, TF-IDF, Centrality, CONCOR, and Word2Vec analysis. As a result of the study, MelOn was used in a user's daily life. Based on Melon's advantages of providing various contents, the advantage is judged to have considerable competitiveness beyond the limits of the streaming app. However, the MelOn users had negative emotions such as anger, repulsion, and pressure. On the contrary, in the case of Spotify, users were highly interested in the music content. In particular, interest in foreign music was high, and users were also interested in stock investment. In addition, positive emotions such as interest and pleasure were higher than MelOn users, which could be interpreted as providing attractive services to Korean users. While previous studies have mainly focused on technical or personal factors, this study focuses on consumer reactions (online reviews) according to corporate strategies, and this point is the differentiation from others.

An Efficient Method for Mining Frequent Patterns based on Weighted Support over Data Streams (데이터 스트림에서 가중치 지지도 기반 빈발 패턴 추출 방법)

  • Kim, Young-Hee;Kim, Won-Young;Kim, Ung-Mo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.8
    • /
    • pp.1998-2004
    • /
    • 2009
  • Recently, due to technical developments of various storage devices and networks, the amount of data increases rapidly. The large volume of data streams poses unique space and time constraints on the data mining process. The continuous characteristic of streaming data necessitates the use of algorithms that require only one scan over the stream for knowledge discovery. Most of the researches based on the support are concerned with the frequent itemsets, but ignore the infrequent itemsets even if it is crucial. In this paper, we propose an efficient method WSFI-Mine(Weighted Support Frequent Itemsets Mine) to mine all frequent itemsets by one scan from the data stream. This method can discover the closed frequent itemsets using DCT(Data Stream Closed Pattern Tree). We compare the performance of our algorithm with DSM-FI and THUI-Mine, under different minimum supports. As results show that WSFI-Mine not only run significant faster, but also consume less memory.

A Study on the DC Resistivity Method to Image the Underground Structure Beneath River or Lake Bottom (하저 지반특성 규명을 위한 수상 전기비저항 탐사에 관한 연구)

  • Kim Jung-Ho;Yi Myeong-Jong;Song Yoonho;Choi Seong-Jun;Lee Seoung Kon;Son Jeong-Sul;Chung Seung-Hwan
    • Geophysics and Geophysical Exploration
    • /
    • v.5 no.4
    • /
    • pp.223-235
    • /
    • 2002
  • Since weak Bones or geological lineaments are likely to be eroded, there may develop weak Bones beneath rivers, and a careful evaluation of ground condition is important to construct structures passing through a river. DC resistivity method, however, has seldomly applied to the investigation of water-covered area, possibly because of difficulties in data aquisition and interpretation. The data aquisition having high quality may be the most important factor, and is more difficult than that in land survey, due to the water layer overlying the underground structure to be imaged. Through the numerical modeling and the analysis of a case history, we studied the method of resistivity survey at the water-covered area, starting from the characteristics of measured data, via data acquisition method, to the interpretation method. We unfolded our discussion according to the installed locations of electrodes, ie., floating them on the water surface, and installing them at the water bottom, because the methods of data acquisition and interpretation vary depending on the electrode location. Through this study, we could confirm that the DC resistivity method can provide fairly reasonable subsurface images. It was also shown that installing electrodes at the water bottom can give the subsurface image with much higher resolution than floating them on the water surface. Since the data acquired at the water-covered area have much lower sensitivity to the underground structure than those at the land, and can be contaminated by the higher noise, such as streaming potential, it would be very important to select the acquisition method and electrode array being able to provide the higher signal-to-noise ratio (S/N ratio) data as well as the high resolving power. Some of the modified electrode arrays can provide the data having reasonably high S/N ratio and need not to install remote electrode(s), and thus, they may be suitable to the resistivity survey at the water-covered area.