• Title/Summary/Keyword: Data stream mining

Search Result 97, Processing Time 0.03 seconds

Finding the time sensitive frequent itemsets based on data mining technique in data streams (데이터 스트림에서 데이터 마이닝 기법 기반의 시간을 고려한 상대적인 빈발항목 탐색)

  • Park, Tae-Su;Chun, Seok-Ju;Lee, Ju-Hong;Kang, Yun-Hee;Choi, Bum-Ghi
    • Journal of The Korean Association of Information Education
    • /
    • v.9 no.3
    • /
    • pp.453-462
    • /
    • 2005
  • Recently, due to technical improvements of storage devices and networks, the amount of data increase rapidly. In addition, it is required to find the knowledge embedded in a data stream as fast as possible. Huge data in a data stream are created continuously and changed fast. Various algorithms for finding frequent itemsets in a data stream are actively proposed. Current researches do not offer appropriate method to find frequent itemsets in which flow of time is reflected but provide only frequent items using total aggregation values. In this paper we proposes a novel algorithm for finding the relative frequent itemsets according to the time in a data stream. We also propose the method to save frequent items and sub-frequent items in order to take limited memory into account and the method to update time variant frequent items. The performance of the proposed method is analyzed through a series of experiments. The proposed method can search both frequent itemsets and relative frequent itemsets only using the action patterns of the students at each time slot. Thus, our method can enhance the effectiveness of learning and make the best plan for individual learning.

  • PDF

Efficient Generation of a Feature Profile in a Set of Similar Video Data (유사 비디오 데이터 집합에서 효율적인 특성정보 프로파일 생성 기법)

  • Park Dong Cheol;Chang Joong-Hyuk;Lee Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.12D no.2 s.98
    • /
    • pp.219-232
    • /
    • 2005
  • With the rapid progress of computer technology in recent years, a digital video data has been used in many applications. As a result, various technologies have been introduced to discover knowledge embedded in a video database. However, few researches on data mining for a video database have been performed due to the unclear boundary of the information in a large amount of a video stream. This paper proposes an efficient generation method of a feature profile in a set of similar video data. To extract the information embedded in original video data efficiently, several refinement techniques are also presented. These methods include merging pixels, restricting preferred areas, removing noises under a minimum repeat factor, extracting important key frames, generating derived blocks and applying weights to dynamic and static areas differently. Finally, the performance of these methods is evaluated by comparing a result profile obtained by a data mining process with original video streams.

Geochemical Study on Pollution of Heavy Metals in Soils, Plants and Streams in the Vicinity of Abandoned Metal Mines -Dalseong and Kyeongsan Mines- (금속폐광산주변의 토양, 식물 및 하천의 중금속오염에 대한 지화학적 연구 -달성 및 경산광산-)

  • Lee, Jae Yeong;Lee, In Ho;Lee, Sun Yeong
    • Economic and Environmental Geology
    • /
    • v.29 no.5
    • /
    • pp.597-613
    • /
    • 1996
  • The tonnage of copper and tungsten produced at Dalseong mine by Taehan Tungsten Mining Company from 1961 to 1971 was 48,704 tons (M/T) of 4 wt.% Cu and 1,620 tons (S/T) of 70wt.% WO, but the mine was closed in 1974. Kyeongsan mine is a small abandoned cobalt mine with no data of production. To investigate the pollution level of the mine areas, soils, plants (Ohwi and Pampanini), stream waters and stream sediments were taken and Fe, Mn, Cu, Pb, Zn, Ni, Co, Cd and Cr were analysed by ICP. Soils are considerably contaminated by the heavy metals related to ore deposits, The heavy metal contents in plants vary with the species and parts of plants. Stream waters are anomalously high in heavy metals in the vicinity of the mines but the contents decrease downstream in the process of dilution and precipiation. However, heavy metal contents increase very high in stream sediments due to precipiation. To protect environmental damages caused by acid mine drainages wetlands must be constructed outside pits, and it is necessary to fill pits with waters, limestone chips and organic materials, which give reducing and alkaline condition to ores. Under the condition pyrite is protected from oxidation and aqueous iron sulphates precipitate to form stable secondary pyrite.

  • PDF

A Technique for Detecting Companion Groups from Trajectory Data Streams (궤적 데이터 스트림에서 동반 그룹 탐색 기법)

  • Kang, Suhyun;Lee, Ki Yong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.12
    • /
    • pp.473-482
    • /
    • 2019
  • There have already been studies analyzing the trajectories of objects from data streams of moving objects. Among those studies, there are also studies to discover groups of objects that move together, called companion groups. Most studies to discover companion groups use existing clustering techniques to find groups of objects close to each other. However, these clustering-based methods are often difficult to find the right companion groups because the number of clusters is unpredictable in advance or the shape or size of clusters is hard to control. In this study, we propose a new method that discovers companion groups based on the distance specified by the user. The proposed method does not apply the existing clustering techniques but periodically determines the groups of objects close to each other, by using a technique that efficiently finds the groups of objects that exist within the user-specified distance. Furthermore, unlike the existing methods that return only companion groups and their trajectories, the proposed method also returns their appearance and disappearance time. Through various experiments, we show that the proposed method can detect companion groups correctly and very efficiently.

A Method of Realtime Mining for Summarization and Discovery of a Casual Relationship based on Multidimensional Stream Data (다차원 스트림 데이터 요약 및 인과 관계 탐사를 위한 실시간 데이터 마이닝 기법)

  • Song, Myung-Jin;Kim, Dae-In;Hwang, Bu-Hyun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.152-155
    • /
    • 2010
  • 실시간 데이터 마이닝 기법은 다양한 종류의 센서에서 수집된 다차원 스트림 데이터들 사이에 존재하는 의미있는 정보를 탐사할 수 있다. 전통적인 데이터베이스 시스템에서의 마이닝 기법은 정적인 데이터베이스에 기초하므로 실시간으로 수집되는 스트림 데이터는 시간 속성을 갖는 인터벌 이벤트로 요약되어야 한다. 이 논문은 다차원 스트림 데이터 환경에서 스트림 데이터를 요약하고 이들 사이에 존재하는 인과 관계를 탐사하는 실시간 데이터 마이닝 기법을 제안한다. 제안 기법은 센서에서 수집되는 데이터의 대부분이 객체의 정상적인 상태 데이터임을 고려하여 의미있는 이상 이벤트를 선별하여 전송한다. 그리고 스트림 데이터의 연속성을 고려하며 스트림 데이터를 세 가지 상태의 이벤트로 요약하고 인과 관계 규칙을 탐사한다. 인과 관계 규칙은 시간에 따라 이벤트 발생에 영향력을 미치는 원인 이벤트를 발견함으로써 이벤트의 발생을 미리 예측할 수 있다.

  • PDF

Digital TV personalization system based on Data Stream Mining (지능적 사용자 맞춤형 DTV 방송 서비스 시스템)

  • Shin, Se Jung;Lee, Won Suk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2009.11a
    • /
    • pp.901-902
    • /
    • 2009
  • 최근 지상파 TV 방송의 디지털 전환 프로젝트가 본격적으로 진행되고 있다. 디지털 방송 서비스는 다매체, 다채널을 통한 방송 프로그램의 증가와 양방향 TV 방송 서비스로 인해 사용자에게 다양한 방송 프로그램의 선택과 개인별 맞춤형 시청 기회를 제공함으로써 새로운 방송 서비스 환경을 필요로 한다. 본 논문에서는 사용자의 시청 상황을 포함한 시청 패턴을 분석함으로써 시청 패턴 프로파일 및 시청 선호도 연관규칙 생성 기법을 통한 지능적 사용자 맞춤형 DTV 방송 서비스 시스템을 제안한다. 또한, 임베디드 시스템 기반의 사용자 인터페이스를 구현하여 사용자에게 적절한 추천 프로그램을 제공하고, 시청 프로그램 정보에 따른 시청 상황을 자동으로 제어하는 기능을 포함한다.

Data Streams classification using Local Concept-adapted IOLIN System (지역적 컨셉트 적응형 IOLIN시스템을 사용한 데이터 스트림의 분류)

  • Kim, Jae-Woo;Song, Jae-Won;Lee, Ju-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.1
    • /
    • pp.37-44
    • /
    • 2008
  • Data stream has the tendency to change in Patterns over time. Also known as concept drift, such problem can reduce the predictive performance of a classification model CVFDT and IOLIN tried to solve the problem of a concept drift through incremental classification model updates. The local changes in patterns. however was revealed to be unable to resolve the problems of local concept drift that occurs by influencing on total classification results. In this paper, we propose adapted IOLIN system that improves system's predictive performance by detecting the local concept drift. The experimental result shows that adaptive IOLIN, the Proposed method, is about 2.8% in accuracy better than IOLIN and about 11.2% in accuracy better than CVFDT.

  • PDF

Securing a Cyber Physical System in Nuclear Power Plants Using Least Square Approximation and Computational Geometric Approach

  • Gawand, Hemangi Laxman;Bhattacharjee, A.K.;Roy, Kallol
    • Nuclear Engineering and Technology
    • /
    • v.49 no.3
    • /
    • pp.484-494
    • /
    • 2017
  • In industrial plants such as nuclear power plants, system operations are performed by embedded controllers orchestrated by Supervisory Control and Data Acquisition (SCADA) software. A targeted attack (also termed a control aware attack) on the controller/SCADA software can lead a control system to operate in an unsafe mode or sometimes to complete shutdown of the plant. Such malware attacks can result in tremendous cost to the organization for recovery, cleanup, and maintenance activity. SCADA systems in operational mode generate huge log files. These files are useful in analysis of the plant behavior and diagnostics during an ongoing attack. However, they are bulky and difficult for manual inspection. Data mining techniques such as least squares approximation and computational methods can be used in the analysis of logs and to take proactive actions when required. This paper explores methodologies and algorithms so as to develop an effective monitoring scheme against control aware cyber attacks. It also explains soft computation techniques such as the computational geometric method and least squares approximation that can be effective in monitor design. This paper provides insights into diagnostic monitoring of its effectiveness by attack simulations on a four-tank model and using computation techniques to diagnose it. Cyber security of instrumentation and control systems used in nuclear power plants is of paramount importance and hence could be a possible target of such applications.

Predicting Movie Revenue by Online Review Mining: Using the Opening Week Online Review (영화 흥행성과 예측을 위한 온라인 리뷰 마이닝 연구: 개봉 첫 주 온라인 리뷰를 활용하여)

  • Cho, Seung Yeon;Kim, Hyun-Koo;Kim, Beomsoo;Kim, Hee-Woong
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.113-134
    • /
    • 2014
  • Since a movie is an experience goods, purchase can be decided upon preliminary information and evaluation. There are ongoing researches on what impact online reviews might have on movie revenues. Whereas research in the past was focused on the effect of online reviews. The influence of online reviews appears to be significant in products like a movie because it is difficult to evaluate the feature prior to "consuming" the product. Since an online review is regarded to be objective, consumers find it more trustworthy. Contrary to prior research focused on movie review ratings and volume, we focus moves on movie features related specific reviews. This research proposes a predictive model for movie revenue generation. We decided 15 criteria to classify movie features collected from online reviews through the online review mining and made up feature keyword list each criterion. In addition, we performed data preprocessing and dimensional reduction for data mining through factor analysis. We suggest the movie revenue predictive model is tested using discriminant analysis. Following the discriminant analysis, we found that online review factors can be used to predict movie popularity and revenue stream. We also expect using this predictive model, marketers and strategic decision makers can allocate their resources in more parsimonious fashion.

A Study on the Development and Implementation of a Data-mining Based Prototype for Hospital Bill Claim Reduction System (데이터마이닝 기법을 활용한 의료보험 진료비청구 삭감분석시스템 개발 및 구현에 관한 연구)

  • Yoo, Sang-Jin;Park, Mun-Ro
    • Information Systems Review
    • /
    • v.7 no.1
    • /
    • pp.275-295
    • /
    • 2005
  • Changes in business environment caused by globalization of the world economy and the beginning of the knowledge society forced hospitals to equip with tools for the enhanced competitiveness. In other words, hospitals must aim three targets such as acquisition of advanced medical skills and equipments, improvement of service level for patients, and achievement of superior managerial performance simultaneously. This study has been done to suggest a way to reduce the possibility of hospital bill claim reduction as an alternative for the achievement of superior managerial performance. If the reduction rate of hospital bill claim is high, it will put negative impact on the hospital's revenue stream and hospital's reliability. Thus, if they want to stay competitive, hospitals need to device ways to cut the reduction rate as much as possible. In this study, a prototype system has been developed and implemented to check the possibility to cut the reduction rate through deep analysis of causes of reduction. The prototype first developed utilizing data mining techniques and the relation rules algorithm. Then the prototype was tested its performance using the D hospital's live data.