• Title/Summary/Keyword: data set

Search Result 10,970, Processing Time 0.038 seconds

A Fuzzy Window Mechanism for Information Differentiation in Mining Data Streams (데이터 스트림 마이닝에서 정보 중요성 차별화를 위한 퍼지 윈도우 기법)

  • Chang, Joong-Hyuk
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.9
    • /
    • pp.4183-4191
    • /
    • 2011
  • Considering the characteristics of a data stream whose data elements are continuously generated and may change over time, there have been many techniques to differentiate the importance of data elements in a data stream by their generation time. The conventional techniques are efficient to get an analysis result focusing on the recent information in a data stream, but they have a limitation to differentiate the importance of information in various ways more flexible. An information differentiation technique based on the term of a fuzzy set can be an alternative way to compensate the limitation. A term of a fuzzy set has been widely used in various data mining fields, which can overcome the sharp boundary problem and give an analysis result reflecting the requirements in real world applications more. In this paper, a fuzzy window mechanism is proposed, which is adapting a term of a fuzzy set and is efficiently used to differentiate the importance of information in mining data streams. Basic concepts including fuzzy calendars are described first, and subsequently details on data stream mining of weighted patterns using a fuzzy window technique are described.

Trading rule extraction in stock market using the rough set approach

  • Kim, Kyoung-jae;Huh, Jin-nyoung;Ingoo Han
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 1999.10a
    • /
    • pp.337-346
    • /
    • 1999
  • In this paper, we propose the rough set approach to extract trading rules able to discriminate between bullish and bearish markets in stock market. The rough set approach is very valuable to extract trading rules. First, it does not make any assumption about the distribution of the data. Second, it not only handles noise well, but also eliminates irrelevant factors. In addition, the rough set approach appropriate for detecting stock market timing because this approach does not generate the signal for trade when the pattern of market is uncertain. The experimental results are encouraging and prove the usefulness of the rough set approach for stock market analysis with respect to profitability.

  • PDF

Nonparametric confidence intervals for quantiles based on a modified ranked set sampling

  • Morabbi, Hakime;Razmkhah, Mostafa;Ahmadi, Jafar
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.2
    • /
    • pp.119-129
    • /
    • 2016
  • A new sampling method is introduced based on the idea of a ranked set sampling scheme in which taken samples in each set are dependent on previous ones. Some theoretical results are presented and distribution-free confidence intervals are derived for the quantiles of any continuous population. It is shown numerically that the proposed sampling scheme may lead to 95% confidence intervals (especially for extreme quantiles) that cannot be found based on the ordinary ranked set sampling scheme presented by Chen (2000) and Balakrishnan and Li (2006). Optimality aspects of this scheme are investigated for both coverage probability and minimum expected length criteria. A real data set is also used to illustrate the proposed procedure. Conclusions are eventually stated.

Design and Implementation of the Data Broadcasting System using Data Piping (데이터 파이핑을 이용한 데이터 방송 시스템의 설계 및 구현)

  • Kim, Kyoung-Ill;Mah, Pyeong-Soo;Lee, Kyu-Chul
    • The KIPS Transactions:PartD
    • /
    • v.10D no.2
    • /
    • pp.301-308
    • /
    • 2003
  • In this paper, we propose a prototype system of digital data broadcasting system based on the ATSC data broadcasting standard. This prototype system uses data piping as a mechanism for delivery of arbitrary user-defined data inserted directly into the payload part of the MPEG-2 Transport Stream packets. This data type includes URL or HTML content. After the contents are inserted into the MPEG-2 Transport Stream, they can be delivered through the broadcasting to the DTV set-top receiver. The 75 packets received in real-time during the TV broadcast are used to start display or switch content. This prototype system describes how to achieve common design goals and integrating digital TV and web pages based on the ATSC data broadcasting standard. The prototype system can be used to display digital data contents - HTML, images-on existing TV or digital TV set-tops.

Incremental Multi-classification by Least Squares Support Vector Machine

  • Oh, Kwang-Sik;Shim, Joo-Yong;Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.965-974
    • /
    • 2003
  • In this paper we propose an incremental classification of multi-class data set by LS-SVM. By encoding the output variable in the training data set appropriately, we obtain a new specific output vectors for the training data sets. Then, online LS-SVM is applied on each newly encoded output vectors. Proposed method will enable the computation cost to be reduced and the training to be performed incrementally. With the incremental formulation of an inverse matrix, the current information and new input data are used for building another new inverse matrix for the estimation of the optimal bias and lagrange multipliers. Computational difficulties of large scale matrix inversion can be avoided. Performance of proposed method are shown via numerical studies and compared with artificial neural network.

  • PDF

A Study on Partial Pattern Estimation for Sequential Agglomerative Hierarchical Nested Model (SAHN 모델의 부분적 패턴 추정 방법에 대한 연구)

  • Jang, Kyung-Won;Ahn, Tae-Chon
    • Proceedings of the KIEE Conference
    • /
    • 2005.10b
    • /
    • pp.143-145
    • /
    • 2005
  • In this paper, an empirical study result on pattern estimation method is devoted to reveal underlying data patterns with a relatively reduced computational cost. Presented method performs crisp type clustering with given n number of data samples by means of the sequential agglomerative hierarchical nested model (SAHN). Conventional SAHN based clustering requires large computation time in the initial step of algorithm. To deal with this concern, we modified overall process with a partial approach. In the beginning of this method, we divide given data set to several sub groups with uniform sampling and then each divided sub data group is applied to SAHN based method. The advantage of this method reduces computation time of original process and gives similar results. Proposed is applied to several test data set and simulation result with conceptual analysis is presented.

  • PDF

Rule Generation using Rough set and Hierarchical Structure (러프집합과 계층적 구조를 이용한 규칙생성)

  • Kim, Ju-Young;Lee, Chul-Heui
    • Proceedings of the KIEE Conference
    • /
    • 2002.11c
    • /
    • pp.521-524
    • /
    • 2002
  • This paper deals with the rule generation from data for control system and data mining using rough set. If the cores and reducts are searched for without consideration of the frequency of data belonging to the same equivalent class, the unnecessary attributes may not be discarded, and the resultant rules don't represent well the characteristics of the data. To improve this, we handle the inconsistent data with a probability measure defined by support, As a result the effect of uncertainty in knowledge reduction can be reduced to some extent. Also we construct the rule base in a hierarchical structure by applying core as the classification criteria at each level. If more than one core exist, the coverage degree is used to select an appropriate one among then to increase the classification rate. The proposed method gives more proper and effective rule base in compatibility and size. For some data mining example the simulations are performed to show the effectiveness of the proposed method.

  • PDF

Iowa Liquor Sales Data Predictive Analysis Using Spark

  • Ankita Paul;Shuvadeep Kundu;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.31 no.2
    • /
    • pp.185-196
    • /
    • 2021
  • The paper aims to analyze and predict sales of liquor in the state of Iowa by applying machine learning algorithms to models built for prediction. We have taken recourse of Azure ML and Spark ML for our predictive analysis, which is legacy machine learning (ML) systems and Big Data ML, respectively. We have worked on the Iowa liquor sales dataset comprising of records from 2012 to 2019 in 24 columns and approximately 1.8 million rows. We have concluded by comparing the models with different algorithms applied and their accuracy in predicting the sales using both Azure ML and Spark ML. We find that the Linear Regression model has the highest precision and Decision Forest Regression has the fastest computing time with the sample data set using the legacy Azure ML systems. Decision Tree Regression model in Spark ML has the highest accuracy with the quickest computing time for the entire data set using the Big Data Spark systems.

Riding a Bike Not Owned by Me in Bad Air: Big Data Analysis on Bike Sharing

  • Taekyung Kim
    • Asia pacific journal of information systems
    • /
    • v.29 no.3
    • /
    • pp.414-427
    • /
    • 2019
  • The sharing economy has significantly changed the way of living for years. The emergence and expansion of sharing economy empowered by the mobile information technologies and intellectual algorithms reconfigure how people use transportation means. In this paper, the bike sharing phenomenon is highlighted. Combining a big data set provided by the Seoul government about user logs and air quality data set, the empirical findings reveal that temperature change is tightly associated bike sharing activities. Also, the concentration of particulate matter is weakly related to bike sharing, but the trend should be carefully examined. By considering external environmental factors to bike sharing businesses, this work is differentiated. To further understand empirical data, data mining methods and econometric approaches were adopted.

Cache memory system for high performance CPU with 4GHz (4Ghz 고성능 CPU 위한 캐시 메모리 시스템)

  • Jung, Bo-Sung;Lee, Jung-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.2
    • /
    • pp.1-8
    • /
    • 2013
  • TIn this paper, we propose a high performance L1 cache structure on the high clock CPU of 4GHz. The proposed cache memory consists of three parts, i.e., a direct-mapped cache to support fast access time, a two-way set associative buffer to exploit temporal locality, and a buffer-select table. The most recently accessed data is stored in the direct-mapped cache. If a data has a high probability of a repeated reference, when the data is replaced from the direct-mapped cache, the data is selectively stored into the two-way set associative buffer. For the high performance and low power consumption, we propose an one way among two ways set associative buffer is selectively accessed based on the buffer-select table(BST). According to simulation results, Energy $^*$ Delay product can improve about 45%, 70% and 75% compared with a direct mapped cache, a four-way set associative cache, and a victim cache with two times more space respectively.