• Title/Summary/Keyword: huge data

Search Result 1,411, Processing Time 0.024 seconds

A Real-Time Stock Market Prediction Using Knowledge Accumulation (지식 누적을 이용한 실시간 주식시장 예측)

  • Kim, Jin-Hwa;Hong, Kwang-Hun;Min, Jin-Young
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.109-130
    • /
    • 2011
  • One of the major problems in the area of data mining is the size of the data, as most data set has huge volume these days. Streams of data are normally accumulated into data storages or databases. Transactions in internet, mobile devices and ubiquitous environment produce streams of data continuously. Some data set are just buried un-used inside huge data storage due to its huge size. Some data set is quickly lost as soon as it is created as it is not saved due to many reasons. How to use this large size data and to use data on stream efficiently are challenging questions in the study of data mining. Stream data is a data set that is accumulated to the data storage from a data source continuously. The size of this data set, in many cases, becomes increasingly large over time. To mine information from this massive data, it takes too many resources such as storage, money and time. These unique characteristics of the stream data make it difficult and expensive to store all the stream data sets accumulated over time. Otherwise, if one uses only recent or partial of data to mine information or pattern, there can be losses of valuable information, which can be useful. To avoid these problems, this study suggests a method efficiently accumulates information or patterns in the form of rule set over time. A rule set is mined from a data set in stream and this rule set is accumulated into a master rule set storage, which is also a model for real-time decision making. One of the main advantages of this method is that it takes much smaller storage space compared to the traditional method, which saves the whole data set. Another advantage of using this method is that the accumulated rule set is used as a prediction model. Prompt response to the request from users is possible anytime as the rule set is ready anytime to be used to make decisions. This makes real-time decision making possible, which is the greatest advantage of this method. Based on theories of ensemble approaches, combination of many different models can produce better prediction model in performance. The consolidated rule set actually covers all the data set while the traditional sampling approach only covers part of the whole data set. This study uses a stock market data that has a heterogeneous data set as the characteristic of data varies over time. The indexes in stock market data can fluctuate in different situations whenever there is an event influencing the stock market index. Therefore the variance of the values in each variable is large compared to that of the homogeneous data set. Prediction with heterogeneous data set is naturally much more difficult, compared to that of homogeneous data set as it is more difficult to predict in unpredictable situation. This study tests two general mining approaches and compare prediction performances of these two suggested methods with the method we suggest in this study. The first approach is inducing a rule set from the recent data set to predict new data set. The seocnd one is inducing a rule set from all the data which have been accumulated from the beginning every time one has to predict new data set. We found neither of these two is as good as the method of accumulated rule set in its performance. Furthermore, the study shows experiments with different prediction models. The first approach is building a prediction model only with more important rule sets and the second approach is the method using all the rule sets by assigning weights on the rules based on their performance. The second approach shows better performance compared to the first one. The experiments also show that the suggested method in this study can be an efficient approach for mining information and pattern with stream data. This method has a limitation of bounding its application to stock market data. More dynamic real-time steam data set is desirable for the application of this method. There is also another problem in this study. When the number of rules is increasing over time, it has to manage special rules such as redundant rules or conflicting rules efficiently.

Detailed-information Browsing Technology based on Level of Detail for 3D Cultural Asset Data (3D 문화재 데이터의 LOD 기반 상세정보 브라우징 기술)

  • Jung, Jung-Il;Cho, Jin-Soo;WhangBo, Tae-Keun
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.10
    • /
    • pp.110-121
    • /
    • 2009
  • In this paper, we propose the new method that offer detailed-information through relax the system memory limitation about 3D model to user. That method based on making LOD(Level of Detail) model from huge 3D data of structure cultural assets. In our method as transformed AOSP algorithm, first of all it create the hierarchical structure space about 3D data, and create the LOD model by surface simplification. Then it extract the ROI(Region of Interest) of user in simplified LOD model, and then do rendering by original model and same surface detailed-information after process the local detailed in extracted region. To evaluate the proposed method, we have some experiment by using the precise 3D scan data of structure cultural assets. Our method can offer the detailed-information same as exist method, and moreover 45% reduced consumption of memory experimentally by forming mesh structure same as ROI of simplified LOD model. So we can check the huge structure cultural assets particularly in general computer environment.

A Comparison of Performance between STMP/MST and Existing Spatio-Temporal Moving Pattern Mining Methods (STMP/MST와 기존의 시공간 이동 패턴 탐사 기법들과의 성능 비교)

  • Lee, Yon-Sik;Kim, Eun-A
    • Journal of Internet Computing and Services
    • /
    • v.10 no.5
    • /
    • pp.49-63
    • /
    • 2009
  • The performance of spatio-temporal moving pattern mining depends on how to analyze and process the huge set of spatio-temporal data due to the nature of it. The several method was presented in order to solve the problems in which existing spatio-temporal moving pattern mining methods[1-10] have, such as increasing execution time and required memory size during the pattern mining, but they did not solve properly yet. Thus, we proposed the STMP/MST method[11] as a preceding research in order to extract effectively sequential and/or periodical frequent occurrence moving patterns from the huge set of spatio-temporal moving data. The proposed method reduces patterns mining execution time, using the moving sequence tree based on hash tree. And also, to minimize the required memory space, it generalizes detailed historical data including spatio-temporal attributes into the real world scopes of space and time by using spatio-temporal concept hierarchy. In this paper, in order to verify the effectiveness of the STMP/MST method, we compared and analyzed performance with existing spatio-temporal moving pattern mining methods based on the quantity of mining data and minimum support factor.

  • PDF

Method to Limit The Spread of Data in Wireless Content-Centric Network (무선 Content-Centric Network에서 Data 확산 제한 방법)

  • Park, Chan-Min;Kim, Byung-Seo
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.1
    • /
    • pp.9-14
    • /
    • 2016
  • Since Devices such labtop, tablet, smartphone have been developed, a lots of huge data that can be classified as content is flooded in the network. According to changing Internet usage, Content-Centric Network(CCN) what is new concept of Internet Architecture is appeared. Initially, CCN is studied on wired network. but recently, CCN is also studied on wireless network. Since a characteristic of wireless environment is different from a characteristic of wired environment, There are issues in wireless CCN. In this paper, we discuss improvement method of Data spread issue on wireless CCN. The proposed scheme of this paper use MAC Address of nodes when Interest and Data Packet are forwarded. As using the proposed scheme, we reduce the spread of Data and offer priority of forwarding to nodes of shortest path, reduce delay by modifying retransmission waiting time.

Questionnaire Survey and Analysis Using Data Mining (데이터마이닝을 이용한 설문조사 및 분석)

  • 박만희;채화성;신완선
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.25 no.5
    • /
    • pp.46-52
    • /
    • 2002
  • Today's database system needs to collect huge amount of questionnaire that results from development of the information technology by the internet, so it has to be administrable. However, there are many difficulties concerned with finding analytic data or useful information in the high capacity-database. Data mining can solve these problems and utilize the database. Questionnaire analysis that uses data mining has drawn relevant patterns that did not look or was tended to overlook before. These patterns can be applied by a new business rule. The purpose of this research is to analyze the questionnaire results and to present the result that can help to make decision easily with data mining. Recognition and analysis about these techniques of data mining show suitable type of questionnaire survey. This research focus on the form of present composition and the model of suitable questionnaire to analyze the type of it. Also, the comparison between the actual questionnaire result and the conventional statistical analysis is examined.

A Knowledge Discovery Framework for Spatiotemporal Data Mining

  • Lee, Jun-Wook;Lee, Yong-Joon
    • Journal of Information Processing Systems
    • /
    • v.2 no.2
    • /
    • pp.124-129
    • /
    • 2006
  • With the explosive increase in the generation and utilization of spatiotemporal data sets, many research efforts have been focused on the efficient handling of the large volume of spatiotemporal sets. With the remarkable growth of ubiquitous computing technology, mining from the huge volume of spatiotemporal data sets is regarded as a core technology which can provide real world applications with intelligence. In this paper, we propose a 3-tier knowledge discovery framework for spatiotemporal data mining. This framework provides a foundation model not only to define the problem of spatiotemporal knowledge discovery but also to represent new knowledge and its relationships. Using the proposed knowledge discovery framework, we can easily formalize spatiotemporal data mining problems. The representation model is very useful in modeling the basic elements and the relationships between the objects in spatiotemporal data sets, information and knowledge.

Databases and tools for constructing signal transduction networks in cancer

  • Nam, Seungyoon
    • BMB Reports
    • /
    • v.50 no.1
    • /
    • pp.12-19
    • /
    • 2017
  • Traditionally, biologists have devoted their careers to studying individual biological entities of their own interest, partly due to lack of available data regarding that entity. Large, high-throughput data, too complex for conventional processing methods (i.e., "big data"), has accumulated in cancer biology, which is freely available in public data repositories. Such challenges urge biologists to inspect their biological entities of interest using novel approaches, firstly including repository data retrieval. Essentially, these revolutionary changes demand new interpretations of huge datasets at a systems-level, by so called "systems biology". One of the representative applications of systems biology is to generate a biological network from high-throughput big data, providing a global map of molecular events associated with specific phenotype changes. In this review, we introduce the repositories of cancer big data and cutting-edge systems biology tools for network generation, and improved identification of therapeutic targets.

A Study on the Comparison Between Full-3D and Quasi-1D Supercompact Multiwavelets (Full-3D와 Quasi-1D Supercompact Multiwavelets의 비교 연구)

  • Park, June-Pyo;Lee, Do-Hyung;Kwon, Do-Hoon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.28 no.12
    • /
    • pp.1608-1615
    • /
    • 2004
  • CFD data compression methods based on Full-3D and Quasi-1D supercompact multiwavelets are presented. Supercompact wavelets method provide advantageous benefit that it allows higher order accurate representation with compact support. Therefore it avoids unnecessary interaction with remotely located data across singularities such as shock. Full-3D wavelets entails appropriate cross-derivative scaling function & wavelets, hence it can allow highly accurate multi-spatial data representation. Quasi-1D method adopt 1D multiresolution by alternating the directions rather than solving huge transformation matrix in Full-3D method. Hence efficient and relatively handy data processing can be conducted. Several numerical tests show swift data processing as well as high data compression ratio for CFD simulation data.

Study on Query Type and Data Structure for Mobile Meteorological Services

  • Choi, Jin-Oh
    • Journal of information and communication convergence engineering
    • /
    • v.9 no.4
    • /
    • pp.457-460
    • /
    • 2011
  • For the mobile meteorological services, sensed data should be gathered at a server from various clients like as Ubiquitous Sensor Network, mobile phone or public traffic vehicle by wireless network. The gathered data at server have huge volume and increase continuously. Therefore, a special query method and data structure should be considered. This paper studies on all possible query type on the data and processing steps for the mobile meteorological services. Some query spaces will be discussed. After that, this paper proposes effective data structure for the sensed data to support the query types.

A Study on Satisfaction Survey Based on Regression Analysis to Improve Curriculum for Big Data Education (빅데이터 양성 교육 교과과정 개선을 위한 회귀분석 기반의 만족도 조사에 관한 연구)

  • Choi, Hyun
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.22 no.6
    • /
    • pp.749-756
    • /
    • 2019
  • Big data is structured and unstructured data that is so difficult to collect, store, and so on due to the huge amount of data. Many institutions, including universities, are building student convergence systems to foster talents for data science and AI convergence, but there is an absolute lack of research on what kind of education is needed and what kind of education is required for students. Therefore, in this paper, after conducting the correlation analysis based on the questionnaire on basic surveys and courses to improve the curriculum by grasping the satisfaction and demands of the participants in the "2019 Big Data Youth Talent Training Course" held at K University, Regression analysis was performed. As a result of the study, the higher the satisfaction level, the satisfaction with class or job connection, and the self-development, the more positive the evaluation of program efficiency.