• Title/Summary/Keyword: Large data

Search Result 14,088, Processing Time 0.036 seconds

Review on statistical methods for large spatial Gaussian data

  • Park, Jincheol
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.495-504
    • /
    • 2015
  • The Gaussian geostatistical model has been widely used for modeling spatial data. However, this model suffers from a severe difficulty in computation because inference requires to invert a large covariance matrix in evaluating log-likelihood. In addressing this computational challenge, three strategies have been employed: likelihood approximation, lower dimensional space approximation, and Markov random field approximation. In this paper, we reviewed statistical approaches attacking the computational challenge. As an illustration, we also applied integrated nested Laplace approximation (INLA) technology, one of Markov approximation approach, to real data to provide an example of its use in practice dealing with large spatial data.

An Efficient Visualization Technique of Large-Scale Nodes Structure with Linked Information

  • Mun Su-Youl;Ha Seok-Wun
    • Journal of information and communication convergence engineering
    • /
    • v.3 no.1
    • /
    • pp.49-55
    • /
    • 2005
  • This study is to suggest a visualization technique to display the relations of associated data in an optimal way when trying to display the whole data on a limited space by dealing with a large amount of data with linked information. For example, if you track an IP address through several steps and display the data on a screen, or if you visualize the human gene information on a 3-dimensional space, then it becomes even easier to understand the data flow in such cases. In order to simulate the technique given in this study, the given algorithm was applied to a large number of nodes made in a random fashion to optimize the data and we visually observed the result. According to the result, the technique given in this study is more efficient than any previous method in terms of visualization and utilizing space and allows to more easily understand the whole structure of a node because it consists of sub-groups.

Flow Efficiency in Multi-Louvered Fins Having Large Louver-to-Fin Pitch Ratio

  • Kim, Nae-Hyun;Cho, Jin-Pyo;Kim, Do-Young;Kim, Hyun-Jin
    • International Journal of Air-Conditioning and Refrigeration
    • /
    • v.15 no.4
    • /
    • pp.156-162
    • /
    • 2007
  • Flow visualization experiments were conducted for two louver arrays having large louver pitch ratio ($L_p/F_p=1.0$ and 1.4). Flow efficiencies and critical Reynolds numbers were obtained from the data, and were compared with existing correlations. The correlations failed to predict the present flow efficiency data adequately; some correlation overpredicted the data, while others underpredicted the data. Large louver pitch ratio of the present model, which is outside of the applicable range of the correlations may partly be responsible. The critical Reynolds numbers obtained from the present flow visualization data were in close agreement with those obtained from the heat transfer tests on actual flat tube heat exchangers. Existing correlations on the critical Reynolds number generally overpredicted the present data.

Parameter Tuning in Support Vector Regression for Large Scale Problems (대용량 자료에 대한 서포트 벡터 회귀에서 모수조절)

  • Ryu, Jee-Youl;Kwak, Minjung;Yoon, Min
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.1
    • /
    • pp.15-21
    • /
    • 2015
  • In support vector machine, the values of parameters included in kernels affect strongly generalization ability. It is often difficult to determine appropriate values of those parameters in advance. It has been observed through our studies that the burden for deciding the values of those parameters in support vector regression can be reduced by utilizing ensemble learning. However, the straightforward application of the method to large scale problems is too time consuming. In this paper, we propose a method in which the original data set is decomposed into a certain number of sub data set in order to reduce the burden for parameter tuning in support vector regression with large scale data sets and imbalanced data set, particularly.

A Cell-based Clustering Method for Large High-dimensional Data in Data Mining (데이타마이닝에서 고차원 대용량 데이타를 위한 셀-기반 클러스터 링 방법)

  • Jin, Du-Seok;Chang, Jae-Woo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.558-567
    • /
    • 2001
  • Recently, data mining applications require a large amount of high-dimensional data Most algorithms for data mining applications however, do not work efficiently of high-dimensional large data because of the so-called curse of dimensionality[1] and the limitation of available memory. To overcome these problems, this paper proposes a new cell-based clustering which is more efficient than the existing algorithms for high-dimensional large data, Our clustering method provides a cell construction algorithm for dealing with high-dimensional large data and a index structure based of filtering .We do performance comparison of our cell-based clustering method with the CLIQUE method in terms of clustering time, precision, and retrieval time. Finally, the results from our experiment show that our cell-based clustering method outperform the CLIQUE method.

  • PDF

An XPDL-Based Workflow Control-Structure and Data-Sequence Analyzer

  • Kim, Kwanghoon Pio
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1702-1721
    • /
    • 2019
  • A workflow process (or business process) management system helps to define, execute, monitor and manage workflow models deployed on a workflow-supported enterprise, and the system is compartmentalized into a modeling subsystem and an enacting subsystem, in general. The modeling subsystem's functionality is to discover and analyze workflow models via a theoretical modeling methodology like ICN, to graphically define them via a graphical representation notation like BPMN, and to systematically deploy those graphically defined models onto the enacting subsystem by transforming into their textual models represented by a standardized workflow process definition language like XPDL. Before deploying those defined workflow models, it is very important to inspect its syntactical correctness as well as its structural properness to minimize the loss of effectiveness and the depreciation of efficiency in managing the corresponding workflow models. In this paper, we are particularly interested in verifying very large-scale and massively parallel workflow models, and so we need a sophisticated analyzer to automatically analyze those specialized and complex styles of workflow models. One of the sophisticated analyzers devised in this paper is able to analyze not only the structural complexity but also the data-sequence complexity, especially. The structural complexity is based upon combinational usages of those control-structure constructs such as subprocesses, exclusive-OR, parallel-AND and iterative-LOOP primitives with preserving matched pairing and proper nesting properties, whereas the data-sequence complexity is based upon combinational usages of those relevant data repositories such as data definition sequences and data use sequences. Through the devised and implemented analyzer in this paper, we are able eventually to achieve the systematic verifications of the syntactical correctness as well as the effective validation of the structural properness on those complicate and large-scale styles of workflow models. As an experimental study, we apply the implemented analyzer to an exemplary large-scale and massively parallel workflow process model, the Large Bank Transaction Workflow Process Model, and show the structural complexity analysis results via a series of operational screens captured from the implemented analyzer.

An Experimental Study on the Thermal Performance Measurement of Large Diameter Borehole Heat Exchanger(LD-BHE) for Tripe-U Pipes Spacer Apply (3중관용 스페이서를 적용한 대구경 지중열교환기의 성능측정에 관한 연구)

  • Lee, Sang-Hoon;Park, Jong-Woo;Lim, Kyoung-Bin
    • 한국신재생에너지학회:학술대회논문집
    • /
    • 2009.11a
    • /
    • pp.581-586
    • /
    • 2009
  • Knowledge of ground thermal properties is most important for the proper design of large scale BHE(borehole heat exchanger) systems. The type, pipe size and thermal performance of the BHE is highly dependent on the ground source heatpump system-efficiency and instruction cost. Thermal response tests with mobile measurement devices were developed primarily for insitu determination of design data for large diameter BHE for triple-U spacer apply. The main purpose has been to determine insitu values of effective ground thermal conductivity and thermal resistance, including the effect of ground-water flow and natural convection in the boreholes. The test rig is set up on a some trailer, and contains a circulation pump, a inline heater, temperature sensors, flow meter, power analysis meter and a data logger for recording the temperature, fluid flow data. A constant heat power is injected into the borehole through the tripl-U pipes system of test rig and the resulting temperature change in the borehole is recorded. The recorded temperature data are analysed with a line-source model, which gives the effective insitu values of rock thermal conductivity and borehole thermal resistance of large diameter BHE for spacer apply.

  • PDF

Efficient Continuous Skyline Query Processing Scheme over Large Dynamic Data Sets

  • Li, He;Yoo, Jaesoo
    • ETRI Journal
    • /
    • v.38 no.6
    • /
    • pp.1197-1206
    • /
    • 2016
  • Performing continuous skyline queries of dynamic data sets is now more challenging as the sizes of data sets increase and as they become more volatile due to the increase in dynamic updates. Although previous work proposed support for such queries, their efficiency was restricted to small data sets or uniformly distributed data sets. In a production database with many concurrent queries, the execution of continuous skyline queries impacts query performance due to update requirements to acquire exclusive locks, possibly blocking other query threads. Thus, the computational costs increase. In order to minimize computational requirements, we propose a method based on a multi-layer grid structure. First, relational data object, elements of an initial data set, are processed to obtain the corresponding multi-layer grid structure and the skyline influence regions over the data. Then, the dynamic data are processed only when they are identified within the skyline influence regions. Therefore, a large amount of computation can be pruned by adopting the proposed multi-layer grid structure. Using a variety of datasets, the performance evaluation confirms the efficiency of the proposed method.

The uniform laws of large numbers for the chaotic logistic map

  • Bae, Jongsig;Hwang, Changha;Jun, Doobae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1565-1571
    • /
    • 2017
  • The standard logistic map is an iterative function, which forms a discrete-time dynamic system. The chaotic logistic map is a kind of ergodic map defined over the unit interval. In this paper we study the limiting behaviors on the several processes induced by the chaotic logistic map. We derive the law of large numbers for the process induced by the chaotic logistic map. We also derive the uniform law of large numbers for this process. When deriving the uniform law of large numbers, we study the role of bracketing of the indexed class of functions associated with the process. Then we apply the idea of DeHardt (1971) associated with the bracketing method to the process induced by the logistic map. We finally illustrate an application to Monte Carlo integration.

Large Sample Test for Independence in the Bivariate Pareto Model with Censored Data

  • Cho, Jang-Sik;Lee, Jea-Man;Lee, Woo-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.2
    • /
    • pp.377-383
    • /
    • 2003
  • In this paper, we consider two components system in which the lifetimes follow the bivariate Pareto model with random censored data. We assume that the censoring time is independent of the lifetimes of the two components. We develop large sample tests for testing independence between two components. Also we present simulated study which is the test based on asymptotic normal distribution in testing independence.

  • PDF