• Title/Summary/Keyword: methods of data analysis

Search Result 19,515, Processing Time 0.046 seconds

A Comparison of the Methods for Estimating the Missing Precipitation Values Ungauged (미계측 결측 강수자료 보완 방법의 비교)

  • Yoo, Ju-Hwan;Choi, Yong-Joon;Jung, Kwan-Sue
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2009.05a
    • /
    • pp.1427-1430
    • /
    • 2009
  • The amount and the continuity of the precipitation data used in a hydrological analysis may exert a big influence on the reliability of the analysis. It is a fundamental process to estimate the missing data caused by such as a breakdown of the rainfall recording machine or to expand a short period of rainfall data. In this study the eight methods widely used as methods for estimating are compared. The data used in this research is the annual precipitation amount during 17 years at the Cheolwon station including an ungauged period of 15 years and its five surrounding stations. By use of this certified method the ungauged precipitation values at the Cheolweon station is estimated and the areal average of annual precipitation for 32 years at the Han River basin is calculated.

  • PDF

Detecting Influential Observations in Multivariate Statistical Analysis of Incomplete Data by PCA (주성분분석에 의한 결손 자료의 영향값 검출에 대한 연구)

  • 김현정;문승호;신재경
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.383-392
    • /
    • 2000
  • Since late 1970, methods of influence or sensitivity analysis for detecting influential observations have been studied not only in regression and related methods but also in various multivariate methods. If results of multivariate analyses sometimes depend heavily on a small number of observations, we should be very careful to draw a conclusion. Similar phenomena may also occur in the case of incomplete data. In this research we try to study such influential observations in multivariate statistical analysis of incomplete data. Case of principal component analysis is studied with a numerical example.

  • PDF

A Visualization System for Multiple Heterogeneous Network Security Data and Fusion Analysis

  • Zhang, Sheng;Shi, Ronghua;Zhao, Jue
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2801-2816
    • /
    • 2016
  • Owing to their low scalability, weak support on big data, insufficient data collaborative analysis and inadequate situational awareness, the traditional methods fail to meet the needs of the security data analysis. This paper proposes visualization methods to fuse the multi-source security data and grasp the network situation. Firstly, data sources are classified at their collection positions, with the objects of security data taken from three different layers. Secondly, the Heatmap is adopted to show host status; the Treemap is used to visualize Netflow logs; and the radial Node-link diagram is employed to express IPS logs. Finally, the Labeled Treemap is invented to make a fusion at data-level and the Time-series features are extracted to fuse data at feature-level. The comparative analyses with the prize-winning works prove this method enjoying substantial advantages for network analysts to facilitate data feature fusion, better understand network security situation with a unified, convenient and accurate mode.

Mineral Resources Potential Mapping using GIS-based Data Integration

  • Lee Hong-Jin;Chi Kwang-Hoon;Park Maeng-Eon
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.662-663
    • /
    • 2004
  • In general, mineral resources prospect is performed in several methods including geological survey, geological structure analysis, geochemical exploration, airborne geophysical exploration and remote sensing, but data collected through these methods are usually not integrated for analysis but used separately. Therefore we compared various data integration techniques and generated final mineral resources potentiality map.

  • PDF

Comparison of Five Single Imputation Methods in General Missing Pattern

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.945-955
    • /
    • 2004
  • 'Complete-case analysis' is easy to carry out and it may be fine with small amount of missing data. However, this method is not recommended in general because the estimates are usually biased and not efficient. There are numerous alternatives to complete-case analysis. One alternative is the single imputation. Some of the most common single imputation methods are reviewed and the performances are compared by simulation studies.

  • PDF

An Investigation of a Sensibility Evaluation Method Using Big Data in the Field of Design -Focusing on Hanbok Related Design Factors, Sensibility Responses, and Evaluation Terms- (디자인 분야에서 빅데이터를 활용한 감성평가방법 모색 -한복 연관 디자인 요소, 감성적 반응, 평가어휘를 중심으로-)

  • An, Hyosun;Lee, Inseong
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.40 no.6
    • /
    • pp.1034-1044
    • /
    • 2016
  • This study seeks a method to objectively evaluate sensibility based on Big Data in the field of design. In order to do so, this study examined the sensibility responses on design factors for the public through a network analysis of texts displayed in social media. 'Hanbok', a formal clothing that represents Korea, was selected as the subject for the research methodology. We then collected 47,677 keywords related to Hanbok from 12,000 posts on Naver blogs from January $1^{st}$ to December $31^{st}$ 2015 and that analyzed using social matrix (a Big Data analysis software) rather than using previous survey methods. We also derived 56 key-words related to design elements and sensibility responses of Hanbok. Centrality analysis and CONCOR analysis were conducted using Ucinet6. The visualization of the network text analysis allowed the categorization of the main design factors of Hanbok with evaluation terms that mean positive, negative, and neutral sensibility responses. We also derived key evaluation factors for Hanbok as fitting, rationality, trend, and uniqueness. The evaluation terms extracted based on natural language processing technologies of atypical data have validity as a scale for evaluation and are expected to be suitable for utilization in an index for sensibility evaluation that supplements the limits of previous surveys and statistical analysis methods. The network text analysis method used in this study provides new guidelines for the use of Big Data involving sensibility evaluation methods in the field of design.

The Comparison of Singular Value Decomposition and Spectral Decomposition

  • Shin, Yang-Gyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.4
    • /
    • pp.1135-1143
    • /
    • 2007
  • The singular value decomposition and the spectral decomposition are the useful methods in the area of matrix computation for multivariate techniques such as principal component analysis and multidimensional scaling. These techniques aim to find a simpler geometric structure for the data points. The singular value decomposition and the spectral decomposition are the methods being used in these techniques for this purpose. In this paper, the singular value decomposition and the spectral decomposition are compared.

  • PDF

Semiparametric accelerated failure time model for the analysis of right censored data

  • Jin, Zhezhen
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.6
    • /
    • pp.467-478
    • /
    • 2016
  • The accelerated failure time model or accelerated life model relates the logarithm of the failure time linearly to the covariates. The parameters in the model provides a direct interpretation. In this paper, we review some newly developed practically useful estimation and inference methods for the model in the analysis of right censored data.

Complex Segregation Analysis of Categorical Traits in Farm Animals: Comparison of Linear and Threshold Models

  • Kadarmideen, Haja N.;Ilahi, H.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.18 no.8
    • /
    • pp.1088-1097
    • /
    • 2005
  • Main objectives of this study were to investigate accuracy, bias and power of linear and threshold model segregation analysis methods for detection of major genes in categorical traits in farm animals. Maximum Likelihood Linear Model (MLLM), Bayesian Linear Model (BALM) and Bayesian Threshold Model (BATM) were applied to simulated data on normal, categorical and binary scales as well as to disease data in pigs. Simulated data on the underlying normally distributed liability (NDL) were used to create categorical and binary data. MLLM method was applied to data on all scales (Normal, categorical and binary) and BATM method was developed and applied only to binary data. The MLLM analyses underestimated parameters for binary as well as categorical traits compared to normal traits; with the bias being very severe for binary traits. The accuracy of major gene and polygene parameter estimates was also very low for binary data compared with those for categorical data; the later gave results similar to normal data. When disease incidence (on binary scale) is close to 50%, segregation analysis has more accuracy and lesser bias, compared to diseases with rare incidences. NDL data were always better than categorical data. Under the MLLM method, the test statistics for categorical and binary data were consistently unusually very high (while the opposite is expected due to loss of information in categorical data), indicating high false discovery rates of major genes if linear models are applied to categorical traits. With Bayesian segregation analysis, 95% highest probability density regions of major gene variances were checked if they included the value of zero (boundary parameter); by nature of this difference between likelihood and Bayesian approaches, the Bayesian methods are likely to be more reliable for categorical data. The BATM segregation analysis of binary data also showed a significant advantage over MLLM in terms of higher accuracy. Based on the results, threshold models are recommended when the trait distributions are discontinuous. Further, segregation analysis could be used in an initial scan of the data for evidence of major genes before embarking on molecular genome mapping.

Review of Data-Driven Multivariate and Multiscale Methods

  • Park, Cheolsoo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.2
    • /
    • pp.89-96
    • /
    • 2015
  • In this paper, time-frequency analysis algorithms, empirical mode decomposition and local mean decomposition, are reviewed and their applications to nonlinear and nonstationary real-world data are discussed. In addition, their generic extensions to complex domain are addressed for the analysis of multichannel data. Simulations of these algorithms on synthetic data illustrate the fundamental structure of the algorithms and how they are designed for the analysis of nonlinear and nonstationary data. Applications of the complex version of the algorithms to the synthetic data also demonstrate the benefit of the algorithms for the accurate frequency decomposition of multichannel data.