• Title/Summary/Keyword: Statistics analysis

Search Result 9,927, Processing Time 0.032 seconds

On Sensitivity Analysis in Principal Component Regression

  • Kim, Soon-Kwi;Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • v.20 no.2
    • /
    • pp.177-190
    • /
    • 1991
  • In this paper, we discuss and review various measures which have been presented for studying outliers. high-leverage points, and influential observations when principal component regression is adopted. We suggest several diagnostics measures when principal component regression is used. A numerical example is illustrated. Some individual data points may be flagged as outliers, high-leverage point, or influential points.

  • PDF

Performance Comparison of Classication Methods with the Combinations of the Imputation and Gene Selection Methods

  • Kim, Dong-Uk;Nam, Jin-Hyun;Hong, Kyung-Ha
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1103-1113
    • /
    • 2011
  • Gene expression data is obtained through many stages of an experiment and errors produced during the process may cause missing values. Due to the distinctness of the data so called 'small n large p', genes have to be selected for statistical analysis, like classification analysis. For this reason, imputation and gene selection are important in a microarray data analysis. In the literature, imputation, gene selection and classification analysis have been studied respectively. However, imputation, gene selection and classification analysis are sequential processing. For this aspect, we compare the performance of classification methods after imputation and gene selection methods are applied to microarray data. Numerical simulations are carried out to evaluate the classification methods that use various combinations of the imputation and gene selection methods.

Derivation and Application of In uence Function in Discriminant Analysis for Three Groups (세 집단 판별분석 상황에서의 영향함수 유도 및 그 응용)

  • Lee, Hae-Jung;Kim, Hong-Gie
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.941-949
    • /
    • 2011
  • The influence function is used to develop criteria to detect outliers in discriminant analysis. We derive the influence function of observations that estimate the the misclassification probability in discriminant analysis for three groups. The proposed measures are applied to the facial image data to define outliers and redo the discriminant analysis excluding the outliers. The study proves that the derived influence function is more efficient than using the discriminant probability approach.

A Classification Method Using Data Reduction

  • Uhm, Daiho;Jun, Sung-Hae;Lee, Seung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.1
    • /
    • pp.1-5
    • /
    • 2012
  • Data reduction has been used widely in data mining for convenient analysis. Principal component analysis (PCA) and factor analysis (FA) methods are popular techniques. The PCA and FA reduce the number of variables to avoid the curse of dimensionality. The curse of dimensionality is to increase the computing time exponentially in proportion to the number of variables. So, many methods have been published for dimension reduction. Also, data augmentation is another approach to analyze data efficiently. Support vector machine (SVM) algorithm is a representative technique for dimension augmentation. The SVM maps original data to a feature space with high dimension to get the optimal decision plane. Both data reduction and augmentation have been used to solve diverse problems in data analysis. In this paper, we compare the strengths and weaknesses of dimension reduction and augmentation for classification and propose a classification method using data reduction for classification. We will carry out experiments for comparative studies to verify the performance of this research.

Sensitivity Analysis in Latent Root Regression

  • Shin, Jae-Kyoung;Tomoyuki Tarumi;Yutaka Tanaka
    • Communications for Statistical Applications and Methods
    • /
    • v.1 no.1
    • /
    • pp.102-111
    • /
    • 1994
  • We Propose a method of sensitivity analysis in latent root regression analysis (LRRA). For this purpose we derive the quantities ${\beta\limits^\wedge \;_{LRR}}^{(1)}$, which correspond to the theoretical influence function $I(x, y \;;\;\beta\limits^\wedge \;_{LRR})$ for the regression coefficient ${\beta\limits^\wedge}_{LRR}$ based on LRRA. We give a numerical example for illustration and also investigate numerically the relationship between the estimated values of ${\beta\limits^\wedge \;_{LRR}}^{(1)}$ with the values of the other measures called sample influence curve(SIC) based on the recomputation for the data with a single observation deleted. We also discuss the comparision among the results of LRRA, ordinary least square regression analysis (OLSRA) and ridge regression analysis(RRA).

  • PDF

Beta Processes and Survival Analysis (베타과정과 베이지안 생존분석)

  • Kim, Yongdai;Chae, Minwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.6
    • /
    • pp.891-907
    • /
    • 2014
  • This article is concerned with one of the most important prior distributions for Bayesian analysis of survival and event history data, called Beta processes, proposed in Hjort (1990). We review the current state of the art of beta processes and their application to survival analysis. Relevant methodological and practical areas of research that we touch on relate to constructions, posterior distributions, large-sample properties, Bayesian computations, and mixtures of Beta processes.

Factor analysis of the trend of stream quality in Nakdong River

  • Kim, Kyong-Mu;Lee, In-Rak;Kim, Jong-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1201-1210
    • /
    • 2008
  • The goal of this paper is to investigate the trend of stream quality and the quality of water in Nakdong river by the method of factor analysis. It used the fourteen different monthly time series data such as pH, BOD, COD, SS, TN and etc. of the thirty four of Nakdong River measurement points from Jan. 1998 to Dec. 2006. The result of factor analysis is that the factor 1 results from organic water pollution is occupied 29.288% such as BOD, COD, TN and EC, and the factor 2 explained from sewage and a seasonal variation is occupied 16.467% such as SS.

  • PDF

Blind Source Separation via Principal Component Analysis

  • Choi, Seung-Jin
    • Journal of KIEE
    • /
    • v.11 no.1
    • /
    • pp.1-7
    • /
    • 2001
  • Various methods for blind source separation (BSS) are based on independent component analysis (ICA) which can be viewed as a nonlinear extension of principal component analysis (PCA). Most existing ICA methods require certain nonlinear functions (which leads to higher-order statistics) depending on the probability distributions of sources, whereas PCA is a linear learning method based on second-order statistics. In this paper we show that the PCA can be applied to the task of BBS, provided that source are spatially uncorrelated but temporally correlated. Since the resulting method is based on only second-order statistics, it avoids the nonlinear function and is able to separate mixtures of several colored Gaussian sources, in contrast to the conventional ICA methods.

  • PDF

Statistical network analysis for epilepsy MEG data

  • Haeji Lee;Chun Kee Chung;Jaehee Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.6
    • /
    • pp.561-575
    • /
    • 2023
  • Brain network analysis has attracted the interest of neuroscience researchers in studying brain diseases. Magnetoencephalography (MEG) is especially proper for analyzing functional connectivity due to high temporal and spatial resolution. The application of graph theory for functional connectivity analysis has been studied widely, but research on network modeling for MEG still needs more. Temporal exponential random graph model (TERGM) considers temporal dependencies of networks. We performed the brain network analysis, including static/temporal network statistics, on two groups of epilepsy patients who removed the left (LT) or right (RT) part of the brain and healthy controls. We investigate network differences using Multiset canonical correlation analysis (MCCA) and TERGM between epilepsy patients and healthy controls (HC). The brain network of healthy controls had fewer temporal changes than patient groups. As a result of TERGM, on the simulation networks, LT and RT had less stable state than HC in the network connectivity structure. HC had a stable state of the brain network.

A Study on the Performance Evaluation of the College-Entrance Processes (대학 입학전형별 학업성취도 연구)

  • Oh, Jung-Hyun;Jung, Jae-Yoon;Hong, Young-Hoon;Park, Sang-Gue;Kim, S.
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.987-996
    • /
    • 2010
  • The goal of the entrance examination models is to promote promising and potential students who are suitable for post-secondary education purposes. Recently, a promotion system based on the admissions supervisors has been the major role for the promotion of students. Various statistical models and methods should be applied for the better and reasonable promotion of promising Korean and international students. In this study, we applied the proper methods in statistical methodologies and show the meaningful results on the performance evaluation of the several entrance examination models for a university in Seoul, Korea.