• Title/Summary/Keyword: Performance-based Statistics

Search Result 1,048, Processing Time 0.023 seconds

Modified Recursive PC (수정된 반복 주성분 분석 기법에 대한 연구)

  • Kim, Dong-Gyu;Kim, Ah-Hyoun;Kim, Hyun-Joong
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.963-977
    • /
    • 2011
  • PCA(Principal Component Analysis) is a well-studied statistical technique and an important tool for handling multivariate data. Although many algorithms exist for PCA, most of them are unsuitable for real time applications or high dimensional problems. Since it is desirable to avoid extensive matrix operations in such cases, alternative solutions are required to calculate the eigenvalues and eigenvectors of the sample covariance matrix. Erdogmus et al. (2004) proposed Recursive PCA(RPCA), which is a fast adaptive on-line solution for PCA, based on the first order perturbation theory. It facilitates the real-time implementation of PCA by recursively approximating updated eigenvalues and eigenvectors. However, the performance of the RPCA method becomes questionable as the size of newly-added data increases. In this paper, we modified the RPCA method by taking advantage of the mathematical relation of eigenvalues and eigenvectors of sample covariance matrix. We compared the performance of the proposed algorithm with that of RPCA, and found that the accuracy of the proposed method remarkably improved.

A CUSUM Chart for Detecting Mean Shifts of Oscillating Pattern (진동 패턴의 평균 변화 탐지를 위한 누적합 관리도)

  • Lee, Jae-June;Kim, Duk-Rae;Lee, Jong-Seon
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1191-1201
    • /
    • 2009
  • The cumulative sum(CUSUM) control charts are typically used for detecting small level shifts in process control. To control an auto-correlated process, the model-based control methods can be employed, in which the residuals from fitting a time series model are applied to the CUSUM chart. However, the persistent level shifts in the original process may lead to varying mean shifts in residuals, which may deteriorate detection performance significantly. Therefore, in this paper, focussing on ARMA(1,1), we propose a new CUSUM type control method which can detect the dynamic mean shifts in residuals especially with oscillating pattern effectively and, through the simulation study, evaluate its performance by comparing with other various CUSUM type control methods introduced so far.

Deep learning-based speech recognition for Korean elderly speech data including dementia patients (치매 환자를 포함한 한국 노인 음성 데이터 딥러닝 기반 음성인식)

  • Jeonghyeon Mun;Joonseo Kang;Kiwoong Kim;Jongbin Bae;Hyeonjun Lee;Changwon Lim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.1
    • /
    • pp.33-48
    • /
    • 2023
  • In this paper we consider automatic speech recognition (ASR) for Korean speech data in which elderly persons randomly speak a sequence of words such as animals and vegetables for one minute. Most of the speakers are over 60 years old and some of them are dementia patients. The goal is to compare deep-learning based ASR models for such data and to find models with good performance. ASR is a technology that can recognize spoken words and convert them into written text by computers. Recently, many deep-learning models with good performance have been developed for ASR. Training data for such models are mostly composed of the form of sentences. Furthermore, the speakers in the data should be able to pronounce accurately in most cases. However, in our data, most of the speakers are over the age of 60 and often have incorrect pronunciation. Also, it is Korean speech data in which speakers randomly say series of words, not sentences, for one minute. Therefore, pre-trained models based on typical training data may not be suitable for our data, and hence we train deep-learning based ASR models from scratch using our data. We also apply some data augmentation methods due to small data size.

Bivariate ROC Curve (이변량 ROC곡선)

  • Hong, C.S.;Kim, G.C.;Jeong, J.A.
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.2
    • /
    • pp.277-286
    • /
    • 2012
  • For credit assessment models, the ROC curves evaluate the classification performance using two univariate cumulative distribution functions of the false positive rate and true positive rate. In this paper, it is extended to two bivariate normal distribution functions of default and non-default borrowers; in addition, the bivariate ROC curves are proposed to represent the joint cumulative distribution functions by making use of the linear function that passes though the mean vectors of two score random variables. We explore the classification performance based on these ROC curves obtained from various bivariate normal distributions, and analyze with the corresponding AUROC. The optimal threshold could be derived from the bivariate ROC curve using many well known classification criteria and it is possible to establish an optimal cut-off criteria of bivariate mixture distribution functions.

Development of Updateable Model Output Statistics (UMOS) System for the Daily Maximum and Minimum Temperature (일 최고 및 최저 기온에 대한 UMOS (Updateable Model Output Statistics) 시스템 개발)

  • Hong, Ki-Ok;Suh, Myoung-Seok;Kang, Jeon-Ho;Kim, Chansoo
    • Atmosphere
    • /
    • v.20 no.2
    • /
    • pp.73-89
    • /
    • 2010
  • An updateable model output statistics (UMOS) system for daily maximum and minimum temperature ($T_M$ and $T_m$) over South Korea based on the Canadian UMOS system were developed and validated. RDAPS (regional data assimilation and prediction system) and KWRF (Korea WRF) which have quite different physics and dynamics were used for the development of UMOS system. The 20 most frequently selected potential predictors for each season, station, and forecast projection time from the 68 potential predictors of the MOS system, were used as potential predictors of the UMOS system. The UMOS equations were developed through the weighted blending of the new and old model data, with weights chosen to emphasize the new model data while including enough old model data to ensure stable equations and a smooth transition of dependency from the old model to the new model. The UMOS equations are being updated by every 7 days. The validation results of $T_M$ and $T_m$ showed that seasonal mean bias, RMSE, and correlation coefficients for the total forecast projection times are -0.41-0.17 K, 1.80-2.46 K, and 0.80-0.97, respectively. The performance is slightly better in autumn and winter than in spring and summer. Also the performance of UMOS system are clearly dependent on location, better at the coastal region than inland area. As in the MOS system, the performance of UMOS system is degraded as the forecast day increases.

A comparison study of Bayesian high-dimensional linear regression models (베이지안 고차원 선형 회귀분석에서의 비교연구)

  • Shin, Ju-Won;Lee, Kyoungjae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.491-505
    • /
    • 2021
  • We consider linear regression models in high-dimensional settings (p ≫ n) and compare various classes of priors. The spike and slab prior is one of the most widely used priors for Bayesian regression models, but its model space is vast, resulting in a bad performance in finite samples. As an alternative, various continuous shrinkage priors, including the horseshoe prior and its variants, have been proposed. Although each of the above priors has been investigated separately, exhaustive comparative studies of their performance have been conducted very rarely. In this study, we compare the spike and slab prior, the horseshoe prior and its variants in various simulation settings. The performance of each method is demonstrated in terms of the regression coefficient estimation and variable selection. Finally, some remarks and suggestions are given based on comprehensive simulation studies.

Comparison of term weighting schemes for document classification (문서 분류를 위한 용어 가중치 기법 비교)

  • Jeong, Ho Young;Shin, Sang Min;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.265-276
    • /
    • 2019
  • The document-term frequency matrix is a general data of objects in text mining. In this study, we introduce a traditional term weighting scheme TF-IDF (term frequency-inverse document frequency) which is applied in the document-term frequency matrix and used for text classifications. In addition, we introduce and compare TF-IDF-ICSDF and TF-IGM schemes which are well known recently. This study also provides a method to extract keyword enhancing the quality of text classifications. Based on the keywords extracted, we applied support vector machine for the text classification. In this study, to compare the performance term weighting schemes, we used some performance metrics such as precision, recall, and F1-score. Therefore, we know that TF-IGM scheme provided high performance metrics and was optimal for text classification.

Penalizing the Negative Exponential Disparity in Discrete Models

  • Sahadeb Sarkar;Song, Kijoung-Song;Jeong, Dong-Bin
    • Communications for Statistical Applications and Methods
    • /
    • v.5 no.2
    • /
    • pp.517-529
    • /
    • 1998
  • When the sample size is small the robust minimum Hellinger distance (HD) estimator can have substantially poor relative efficiency at the true model. Similarly, approximating the exact null distributions of the ordinary Hellinger distance tests with the limiting chi-square distributions can be quite inappropriate in small samples. To overcome these problems Harris and Basu (1994) and Basu et at. (1996) recommended using a modified HD called penalized Hellinger distance (PHD). Lindsay (1994) and Basu et al. (1997) showed that another density based distance, namely the negative exponential disparity (NED), is a major competitor to the Hellinger distance in producing an asymptotically fully efficient and robust estimator. In this paper we investigate the small sample performance of the estimates and tests based on the NED and penalized NED (PNED). Our results indicate that, in the settings considered here, the NED, unlike the HD, produces estimators that perform very well in small samples and penalizing the NED does not help. However, in testing of hypotheses, the deviance test based on a PNED appears to achieve the best small-sample level compared to tests based on the NED, HD and PHD.

  • PDF

Improvement of Genetic Programming Based Nonlinear Regression Using ADF and Application for Prediction MOS of Wind Speed (ADF를 사용한 유전프로그래밍 기반 비선형 회귀분석 기법 개선 및 풍속 예보 보정 응용)

  • Oh, Seungchul;Seo, Kisung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.64 no.12
    • /
    • pp.1748-1755
    • /
    • 2015
  • A linear regression is widely used for prediction problem, but it is hard to manage an irregular nature of nonlinear system. Although nonlinear regression methods have been adopted, most of them are only fit to low and limited structure problem with small number of independent variables. However, real-world problem, such as weather prediction required complex nonlinear regression with large number of variables. GP(Genetic Programming) based evolutionary nonlinear regression method is an efficient approach to attach the challenging problem. This paper introduces the improvement of an GP based nonlinear regression method using ADF(Automatically Defined Function). It is believed ADFs allow the evolution of modular solutions and, consequently, improve the performance of the GP technique. The suggested ADF based GP nonlinear regression methods are compared with UM, MLR, and previous GP method for 3 days prediction of wind speed using MOS(Model Output Statistics) for partial South Korean regions. The UM and KLAPS data of 2007-2009, 2011-2013 years are used for experimentation.

Depth tracking of occluded ships based on SIFT feature matching

  • Yadong Liu;Yuesheng Liu;Ziyang Zhong;Yang Chen;Jinfeng Xia;Yunjie Chen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.4
    • /
    • pp.1066-1079
    • /
    • 2023
  • Multi-target tracking based on the detector is a very hot and important research topic in target tracking. It mainly includes two closely related processes, namely target detection and target tracking. Where target detection is responsible for detecting the exact position of the target, while target tracking monitors the temporal and spatial changes of the target. With the improvement of the detector, the tracking performance has reached a new level. The problem that always exists in the research of target tracking is the problem that occurs again after the target is occluded during tracking. Based on this question, this paper proposes a DeepSORT model based on SIFT features to improve ship tracking. Unlike previous feature extraction networks, SIFT algorithm does not require the characteristics of pre-training learning objectives and can be used in ship tracking quickly. At the same time, we improve and test the matching method of our model to find a balance between tracking accuracy and tracking speed. Experiments show that the model can get more ideal results.