• Title/Summary/Keyword: Performance-based Statistics

Search Result 1,048, Processing Time 0.029 seconds

Outlier Detection Using Support Vector Machines (서포트벡터 기계를 이용한 이상치 진단)

  • Seo, Han-Son;Yoon, Min
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.2
    • /
    • pp.171-177
    • /
    • 2011
  • In order to construct approximation functions for real data, it is necessary to remove the outliers from the measured raw data before constructing the model. Conventionally, visualization and maximum residual error have been used for outlier detection, but they often fail to detect outliers for nonlinear functions with multidimensional input. Although the standard support vector regression based outlier detection methods for nonlinear function with multidimensional input have achieved good performance, they have practical issues in computational cost and parameter adjustments. In this paper we propose a practical approach to outlier detection using support vector regression that reduces computational time and defines outlier threshold suitably. We apply this approach to real data examples for validity.

Selection probability of multivariate regularization to identify pleiotropic variants in genetic association studies

  • Kim, Kipoong;Sun, Hokeun
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.5
    • /
    • pp.535-546
    • /
    • 2020
  • In genetic association studies, pleiotropy is a phenomenon where a variant or a genetic region affects multiple traits or diseases. There have been many studies identifying cross-phenotype genetic associations. But, most of statistical approaches for detection of pleiotropy are based on individual tests where a single variant association with multiple traits is tested one at a time. These approaches fail to account for relations among correlated variants. Recently, multivariate regularization methods have been proposed to detect pleiotropy in analysis of high-dimensional genomic data. However, they suffer a problem of tuning parameter selection, which often results in either too many false positives or too small true positives. In this article, we applied selection probability to multivariate regularization methods in order to identify pleiotropic variants associated with multiple phenotypes. Selection probability was applied to individual elastic-net, unified elastic-net and multi-response elastic-net regularization methods. In simulation studies, selection performance of three multivariate regularization methods was evaluated when the total number of phenotypes, the number of phenotypes associated with a variant, and correlations among phenotypes are different. We also applied the regularization methods to a wild bean dataset consisting of 169,028 variants and 17 phenotypes.

Modeling Clustered Interval-Censored Failure Time Data with Informative Cluster Size (군집의 크기가 생존시간에 영향을 미치는 군집 구간중도절단된 자료에 대한 준모수적 모형)

  • Kim, Jinheum;Kim, Youn Nam
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.2
    • /
    • pp.331-343
    • /
    • 2014
  • We propose two estimating procedures to analyze clustered interval-censored data with an informative cluster size based on a marginal model and investigate their asymptotic properties. One is an extension of Cong et al. (2007) to interval-censored data and the other uses the within-cluster resampling method proposed by Hoffman et al. (2001). Simulation results imply that the proposed estimators have a better performance in terms of bias and coverage rate of true value than an estimator with no adjustment of informative cluster size when the cluster size is related with survival time. Finally, they are applied to lymphatic filariasis data adopted from Williamson et al. (2008).

A Reconfiguration Method for Preserving Network Bandwidth and Nodes Energy of Wireless Sensor Networks

  • Jung, Hyunjun;Jeong, Dongwon;On, Byung-Won;Baik, Doo-Kwon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.5
    • /
    • pp.2181-2202
    • /
    • 2016
  • In Wireless Sensor Networks (WSNs) and even in the Internet of Things (IoT) ecosystem, the reconfiguration of sensor variables is an important problem when the role of a system (or application) program's sensor nodes needs to be adjusted in a particular situation. For example, the outdoor temperature in a volcanic zone, which is usually updated in a system every 10 s, should be updated every 1 s during an emergency situation. To solve this problem, this paper proposes a novel approach based on changing only a set of sensor variables in a part of a program, rather than modifying the entire program, in order to reduce both network congestion and the sensor nodes' battery consumption. To validate our approach, we demonstrate an implementation of a proof-of-concept prototype system and also present results of comparative studies showing the performance and effectiveness of our proposed method.

Effects of Parameter Estimation in Phase I on Phase II Control Limits for Monitoring Autocorrelated Data (자기상관 데이터 모니터링에서 일단계 모수 추정이 이단계 관리한계선에 미치는 영향 연구)

  • Lee, Sungim
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.1025-1034
    • /
    • 2015
  • Traditional Shewhart control charts assume that the observations are independent over time. Current progress in measurement and data collection technology lead to the presence of autocorrelated process data that may affect poor performance in statistical process control. One of the most popular charts for autocorrelated data is to model a correlative structure with an appropriate time series model and apply control chart to the sequence of residuals. Model parameters are estimated by an in-control Phase I reference sample since they are usually unknown in practice. This paper deals with the effects of parameter estimation on Phase II control limits to monitor autocorrelated data.

Regularization Parameter Selection for Total Variation Model Based on Local Spectral Response

  • Zheng, Yuhui;Ma, Kai;Yu, Qiqiong;Zhang, Jianwei;Wang, Jin
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1168-1182
    • /
    • 2017
  • In the past decades, various image regularization methods have been introduced. Among them, total variation model has drawn much attention for the reason of its low computational complexity and well-understood mathematical behavior. However, regularization parameter estimation of total variation model is still an open problem. To deal with this problem, a novel adaptive regularization parameter selection scheme is proposed in this paper, by means of using the local spectral response, which has the capability of locally selecting the regularization parameters in a content-aware way and therefore adaptively adjusting the weights between the two terms of the total variation model. Experiment results on simulated and real noisy image show the good performance of our proposed method, in visual improvement and peak signal to noise ratio value.

Evaluation of the classification method using ancestry SNP markers for ethnic group

  • Lee, Hyo Jung;Hong, Sun Pyo;Lee, Soong Deok;Rhee, Hwan seok;Lee, Ji Hyun;Jeong, Su Jin;Lee, Jae Won
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.1
    • /
    • pp.1-9
    • /
    • 2019
  • Various probabilistic methods have been proposed for using interpopulation allele frequency differences to infer the ethnic group of a DNA specimen. The selection of the statistical method is critical because the accuracy of the statistical classification results vary. For the ancestry classification, we proposed a new ancestry evaluation method that estimate the combined ethnicity index as well as compared its performance with various classical classification methods using two real data sets. We selected 13 SNPs that are useful for the inference of ethnic origin. These single nucleotide polymorphisms (SNPs) were analyzed by restriction fragment mass polymorphism assay and followed by classification among ethnic groups. We genotyped 400 individuals from four ethnic groups (100 African-American, 100 Caucasian, 100 Korean, and 100 Mexican-American) for 13 SNPs and allele frequencies that differed among the four ethnic groups. Additionally, we applied our new method to HapMap SNP genotypes for 1,011 samples from 4 populations (African, European, East Asian, and Central-South Asian). Our proposed method yielded the highest accuracy among statistical classification methods. Our ethnic group classification system based on the analysis of ancestry informative SNP markers can provide a useful statistical tool to identify ethnic groups.

A new extended alpha power transformed family of distributions: properties, characterizations and an application to a data set in the insurance sciences

  • Ahmad, Zubair;Mahmoudi, Eisa;Hamedani, G.G.
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.1
    • /
    • pp.1-19
    • /
    • 2021
  • Heavy tailed distributions are useful for modeling actuarial and financial risk management problems. Actuaries often search for finding distributions that provide the best fit to heavy tailed data sets. In the present work, we introduce a new class of heavy tailed distributions of a special sub-model of the proposed family, called a new extended alpha power transformed Weibull distribution, useful for modeling heavy tailed data sets. Mathematical properties along with certain characterizations of the proposed distribution are presented. Maximum likelihood estimates of the model parameters are obtained. A simulation study is provided to evaluate the performance of the maximum likelihood estimators. Actuarial measures such as Value at Risk and Tail Value at Risk are also calculated. Further, a simulation study based on the actuarial measures is done. Finally, an application of the proposed model to a heavy tailed data set is presented. The proposed distribution is compared with some well-known (i) two-parameter models, (ii) three-parameter models and (iii) four-parameter models.

A Comparison Study of Forecasting Time Series Models for the Harmful Gas Emission (유해가스 배출량에 대한 시계열 예측 모형의 비교연구)

  • Jang, Moonsoo;Heo, Yoseob;Chung, Hyunsang;Park, Soyoung
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.24 no.3
    • /
    • pp.323-331
    • /
    • 2021
  • With global warming and pollution problems, accurate forecasting of the harmful gases would be an essential alarm in our life. In this paper, we forecast the emission of the five gases(SOx, NO2, NH3, H2S, CH4) using the time series model of ARIMA, the learning algorithms of Random forest, and LSTM. We find that the gas emission data depends on the short-term memory and behaves like a random walk. As a result, we compare the RMSE, MAE, and MAPE as the measure of the prediction performance under the same conditions given to three models. We find that ARIMA forecasts the gas emissions more precisely than the other two learning-based methods. Besides, the ARIMA model is more suitable for the real-time forecasts of gas emissions because it is faster for modeling than the two learning algorithms.

Combined Time Synchronization And Channel Estimation For MB-OFDM UWB Systems

  • Kareem, Aymen M.;El-Saleh, Ayman A.;Othman, Masuri
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.7
    • /
    • pp.1792-1801
    • /
    • 2012
  • Symbol timing error amounts to a major degradation in the system performance. Conventionally, timing error is estimated by predefined preamble on both transmitter and receiver. The maximum of the correlation result is considered the start of the OFDM symbol. Problem arises when the prime path is not the strongest one. In this paper, we propose a new combined time and channel estimation method for multi-band OFDM ultra wide-band (MB-OFDM UWB) systems. It is assumed that a coarse timing has been obtained at a stage before the proposed scheme. Based on the coarse timing, search interval is set (or time candidates). Exploiting channel statistics that are assumed to be known by the receiver, we derive a maximum a posteriori estimate (MAP) of the channel impulse response. Based on this estimate, we discern for the timing error. Timing estimation performance is compared with the least squares (LS) channel estimate in terms of mean squared error (MSE). It is shown that the proposed timing scheme is lower in MSE than the LS method.