• Title/Summary/Keyword: Sample Vector

Search Result 270, Processing Time 0.023 seconds

Modified Multivariate $T^2$-Chart based on Robust Estimation (로버스트 추정에 근거한 수정된 다변량 $T^2$- 관리도)

  • 성웅현;박동련
    • Journal of Korean Society for Quality Management
    • /
    • v.29 no.1
    • /
    • pp.1-10
    • /
    • 2001
  • We consider the problem of detecting special variations in multivariate $T^2$-control chart when two or more multivariate outliers are present. Since a multivariate outlier may reflect slippage in mean, variance, or correlation, it can distort the sample mean vector and sample covariance matrix. Damaged sample mean vector and sample covariance matrix have difficulty in examining special variations clearly, An alternative to detection outliers or special variations is to use robust estimators of mean vector and covariance matrix that are less sensitive to extreme observations than are the standard estimators $\bar{x}$ and $\textbf{S}$. We applied popular minimum volume ellipsoid(MVE) and minimum covariance determinant(MCD) method to estimate mean vector and covariance matrix and compared its results with standard $T^2$-control chart using simulated multivariate data with outliers. We found that the modified $T^2$-control chart based on the above robust methods were more effective in detecting special variations clearly than the standard $T^2$-control chart.

  • PDF

The Convergence Characteristics of The Time- Averaged Distortion in Vector Quantization: Part I. Theory Based on The Law of Large Numbers (벡터 양자화에서 시간 평균 왜곡치의 수렴 특성 I. 대수 법칙에 근거한 이론)

  • 김동식
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.7
    • /
    • pp.107-115
    • /
    • 1996
  • The average distortio of the vector quantizer is calcualted using a probability function F of the input source for a given codebook. But, since the input source is unknown in geneal, using the sample vectors that is realized from a random vector having probability function F, a time-average opeation is employed so as to obtain an approximation of the average distortion. In this case the size of the smple set should be large so that the sample vectors represent true F reliably. The theoretical inspection about the approximation, however, is not perfomed rigorously. Thus one might use the time-average distortion without any verification of the approximation. In this paper, the convergence characteristics of the time-average distortions are theoretically investigated when the size of sample vectors or the size of codebook gets large. It has been revealed that if codebook size is large enough, then small sample set is enough to obtain the average distortion by approximatio of the calculated tiem-averaged distortion. Experimental results on synthetic data, which are supporting the analysis, are also provided and discussed.

  • PDF

Change of population density of tobacco whitefly (Bemisia tabaci, Aleyrodidae, Hemiptera) by RNAi (RNAi에 의한 담배가루이(Bemisia tabaci, 가루이과, 노린재목)의 개체군 밀도변화)

  • Ko, Na-Yeon;Youn, Young-Nam
    • Korean Journal of Agricultural Science
    • /
    • v.42 no.1
    • /
    • pp.7-13
    • /
    • 2015
  • Ninety genes randomly selected from tobacco whitefly (Bemisia tabaci) cDNA library was studied for selecting target gene in order to control of tobacco whitefly using TRV-VIGS vector (tobacco rattle virus-virus induced gene silencing vector) with RNAi. First of all, the occurrence of B. tabaci adult according to agro-infiltration of TRV was no significant difference. And that of TRV inserted tobacco whitefly cDNA showed a significant difference in each sample. P CV and N CV sample were more than 80% could be confirmed in 5 samples, for example, wh11, wh36, wh46, wh50 and wh71. Lastly, the occurrence of nymph and egg also showed a significant difference in each sample. That could be confirmed in 11 samples, for example, wh01, wh09, wh10, wh15, wh16, wh23, wh24, wh48, wh64 and wh66. In case of wh46, wh50 and wh71 sample could be confirmed that occurrence of B. tabaci adult was many, but occurrence of B. tabaci nymph and egg was a little. So sample showed a physioecological good effect to control of whitefly need to be investigated variation of gene expression in whitefly body using qRT-PCR through individual test.

Estimators Shrinking towards Projection Vector for Multivariate Normal Mean Vector under the Norm with a Known Interval

  • Baek, Hoh Yoo
    • Journal of Integrative Natural Science
    • /
    • v.11 no.3
    • /
    • pp.154-160
    • /
    • 2018
  • Consider the problem of estimating a $p{\times}1$ mean vector ${\theta}(p-r{\geq}3)$, r = rank(K) with a projection matrix K under the quadratic loss, based on a sample $Y_1$, $Y_2$, ${\cdots}$, $Y_n$. In this paper a James-Stein type estimator with shrinkage form is given when it's variance distribution is specified and when the norm ${\parallel}{\theta}-K{\theta}{\parallel}$ is constrain, where K is an idempotent and symmetric matrix and rank(K) = r. It is characterized a minimal complete class of James-Stein type estimators in this case. And the subclass of James-Stein type estimators that dominate the sample mean is derived.

A Note on the Small-Sample Calibration

  • So, Beong-Soo
    • Journal of Korean Society for Quality Management
    • /
    • v.22 no.2
    • /
    • pp.89-97
    • /
    • 1994
  • We consider the linear calibration model: $y_1={\alpha}+{\beta}x_i+{\sigma}{\varepsilon}_i$, i = 1, ${\cdots}$, n, $y={\alpha}+{\beta}x+{\sigma}{\varepsilon}$ where ($y_1$, ${\cdots}$, $y_n$, y) stands for an observation vector, {$x_i$} fixed design vector, (${\alpha}$, ${\beta}$) vector of regression parameters, x unknown true value of interest and {${\varepsilon}_i$}, ${\varepsilon}$ are mutually uncorrelated measurement errors with zero mean and unit variance but otherwise unknown distributions. On the basis of simple small-sample low-noise approximation, we introduce a new method of comparing the mean squared errors of the various competing estimators of the true value x for finite sample size n. Then we show that a class of estimators including the classical and the inverse estimators are consistent and first-order efficient within the class of all regular consistent estimators irrespective of type of measurement errors.

  • PDF

Sparse vector heterogeneous autoregressive model with nonconvex penalties

  • Shin, Andrew Jaeho;Park, Minsu;Baek, Changryong
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.53-64
    • /
    • 2022
  • High dimensional time series is gaining considerable attention in recent years. The sparse vector heterogeneous autoregressive (VHAR) model proposed by Baek and Park (2020) uses adaptive lasso and debiasing procedure in estimation, and showed superb forecasting performance in realized volatilities. This paper extends the sparse VHAR model by considering non-convex penalties such as SCAD and MCP for possible bias reduction from their penalty design. Finite sample performances of three estimation methods are compared through Monte Carlo simulation. Our study shows first that taking into cross-sectional correlations reduces bias. Second, nonconvex penalties performs better when the sample size is small. On the other hand, the adaptive lasso with debiasing performs well as sample size increases. Also, empirical analysis based on 20 multinational realized volatilities is provided.

Imbalanced SVM-Based Anomaly Detection Algorithm for Imbalanced Training Datasets

  • Wang, GuiPing;Yang, JianXi;Li, Ren
    • ETRI Journal
    • /
    • v.39 no.5
    • /
    • pp.621-631
    • /
    • 2017
  • Abnormal samples are usually difficult to obtain in production systems, resulting in imbalanced training sample sets. Namely, the number of positive samples is far less than the number of negative samples. Traditional Support Vector Machine (SVM)-based anomaly detection algorithms perform poorly for highly imbalanced datasets: the learned classification hyperplane skews toward the positive samples, resulting in a high false-negative rate. This article proposes a new imbalanced SVM (termed ImSVM)-based anomaly detection algorithm, which assigns a different weight for each positive support vector in the decision function. ImSVM adjusts the learned classification hyperplane to make the decision function achieve a maximum GMean measure value on the dataset. The above problem is converted into an unconstrained optimization problem to search the optimal weight vector. Experiments are carried out on both Cloud datasets and Knowledge Discovery and Data Mining datasets to evaluate ImSVM. Highly imbalanced training sample sets are constructed. The experimental results show that ImSVM outperforms over-sampling techniques and several existing imbalanced SVM-based techniques.

James-Stein Type Estimators Shrinking towards Projection Vector When the Norm is Restricted to an Interval

  • Baek, Hoh Yoo;Park, Su Hyang
    • Journal of Integrative Natural Science
    • /
    • v.10 no.1
    • /
    • pp.33-39
    • /
    • 2017
  • Consider the problem of estimating a $p{\times}1$ mean vector ${\theta}(p-q{\geq}3)$, $q=rank(P_V)$ with a projection matrix $P_v$ under the quadratic loss, based on a sample $X_1$, $X_2$, ${\cdots}$, $X_n$. We find a James-Stein type decision rule which shrinks towards projection vector when the underlying distribution is that of a variance mixture of normals and when the norm ${\parallel}{\theta}-P_V{\theta}{\parallel}$ is restricted to a known interval, where $P_V$ is an idempotent and projection matrix and rank $(P_V)=q$. In this case, we characterize a minimal complete class within the class of James-Stein type decision rules. We also characterize the subclass of James-Stein type decision rules that dominate the sample mean.

Regression Quantile Estimations on Censored Survival Data

  • Shim, Joo-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.2
    • /
    • pp.31-38
    • /
    • 2002
  • In the case of multiple survival times which might be censored at each covariate vector, we study the regression quantile estimations in this paper. The estimations are based on the empirical distribution functions of the censored times and the sample quantiles of the observed survival times at each covariate vector and the weighted least square method is applied for the estimation of the regression quantile. The estimators are shown to be asymptotically normally distributed under some regularity conditions.

  • PDF

A Machine Learning-based Customer Classification Model for Effective Online Free Sample Promotions (온라인 무료 샘플 판촉의 효과적 활용을 위한 기계학습 기반 고객분류예측 모형)

  • Won, Ha-Ram;Kim, Moo-Jeon;Ahn, Hyunchul
    • The Journal of Information Systems
    • /
    • v.27 no.3
    • /
    • pp.63-80
    • /
    • 2018
  • Purpose The purpose of this study is to build a machine learning-based customer classification model to promote customer expansion effect of the free sample promotion. Specifically, the proposed model classifies potential target customers who are expected to purchase the products included in the free sample promotion after receiving the free samples. Design/methodology/approach This study proposes to build a customer classification model for determining customers suitable for providing free samples by using various machine learning techniques such as logistic regression, multiple discriminant analysis, case-based reasoning, decision tree, artificial neural network, and support vector machine. To validate the usefulness of the proposed model, we apply it to a real-world free sample-based target marketing case of a Korean major cosmetic retail company. Findings Experimental results show that a machine learning-based customer classification model presents satisfactory accuracy ranging from 70% to 75%. In particular, support vector machine is found to be the most effective machine learning technique for free sample-based target marketing model. Our study sheds a light on customer relationship management strategies using free sample promotions.