• Title/Summary/Keyword: 최소 표본수

Search Result 120, Processing Time 0.033 seconds

A Study on Face Recognition based on Partial Least Squares (부분 최소제곱법을 이용한 얼굴 인식에 관한 연구)

  • Lee Chang-Beom;Kim Do-Hyang;Baek Jang-Sun;Park Hyuk-Ro
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.393-400
    • /
    • 2006
  • There are many feature extraction methods for face recognition. We need a new method to overcome the small sample problem that the number of feature variables is larger than the sample size for face image data. The paper considers partial least squares(PLS) as a new dimension reduction technique for feature vector. Principal Component Analysis(PCA), a conventional dimension reduction method, selects the components with maximum variability, irrespective of the class information. So, PCA does not necessarily extract features that are important for the discrimination of classes. PLS, on the other hand, constructs the components so that the correlation between the class variable and themselves is maximized. Therefore PLS components are more predictive than PCA components in classification. The experimental results on Manchester and ORL databases shows that PLS is to be preferred over PCA when classification is the goal and dimension reduction is needed.

Statistical methods for testing tumor heterogeneity (종양 이질성을 검정을 위한 통계적 방법론 연구)

  • Lee, Dong Neuck;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.331-348
    • /
    • 2019
  • Understanding the tumor heterogeneity due to differences in the growth pattern of metastatic tumors and rate of change is important for understanding the sensitivity of tumor cells to drugs and finding appropriate therapies. It is often possible to test for differences in population means using t-test or ANOVA when the group of N samples is distinct. However, these statistical methods can not be used unless the groups are distinguished as the data covered in this paper. Statistical methods have been studied to test heterogeneity between samples. The minimum combination t-test method is one of them. In this paper, we propose a maximum combinatorial t-test method that takes into account combinations that bisect data at different ratios. Also we propose a method based on the idea that examining the heterogeneity of a sample is equivalent to testing whether the number of optimal clusters is one in the cluster analysis. We verified that the proposed methods, maximum combination t-test method and gap statistic, have better type-I error and power than the previously proposed method based on simulation study and obtained the results through real data analysis.

지수분포의 검정을 위한 수정된 W-통계량

  • 김남현
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.141-146
    • /
    • 2000
  • Shapiro와 Wilk(1972)는 위치모수와 척도모수가 미지인 경우 지수분포의 검정통계량을 제안하였다. 그것은 척도모수의 일반화 최소제곱추정량과 표본분산의 비로 구성되었다. 그러나 이 검정통계량은 일치성을 갖지 않는다. 본 논문에서는 척도모수의 두개의 점근유효추정량으로 구성된 통계량을 고려하고 이의 극한분포를 구하였다. 또한 두 개의 통계량의 검정력을 비교한 결과 제안된 통계량이 변동계수가 1보다 크거나 같은 분포에서 더 좋은 검정력을 가짐을 볼 수 있었다.

  • PDF

Unrelated question model with quantitative attribute by simple cluster sampling (단순집락추출법에 의한 양적속성의 무관질문모형)

  • 이기성;홍기학
    • The Korean Journal of Applied Statistics
    • /
    • v.11 no.1
    • /
    • pp.141-150
    • /
    • 1998
  • In this paper, we developed one-stage cluster randomized response model for obtaining quantitative data by using the Greenberg et al. model(1971) when the population was made up of sensitive quantitative clusters. We obtained the minimum variance by calculating the cluster's size and the optimum number of sample clusters under the some given constant cost. We compared the efficiency of our model with the Greenberg et al. model by simple random sampling.

  • PDF

Local Linear Logistic Classification of Microarray Data Using Orthogonal Components (직교요인을 이용한 국소선형 로지스틱 마이크로어레이 자료의 판별분석)

  • Baek, Jang-Sun;Son, Young-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.587-598
    • /
    • 2006
  • The number of variables exceeds the number of samples in microarray data. We propose a nonparametric local linear logistic classification procedure using orthogonal components for classifying high-dimensional microarray data. The proposed method is based on the local likelihood and can be applied to multi-class classification. We applied the local linear logistic classification method using PCA, PLS, and factor analysis components as new features to Leukemia data and colon data, and compare the performance of the proposed method with the conventional statistical classification procedures. The proposed method outperforms the conventional ones for each component, and PLS has shown best performance when it is embedded in the proposed method among the three orthogonal components.

A Study on the Estimation of Diameter Distribution and Volumetric Frequency of Joint Discs Using the Least Square Method (최소자승법을 이용한 원판형 절리의 직경분포와 체적빈도 추정에 관한 연구)

  • Song Jae-Joon
    • Tunnel and Underground Space
    • /
    • v.15 no.2 s.55
    • /
    • pp.137-144
    • /
    • 2005
  • An estimation technique of the joint diameter distribution using the least square method is suggested. When utilizing the technique by Song and Lee, the diameter distribution would be obtained only from the trace length distribution defined in an infinite window after the trace length distribution is estimated from the contained trace length distribution. With the new technique, however, the diameter distribution can be directly obtained from the sample histogram of the contained trace lengths. Compared with the previous technique, it shows a more accurate result for small sizes of joint samples and provides the joint geometry parameter of volumetric frequency. Verification of this new technique was completed by using Monte Carlo simulations.

Uncertainty Estimation of AR Model Parameters Using a Bayesian technique (Bayesian 기법을 활용한 AR Model 매개변수의 불확실성 추정)

  • Park, Chan-Young;Park, Jong-Hyeon;Park, Min-Woo;Kwon, Hyun-Han
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2016.05a
    • /
    • pp.280-280
    • /
    • 2016
  • 특정 자료의 시간의 흐름에 따른 예측치를 추정하는 방법으로 AR Model 즉, 자기회귀모형이 많이 사용되고 있다. AR Model은 변수의 현재 값을 과거 값의 함수로 나타내게 되는데, 이런 시계열 분석 모델을 사용할 때 매개변수의 추정 과정이 필수적으로 요구된다. 일반적으로 매개변수를 추정하는 방법에는 확률적근사법(stochastic approximation), 최소제곱법(method of least square), 자기상관법(method of autocorrelation method), 최우도법(method of maximum likelihood) 등이 있다. AR Model에서 가장 많이 사용되는 최우도법은 표본크기가 충분히 클 때 가장 효율적인 방법으로 평가되지만 수치적으로 해를 구하는 과정이 복잡한 경우가 많으며, 해를 구하지 못하는 어려움이 따르기도 한다. 또한 표본 크기가 작을 때 일반적으로 잘 일치하지 않은 결과를 얻게 된다. 우리나라의 강우, 유량 등의 자료는 자료의 수가 적은 경우가 많기 때문에 최우도법을 통한 매개변수 추정 시 불확실성이 내재되어있지만 그것을 정량적으로 제시하는데 한계가 있다. 본 연구에서는 AR Model의 매개변수 추정 시 Bayesian 기법으로 매개변수의 사후분포(posterior distribution)를 제공하여 매개변수의 불확실성 구간을 정량적으로 표현하게 됨으로써, 시계열 분석을 통해 보다 신뢰성 있는 예측치를 얻을 수 있으리라 판단된다.

  • PDF

Plot Size for Investigating Forest Community Structure (IV) - Adequate Number of Plots for Shrub Stratum in a Mixed Forest Community of Abies holophylla and Broad-leaved Trees at Odaesan National Park - (삼림군집구조 조사를 위한 조사구 크기에 관한 연구(IV) - 오대산 국립공원지역 젓나무-활엽수 혼효림군집 관목층의 적정 조사구수 -)

  • 박인협;문광선
    • Korean Journal of Environment and Ecology
    • /
    • v.9 no.2
    • /
    • pp.197-201
    • /
    • 1996
  • A mixed forest community of Abies holophylla and broad-leaved trees in Odaesan National Park was studied to determine the adequate number of plots of shrub stratum for investigating forest community structure. Thirty 5m*5m plots were set up in the shrub stratum, and species-area curve and performance curve were made out. The minimum number of plots where a given percentage increase in number of plots produce in number of plots produced less than the same percentage increase in number of species was six. The minimum number of plots where a given percentage increase in number of plots produced less than the half of the percentage increase in number of plots was eleven. The minimum number of plots where the dominant species was distinguished from the subdominant species was five. The minimum numver of plots where the first subdominant species was distinguished from other subdominant species was ten. The diffrence of species diversity(H') between five or more plots and total thirty plots was less than 0.05. Similarity index was more than 70% between five or more plots and total thirty plots, and more than 80% between ten or more plots and total thirty plots. The conclusion is that the adequate number of 5m*5m plots for the shrub stratum was about 5 in general case and about 10 in case of requiring more accuracy.

  • PDF

Forest Thematic Maps and Forest Statistics Using the k-Nearest Neighbor Technique for Pyeongchang-Gun, Gangwon-Do (kNN 기법을 이용한 강원도 평창군의 산림 주제도 작성과 산림통계량 추정)

  • Yim, Jong-Su;Kong, Gee Su;Kim, Sung Ho;Shin, Man Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.96 no.3
    • /
    • pp.259-268
    • /
    • 2007
  • This study was conducted to produce forest thematic maps and estimate forest statistics for Pyeongchang Gun using the kNN technique, which has been applied to produce thematic maps of variables of interest including unobserved plots by combining field plot data, remotely sensed data and other digital map data in forest inventories. The estimation errors for three horizontal reference areas (HRAs), whose radii are 20, 40 and 60 km respectively, were compared. Although the precision for the 40 km radius was lower compared to that for the 60 km radius, the 40 km radius was found to be an efficient HRA because their difference in precision was modest. At a value of k=5 nearest neighbors for the selected HRA, the overall accuracy was high. As a result, using the k=5 neighbors within the HRA of 40 km radius, thematic maps of number of trees, basal area, and growing stock per hectare were generated. As compared to the forest statistics based on field sample plots, the estimated means of each parameter from the produced maps were underestimated.

Analysis of internet addiction in Korean adolescents using sparse partial least-squares regression (희소 부분 최소 제곱법을 이용한 우리나라 청소년 인터넷 중독 자료 분석)

  • Han, Jeongseop;Park, Soobin;Lee, onghwan
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.2
    • /
    • pp.253-263
    • /
    • 2018
  • Internet addiction in adolescents is an important social issue. In this study, sparse partial least-squares regression (SPLS) was applied to internet addiction data in Korean adolescent samples. The internet addiction score and various clinical and psychopathological features were collected and analyzed from self-reported questionnaires. We considered three PLS methods and compared the performance in terms of prediction and sparsity. We found that the SPLS method with the hierarchical likelihood penalty was the best; in addition, two aggression features, AQ and BSAS, are important to discriminate and explain latent features of the SPLS model.