• Title/Summary/Keyword: robust statistic

Search Result 50, Processing Time 0.023 seconds

A Test on a Specific Set of Outlier Candidates in a Linear Model (선형모형에서 특정 이상치 후보군에 대한 검정)

  • Seo, Han Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.2
    • /
    • pp.307-315
    • /
    • 2014
  • An exact distribution of the test statistic to test for multiple outlier candidates does not generally exist; therefore, tests of individual outliers (or tests using simulated critical-values) are usually conducted instead of testing for groups of outliers. This article is on procedures to test outlying observations. We suggest a method that can be applied to arbitrary observations or multiple outlier candidates detected by an outlier detecting method. A Monte Carlo study performance is used to compare the proposed method with others.

Does Individual's Income always Matter Happiness?: Evidence from China

  • HE, Yugang;WU, Renhong
    • Journal of Wellbeing Management and Applied Psychology
    • /
    • v.3 no.1
    • /
    • pp.21-31
    • /
    • 2020
  • As people's income rises dramatically, people's happiness seems not as high as expected. In fact, there are two different arguments about the relationship between income level and happiness. The focus of the debate is whether the correlation between income and probability of happiness is positive or negative. Therefore, we hypothesizes that the relationship between income and probability of happiness presents an inverted U-shaped curve. Then, this paper sets China as an example to explore the effect of income on happiness. The data from the Chinese General Social Survey (CGSS) in 2015 is employed to conduct empirical analyses under the Probit model and the Zero-Inflation-Passion model. The empirical findings indicate that the effect of income on happiness presents an inverted U-shaped curve and significantly in statistic. Meanwhile, spouse's income, educational level, marriage time and house property have a positive and significant effect on happiness. Conversely, age and local living standards have a negative and significant effect on happiness. Unfortunately, even though registered residence and children have a negative effect on happiness, they do not get through the significant test. In order to ensure the robustness of our empirical results, we test the robustness of the above empirical results by adjusting the sample size. The results of robustness test verify that our empirical results are robust. Moreover, this paper also makes a small contribution to the current literature with a sample from China.

A Probabilistic Combination Method of Minimum Statistics and Soft Decision for Robust Noise Power Estimation in Speech Enhancement (강인한 음성향상을 위한 Minimum Statistics와 Soft Decision의 확률적 결합의 새로운 잡음전력 추정기법)

  • Park, Yun-Sik;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.4
    • /
    • pp.153-158
    • /
    • 2007
  • This paper presents a new approach to noise estimation to improve speech enhancement in non-stationary noisy environments. The proposed method combines the two separate noise power estimates provided by the minimum statistics (MS) for speech presence and soft decision (SD) for speech absence in accordance with SAP (Speech Absence Probability) on a separate frequency bin. The performance of the proposed algorithm is evaluated by the subjective test under various noise environments and yields better results compared with the conventional MS or SD-based schemes.

Transmission and Disequilibrium Tests Based on Sibship Data (형제 및 자매의 유전자형 자료에 기초한 전달불균형 검정법에 관한 연구)

  • Kim, Jin-Heum;Jang, Yang-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.1
    • /
    • pp.81-94
    • /
    • 2008
  • Family-based tests such as the transmission and disequilibrium tests(TDT) have proved to be powerful tools in the search for disease genes. Unlike case-control studies, the tests are not affected by population admixture, which can lead to spurious association of multiple highly linked makers with disease-susceptible genes. Those tests have largely required knowledge of parental marker genotypes. However, parental data are often not available for late-onset diseases. In this article we propose sib-TDTs that overcome this problem by use of marker data from unaffected sib(s) instead of parents. To do this end, we fist defined a Mantel-Haenszel-type statistic for each haplotype and then proposed two tests based on this statistic. Simulation studies suggest that the proposed tests are robust to population admixture and are monotone increasing as a relative risk increases irrespective of mode of inheritance. We also illustrated the proposed tests with data adopted from Yonsei Cardiovascular Genome Center.

(A Question Type Classifier based on a Support Vector Machine for a Korean Question-Answering System) (한국어 질의응답시스템을 위한 지지 벡터기계 기반의 질의유형분류기)

  • 김학수;안영훈;서정연
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.5_6
    • /
    • pp.466-475
    • /
    • 2003
  • To build an efficient Question-Answering (QA) system, a question type classifier is needed. It can classify user's queries into predefined categories regardless of the surface form of a question. In this paper, we propose a question type classifier using a Support Vector Machine (SVM). The question type classifier first extracts features like lexical forms, part of speech and semantic markers from a user's question. The system uses $X^2$ statistic to select important features. Selected features are represented as a vector. Finally, a SVM categorizes questions into predefined categories according to the extracted features. In the experiment, the proposed system accomplished 86.4% accuracy The system precisely classifies question type without using any rules like lexico-syntactic patterns. Therefore, the system is robust and easily portable to other domains.

Performance Analysis of Projection Statistics through Method of Clutter Covariance Matrix Estimation for STAP (STAP를 위한 간섭 공분산 행렬의 예측 방법에 따른 Projection Statistics의 성능 분석)

  • Kang, Sung-Yong;Kim, Kyung-Soo;Jeong, Ji-Chai
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.22 no.1
    • /
    • pp.89-97
    • /
    • 2011
  • We analyze the performance of various techniques to overcome degradation of performance of STAP caused by nonhomogeneous clutter. The performance of NHD that used to eliminate outliers from nonhomogeneous clutter is improved by using the projection statistics(PS) that is robust to multiple outliers. The method of clutter covariance matrix estimation using a median value and the conventional method are also investigated and then compared. From the simulation results of STAP, the method of clutter covariance matrix estimation using a median value shows better performance than the conventional method for the calculation of the SINR loss, and MSMI for the single target and the multiple targets regardless of the NHD methods.

Multi-Level based Application Traffic Classification Method (멀티 레벨 기반의 응용 트래픽 분석 방법)

  • Oh, Young-Suk;Park, Jun-Sang;Yoon, Sung-Ho;Park, Jin-Wan;Lee, Sang-Woo;Kim, Myung-Sup
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.8B
    • /
    • pp.1170-1178
    • /
    • 2010
  • Recently as the number of users and application traffic is increasing on high speed network, the importance of application traffic classification is growing more and more for efficient network resource management. Although a number of methods and algorithms for traffic classification have been introduced, they have some limitations in terms of accuracy and completeness. In this paper we propose an application traffic classification based multi-level architecture which integrates several signature-based methods and behavior algorithm, and analyzes traffic using correlation among traffic flows. By strengthening the strength and making up for the weakness of individual methods we could construct a flexible and robust multi-level classification system. Also, by experiments with our campus network traffic we proved the performance and validity of the proposed mechanism.

Mean-shortfall optimization problem with perturbation methods (퍼터베이션 방법을 활용한 평균-숏폴 포트폴리오 최적화)

  • Won, Hayeon;Park, Seyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.1
    • /
    • pp.39-56
    • /
    • 2021
  • Many researches have been done on portfolio optimization since Markowitz (1952) published a diversified investment model. Markowitz's mean-variance portfolio optimization problem is established under the assumption that the distribution of returns follows a normal distribution. However, in real life, the distribution of returns does not follow a normal distribution, and variance is not a robust statistic as it is heavily influenced by outliers. To overcome these potential issues, mean-shortfall portfolio model was proposed that utilized downside risk, shortfall, as a risk index. In this paper, we propose a perturbation method that uses the shortfall as a risk index of the portfolio. The proposed portfolio utilizes an adaptive Lasso to obtain a sparse and stable asset selection because it can reduce management and transaction costs. The proposed optimization is easily applicable as it can be computed using an efficient linear programming. In our real data analysis, we show the validity of the proposed perturbation method.

Firefighting and Cancer: A Meta-analysis of Cohort Studies in the Context of Cancer Hazard Identification

  • Nathan L. DeBono;Robert D. Daniels ;Laura E. Beane Freeman ;Judith M. Graber ;Johnni Hansen ;Lauren R. Teras ;Tim Driscoll ;Kristina Kjaerheim;Paul A. Demers ;Deborah C. Glass;David Kriebel;Tracy L. Kirkham;Roland Wedekind;Adalberto M. Filho;Leslie Stayner ;Mary K. Schubauer-Berigan
    • Safety and Health at Work
    • /
    • v.14 no.2
    • /
    • pp.141-152
    • /
    • 2023
  • Objective: We performed a meta-analysis of epidemiological results for the association between occupational exposure as a firefighter and cancer as part of the broader evidence synthesis work of the IARC Monographs program. Methods: A systematic literature search was conducted to identify cohort studies of firefighters followed for cancer incidence and mortality. Studies were evaluated for the influence of key biases on results. Random-effects meta-analysis models were used to estimate the association between ever-employment and duration of employment as a firefighter and risk of 12 selected cancers. The impact of bias was explored in sensitivity analyses. Results: Among the 16 included cancer incidence studies, the estimated meta-rate ratio, 95% confidence interval (CI), and heterogeneity statistic (I2) for ever-employment as a career firefighter compared mostly to general populations were 1.58 (1.14-2.20, 8%) for mesothelioma, 1.16 (1.08-1.26, 0%) for bladder cancer, 1.21 (1.12-1.32, 81%) for prostate cancer, 1.37 (1.03-1.82, 56%) for testicular cancer, 1.19 (1.07-1.32, 37%) for colon cancer, 1.36 (1.15-1.62, 83%) for melanoma, 1.12 (1.01-1.25, 0%) for non-Hodgkin lymphoma, 1.28 (1.02-1.61, 40%) for thyroid cancer, and 1.09 (0.92-1.29, 55%) for kidney cancer. Ever-employment as a firefighter was not positively associated with lung, nervous system, or stomach cancer. Results for mesothelioma and bladder cancer exhibited low heterogeneity and were largely robust across sensitivity analyses. Conclusions: There is epidemiological evidence to support a causal relationship between occupational exposure as a firefighter and certain cancers. Challenges persist in the body of evidence related to the quality of exposure assessment, confounding, and medical surveillance bias.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.