• Title/Summary/Keyword: statistical approach

Search Result 2,335, Processing Time 0.027 seconds

Data Mining Model Approach for The Risk Factor of BMI - By Medical Examination of Health Data -

  • Lee Jea-Young;Lee Yong-Won
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.1
    • /
    • pp.217-227
    • /
    • 2005
  • The data mining is a new approach to extract useful information through effective analysis of huge data in numerous fields. We utilized this data mining technique to analyze medical record of 35,671 people. Whole data were assorted by BMI score and divided into two groups. We tried to find out BMI risk factor from overweight group by analyzing the raw data with data mining approach. The result extracted by C5.0 decision tree method showed that important risk factors for BMI score are triglyceride, gender, age and HDL cholesterol. Odds ratio of major risk factors were calculated to show individual effect of each factors.

Ensemble approach for improving prediction in kernel regression and classification

  • Han, Sunwoo;Hwang, Seongyun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.4
    • /
    • pp.355-362
    • /
    • 2016
  • Ensemble methods often help increase prediction ability in various predictive models by combining multiple weak learners and reducing the variability of the final predictive model. In this work, we demonstrate that ensemble methods also enhance the accuracy of prediction under kernel ridge regression and kernel logistic regression classification. Here we apply bagging and random forests to two kernel-based predictive models; and present the procedure of how bagging and random forests can be embedded in kernel-based predictive models. Our proposals are tested under numerous synthetic and real datasets; subsequently, they are compared with plain kernel-based predictive models and their subsampling approach. Numerical studies demonstrate that ensemble approach outperforms plain kernel-based predictive models.

A comparative study of the Gini coefficient estimators based on the regression approach

  • Mirzaei, Shahryar;Borzadaran, Gholam Reza Mohtashami;Amini, Mohammad;Jabbari, Hadi
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.4
    • /
    • pp.339-351
    • /
    • 2017
  • Resampling approaches were the first techniques employed to compute a variance for the Gini coefficient; however, many authors have shown that an analysis of the Gini coefficient and its corresponding variance can be obtained from a regression model. Despite the simplicity of the regression approach method to compute a standard error for the Gini coefficient, the use of the proposed regression model has been challenging in economics. Therefore in this paper, we focus on a comparative study among the regression approach and resampling techniques. The regression method is shown to overestimate the standard error of the Gini index. The simulations show that the Gini estimator based on the modified regression model is also consistent and asymptotically normal with less divergence from normal distribution than other resampling techniques.

Estimating Fuzzy Regression with Crisp Input-Output Using Quadratic Loss Support Vector Machine

  • Hwang, Chang-Ha;Hong, Dug-Hun;Lee, Sang-Bock
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2004.10a
    • /
    • pp.53-59
    • /
    • 2004
  • Support vector machine(SVM) approach to regression can be found in information science literature. SVM implements the regularization technique which has been introduced as a way of controlling the smoothness properties of regression function. In this paper, we propose a new estimation method based on quadratic loss SVM for a linear fuzzy regression model of Tanaka's, and furthermore propose a estimation method for nonlinear fuzzy regression. This approach is a very attractive approach to evaluate nonlinear fuzzy model with crisp input and output data.

  • PDF

Multinomial Kernel Logistic Regression via Bound Optimization Approach

  • Shim, Joo-Yong;Hong, Dug-Hun;Kim, Dal-Ho;Hwang, Chang-Ha
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.3
    • /
    • pp.507-516
    • /
    • 2007
  • Multinomial logistic regression is probably the most popular representative of probabilistic discriminative classifiers for multiclass classification problems. In this paper, a kernel variant of multinomial logistic regression is proposed by combining a Newton's method with a bound optimization approach. This formulation allows us to apply highly efficient approximation methods that effectively overcomes conceptual and numerical problems of standard multiclass kernel classifiers. We also provide the approximate cross validation (ACV) method for choosing the hyperparameters which affect the performance of the proposed approach. Experimental results are then presented to indicate the performance of the proposed procedure.

CONSISTENCY AND ASYMPTOTIC NORMALITY OF A MODIFIED LIKELIHOOD APPROACH CONTINUAL REASSESSMENT METHOD

  • Kang, Seung-Ho
    • Journal of the Korean Statistical Society
    • /
    • v.32 no.1
    • /
    • pp.33-46
    • /
    • 2003
  • The continual reassessment method (CRM) provides a Bayesian estimation of the maximum tolerated dose (MTD) in phase I clinical trials. The CRM has been proposed as an alternative design of the standard design. The CRM has been modified to improve practical feasibility and, recently, the likelihood approach CRM has been proposed. In this paper we investigate the consistency and asymptotic normality of the modified likelihood approach CRM in which the maximum likelihood estimate is used instead of the posterior mean. Small-sample properties of the consistency is examined using complete enumeration. Both the asymptotic results and their small-sample properties show that the modified CRML outperforms the standard design.

A two-step approach for variable selection in linear regression with measurement error

  • Song, Jiyeon;Shin, Seung Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.1
    • /
    • pp.47-55
    • /
    • 2019
  • It is important to identify informative variables in high dimensional data analysis; however, it becomes a challenging task when covariates are contaminated by measurement error due to the bias induced by measurement error. In this article, we present a two-step approach for variable selection in the presence of measurement error. In the first step, we directly select important variables from the contaminated covariates as if there is no measurement error. We then apply, in the following step, orthogonal regression to obtain the unbiased estimates of regression coefficients identified in the previous step. In addition, we propose a modification of the two-step approach to further enhance the variable selection performance. Various simulation studies demonstrate the promising performance of the proposed method.

A Study on the Analysis of Population Dynamics and the Model of population Relocation (人口過程의 分析과 人口配置計劃의 모델模索)

  • 박찬계;함종욱
    • Journal of the Korean Statistical Society
    • /
    • v.10
    • /
    • pp.145-157
    • /
    • 1981
  • Regional relocation of population in Korea is required strongly from natural and environmental sides for substantial growth of economy and the rigorous revival national economy against especially internationalization. This paper aimed for analysed the population distribution by regional and special characteristics of the inter-migration and showed the direction of population policy through the model building. Relocation methods of population by region has been examined through the process from the approach method by Haurin's production function to the approach by the utility function. The examination of the development model is done efficiently, how utility these approach models are depends on that scientific and composite plan for population problems against forced policy should be taken precedence.

  • PDF

Maximum product of spacings under a generalized Type-II progressive hybrid censoring scheme

  • Young Eun, Jeon;Suk-Bok, Kang;Jung-In, Seo
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.665-677
    • /
    • 2022
  • This paper proposes a new estimation method based on the maximum product of spacings for estimating unknown parameters of the three-parameter Weibull distribution under a generalized Type-II progressive hybrid censoring scheme which guarantees a constant number of observations and an appropriate experiment duration. The proposed approach is appropriate for a situation where the maximum likelihood estimation is invalid, especially, when the shape parameter is less than unity. Furthermore, it presents the enhanced performance in terms of the bias through the Monte Carlo simulation. In particular, the superiority of this approach is revealed even under the condition where the maximum likelihood estimation satisfies the classical asymptotic properties. Finally, to illustrate the practical application of the proposed approach, the real data analysis is conducted, and the superiority of the proposed method is demonstrated through a simple goodness-of-fit test.

Detecting outliers in segmented genomes of flu virus using an alignment-free approach

  • Daoud, Mosaab
    • Genomics & Informatics
    • /
    • v.18 no.1
    • /
    • pp.2.1-2.11
    • /
    • 2020
  • In this paper, we propose a new approach to detecting outliers in a set of segmented genomes of the flu virus, a data set with a heterogeneous set of sequences. The approach has the following computational phases: feature extraction, which is a mapping into feature space, alignment-free distance measure to measure the distance between any two segmented genomes, and a mapping into distance space to analyze a quantum of distance values. The approach is implemented using supervised and unsupervised learning modes. The experiments show robustness in detecting outliers of the segmented genome of the flu virus.