• Title/Summary/Keyword: Ordinal variable

Search Result 28, Processing Time 0.02 seconds

Ordinal Variable Selection in Decision Trees (의사결정나무에서 순서형 분리변수 선택에 관한 연구)

  • Kim Hyun-Joong
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.1
    • /
    • pp.149-161
    • /
    • 2006
  • The most important component in decision tree algorithm is the rule for split variable selection. Many earlier algorithms such as CART and C4.5 use greedy search algorithm for variable selection. Recently, many methods were developed to cope with the weakness of greedy search algorithm. Most algorithms have different selection criteria depending on the type of variables: continuous or nominal. However, ordinal type variables are usually treated as continuous ones. This approach did not cause any trouble for the methods using greedy search algorithm. However, it may cause problems for the newer algorithms because they use statistical methods valid for continuous or nominal types only. In this paper, we propose a ordinal variable selection method that uses Cramer-von Mises testing procedure. We performed comparisons among CART, C4.5, QUEST, CRUISE, and the new method. It was shown that the new method has a good variable selection power for ordinal type variables.

A Study on the Scoring Method of the Ordinal Variable

  • Chung, Sung-S.;Chun, Young-M.;Oh, Seon-J.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.1
    • /
    • pp.95-105
    • /
    • 2004
  • The main characteristic of the ordinal scale is that its categories have a logically or continuously ordered relationship to each other. A continuous type permits measuring degrees of differences among categories. Also, the specific amount of differences is important. In this paper we consider the scoring method using a dummy variable based on distance among categories.

  • PDF

Goodness-of-fit tests for a proportional odds model

  • Lee, Hyun Yung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.6
    • /
    • pp.1465-1475
    • /
    • 2013
  • The chi-square type test statistic is the most commonly used test in terms of measuring testing goodness-of-fit for multinomial logistic regression model, which has its grouped data (binomial data) and ungrouped (binary) data classified by a covariate pattern. Chi-square type statistic is not a satisfactory gauge, however, because the ungrouped Pearson chi-square statistic does not adhere well to the chi-square statistic and the ungrouped Pearson chi-square statistic is also not a satisfactory form of measurement in itself. Currently, goodness-of-fit in the ordinal setting is often assessed using the Pearson chi-square statistic and deviance tests. These tests involve creating a contingency table in which rows consist of all possible cross-classifications of the model covariates, and columns consist of the levels of the ordinal response. I examined goodness-of-fit tests for a proportional odds logistic regression model-the most commonly used regression model for an ordinal response variable. Using a simulation study, I investigated the distribution and power properties of this test and compared these with those of three other goodness-of-fit tests. The new test had lower power than the existing tests; however, it was able to detect a greater number of the different types of lack of fit considered in this study. I illustrated the ability of the tests to detect lack of fit using a study of aftercare decisions for psychiatrically hospitalized adolescents.

Bayesian ordinal probit semiparametric regression models: KNHANES 2016 data analysis of the relationship between smoking behavior and coffee intake (베이지안 순서형 프로빗 준모수 회귀 모형 : 국민건강영양조사 2016 자료를 통한 흡연양태와 커피섭취 간의 관계 분석)

  • Lee, Dasom;Lee, Eunji;Jo, Seogil;Choi, Taeryeon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.25-46
    • /
    • 2020
  • This paper presents ordinal probit semiparametric regression models using Bayesian Spectral Analysis Regression (BSAR) method. Ordinal probit regression is a way of modeling ordinal responses - usually more than two categories - by connecting the probability of falling into each category explained by a combination of available covariates using a probit (an inverse function of normal cumulative distribution function) link. The Bayesian probit model facilitates posterior sampling by bringing a latent variable following normal distribution, therefore, the responses are categorized by the cut-off points according to values of latent variables. In this paper, we extend the latent variable approach to a semiparametric model for the Bayesian ordinal probit regression with nonparametric functions using a spectral representation of Gaussian processes based BSAR method. The latent variable is decomposed into a parametric component and a nonparametric component with or without a shape constraint for modeling ordinal responses and predicting outcomes more flexibly. We illustrate the proposed methods with simulation studies in comparison with existing methods and real data analysis applied to a Korean National Health and Nutrition Examination Survey (KNHANES) 2016 for investigating nonparametric relationship between smoking behavior and coffee intake.

DISCRIMINATION OF IN-ORDINAL STATE IN ROOM TEMPERATURE BASED ON STATISTICAL ANALYSIS

  • Takanashi, Ken-ichi;Daisuke Kozeki;Yoshiyuki Matsubara
    • Proceedings of the Korea Institute of Fire Science and Engineering Conference
    • /
    • 1997.11a
    • /
    • pp.484-491
    • /
    • 1997
  • In this paper, an approach to determine the in-ordinal condition of a room, which is based on multi variable analysis, is proposed. According to this approach, the distance of a state from the ordinal condition is thought to be evaluated by the Mahalanobis' distance. The temperature changes of a room were measured and their statistical characteristics such as distribution type, the mean value and the standard deviation are studied. The applicability of the method for the fire detection is also investigated.

  • PDF

Optimal Process Condition for Products with Multi-Categorical Ordinal Quality Characteristic (다범주 순서형 품질특성을 갖는 제품의 최적 공정조건 결정에 관한 연구)

  • Kim Sang-Cheol;Yun Won-Young;Chun Young-Rok
    • Journal of Korean Society for Quality Management
    • /
    • v.32 no.3
    • /
    • pp.109-125
    • /
    • 2004
  • This paper deals with an optimal process control problem in production of hull structural steel plate with high defective rate. The main quality characteristic(dependent variable) is the internal quality(defect) of plates and is dependent on process parameters(independent variables). The dependent variable(quality characteristics) has three categorical ordinal data and there are 35 independent variables(29 continuous variables and 6 categorical variables). In this paper, we determine the main factors and to develop the mathematical model between internal quality predicted probabilities and the main factors. Secondly, we find out the optimal process condition of main factors through analysis of variance(ANOVA) using simulation. We consider three models to obtain the main factors and the optimal process condition: linear, quadratic, error models.

Treatment of Missing Data by Decomposition and Voting with Ordinal Data

  • Chun, Young-M.;Son, Hong-K.;Chung, Sung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.3
    • /
    • pp.585-598
    • /
    • 2007
  • It is so difficult to get complete data when we conduct a questionaire in actuality. And we get inefficient results if we analyze statistical tests with ignoring missing values. Therefore, we use imputation methods which evaluate quality of data. This study proposes a imputation method by decomposition and voting with ordinal data. First, data are sorted by each variable. After that, imputation methods are used by each decomposition level. And the last step is selection of values with voting. The proposed method is evaluated by accuracy and RMSE. In conclusion, missing values are related to each variable, median imputation method using decomposition and voting is powerful.

  • PDF

Applications of proportional odds ordinal logistic regression models and continuation ratio models in examining the association of physical inactivity with erectile dysfunction among type 2 diabetic patients

  • Mathew, Anil C.;Siby, Elbin;Tom, Amal;Kumar R, Senthil
    • Korean Journal of Exercise Nutrition
    • /
    • v.25 no.1
    • /
    • pp.30-34
    • /
    • 2021
  • [Purpose] Many studies have observed a high prevalence of erectile dysfunction among individuals performing physical activity in less leisure-time. However, this relationship in patients with type 2 diabetic patients is not well studied. In exposure outcome studies with ordinal outcome variables, investigators often try to make the outcome variable dichotomous and lose information by collapsing categories. Several statistical models have been developed to make full use of all information in ordinal response data, but they have not been widely used in public health research. In this paper, we discuss the application of two statistical models to determine the association of physical inactivity with erectile dysfunction among patients with type 2 diabetes. [Methods] A total of 204 married men aged 20-60 years with a diagnosis of type 2 diabetes at the outpatient unit of the Department of Endocrinology at PSG hospitals during the months of May and June 2019 were studied. We examined the association between physical inactivity and erectile dysfunction using proportional odds ordinal logistic regression models and continuation ratio models. [Results] The proportional odds model revealed that patients with diabetes who perform leisure time physical activity for over 40 minutes per day have reduced odds of erectile dysfunction (odds ratio=0.38) across the severity categories of erectile dysfunction after adjusting for age and duration of diabetes. [Conclusion] The present study suggests that physical inactivity has a negative impact on erectile function. We observed that the simple logistic regression model had only 75% efficiency compared to the proportional odds model used here; hence, more valid estimates were obtained here.

A Study on the Determinants of Organizational Level for the Advancement of Smart Factory (스마트공장 고도화 수준의 조직수준 결정요인에 대한 연구)

  • Chi-Ho Ok
    • Asia-Pacific Journal of Business
    • /
    • v.14 no.1
    • /
    • pp.281-294
    • /
    • 2023
  • Purpose - The purpose of this study is to explore the determinants of the organizational level for the advancement of smart factory. We suggested three determinants of the organizational level such as CEO's entrepreneurship, high-involvement human resource management, and cooperative industrial relations. Design/methodology/approach - The population of our survey was manufacturing SMEs, and we took a sample and conducted a survey of 232 companies. Since the level of smart factory advancement, which is a dependent variable, was measured on an ordinal scale, ordinal logistic regression analysis was used to test the hypothesis. Findings - The higher the level of high-involvement human resource management, the higher the level of smart factory advancement. As the level of high-involvement human resource management increases by one unit, the probability of smart factory advancement increases by 22.8%. On the other hand, the CEO's entrepreneurship did not significantly affect the level of smart factory advancement. Interestingly, the cooperative industrial relations negatively affected to the level of smart factory advancement, contrary to the hypothesis prediction. Research implications or Originality - This study explored determinants at the organizational level that affect the advancement of smart factories. Through this, various implications are presented for related research and policy fields.

Impact of Regional Emergency Medical Access on Patients' Prognosis and Emergency Medical Expenditure (지역별 응급의료 접근성이 환자의 예후 및 응급의료비 지출에 미치는 영향)

  • Kim, Yeonjin;Lee, Tae-Jin
    • Health Policy and Management
    • /
    • v.30 no.3
    • /
    • pp.399-408
    • /
    • 2020
  • Background: The purpose of this study was to examine the impact of the regional characteristics on the accessibility of emergency care and the impact of emergency medical accessibility on the patients' prognosis and the emergency medical expenditure. Methods: This study used the 13th beta version 1.6 annual data of Korea Health Panel and the statistics from the Korean Statistical Information Service. The sample included 8,119 patients who visited the emergency centers between year 2013 and 2017. The arrival time, which indicated medical access, was used as dependent variable for multi-level analysis. For ordinal logistic regression and multiple regression, the arrival time was used as independent variable while patients' prognosis and emergency medical expenditure were used as dependent variables. Results: The results for the multi-level analysis in both the individual and regional variables showed that as the number of emergency medical institutions per 100 km2 area increased, the time required to reach emergency centers significantly decreased. Ordinal logistic regression and multiple regression results showed that as the arrival time increased, the patients' prognosis significantly worsened and the emergency medical expenses significantly increased. Conclusion: In conclusion, the access to emergency care was affected by regional characteristics and affected patient outcomes and emergency medical expenditure.