• Title/Summary/Keyword: Ordinal data

Search Result 119, Processing Time 0.02 seconds

Relation for the Measure of Association and the Criteria of Association Rule in Ordinal Database

  • Park, Hee-Chang;Lee, Ho-Soon
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.197-213
    • /
    • 2003
  • One of the well-studied problems in data mining is the search for association rules. The goal of association rule mining is to find all the rules with support and confidence exceeding some user specified thresholds. In this paper we consider the relation between the measure of association and the criteria of association rule for ordinal data.

  • PDF

OrdinalEncoder based DNN for Natural Gas Leak Prediction (천연가스 누출 예측을 위한 OrdinalEncoder 기반 DNN)

  • Khongorzul, Dashdondov;Lee, Sang-Mu;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.10
    • /
    • pp.7-13
    • /
    • 2019
  • The natural gas (NG), mostly methane leaks into the air, it is a big problem for the climate. detected NG leaks under U.S. city streets and collected data. In this paper, we introduced a Deep Neural Network (DNN) classification of prediction for a level of NS leak. The proposed method is OrdinalEncoder(OE) based K-means clustering and Multilayer Perceptron(MLP) for predicting NG leak. The 15 features are the input neurons and the using backpropagation. In this paper, we propose the OE method for labeling target data using k-means clustering and compared normalization methods performance for NG leak prediction. There five normalization methods used. We have shown that our proposed OE based MLP method is accuracy 97.7%, F1-score 96.4%, which is relatively higher than the other methods. The system has implemented SPSS and Python, including its performance, is tested on real open data.

Assessing Classification Accuracy using Cohen's kappa in Data Mining (데이터 마이닝에서 Cohen의 kappa를 이용한 분류정확도 측정)

  • Um, Yonghwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.1
    • /
    • pp.177-183
    • /
    • 2013
  • In this paper, Cohen's kappa and weighted kappa are applied to measuring classification accuracy when performing classification in data minig. Cohen's kappa compensates for classifications that may be due to chance and is used for the data with nominal or ordinal scales. Especially, for the ordinal data, weighted kappa which measures the classification accuracy by quantifying the classification errors as weights is used. We used two weights (linear weight, quadratic weight) for calculations of weighted kappa. Also for the calculation and comparison of kappa and weighted kappa we used a real data set, fat-liver data.

Bayesian ordinal probit semiparametric regression models: KNHANES 2016 data analysis of the relationship between smoking behavior and coffee intake (베이지안 순서형 프로빗 준모수 회귀 모형 : 국민건강영양조사 2016 자료를 통한 흡연양태와 커피섭취 간의 관계 분석)

  • Lee, Dasom;Lee, Eunji;Jo, Seogil;Choi, Taeryeon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.25-46
    • /
    • 2020
  • This paper presents ordinal probit semiparametric regression models using Bayesian Spectral Analysis Regression (BSAR) method. Ordinal probit regression is a way of modeling ordinal responses - usually more than two categories - by connecting the probability of falling into each category explained by a combination of available covariates using a probit (an inverse function of normal cumulative distribution function) link. The Bayesian probit model facilitates posterior sampling by bringing a latent variable following normal distribution, therefore, the responses are categorized by the cut-off points according to values of latent variables. In this paper, we extend the latent variable approach to a semiparametric model for the Bayesian ordinal probit regression with nonparametric functions using a spectral representation of Gaussian processes based BSAR method. The latent variable is decomposed into a parametric component and a nonparametric component with or without a shape constraint for modeling ordinal responses and predicting outcomes more flexibly. We illustrate the proposed methods with simulation studies in comparison with existing methods and real data analysis applied to a Korean National Health and Nutrition Examination Survey (KNHANES) 2016 for investigating nonparametric relationship between smoking behavior and coffee intake.

A Study on the Scoring Method of the Ordinal Variable

  • Chung, Sung-S.;Chun, Young-M.;Oh, Seon-J.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.1
    • /
    • pp.95-105
    • /
    • 2004
  • The main characteristic of the ordinal scale is that its categories have a logically or continuously ordered relationship to each other. A continuous type permits measuring degrees of differences among categories. Also, the specific amount of differences is important. In this paper we consider the scoring method using a dummy variable based on distance among categories.

  • PDF

Nonparametric Procedure for Identifying the Minimum Effective Dose with Ordinal Response Data

  • Kang, Jongsook;Kim, Dongjae
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.3
    • /
    • pp.597-607
    • /
    • 2004
  • The primary interest of drug development studies is identifying the lowest dose level producing a desirable effect over that of the zero-dose control, which is referred as the minimum effective dose (MED). In this paper, we suggest a nonparametric procedure for identifying the MED with binary or ordered categorical response data. Proposed test and Williams' test are compared by Monte Carlo simulation study and discussed.

Sample Size Determination of Univariate and Bivariate Ordinal Outcomes by Nonparametric Wilcoxon Tests (단변량 및 이변량 순위변수의 비모수적 윌콕슨 검정법에 의한 표본수 결정방법)

  • Park, Hae-Gang;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1249-1263
    • /
    • 2009
  • The power function in sample size determination has to be characterized by an appropriate statistical test for the hypothesis of interest. Nonparametric tests are suitable in the analysis of ordinal data or frequency data with ordered categories which appear frequently in the biomedical research literature. In this paper, we study sample size calculation methods for the Wilcoxon-Mann-Whitney test for one- and two-dimensional ordinal outcomes. While the sample size formula for the univariate outcome which is based on the variances of the test statistic under both null and alternative hypothesis perform well, this formula requires additional information on probability estimates that appear in the variance of the test statistic under alternative hypothesis, and the values of these probabilities are generally unknown. We study the advantages and disadvantages of different sample size formulas with simulations. Sample sizes are calculated for the two-dimensional ordinal outcomes of efficacy and safety, for which bivariate Wilcoxon-Mann-Whitney test is appropriate than the multivariate parametric test.

Bayesian hierarchical model for the estimation of proper receiver operating characteristic curves using stochastic ordering

  • Jang, Eun Jin;Kim, Dal Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.205-216
    • /
    • 2019
  • Diagnostic tests in medical fields detect or diagnose a disease with results measured by continuous or discrete ordinal data. The performance of a diagnostic test is summarized using the receiver operating characteristic (ROC) curve and the area under the curve (AUC). The diagnostic test is considered clinically useful if the outcomes in actually-positive cases are higher than actually-negative cases and the ROC curve is concave. In this study, we apply the stochastic ordering method in a Bayesian hierarchical model to estimate the proper ROC curve and AUC when the diagnostic test results are measured in discrete ordinal data. We compare the conventional binormal model and binormal model under stochastic ordering. The simulation results and real data analysis for breast cancer indicate that the binormal model under stochastic ordering can be used to estimate the proper ROC curve with a small bias even though the sample sizes were small or the sample size of actually-negative cases varied from actually-positive cases. Therefore, it is appropriate to consider the binormal model under stochastic ordering in the presence of large differences for a sample size between actually-negative and actually-positive groups.

Treatment of Missing Data by Decomposition and Voting with Ordinal Data

  • Chun, Young-M.;Son, Hong-K.;Chung, Sung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.3
    • /
    • pp.585-598
    • /
    • 2007
  • It is so difficult to get complete data when we conduct a questionaire in actuality. And we get inefficient results if we analyze statistical tests with ignoring missing values. Therefore, we use imputation methods which evaluate quality of data. This study proposes a imputation method by decomposition and voting with ordinal data. First, data are sorted by each variable. After that, imputation methods are used by each decomposition level. And the last step is selection of values with voting. The proposed method is evaluated by accuracy and RMSE. In conclusion, missing values are related to each variable, median imputation method using decomposition and voting is powerful.

  • PDF

Development of R&D Project Selection Model and Web-based R&D Project Selection System using Hybrid DEA/AHP Model (DEA/AHP 모형을 이용한 R&D 프로젝트 선정모형 및 Web 기반 R&D 프로젝트 선정시스템 개발)

  • Lee, Deok-Joo;Bae, Sungsik;Kang, Jinsoo
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.32 no.1
    • /
    • pp.18-28
    • /
    • 2006
  • Some issues which should be considered in an R&D project selection problem are as follows: First, quantitative analysis on the efficiencies of R&D projects is required to guarantee objective validity in the evaluation of the projects. For this reason, the methodology for selecting R&D projects should be based on mathematical models that perform quantitative analysis. Second, in general there are ordinal factors like Likert-scale in the data for evaluating R&D projects. Previous researches, however, couldn't suggest explicit methods incorporating these ordinal factors into models. Third, for the R&D project selection problems with limited resources like budget, it is necessary to decide the perfect ranking of the all projects. This paper develops a mathematical model that can be applicable to the problems of selecting R&D projects with the previous features. In this paper, we improve the original DEA model for evaluating efficiency to incorporate ordinal factors and suggest a new model which can decide the perfect ranking of all projects by merging the improved DEA model and AHP method. Furthermore a web-based R&D project selection system using the DEA/AHP model suggested in this paper is developed and illustrated.