• Title/Summary/Keyword: Categorical Variables

Search Result 215, Processing Time 0.023 seconds

Nonlinear Canonical Correlation Analysis for Paralysis Disease Data

  • Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.515-521
    • /
    • 2004
  • Categorical data are mostly found in oriental medical research. The nonlinear canonical correlation analysis does not assume an interval level of measurement. In this paper, we apply nonlinear canonical correlation analysis to quantification and explain how similar sets of variables are to one another for paralysis disease data.

  • PDF

A Sequence of Models for Categorical Data with Compound Scales (복합척도의 범주형 자료에 대한 연속 모형)

  • 최재성
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.103-110
    • /
    • 2001
  • This paper considers a multistage experiment. Response scales can be same or different from stage to stage. When variables are of nested structure, the response variable at each stage can be defined conditionally. For analysing such data with compound scales, this paper suggests a sequnce of dependence models and shows how to set up a sequence of models for the driver's liscense test data.

  • PDF

Conditions For Hyper-EM And Large Graphical Modelling

  • Kim, Seong-Ho;Kim, Sung-Ho
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2002.11a
    • /
    • pp.293-298
    • /
    • 2002
  • We propose an improved version of Kim (2000) to the effect that in principle we may deal with a graphical model of any size. Kim (2000) proposed a method of estimating parameters for a model of categorical variables which is too large to handle as a single model. We applied the proposed method to a simulated data of 158 binary variables.

  • PDF

Robust Variable Selection in Classification Tree

  • Jang Jeong Yee;Jeong Kwang Mo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.89-94
    • /
    • 2001
  • In this study we focus on variable selection in decision tree growing structure. Some of the splitting rules and variable selection algorithms are discussed. We propose a competitive variable selection method based on Kruskal-Wallis test, which is a nonparametric version of ANOVA F-test. Through a Monte Carlo study we note that CART has serious bias in variable selection towards categorical variables having many values, and also QUEST using F-test is not so powerful to select informative variables under heavy tailed distributions.

  • PDF

A Continuation-Ratio Logits Mixed Model for Structured Polytomous Data

  • Choi, Jae-Sung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.1
    • /
    • pp.187-193
    • /
    • 2006
  • This paper shows how to use continuation-ratio logits for the analysis of structured polytomous data. Here, response categories are considered to have a nested binary structure. Thus, conditionally nested binary random variables can be defined in each step. Two types of factors are considered as independent variables affecting response probabilities. For the purpose of analyzing categorical data with binary nested strutures a continuation-ratio mixed model is suggested. Estimation procedure for the unknown parameters in a suggested model is also discussed in detail by an example.

  • PDF

A Prediction of Work-life Balance Using Machine Learning

  • Youngkeun Choi
    • Asia pacific journal of information systems
    • /
    • v.34 no.1
    • /
    • pp.209-225
    • /
    • 2024
  • This research aims to use machine learning technology in human resource management to predict employees' work-life balance. The study utilized a dataset from IBM Watson Analytics in the IBM Community for the machine learning analysis. Multinomial dependent variables concerning workers' work-life balance were examined, categorized into continuous and categorical types using the Generalized Linear Model. The complexity of assessing variable roles and their varied impact based on the type of model used was highlighted. The study's outcomes are academically and practically relevant, showcasing how machine learning can offer further understanding of psychological variables like work-life balance through analyzing employee profiles.

Graphical Methods for Hierarchical Log-Linear Models

  • Hong, Chong-Sun;Lee, Ui-Ki
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.755-764
    • /
    • 2006
  • Most graphical methods for categorical data can describe the structure of data and represent a measure of association among categorical variables. Among them the polyhedron plot represents sequential relationships among hierarchical log-linear models for a multidimensional contingency table. This kind of plot could be explored to describe the differences among sequential models. In this paper we suggest graphical methods, containing all the information, that reflect the relationship among all log-linear models in a certain hierarchical structure. We use the ideas of a correlation diagram.

Monitoring social networks based on transformation into categorical data

  • Lee, Joo Weon;Lee, Jaeheon
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.487-498
    • /
    • 2022
  • Social network analysis (SNA) techniques have recently been developed to monitor and detect abnormal behaviors in social networks. As a useful tool for process monitoring, control charts are also useful for network monitoring. In this paper, the degree and closeness centrality measures, in which each has global and local perspectives, respectively, are applied to an exponentially weighted moving average (EWMA) chart and a multinomial cumulative sum (CUSUM) chart for monitoring undirected weighted networks. In general, EWMA charts monitor only one variable in a single chart, whereas multinomial CUSUM charts can monitor a categorical variable, in which several variables are transformed through classification rules, in a single chart. To monitor both degree centrality and closeness centrality simultaneously, we categorize them based on the average of each measure and then apply to the multinomial CUSUM chart. In this case, the global and local attributes of the network can be monitored simultaneously with a single chart. We also evaluate the performance of the proposed procedure through a simulation study.

Development and application of prediction model of hyperlipidemia using SVM and meta-learning algorithm (SVM과 meta-learning algorithm을 이용한 고지혈증 유병 예측모형 개발과 활용)

  • Lee, Seulki;Shin, Taeksoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.111-124
    • /
    • 2018
  • This study aims to develop a classification model for predicting the occurrence of hyperlipidemia, one of the chronic diseases. Prior studies applying data mining techniques for predicting disease can be classified into a model design study for predicting cardiovascular disease and a study comparing disease prediction research results. In the case of foreign literatures, studies predicting cardiovascular disease were predominant in predicting disease using data mining techniques. Although domestic studies were not much different from those of foreign countries, studies focusing on hypertension and diabetes were mainly conducted. Since hypertension and diabetes as well as chronic diseases, hyperlipidemia, are also of high importance, this study selected hyperlipidemia as the disease to be analyzed. We also developed a model for predicting hyperlipidemia using SVM and meta learning algorithms, which are already known to have excellent predictive power. In order to achieve the purpose of this study, we used data set from Korea Health Panel 2012. The Korean Health Panel produces basic data on the level of health expenditure, health level and health behavior, and has conducted an annual survey since 2008. In this study, 1,088 patients with hyperlipidemia were randomly selected from the hospitalized, outpatient, emergency, and chronic disease data of the Korean Health Panel in 2012, and 1,088 nonpatients were also randomly extracted. A total of 2,176 people were selected for the study. Three methods were used to select input variables for predicting hyperlipidemia. First, stepwise method was performed using logistic regression. Among the 17 variables, the categorical variables(except for length of smoking) are expressed as dummy variables, which are assumed to be separate variables on the basis of the reference group, and these variables were analyzed. Six variables (age, BMI, education level, marital status, smoking status, gender) excluding income level and smoking period were selected based on significance level 0.1. Second, C4.5 as a decision tree algorithm is used. The significant input variables were age, smoking status, and education level. Finally, C4.5 as a decision tree algorithm is used. In SVM, the input variables selected by genetic algorithms consisted of 6 variables such as age, marital status, education level, economic activity, smoking period, and physical activity status, and the input variables selected by genetic algorithms in artificial neural network consist of 3 variables such as age, marital status, and education level. Based on the selected parameters, we compared SVM, meta learning algorithm and other prediction models for hyperlipidemia patients, and compared the classification performances using TP rate and precision. The main results of the analysis are as follows. First, the accuracy of the SVM was 88.4% and the accuracy of the artificial neural network was 86.7%. Second, the accuracy of classification models using the selected input variables through stepwise method was slightly higher than that of classification models using the whole variables. Third, the precision of artificial neural network was higher than that of SVM when only three variables as input variables were selected by decision trees. As a result of classification models based on the input variables selected through the genetic algorithm, classification accuracy of SVM was 88.5% and that of artificial neural network was 87.9%. Finally, this study indicated that stacking as the meta learning algorithm proposed in this study, has the best performance when it uses the predicted outputs of SVM and MLP as input variables of SVM, which is a meta classifier. The purpose of this study was to predict hyperlipidemia, one of the representative chronic diseases. To do this, we used SVM and meta-learning algorithms, which is known to have high accuracy. As a result, the accuracy of classification of hyperlipidemia in the stacking as a meta learner was higher than other meta-learning algorithms. However, the predictive performance of the meta-learning algorithm proposed in this study is the same as that of SVM with the best performance (88.6%) among the single models. The limitations of this study are as follows. First, various variable selection methods were tried, but most variables used in the study were categorical dummy variables. In the case with a large number of categorical variables, the results may be different if continuous variables are used because the model can be better suited to categorical variables such as decision trees than general models such as neural networks. Despite these limitations, this study has significance in predicting hyperlipidemia with hybrid models such as met learning algorithms which have not been studied previously. It can be said that the result of improving the model accuracy by applying various variable selection techniques is meaningful. In addition, it is expected that our proposed model will be effective for the prevention and management of hyperlipidemia.

Canonical Correlation of 3D Visual Fatigue between Subjective and Physiological Measures

  • Won, Myeung Ju;Park, Sang In;Whang, Mincheol
    • Journal of the Ergonomics Society of Korea
    • /
    • v.31 no.6
    • /
    • pp.785-791
    • /
    • 2012
  • Objective: The aim of this study was to investigate the correlation between 3D visual fatigue and physiological measures by canonical correlation analysis enabling to categorical correlation. Background: Few studies have been conducted to investigate the physiological mechanism underlying the visual fatigue caused by processing 3D information which may make the cognitive mechanism overloaded. However, even the previous studies lack validation in terms of the correlation between physiological variables and the visual fatigue. Method: 9 Female and 6 male subjects with a mean age of $22.53{\pm}2.55$ voluntarily participated in this experiment. All participants were asked to report how they felt about their health sate at after viewing 3D. In addition, Low & Hybrid measurement test(Event Related Potential, Steady-state Visual Evoked Potential) and for evaluating cognitive fatigue before and after viewing 3D were performed. The physiological signal were measured with subjective fatigue evaluation before and after in watching the 3D content. For this study suggesting categorical correlation, all measures were categorized into three sets such as included Visual Fatigue set(response time, subjective evaluation), Autonomic Nervous System set(PPG frequency, PPG amplitude, HF/LF ratio), Central Nervous System set(ERP amplitude P4, O1, O2, ERP latency P4, O1, O2, SSVEP S/N ratio P4, O1, O2). Then the correlation of three variables sets, canonical correlation analysis was conducted. Results: The results showed a significant correlation between visual fatigue and physiological measures. However, different variables of visual fatigue were highly correlated to respective HF/LF ratio and to ERP latency(O2). Conclusion: Response time was highly correlated to ERP latency(O2) while the subjective evaluation was to HF/LF ratio. Application: This study may provide the most significant variables for the quantitative evaluation of visual fatigue using HF/LF ratio and ERP latency based human performance and subjective fatigue.