• Title/Summary/Keyword: Categorical Variables

Search Result 215, Processing Time 0.023 seconds

A generalized model for categorical data from epidemiological studies (질병의 범주적 자료에 대한 통계적 분석모형)

  • 최재성
    • The Korean Journal of Applied Statistics
    • /
    • v.9 no.1
    • /
    • pp.1-15
    • /
    • 1996
  • This paper discusses the effectiveness of an infection rate under a certain disease on an immunity rate by a protective inoculation. A sequence of dependense models concerning the infection rate is derived by defining conditionally nested binary random variables for the analysis of polytomous data with hierarchical response scale. Maximum likelihood estimates based on the marginal log-likelihood functin are obtained numerically in the Nelder and Mead's(1965) simplex method.

  • PDF

Empirical Bayesian Misclassification Analysis on Categorical Data (범주형 자료에서 경험적 베이지안 오분류 분석)

  • 임한승;홍종선;서문섭
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.39-57
    • /
    • 2001
  • Categorical data has sometimes misclassification errors. If this data will be analyzed, then estimated cell probabilities could be biased and the standard Pearson X2 tests may have inflated true type I error rates. On the other hand, if we regard wellclassified data with misclassified one, then we might spend lots of cost and time on adjustment of misclassification. It is a necessary and important step to ask whether categorical data is misclassified before analyzing data. In this paper, when data is misclassified at one of two variables for two-dimensional contingency table and marginal sums of a well-classified variable are fixed. We explore to partition marginal sums into each cells via the concepts of Bound and Collapse of Sebastiani and Ramoni (1997). The double sampling scheme (Tenenbein 1970) is used to obtain informations of misclassification. We propose test statistics in order to solve misclassification problems and examine behaviors of the statistics by simulation studies.

  • PDF

Small Sample Characteristics of Generalized Estimating Equations for Categorical Repeated Measurements (범주형 반복측정자료를 위한 일반화 추정방정식의 소표본 특성)

  • 김동욱;김재직
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.2
    • /
    • pp.297-310
    • /
    • 2002
  • Liang and Zeger proposed generalized estimating equations(GEE) for analyzing repeated data which is discrete or continuous. GEE model can be extended to model for repeated categorical data and its estimator has asymptotic multivariate normal distribution in large sample sizes. But GEE is based on large sample asymptotic theory. In this paper, we study the properties of GEE estimators for repeated ordinal data in small sample sizes. We generate ordinal repeated measurements for two groups using two methods. Through Monte Carlo simulation studies we investigate the empirical type 1 error rates, powers, relative efficiencies of the GEE estimators, the effect of unequal sample size of two groups, and the performance of variance estimators for polytomous ordinal response variables, especially in small sample sizes.

A Study on the Analysis for Life-cycle of Quasi-Market Oriented SOC Public Enterprise and Effective Management (준시장형 SOC 공기업의 수명주기 분석과 효율적 관리방안에 관한 연구)

  • Park, Dong-Sun;Kang, Myung-Soo;Kim, Nam-Jung
    • Land and Housing Review
    • /
    • v.6 no.4
    • /
    • pp.165-175
    • /
    • 2015
  • This study is focusing on the needs to introduce policy decision making based on identification of the definition for 'business life cycle' and 'public enterprises' for proper public enterprises management. For this purpose, the study is planning to define categorical variables for enterprise life cycle and provide basic data for public enterprises management policy. This study explored 'Korea Expressway Corporation', 'K-water', 'Korea Railroad', 'Korea Land and Housing Corporations', because of they are the public institutions recently underwent 'management normalization policy' due to rapidly increasing debt. First, there follows the analysis on priority and standard of categorical variables for quasi-market oriented SOC public enterprise life cycle by using AHP and frequency study on expert survey. Next, this study investigated and analyzed the enterprises management plan for expected 'declining period' through forecasting 'declining period' by conducting 2nd expert survey.

Visualizing Large Two-way Crosstabs by PLS Method (PLS 방법에 의한 "큰" 2원 교차표의 시각화)

  • Lee, Yong-Goo;Choi, Youn-Im
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.3
    • /
    • pp.421-428
    • /
    • 2009
  • On the visualization of categorical data, if the number of categories is small, we can consider Hayashi Quantification Method 3 for visualization of the categories of the variables. But it is known that the method is unstable because it quantifies more significantly for the small frequency categories rather than large frequency categories. The purpose of this research is to propose the visualization of large two-way crosstabulation data by PLS methods for checking the relationship between the categories of row and column variables. In this research, we utilize the PLS visualization methods (Huh et al., 2007) that is proposed for visualization of the qualitative data to visualize the categories of the large categorical data. We also compared both methods by applying them to real data, and studied the results from PLS visualization method on the real categorized data with many categories.

A Data-Mining-based Methodology for Military Occupational Specialty Assignment (데이터 마이닝 기반의 군사특기 분류 방법론 연구)

  • 민규식;정지원;최인찬
    • Journal of the military operations research society of Korea
    • /
    • v.30 no.1
    • /
    • pp.1-14
    • /
    • 2004
  • In this paper, we propose a new data-mining-based methodology for military occupational specialty assignment. The proposed methodology consists of two phases, feature selection and man-power assignment. In the first phase, the k-means partitioning algorithm and the optimal variable weighting algorithm are used to determine attribute weights. We address limitations of the optimal variable weighting algorithm and suggest a quadratic programming model that can handle categorical variables and non-contributory trivial variables. In the second phase, we present an integer programming model to deal with a man-power assignment problem. In the model, constraints on demand-supply requirements and training capacity are considered. Moreover, the attribute weights obtained in the first phase for each specialty are used to measure dissimilarity. Results of a computational experiment using real-world data are provided along with some analysis.

LAD Estimators for Categorical Data Analysis (범주형 자료 분석을 위한 LAD 추정량)

  • 최현집
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.1
    • /
    • pp.55-69
    • /
    • 2003
  • In this article, we propose the weighted LAD (least absolute deviations) estimators for multi-dimensional contingency tables and drive an estimation method to estimate the proposed estimators. To illustrate the robustness of the estimators, simulation results are presented for several models Including log-linear models and models for ordinal variables in multidimensional contingency tables. Examples were also introduced.

A numerical study on group quantile regression models

  • Kim, Doyoen;Jung, Yoonsuh
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.4
    • /
    • pp.359-370
    • /
    • 2019
  • Grouping structures in covariates are often ignored in regression models. Recent statistical developments considering grouping structure shows clear advantages; however, reflecting the grouping structure on the quantile regression model has been relatively rare in the literature. Treating the grouping structure is usually conducted by employing a group penalty. In this work, we explore the idea of group penalty to the quantile regression models. The grouping structure is assumed to be known, which is commonly true for some cases. For example, group of dummy variables transformed from one categorical variable can be regarded as one group of covariates. We examine the group quantile regression models via two real data analyses and simulation studies that reveal the beneficial performance of group quantile regression models to the non-group version methods if there exists grouping structures among variables.

A Study on the Operational Performance by the Investment Level of Companies Information Security in the Digital Transformation(DX) Era (디지털 전환(DX) 시대에 기업의 정보보안 투자 수준에 따른 운영성과에 관한 연구)

  • Jung Byoungho;Joo Hyungkun
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.20 no.1
    • /
    • pp.119-131
    • /
    • 2024
  • The purpose of this study is to examine the operational performances by the investment level of information security in companies. The theoretical background summarized the meaning of information security, management information security, and network security. The research process was carried out in four stages. As a result of the analysis, the level of information security was classified into four groups, and the difference in operational performance was confirmed. According to the categorical regression analysis of the three dependent variables, independent variables such as network threats, non-network threats, executive information security awareness, industry, organizational size, and information security education all affected information security regulations, in-house information security checks, and information security budget investments. The theoretical implications of this study have contributed to updating the latest information security theory. Practical implications are that rational investments should be made on the level of information security of companies.

Information Theory and Data Visualization Approach to Poll Analysis (정보이론과 시각화 방법에 의한 여론조사 분석의 새로운 접근방법)

  • Huh, Moon-Yul;Cha, Woon-Ock
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.61-78
    • /
    • 2007
  • A method for poll analysis using information theory and data visualization is proposed in this paper. Questions of opinion poll consist of a target variable and many explanation variables. The type of explanation variables is either numerical or categorical. In this study, explanation variables of mixed types have been ranked according to the magnitude of their effect on target variable by using mutual information. Likewise, the order of explanation variables has been evaluated using data visualization. This is the first study to quantify the impact of specific explanation variable on the related target variable.