• 제목/요약/키워드: Categorical Variables

검색결과 217건 처리시간 0.025초

Fluctuation of estimates in an EM procedure

  • Kim, Seong-Ho;Kim, Sung-Ho
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 춘계 학술발표회 논문집
    • /
    • pp.157-162
    • /
    • 2003
  • Estimates from an EM algorithm are somewhat sensitive to the initial values for the estimates, and it is more likely when the model becomes larger and more complicated. In this article, we examined how the estimates fluctuate during an EM procedure for a recursive model of categorical variables. It is found that the fluctuation takes place mostly during the first half of the procedure and that it can be subdued by applying the Bayesian method of estimation. Both simulation data and real data are used for illustration.

  • PDF

부등 제한 조건하에서의 베이지안 추론 (Bayesian Inference with Inequality Constraints)

  • 오만숙
    • 응용통계연구
    • /
    • 제27권6호
    • /
    • pp.909-922
    • /
    • 2014
  • 부등제한 조건 (>,<,=)과 관련된 베이지안 추론에서 다음의 세 가지 주제에 대하여 기존의 연구와 최근의 연구동향 그리고 추후 연구주제에 대하여 살펴보았다 : ⅰ) 모수에 대한 여러 부등제한 조건들의 비교, ⅱ) 모수에 부등제한 조건을 부여하는 것이 타당하다고 할 때 모수의 동등성에 관한 동시 다중 검정, ⅲ) 순서적 범주형 변수에 대한 분할표에서 스코어 모수에 순서적 부등제한 조건을 가정 할 때 스코어 모수의 동등성에 대한 다중 검정.

FUZZY REGRESSION TOWARDS A GENERAL INSURANCE APPLICATION

  • Kim, Joseph H.T.;Kim, Joocheol
    • Journal of applied mathematics & informatics
    • /
    • 제32권3_4호
    • /
    • pp.343-357
    • /
    • 2014
  • In many non-life insurance applications past data are given in a form known as the run-off triangle. Smoothing such data using parametric crisp regression models has long served as the basis of estimating future claim amounts and the reserves set aside to protect the insurer from future losses. In this article a fuzzy counterpart of the Hoerl curve, a well-known claim reserving regression model, is proposed to analyze the past claim data and to determine the reserves. The fuzzy Hoerl curve is more flexible and general than the one considered in the previous fuzzy literature in that it includes a categorical variable with multiple explanatory variables, which requires the development of the fuzzy analysis of covariance, or fuzzy ANCOVA. Using an actual insurance run-off claim data we show that the suggested fuzzy Hoerl curve based on the fuzzy ANCOVA gives reasonable claim reserves without stringent assumptions needed for the traditional regression approach in claim reserving.

An Introduction to Logistic Regression: From Basic Concepts to Interpretation with Particular Attention to Nursing Domain

  • Park, Hyeoun-Ae
    • 대한간호학회지
    • /
    • 제43권2호
    • /
    • pp.154-164
    • /
    • 2013
  • Purpose: The purpose of this article is twofold: 1) introducing logistic regression (LR), a multivariable method for modeling the relationship between multiple independent variables and a categorical dependent variable, and 2) examining use and reporting of LR in the nursing literature. Methods: Text books on LR and research articles employing LR as main statistical analysis were reviewed. Twenty-three articles published between 2010 and 2011 in the Journal of Korean Academy of Nursing were analyzed for proper use and reporting of LR models. Results: Logistic regression from basic concepts such as odds, odds ratio, logit transformation and logistic curve, assumption, fitting, reporting and interpreting to cautions were presented. Substantial shortcomings were found in both use of LR and reporting of results. For many studies, sample size was not sufficiently large to call into question the accuracy of the regression model. Additionally, only one study reported validation analysis. Conclusion: Nursing researchers need to pay greater attention to guidelines concerning the use and reporting of LR models.

대학생의 물질주의 가치관에 대한 연구 (A Study on Materialism of University Students)

  • 송순;신현실
    • 한국생활과학회지
    • /
    • 제11권3호
    • /
    • pp.223-235
    • /
    • 2002
  • The purpose of this study was to examine the influences of the materialism of university students. The data were collected for 331 university students. The data were analyzed by the package of SPSS program. The methods of analyses included basic descriptive categorical analysis (frequencies, means, percentages) as well as t-test, one way ANOVA, and multiple regressions. To summarize major findings from the analysis: (1) A significant difference was found in the materialism of university students by the socio-economic variables such as the amount of pocket money. (2) A significant difference was found in the materialism of university students by more self-esteem than life satisfaction. (3) A significant difference was found in the materialism of university students by parent's materialism and competitive achievement pressure. (4) According to the multiple regression analysis, it was found that the materialism of university students was influenced by the order of self-esteem, parent's materialism and competitive achievement pressure.

  • PDF

Geographic Variation in Shell Morphology of the Rock Shell, Thais clavigera (Gastropoda: Muricidae) According to Environmental Difference in Korean Coasts

  • Son Min Ho
    • 한국수산과학회지
    • /
    • 제36권6호
    • /
    • pp.632-640
    • /
    • 2003
  • Geographic variation in shell morphology of Thais clavigera $(K\"{u}ster)$ (Gastropoda: Muricidae) was investigated using samples collected from 24 sites along the Korean coast. Multivariate statistical analysis was applied to 9 morphometric and 4 categorical variables. The shells of T. clavigera were classified into two distinct morph types (Type-W and -E). Temperature and salinity of the sampling sites were significantly correlated with the incidence of morph types. Relative abundance of Type-W (thin, yellowish brown shell with triangular nodules) was positively correlated with temperature and negatively correlated with salinity. In contrast, relative abundance of Type-E (thick, dark purple shell with round nodules) was negatively correlated with temperature and positively correlated with salinity. Possible correlation between environmental factors (temperature and salinity) and morphological variations in the shells were discussed.

Classification of High Dimensionality Data through Feature Selection Using Markov Blanket

  • Lee, Junghye;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • 제14권2호
    • /
    • pp.210-219
    • /
    • 2015
  • A classification task requires an exponentially growing amount of computation time and number of observations as the variable dimensionality increases. Thus, reducing the dimensionality of the data is essential when the number of observations is limited. Often, dimensionality reduction or feature selection leads to better classification performance than using the whole number of features. In this paper, we study the possibility of utilizing the Markov blanket discovery algorithm as a new feature selection method. The Markov blanket of a target variable is the minimal variable set for explaining the target variable on the basis of conditional independence of all the variables to be connected in a Bayesian network. We apply several Markov blanket discovery algorithms to some high-dimensional categorical and continuous data sets, and compare their classification performance with other feature selection methods using well-known classifiers.

RESEARCH ON THE DEVELOPMENT OF COLLEGE STUDENT EDUCATION BASED ON MACHINE LEARNING - TAKE THE PHYSICAL EDUCATION OF YANBIAN UNIVERSITY AS AN EXAMPLE

  • Quan, Yu;Guo, Wei-Jie;He, Lin;Jin, Zhe-Zhi
    • East Asian mathematical journal
    • /
    • 제38권1호
    • /
    • pp.65-84
    • /
    • 2022
  • This paper is based on Yanbian University's physical test data, and uses statistical analysis methods to study the relationship between college students' physical test scores to promote college physical education. Firstly, using gender as categorical variables, we conduct a general analysis of students in different majors and different grades, and obtain the advantages and disadvantages of male and female college students; then we use Decision Trees and Random Forest algorithms to conduct modeling analysis to provide valuable suggestions for relevant departments of the university. the aiming of this research analyzing about the undergraduates physical test is that giving universities the targeted suggestions to improve the college graduate rate and promote the overall development of higher education, lay the foundation for achieving universal health.

Extension of the Mantel-Haenszel test to bivariate interval censored data

  • Lee, Dong-Hyun;Kim, Yang-Jin
    • Communications for Statistical Applications and Methods
    • /
    • 제29권4호
    • /
    • pp.403-411
    • /
    • 2022
  • This article presents an independence test between pairs of interval censored failure times. The Mantel-Haenszel test is commonly applied to test the independence between two categorical variables accompanied with a strata variable. Hsu and Prentice (1996) applied a Mantel-Haenszel test to the sequence of 2 × 2 tables formed at the grids which are composed of failure times. In this article, due to unknown failure times, the suitable grid points should be determined and the status of failure and at risk are estimated at those grid points. We also consider a weighted test statistic to bring a more powerful test. Simulation studies are performed to evaluate the power of test statistics under finite samples. The method is applied to analyze two real data sets, mastitis data from milk cows and an age-related eye disease study.

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • 제30권3호
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.