• Title/Summary/Keyword: Statistics data

Search Result 13,789, Processing Time 0.035 seconds

On statistical Computing via EM Algorithm in Logistic Linear Models Involving Non-ignorable Missing data

  • Jun, Yu-Na;Qian, Guoqi;Park, Jeong-Soo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.181-186
    • /
    • 2005
  • Many data sets obtained from surveys or medical trials often include missing observations. When these data sets are analyzed, it is general to use only complete cases. However, it is possible to have big biases or involve inefficiency. In this paper, we consider a method for estimating parameters in logistic linear models involving non-ignorable missing data mechanism. A binomial response and normal exploratory model for the missing data are used. We fit the model using the EM algorithm. The E-step is derived by Metropolis-hastings algorithm to generate a sample for missing data and Monte-carlo technique, and the M-step is by Newton-Raphson to maximize likelihood function. Asymptotic variances of the MLE's are derived and the standard error and estimates of parameters are compared.

  • PDF

Construction of Integrated Agricultural Statistical System Architecture for Effective Policy (농업정책 실효성 증대를 위한 농업통계시스템 아키텍처 구축)

  • Lee, Min-Soo;Chae, Young-Chan;Hong, Hee-Yeon;Kim, Sang-Ho;Kim, Jeong-Seop
    • Journal of Korean Society of Rural Planning
    • /
    • v.11 no.4 s.29
    • /
    • pp.75-91
    • /
    • 2005
  • This study designs an integrated data architecture to systematically manage the agricultural statistics database. Managing the agricultural statistics is important since it provides data for policies and decision making for agribusinesses. Ministry of Agriculture and the National Statistical Office collect the basic agricultural statistic data which provides the basis of logical decision making and agricultural policies. However, the agricultural statistic data has not well been used. The data has not been consistently collected nor managed. The raw data has not been organized nor processed to meet various demands. The needs has been arisen for a consistent agricultural statistics system to increase the relevance, accessibility, and efficiency of data for various users. There are massive amount of data accumulated over a long time period. Introducing the new system and reorganizing the data will bear large risks. A systematic method is required to reduce the risks in planing, building, and maintaining the database without hindering administration. This study provides a design of the agricultural statistics system architecture based on the user requirement analysis (URA) and similar systems abroad. We have also build a prototype to check the implementability of the system design.

Screening Vital Few Variables and Development of Logistic Regression Model on a Large Data Set (대용량 자료에서 핵심적인 소수의 변수들의 선별과 로지스틱 회귀 모형의 전개)

  • Lim, Yong-B.;Cho, J.;Um, Kyung-A;Lee, Sun-Ah
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.2
    • /
    • pp.129-135
    • /
    • 2006
  • In the advance of computer technology, it is possible to keep all the related informations for monitoring equipments in control and huge amount of real time manufacturing data in a data base. Thus, the statistical analysis of large data sets with hundreds of thousands observations and hundred of independent variables whose some of values are missing at many observations is needed even though it is a formidable computational task. A tree structured approach to classification is capable of screening important independent variables and their interactions. In a Six Sigma project handling large amount of manufacturing data, one of the goals is to screen vital few variables among trivial many variables. In this paper we have reviewed and summarized CART, C4.5 and CHAID algorithms and proposed a simple method of screening vital few variables by selecting common variables screened by all the three algorithms. Also how to develop a logistics regression model on a large data set is discussed and illustrated through a large finance data set collected by a credit bureau for th purpose of predicting the bankruptcy of the company.

The Marshall-Olkin generalized gamma distribution

  • Barriga, Gladys D.C.;Cordeiro, Gauss M.;Dey, Dipak K.;Cancho, Vicente G.;Louzada, Francisco;Suzuki, Adriano K.
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.3
    • /
    • pp.245-261
    • /
    • 2018
  • Attempts have been made to define new classes of distributions that provide more flexibility for modelling skewed data in practice. In this work we define a new extension of the generalized gamma distribution (Stacy, The Annals of Mathematical Statistics, 33, 1187-1192, 1962) for Marshall-Olkin generalized gamma (MOGG) distribution, based on the generator pioneered by Marshall and Olkin (Biometrika, 84, 641-652, 1997). This new lifetime model is very flexible including twenty one special models. The main advantage of the new family relies on the fact that practitioners will have a quite flexible distribution to fit real data from several fields, such as engineering, hydrology and survival analysis. Further, we also define a MOGG mixture model, a modification of the MOGG distribution for analyzing lifetime data in presence of cure fraction. This proposed model can be seen as a model of competing causes, where the parameter associated with the Marshall-Olkin distribution controls the activation mechanism of the latent risks (Cooner et al., Statistical Methods in Medical Research, 15, 307-324, 2006). The asymptotic properties of the maximum likelihood estimation approach of the parameters of the model are evaluated by means of simulation studies. The proposed distribution is fitted to two real data sets, one arising from measuring the strength of fibers and the other on melanoma data.

Effective education plan of probability and statistics in the H/W curriculum (하드웨어 교과과정에서 효과적인 확률 및 통계 교육방안)

  • Lee, Seung-Woo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.869-880
    • /
    • 2014
  • This study aims at presenting the educational model for the effective application of probability and statistics to the H/W curriculum. In order to do this, this paper conducts a survey with H/W major college students, and then analyzes how probability and statistics can be correlated with other H/W core subjects and how the knowledge of probability and statistics can affect the understanding of H/W majors through the actual class experiment. Consequently this study suggests probability and statistics as a prerequisite subject in the H/W curriculum.

Ways to Improve the Government Statistics : Labour Demand Survey (정부통계의 개선 방안: 《노동력수요동향조사》)

  • 김병조;이건
    • Proceedings of the Korean Association for Survey Research Conference
    • /
    • 2001.04a
    • /
    • pp.61-81
    • /
    • 2001
  • Government statistics have been treated the most reliable data sources in Korea. In the last 20 years, the numbers and scopes of the government statistics have been expended rapidly. Although government statistics have been widely used in many researches, research methods used in collecting government statistics have not been discussed often. In 1998, we reviewed the survey questionnaire, data collecting methods and procedures of one government statistics, the Labor Demand Survey. Based on the pilot study, we also proposed several survey methods for the quality improvement of the survey. The Ministry of Labor had adopted some suggestions in 1999 survey. This paper introduces the ways in which we investigated the Labor Demand Survey,, detected the problems and issues in the survey, suggestions for the improvement of the survey, and several survey methods adopted by the Ministry of Labor. However, the effects of the new measures are not included in this paper, and will be studied in the near future when the data become available.

Ways to Improve the Government Statistics : Labour Demand Survey (정부통계의 개선방안 : $\ll$노동력수요동향조사$\gg$)

  • Kim, Byeong-Jo;Lee, Kun
    • Survey Research
    • /
    • v.2 no.2
    • /
    • pp.61-81
    • /
    • 2001
  • Government statistics have been treated the most reliable data sources in Korea. In the last 20 years, the numbers and scopes of the government statistics have been expended rapidly. Although government statistics have been widely used in many researches. research methods used in collecting government statistics have not been discussed often. In 1998. we reviewed the survey questionnaire. data collecting methods and procedures of one government statistics. the Labor Demand Survey. Based on the pilot study. we also proposed several survey methods for the quality improvement of the survey. The Ministry of Labor had adopted some suggestions in 1999 survey. This paper introduces the ways in which we investigated the Labor Demand survey. detected the problems and issues in the survey. suggestions for the improvement of the survey. and several survey methods adopted by the Ministry of Labor. However, the effects of the new measures are not included in this paper. and will be studied in the near future when the data become available.

  • PDF

A Regression based Unconstraining Demand Method in Revenue Management (수입관리에서 회귀모형 기반 수요 복원 방법)

  • Lee, JaeJune;Lee, Woojoo;Kim, Junghwan
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.467-475
    • /
    • 2015
  • Accurate demand forecasting is a crucial component in revenue management(RM). The booking data of departed flights is used to forecast the demand for future departing flights; however, some booking requests that were denied were omitted in the departed flights data. Denied booking requests can be interpreted as censored in statistics. Thus, unconstraining demand is an important issue to forecast the true demands of future flights. Several unconstraining methods have been introduced and a method based on expectation maximization is considered superior. In this study, we propose a new unconstraining method based on a regression model that can entertain such censored data. Through a simulation study, the performance of the proposed method was evaluated with two representative unconstraining methods widely used in RM.

Comparison of Functions for Filtering Time Course Gene Expression Data with Flat Patterns (무 변화 패턴을 갖는 시간경로 유전자발현자료를 제거하기 위한 함수들의 비교)

  • Kim, Kyung-Sook;Oh, Mi-Ra;Baek, Jang-Sun;Son, Young-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.2
    • /
    • pp.409-422
    • /
    • 2007
  • Filtering genes that do not appear to contribute to regulation prior to the statistical analysis of time course gene expression data can reduce the dimensions of data and the possibility of misinterpretation due to noise or lack of variation. In this paper, we compare six different functions for filtering genes with flat pattern under the percentile criterion on an observed sample and that on a bootstrap sample. The result of applying to the yeast cell cycle data shows that the variance function is most similar in both samples.

Comparison between Kriging and GWR for the Spatial Data (공간자료에 대한 지리적 가중회귀 모형과 크리깅의 비교)

  • Kim Sun-Woo;Jeong Ae-Ran;Lee Sung-Duck
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.271-280
    • /
    • 2005
  • Kriging methods as traditional spatial data analysis methods and geographical weighted regression models as statistical analysis methods are compared. In this paper, we apply data from the Ministry of Environment to spatial analysis for practical study. We compare these methods to performance with monthly carbon monoxide observations taken at 116 measuring area of air pollution in 1999.