• Title/Summary/Keyword: 영과잉 음이항 회귀분석

Search Result 16, Processing Time 0.02 seconds

A Bayesian zero-inflated negative binomial regression model based on Pólya-Gamma latent variables with an application to pharmaceutical data (폴랴-감마 잠재변수에 기반한 베이지안 영과잉 음이항 회귀모형: 약학 자료에의 응용)

  • Seo, Gi Tae;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.311-325
    • /
    • 2022
  • For count responses, the situation of excess zeros often occurs in various research fields. Zero-inflated model is a common choice for modeling such count data. Bayesian inference for the zero-inflated model has long been recognized as a hard problem because the form of conditional posterior distribution is not in closed form. Recently, however, Pillow and Scott (2012) and Polson et al. (2013) proposed a Pólya-Gamma data-augmentation strategy for logistic and negative binomial models, facilitating Bayesian inference for the zero-inflated model. We apply Bayesian zero-inflated negative binomial regression model to longitudinal pharmaceutical data which have been previously analyzed by Min and Agresti (2005). To facilitate posterior sampling for longitudinal zero-inflated model, we use the Pólya-Gamma data-augmentation strategy.

The study on the determinants of the number of job changes (중소기업 청년인턴 이직횟수 결정요인 분석)

  • Park, Sungik;Ryu, Jangsoo;Kim, Jonghan;Cho, Jangsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.387-397
    • /
    • 2015
  • In this paper, the determinants of the number of job changes in the SMEs (small and medium enterprises) youth-intern project is analysed, utilizing SMEs youth-intern DB and employment insurance DB. Since the number of job changes are count data which take integer values other than negative values, general linear regression analysis becomes inappropriate. Therefore, four models such as Poisson regression model, zero inflated Poisson regression model, negative binomial regression model and zero inflated negative binomial regression model are tried to fit count data. A zero inflated negative binomial regression model is selected to be the best model. Major results are the followings. First, the number of job changes is shown to be significantly smaller in the treatment group than in the control group. Second, the number of job changes turns out to be significantly smaller in the young-age group than in the old-age group. Third, it is also shown that the number of job changes of man is significantly greater than that of woman. Lastly, the number of job changes in the bigger firm is shown to be significantly less than that of the smaller firm.

Bayesian Analysis for the Zero-inflated Regression Models (영과잉 회귀모형에 대한 베이지안 분석)

  • Jang, Hak-Jin;Kang, Yun-Hee;Lee, S.;Kim, Seong-W.
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.603-613
    • /
    • 2008
  • We often encounter the situation that discrete count data have a large portion of zeros. In this case, it is not appropriate to analyze the data based on standard regression models such as the poisson or negative binomial regression models. In this article, we consider Bayesian analysis for two commonly used models. They are zero-inflated poisson and negative binomial regression models. We use the Bayes factor as a model selection tool and computation is proceeded via Markov chain Monte Carlo methods. Crash count data are analyzed to support theoretical results.

Forecasting hierarchical time series for foodborne disease outbreaks (식중독 발생 건수에 대한 계층 시계열 예측)

  • In-Kwon Yeo
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.4
    • /
    • pp.499 -508
    • /
    • 2024
  • In this paper, we investigate hierarchical time series forecasting that adhere to a hierarchical structure when deriving predicted values by analyzing segmented data as well as aggregated datasets. The occurrences of food poisoning by a specific pathogen are analyzed using zero-inflated Poisson regression models and negative binomial regression models. The occurrences of major, miscellaneous, and overall food poisoning are analyzed using Poisson regression models and negative binomial regression models. For hierarchical time series forecasting, the MinT estimation proposed by Wickramasuriya et al. (2019) is employed. Negative predicted values resulting from hierarchical adjustments are adjusted to zero, and weights are multiplied to the remaining lowest-level variables to satisfy the hierarchical structure. Empirical analysis revealed that there is little difference between hierarchical and non-hierarchical adjustments in predictions based on pathogens. However, hierarchical adjustments generally yield superior results for predictions concerning major, miscellaneous, and overall occurrences. Without hierarchical adjustment, instances may occur where the predicted frequencies of the lowest-level variables exceed that of major or miscellaneous occurrences. However, the proposed method enables the acquisition of predictions that adhere to the hierarchical structure.

Prediction of the Number of Food Poisoning Occurrences by Microbes (원인균별 식중독 발생 건수 예측)

  • Yeo, In-Kwon
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.6
    • /
    • pp.923-932
    • /
    • 2013
  • This paper proposes a method to predict the number of foodborne disease outbreaks by microbes. The weekly data of food poisoning occurrences by microbes in Korea contain many zero-valued observations and have dependency between outbreaks. In order to model both phenomena, the number of food poisonings is predicted by an autoregressive model and the probabilities of food poisoning occurrences by microbes (given the total of food poisonings) are estimated by the baseline category logit model. The predicted number of foodborne disease outbreaks by a microbe is obtained by multiplying the predicted number of foodborne disease outbreaks and the estimated probability of the food poisoning by the corresponding microbe. The mean squared error and the mean absolute value error are evaluated to compare the performances of the proposed method and the zero-inflated model.

An Analysis of Spatial Determinants of Inventor Networks in Korea (발명자 네트워크의 공간적 결정요인 분석)

  • Jeong, Jun Ho
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.19 no.1
    • /
    • pp.1-17
    • /
    • 2016
  • This paper attempts to explore the spatial structure of inventor networks and their determinants among 230 shi-gun-gu regions in Korea by investigating the residence of co-inventors engaged in Korean patent applications to the Korean Intellectual Office and exploiting a zero inflated negative binomial model to accommodate an estimation to the count nature of a dependent variable and its excess of zeros. Several variables are found to affect the spatial linkage of inventor networks. Spatial links extend beyond the region if it has more own R&D-related specific assets (private R&D, patent productivity, population, education); if it is physically close to and has technological similarity with the other region. The assets of the other region plays a positive role if, in a similar way, the other region has more R&D-related specific assets.

  • PDF

Heat-Wave Data Analysis based on the Zero-Inflated Regression Models (영-과잉 회귀모형을 활용한 폭염자료분석)

  • Kim, Seong Tae;Park, Man Sik
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2829-2840
    • /
    • 2018
  • The random variable with an arbitrary value or more is called semi-continuous variable or zero-inflated one in case that its boundary value is more frequently observed than expected. This means the boundary value is likely to be practically observed more than it should be theoretically under certain probability distribution. When the distribution considered is continuous, the variable is defined as semi-continuous and when one of discrete distribution is assumed for the variable, we regard it as zero-inflated. In this study, we introduce the two-part model, which consists of one part for modelling the binary response and the other part for modelling the variable greater than the boundary value. Especially, the zero-inflated regression models are explained by using Poisson distribution and negative binomial distribution. In real data analysis, we employ the zero-inflated regression models to estimate the number of days under extreme heat-wave circumstances during the last 10 years in South Korea. Based on the estimation results, we create prediction maps for the estimated number of days under heat-wave advisory and heat-wave warning by using the universal kriging, which is one of the spatial prediction methods.

Estimation of the Effects of Daily Walking Hours and Days on the Mental Health of Urban Residents - The Case in Seoul - (주거지역 가로환경 및 일상 걷기가 정신 건강에 미치는 영향 - 서울시 대상으로 -)

  • Koo, Bonyu;Baek, Seungjoo;Yoon, Heeyeun
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.52 no.1
    • /
    • pp.87-100
    • /
    • 2024
  • This study aimed to investigate the impact of the quality of the street environment in residential areas on the mental health of urban residents, considering the frequency of street use. Using a zero-inflated negative binomial regression model, the study analyzed the influence of walking frequency and the street environment on depressive symptoms of urban residents. The research focused on Seoul, South Korea, in 2017, with depressive symptoms as the dependent variable and street environment variables, walking variables, and individual characteristics as independent variables. Additionally, the study explores the interaction effect of street greenery and walking frequency to analyze the synergistic impacts of walking in green spaces on mental health. The findings indicate that a higher ratio of street green areas is associated with fewer depressive symptoms. Increased walking frequency is linked to a reduction in depressive symptoms or a weaker manifestation of such symptoms. The interaction effect confirms that more frequent walking in green spaces is associated with weaker depressive symptoms. Lower ratios of visual complexity are correlated with reduced depressive symptoms. This study contributes to addressing urban residents' mental health issues at the community level by emphasizing the importance of the street green environment in residential areas.

Technology Competitiveness in the AI-Edutech Field: Using Patent Indice and Hurdle Negative Binomial Model (특허 자료를 활용한 AI-에듀테크 분야 국가 간 기술 경쟁력 분석: 특허 통계 지표와 허들 음이항 모델의 활용)

  • Ilyong Ji;Hyun-young Bae
    • Journal of Industrial Convergence
    • /
    • v.22 no.8
    • /
    • pp.1-17
    • /
    • 2024
  • Recently, interest in edutech has been focused on its fusion with AI technology, and the market in this field is expanding. This study aims to analyze the technological competitiveness and key technological areas of major countries in the AI-edutech field. Additionally, considering that AI-edutech is a convergence of AI technology and edutech, the study seeks to examine the path dependence of AI-edutech in each country to determine whether they are based on existing AI technologies or edutech. To this end, AI-edutech patents were collected and competitiveness was analyzed using patent activity, patent impact, and market acquisition indicators. Path dependence for each country was analyzed using the hurdle negative binomial regression model. The analysis results indicate that the major countries in the AI-edutech field are China, South Korea, the United States, India, and Japan. In terms of patent activity, China had the highest level, followed by South Korea. In terms of patent impact and market securing power, the United States was high in both aspects, Japan had high market securing power, and South Korea had high patent influence. The results of the hurdle negative binomial analysis presented unique findings. The logit part results indicated that the possession of existing AI and edutech did not positively affect the emergence of current AI-edutech, but the count part results showed a positive influence. This suggests that, overall, it is difficult to assert that current AI-edutechs are based on past AI and edutechs. However, once some AI-edutechs based on existing AI and edutechs emerge, they are influenced by the existing technologies. These findings provide implications for future research and technological strategies in this field.

A Study on the Duration of Volunteering (자원봉사활동의 지속성에 관한 연구)

  • Song, Kee-Young;Kim, Wook-Jin
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.4
    • /
    • pp.444-460
    • /
    • 2017
  • The duration of volunteering can be analyzed in terms of commitment and attachment. Previous studies have investigated the duration of volunteering predominantly from the perspective of commitment. Alternatively, this study focuses on the concept of attachment and investigates the characteristics of those who volunteer habitually over their whole life, regardless of the regularity and the intensity of the volunteer work. In so doing, the study attempts to identify factors associated with the attachment to volunteering. Data came from a sample of 8,415 participants, ages over twenty who responded to all the surveys of the Korea Welfare Panel Study, from Wave 1 to 10. Zero-inflated negative bionomial regression model was employed to analyze the total number of volunteering in the past ten years. Findings show that people with high attachment to volunteering were those with religion, less education, and a strong sense of reciprocity. Based on the findings, we provide the practical implications for the improved operation and management of volunteer organizations.