• Title/Summary/Keyword: 오차모수

Search Result 211, Processing Time 0.021 seconds

Bayesian Clustering of Prostate Cancer Patients by Using a Latent Class Poisson Model (잠재그룹 포아송 모형을 이용한 전립선암 환자의 베이지안 그룹화)

  • Oh Man-Suk
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.1
    • /
    • pp.1-13
    • /
    • 2005
  • Latent Class model has been considered recently by many researchers and practitioners as a tool for identifying heterogeneous segments or groups in a population, and grouping objects into the segments. In this paper we consider data on prostate cancer patients from Korean National Cancer Institute and propose a method for grouping prostate cancer patients by using latent class Poisson model. A Bayesian approach equipped with a Markov chain Monte Carlo method is used to overcome the limit of classical likelihood approaches. Advantages of the proposed Bayesian method are easy estimation of parameters with their standard errors, segmentation of objects into groups, and provision of uncertainty measures for the segmentation. In addition, we provide a method to determine an appropriate number of segments for the given data so that the method automatically chooses the number of segments and partitions objects into heterogeneous segments.

Smoothing parameter selection in semi-supervised learning (준지도 학습의 모수 선택에 관한 연구)

  • Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.993-1000
    • /
    • 2016
  • Semi-supervised learning makes it easy to use an unlabeled data in the supervised learning such as classification. Applying the semi-supervised learning on the regression analysis, we propose two methods for a better regression function estimation. The proposed methods have been assumed different marginal densities of independent variables and different smoothing parameters in unlabeled and labeled data. We shows that the overfitted pilot estimator should be used to achieve the fastest convergence rate and unlabeled data may help to improve the convergence rate with well estimated smoothing parameters. We also find the conditions of smoothing parameters to achieve optimal convergence rate.

Export Behaviors of the Passenger Cars of Gunsan, Pyeongtaek and Ulsan Port (항만별 승용차 수출 행태: 군산항.평택항.울산항)

  • Mo, Soo-Won
    • Journal of Korea Port Economic Association
    • /
    • v.27 no.2
    • /
    • pp.27-38
    • /
    • 2011
  • The paper aims at examining the behavioral characteristics of the passenger car export of Gunsan, Pyeongtaek, and Ulsan port. This is accomplished by modelling export demand as exchange rate and the Unites States industrial production. All series span the period January 2001 to December 2010. I first show that both the series and the residuals are stationary at the 5 percent significance level. The result cannot reject the null hypothesis of a unit root in each of the level variables and of a unit root for the residuals from the cointegration regression at the 5 percent significance level. I hitherto make use of forecast error decomposition and historical decompositions The forecast error decomposition indicates that car export is endogenous to industrial production and exchange rate. The historical decompositions for the export show that the entire difference between actual export and the base forecast can be attributed to industrial production shocks since exchange rate moves closer to the actual data or the base forecast. It indicates that industrial production outperforms exchange rate in explaining the passenger car exports.

Power study for 2 × 2 factorial design in 4 × 4 latin square design (4 × 4 라틴방격모형 내 2 × 2 요인모형의 검정력 연구)

  • Choi, Young Hun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1195-1205
    • /
    • 2014
  • Compared with single design, powers of rank transformed statistic for testing main and interaction effects for $2{\times}2$ factorial in $4{\times}4$ latin square design are rapidly increased as effect size and replication size are increased. In general powers of rank transformed statistic are superior without regard to the diversified effect composition and the type of error distributions as nontesting factors are few and effect size are small. Powers of rank transformed statistic show much higher level than those of parametric statistic in exponential and double exponential distributions. Further powers of rank transformed statistic are very similar with those of parametric statistic in normal and uniform distributions.

Neural network analysis using neuralnet in R (R의 neuralnet을 활용한 신경망분석)

  • Baik, Jaiwook
    • Industry Promotion Research
    • /
    • v.6 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • We investigated multi-layer perceptrons and supervised learning algorithms, and also examined how to model functional relationships between covariates and response variables using a package called neuralnet. The algorithm applied in this paper is characterized by continuous adjustment of the weights, which are parameters to minimize the error function based on the comparison between the actual and predicted values of the response variable. In the neuralnet package, the activation and error functions can be appropriately selected according to the given situation, and the remaining parameters can be set as default values. As a result of using the neuralnet package for the infertility data, we found that age has little influence on infertility among the four independent variables. In addition, the weight of the neural network takes various values from -751.6 to 7.25, and the intercepts of the first hidden layer are -92.6 and 7.25, and the weights for the covariates age, parity, induced, and spontaneous to the first hidden neuron are identified as 3.17, -5.20, -36.82, and -751.6.

Nonparametric compositional data analysis for tourism industry in Gangwon area (강원도 관광산업에 대한 비모수적 구성비 자료 분석)

  • Seongeun Park;Jeong Min Jeon;Young Kyung Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.5
    • /
    • pp.473-488
    • /
    • 2023
  • Gangwon-do is one of Korea's most popular tourist destinations, with varying tourism demands and trends across its subregions. It is crucial to identify the characteristics of tourism in each area and compare the tourism patterns over time to devise policies that revitalize tourism in each local government and promote balanced development across regions. In this paper, we classify the regions in Gangwon-do based on tourism data from the last four years and analyze the tourism pattern of each region using the non-Euclidean additive model proposed by Jeon et al. (2021). The model incorporates the proportions of visitors by age groups and the proportions of navigation searches by destination types as two covariates, and the proportions of tourism expenditure types as a response variable. We estimate the model using the smooth-backfitting method and coordinate-wise bandwidth selection. The results are visualized in ternary plots, and changes in tourism patterns over time are analyzed by comparing the ratios of prediction errors to fitting errors.

Prediction of Wind Damage Risk based on Estimation of Probability Distribution of Daily Maximum Wind Speed (일 최대풍속의 추정확률분포에 의한 농작물 강풍 피해 위험도 판정 방법)

  • Kim, Soo-ock
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.19 no.3
    • /
    • pp.130-139
    • /
    • 2017
  • The crop damage caused by strong wind was predicted using the wind speed data available from Korean Meteorological Administration (KMA). Wind speed data measured at 19 automatic weather stations in 2012 were compared with wind data available from the KMA's digital forecast. Linear regression equations were derived using the maximum value of wind speed measurements for the three-hour period prior to a given hour and the digital forecasts at the three-hour interval. Estimates of daily maximum wind speed were obtained from the regression equation finding the greatest value among the maximum wind speed at the three-hour interval. The estimation error for the daily maximum wind speed was expressed using normal distribution and Weibull distribution probability density function. The daily maximum wind speed was compared with the critical wind speed that could cause crop damage to determine the level of stages for wind damage, e.g., "watch" or "warning." Spatial interpolation of the regression coefficient for the maximum wind speed, the standard deviation of the estimation error at the automated weather stations, the parameters of Weibull distribution was performed. These interpolated values at the four synoptic weather stations including Suncheon, Namwon, Imsil, and Jangsu were used to estimate the daily maximum wind speed in 2012. The wind damage risk was determined using the critical wind speed of 10m/s under the assumption that the fruit of a pear variety Mansamgil would begin to drop at 10 m/s. The results indicated that the Weibull distribution was more effective than the normal distribution for the estimation error probability distribution for assessing wind damage risk.

Impact of Heterogeneous Dispersion Parameter on the Expected Crash Frequency (이질적 과분산계수가 기대 교통사고건수 추정에 미치는 영향)

  • Shin, Kangwon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.9
    • /
    • pp.5585-5593
    • /
    • 2014
  • This study tested the hypothesis that the significance of the heterogeneous dispersion parameter in safety performance function (SPF) used to estimate the expected crashes is affected by the endogenous heterogeneous prior distributions, and analyzed the impacts of the mis-specified dispersion parameter on the evaluation results for traffic safety countermeasures. In particular, this study simulated the Poisson means based on the heterogeneous dispersion parameters and estimated the SPFs using both the negative binomial (NB) model and the heterogeneous negative binomial (HNB) model for analyzing the impacts of the model mis-specification on the mean and dispersion functions in SPF. In addition, this study analyzed the characteristics of errors in the crash reduction factors (CRFs) obtained when the two models are used to estimate the posterior means and variances, which are essentially estimated through the estimated hyper-parameters in the heterogeneous prior distributions. The simulation study results showed that a mis-estimation on the heterogeneous dispersion parameters through the NB model does not affect the coefficient of the mean functions, but the variances of the prior distribution are seriously mis-estimated when the NB model is used to develop SPFs without considering the heterogeneity in dispersion. Consequently, when the NB model is used erroneously to estimate the prior distributions with heterogeneous dispersion parameters, the mis-estimated posterior mean can produce large errors in CRFs up to 120%.

Documents recommendation using large citation data (거대 인용 자료를 이용한 문서 추천 방법)

  • Chae, Minwoo;Kang, Minsoo;Kim, Yongdai
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.999-1011
    • /
    • 2013
  • In this research, we propose a document recommendation method which can find documents that are relatively important to a specific document based on citation information. The key idea is parameter tuning in the Neumann kernal which is an intermediate between a measure of importance (HITS) and of relatedness (co-citation). Our method properly selects the tuning parameter ${\gamma}$ in the Neumann kernal minimizing the prediction error in future citation. We also discuss some comutational issues needed for analysing large citation data. Finally, results of analyzing patents data from the US Patent Office are given.

A Study for Forecasting Methods of ARMA-GARCH Model Using MCMC Approach (MCMC 방법을 이용한 ARMA-GARCH 모형에서의 예측 방법 연구)

  • Chae, Wha-Yeon;Choi, Bo-Seung;Kim, Kee-Whan;Park, You-Sung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.293-305
    • /
    • 2011
  • The volatility is one of most important parameters in the areas of pricing of financial derivatives an measuring risks arising from a sudden change of economic circumstance. We propose a Bayesian approach to estimate the volatility varying with time under a linear model with ARMA(p, q)-GARCH(r, s) errors. This Bayesian estimate of the volatility is compared with the ML estimate. We also present the probability of existence of the unit root in the GARCH model.