• Title/Summary/Keyword: Bayesian posterior

Search Result 345, Processing Time 0.027 seconds

Survival Analysis for White Non-Hispanic Female Breast Cancer Patients

  • Khan, Hafiz Mohammad Rafiqullah;Saxena, Anshul;Gabbidon, Kemesha;Stewart, Tiffanie Shauna-Jeanne;Bhatt, Chintan
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.9
    • /
    • pp.4049-4054
    • /
    • 2014
  • Background: Race and ethnicity are significant factors in predicting survival time of breast cancer patients. In this study, we applied advanced statistical methods to predict the survival of White non-Hispanic female breast cancer patients, who were diagnosed between the years 1973 and 2009 in the United States (U.S.). Materials and Methods: Demographic data from the Surveillance Epidemiology and End Results (SEER) database were used for the purpose of this study. Nine states were randomly selected from 12 U.S. cancer registries. A stratified random sampling method was used to select 2,000 female breast cancer patients from these nine states. We compared four types of advanced statistical probability models to identify the best-fit model for the White non-Hispanic female breast cancer survival data. Three model building criterion were used to measure and compare goodness of fit of the models. These include Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and Deviance Information Criteria (DIC). In addition, we used a novel Bayesian method and the Markov Chain Monte Carlo technique to determine the posterior density function of the parameters. After evaluating the model parameters, we selected the model having the lowest DIC value. Using this Bayesian method, we derived the predictive survival density for future survival time and its related inferences. Results: The analytical sample of White non-Hispanic women included 2,000 breast cancer cases from the SEER database (1973-2009). The majority of cases were married (55.2%), the mean age of diagnosis was 63.61 years (SD = 14.24) and the mean survival time was 84 months (SD = 35.01). After comparing the four statistical models, results suggested that the exponentiated Weibull model (DIC= 19818.220) was a better fit for White non-Hispanic females' breast cancer survival data. This model predicted the survival times (in months) for White non-Hispanic women after implementation of precise estimates of the model parameters. Conclusions: By using modern model building criteria, we determined that the data best fit the exponentiated Weibull model. We incorporated precise estimates of the parameter into the predictive model and evaluated the survival inference for the White non-Hispanic female population. This method of analysis will assist researchers in making scientific and clinical conclusions when assessing survival time of breast cancer patients.

Recurrent Neural Network Modeling of Etch Tool Data: a Preliminary for Fault Inference via Bayesian Networks

  • Nawaz, Javeria;Arshad, Muhammad Zeeshan;Park, Jin-Su;Shin, Sung-Won;Hong, Sang-Jeen
    • Proceedings of the Korean Vacuum Society Conference
    • /
    • 2012.02a
    • /
    • pp.239-240
    • /
    • 2012
  • With advancements in semiconductor device technologies, manufacturing processes are getting more complex and it became more difficult to maintain tighter process control. As the number of processing step increased for fabricating complex chip structure, potential fault inducing factors are prevail and their allowable margins are continuously reduced. Therefore, one of the key to success in semiconductor manufacturing is highly accurate and fast fault detection and classification at each stage to reduce any undesired variation and identify the cause of the fault. Sensors in the equipment are used to monitor the state of the process. The idea is that whenever there is a fault in the process, it appears as some variation in the output from any of the sensors monitoring the process. These sensors may refer to information about pressure, RF power or gas flow and etc. in the equipment. By relating the data from these sensors to the process condition, any abnormality in the process can be identified, but it still holds some degree of certainty. Our hypothesis in this research is to capture the features of equipment condition data from healthy process library. We can use the health data as a reference for upcoming processes and this is made possible by mathematically modeling of the acquired data. In this work we demonstrate the use of recurrent neural network (RNN) has been used. RNN is a dynamic neural network that makes the output as a function of previous inputs. In our case we have etch equipment tool set data, consisting of 22 parameters and 9 runs. This data was first synchronized using the Dynamic Time Warping (DTW) algorithm. The synchronized data from the sensors in the form of time series is then provided to RNN which trains and restructures itself according to the input and then predicts a value, one step ahead in time, which depends on the past values of data. Eight runs of process data were used to train the network, while in order to check the performance of the network, one run was used as a test input. Next, a mean squared error based probability generating function was used to assign probability of fault in each parameter by comparing the predicted and actual values of the data. In the future we will make use of the Bayesian Networks to classify the detected faults. Bayesian Networks use directed acyclic graphs that relate different parameters through their conditional dependencies in order to find inference among them. The relationships between parameters from the data will be used to generate the structure of Bayesian Network and then posterior probability of different faults will be calculated using inference algorithms.

  • PDF

Statistical Analysis of Count Rate Data for On-line Seawater Radioactivity Monitoring

  • Lee, Dong-Myung;Cong, Binh Do;Lee, Jun-Ho;Yeo, In-Young;Kim, Cheol-Su
    • Journal of Radiation Protection and Research
    • /
    • v.44 no.2
    • /
    • pp.64-71
    • /
    • 2019
  • Background: It is very difficult to distinguish between a radioactive contamination source and background radiation from natural radionuclides in the marine environment by means of online monitoring system. The objective of this study was to investigate a statistical process for triggering abnormal level of count rate data measured from our on-line seawater radioactivity monitoring. Materials and Methods: Count rate data sets in time series were collected from 9 monitoring posts. All of the count rate data were measured every 15 minutes from the region of interest (ROI) for $^{137}Cs$ ($E_{\gamma}=661.6keV$) on the gamma-ray energy spectrum. The Shewhart ($3{\sigma}$), CUSUM, and Bayesian S-R control chart methods were evaluated and the comparative analysis of determination methods for count rate data was carried out in terms of the false positive incidence rate. All statistical algorithms were developed using R Programming by the authors. Results and Discussion: The $3{\sigma}$, CUSUM, and S-R analyses resulted in the average false positive incidence rate of $0.164{\pm}0.047%$, $0.064{\pm}0.0367%$, and $0.030{\pm}0.018%$, respectively. The S-R method has a lower value than that of the $3{\sigma}$ and CUSUM method, because the Bayesian S-R method use the information to evaluate a posterior distribution, even though the CUSUM control chart accumulate information from recent data points. As the result of comparison between net count rate and gross count rate measured in time series all the year at a monitoring post using the $3{\sigma}$ control charts, the two methods resulted in the false positive incidence rate of 0.142% and 0.219%, respectively. Conclusion: Bayesian S-R and CUSUM control charts are better suited for on-line seawater radioactivity monitoring with an count rate data in time series than $3{\sigma}$ control chart. However, it requires a continuous increasing trend to differentiate between a false positive and actual radioactive contamination. For the determination of count rate, the net count method is better than the gross count method because of relatively a small variation in the data points.

Empirical Bayesian Misclassification Analysis on Categorical Data (범주형 자료에서 경험적 베이지안 오분류 분석)

  • 임한승;홍종선;서문섭
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.39-57
    • /
    • 2001
  • Categorical data has sometimes misclassification errors. If this data will be analyzed, then estimated cell probabilities could be biased and the standard Pearson X2 tests may have inflated true type I error rates. On the other hand, if we regard wellclassified data with misclassified one, then we might spend lots of cost and time on adjustment of misclassification. It is a necessary and important step to ask whether categorical data is misclassified before analyzing data. In this paper, when data is misclassified at one of two variables for two-dimensional contingency table and marginal sums of a well-classified variable are fixed. We explore to partition marginal sums into each cells via the concepts of Bound and Collapse of Sebastiani and Ramoni (1997). The double sampling scheme (Tenenbein 1970) is used to obtain informations of misclassification. We propose test statistics in order to solve misclassification problems and examine behaviors of the statistics by simulation studies.

  • PDF

An Application of Dirichlet Mixture Model for Failure Time Density Estimation to Components of Naval Combat System (디리슈레 혼합모형을 이용한 함정 전투체계 부품의 고장시간 분포 추정)

  • Lee, Jinwhan;Kim, Jung Hun;Jung, BongJoo;Kim, Kyeongtaek
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.42 no.4
    • /
    • pp.194-202
    • /
    • 2019
  • Reliability analysis of the components frequently starts with the data that manufacturer provides. If enough failure data are collected from the field operations, the reliability should be recomputed and updated on the basis of the field failure data. However, when the failure time record for a component contains only a few observations, all statistical methodologies are limited. In this case, where the failure records for multiple number of identical components are available, a valid alternative is combining all the data from each component into one data set with enough sample size and utilizing the useful information in the censored data. The ROK Navy has been operating multiple Patrol Killer Guided missiles (PKGs) for several years. The Korea Multi-Function Control Console (KMFCC) is one of key components in PKG combat system. The maintenance record for the KMFCC contains less than ten failure observations and a censored datum. This paper proposes a Bayesian approach with a Dirichlet mixture model to estimate failure time density for KMFCC. Trends test for each component record indicated that null hypothesis, that failure occurrence is renewal process, is not rejected. Since the KMFCCs have been functioning under different operating environment, the failure time distribution may be a composition of a number of unknown distributions, i.e. a mixture distribution, rather than a single distribution. The Dirichlet mixture model was coded as probabilistic programming in Python using PyMC3. Then Markov Chain Monte Carlo (MCMC) sampling technique employed in PyMC3 probabilistically estimated the parameters' posterior distribution through the Dirichlet mixture model. The simulation results revealed that the mixture models provide superior fits to the combined data set over single models.

Nonignorable Nonresponse Imputation and Rotation Group Bias Estimation on the Rotation Sample Survey (무시할 수 없는 무응답을 가지고 있는 교체표본조사에서의 무응답 대체와 교체그룹 편향 추정)

  • Choi, Bo-Seung;Kim, Dae-Young;Kim, Kee-Whan;Park, You-Sung
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.361-375
    • /
    • 2008
  • We propose proper methods to impute the item nonresponse in 4-8-4 rotation sample survey. We consider nonignorable nonresponse mechanism that can happen when survey deals with sensitive question (e.g. income, labor force). We utilize modeling imputation method based on Bayesian approach to avoid a boundary solution problem. We also estimate a interview time bias using imputed data and calculate cell expectation and marginal probability on fixed time after removing estimated bias. We compare the mean squared errors and bias between maximum likelihood method and Bayesian methods using simulation studies.

Estimating the Interim Rate of Votes Earned Based on the Exit Poll Results during the Coverage of Ballot Results by Broadcasters (선거 개표방송에서 출구조사 자료를 활용한 중간 득표율 추정에 관한 연구)

  • Lee, Yoon-Dong;Park, Jin-Woo
    • Survey Research
    • /
    • v.12 no.1
    • /
    • pp.141-152
    • /
    • 2011
  • During major elections, three terrestrial broadcasting stations in Korea have covered the progresses of election results by announcing the simple sum of ballot counts of all ballot counting stations. The current approach, however, does not properly reflect the actual ballot count differences by ballot counting location, leading to cause unnecessary but possible confusions. In addition, the current coverage approach restricts the broadcasters from using regional poll data gained through exit polls by letting them to use the significant information on a one-off purpose to announce the initial prediction of the poll results and to fully disregard the exit poll results during the ballot counting process. Based on the understanding, this paper is designed to suggest a Bayesian approach to consolidate the exit poll results with the progressive ballot counting results and announce them as such. The suggested consolidation approach is expected to mitigate or avoid the possible confusions that may arise in connection with the different ballot counting paces by ballot counting station.

  • PDF

Climate Change Scenario Generation and Uncertainty Assessment: Multiple variables and potential hydrological impacts

  • Kwon, Hyun-Han;Park, Rae-Gun;Choi, Byung-Kyu;Park, Se-Hoon
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2010.05a
    • /
    • pp.268-272
    • /
    • 2010
  • The research presented here represents a collaborative effort with the SFWMD on developing scenarios for future climate for the SFWMD area. The project focuses on developing methodology for simulating precipitation representing both natural quasi-oscillatory modes of variability in these climate variables and also the secular trends projected by the IPCC scenarios that are publicly available. This study specifically provides the results for precipitation modeling. The starting point for the modeling was the work of Tebaldi et al that is considered one of the benchmarks for bias correction and model combination in this context. This model was extended in the framework of a Hierarchical Bayesian Model (HBM) to formally and simultaneously consider biases between the models and observations over the historical period and trends in the observations and models out to the end of the 21st century in line with the different ensemble model simulations from the IPCC scenarios. The low frequency variability is modeled using the previously developed Wavelet Autoregressive Model (WARM), with a correction to preserve the variance associated with the full series from the HBM projections. The assumption here is that there is no useful information in the IPCC models as to the change in the low frequency variability of the regional, seasonal precipitation. This assumption is based on a preliminary analysis of these models historical and future output. Thus, preserving the low frequency structure from the historical series into the future emerges as a pragmatic goal. We find that there are significant biases between the observations and the base case scenarios for precipitation. The biases vary across models, and are shrunk using posterior maximum likelihood to allow some models to depart from the central tendency while allowing others to cluster and reduce biases by averaging. The projected changes in the future precipitation are small compared to the bias between model base run and observations and also relative to the inter-annual and decadal variability in the precipitation.

  • PDF

Dynamic quantitative risk assessment of accidents induced by leakage on offshore platforms using DEMATEL-BN

  • Meng, Xiangkun;Chen, Guoming;Zhu, Gaogeng;Zhu, Yuan
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • v.11 no.1
    • /
    • pp.22-32
    • /
    • 2019
  • On offshore platforms, oil and gas leaks are apt to be the initial events of major accidents that may result in significant loss of life and property damage. To prevent accidents induced by leakage, it is vital to perform a case-specific and accurate risk assessment. This paper presents an integrated method of Ddynamic Qquantitative Rrisk Aassessment (DQRA)-using the Decision Making Trial and Evaluation Laboratory (DEMATEL)-Bayesian Network (BN)-for evaluation of the system vulnerabilities and prediction of the occurrence probabilities of accidents induced by leakage. In the method, three-level indicators are established to identify factors, events, and subsystems that may lead to leakage, fire, and explosion. The critical indicators that directly influence the evolution of risk are identified using DEMATEL. Then, a sequential model is developed to describe the escalation of initial events using an Event Tree (ET), which is converted into a BN to calculate the posterior probabilities of indicators. Using the newly introduced accident precursor data, the failure probabilities of safety barriers and basic factors, and the occurrence probabilities of different consequences can be updated using the BN. The proposed method overcomes the limitations of traditional methods that cannot effectively utilize the operational data of platforms. This work shows trends of accident risks over time and provides useful information for risk control of floating marine platforms.

Estimation of genetic relationships between growth curve parameters in Guilan sheep

  • Hossein-Zadeh, Navid Ghavi
    • Journal of Animal Science and Technology
    • /
    • v.57 no.5
    • /
    • pp.19.1-19.6
    • /
    • 2015
  • The objective of this study was to estimate variance components and genetic parameters for growth curve parameters in Guilan sheep. Studied traits were parameters of Brody growth model which included A (asymptotic mature weight), B (initial animal weight) and K (maturation rate). The data set and pedigree information used in this study were obtained from the Agricultural Organization of Guilan province (Rasht, Iran) and comprised 8647 growth curve records of lambs from birth to 240 days of age during 1994 to 2014. Marginal posterior distributions of parameters and variance components were estimated using TM program. The Gibbs sampler was run 300000 rounds and the first 60000 rounds were discarded as a burn-in period. Posterior mean estimates of direct heritabilities for A, B and K were 0.39, 0.23 and 0.039, respectively. Estimates of direct genetic correlation between growth curve parameters were 0.57, 0.03 and -0.01 between A-B, A-K and B-K, respectively. Estimates of direct genetic trends for A, B and K were positive and their corresponding values were $0.014{\pm}0.003$ (P < 0.001), $0.0012{\pm}0.0009$ (P > 0.05) and $0.000002{\pm}0.0001$ (P > 0.05), respectively. Residual correlations between growth curve parameters varied form -0.52 (between A-K) to 0.48 (between A-B). Also, phenotypic correlations between growth curve parameters varied form -0.49 (between A-K) to 0.47 (between A-B). The results of this study indicated that improvement of growth curve parameters of Guilan sheep seems feasible in selection programs. It is worthwhile to develop a selection strategy to obtain an appropriate shape of growth curve through changing genetically the parameters of growth model.