• Title/Summary/Keyword: Bayes factors

Search Result 107, Processing Time 0.074 seconds

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

Uncertainty assessment of ensemble streamflow prediction method (앙상블 유량예측기법의 불확실성 평가)

  • Kim, Seon-Ho;Kang, Shin-Uk;Bae, Deg-Hyo
    • Journal of Korea Water Resources Association
    • /
    • v.51 no.6
    • /
    • pp.523-533
    • /
    • 2018
  • The objective of this study is to analyze uncertainties of ensemble-based streamflow prediction method for model parameters and input data. ESP (Ensemble Streamflow Prediction) and BAYES-ESP (Bayesian-ESP) based on ABCD rainfall-runoff model were selected as streamflow prediction method. GLUE (Generalized Likelihood Uncertainty Estimation) was applied for the analysis of parameter uncertainty. The analysis of input uncertainty was performed according to the duration of meteorological scenarios for ESP. The result showed that parameter uncertainty was much more significant than input uncertainty for the ensemble-based streamflow prediction. It also indicated that the duration of observed meteorological data was appropriate to using more than 20 years. And the BAYES-ESP was effective to reduce uncertainty of ESP method. It is concluded that this analysis is meaningful for elaborating characteristics of ESP method and error factors of ensemble-based streamflow prediction method.

On Flexible Bayesian Test Criteria for Nested Point Null Hypotheses of Multiple Regression Coefficients

  • Jae-Hyun Kim;Hea-Jung Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.3 no.3
    • /
    • pp.205-214
    • /
    • 1996
  • As flexible Bayesian test criteria for nested point null hypotheses of multiple regression coefficients, partial and overall Bayes factors are introduced under a class of intuitively meaningful prior. The criteria lead to a simple method for considering different prior beliefs on the subspaces that constitute a partition of the coefficient parameter space. A couple of tests are suggested based on the criteria. It is shown that they enable us to obtain pairwise comparisons of hypotheses of the partitioned subspaces. Through a Monte Carlo simulation, performance of the tests based on the criteria are compared with the usual Bayesian test (based on Bayes factor)in terms of their respective powers.

  • PDF

Bayesian Testing for the Equality of Two Lognormal Populations (로그정규분포의 상등에 관한 베이지안 검정)

  • Moon, Kyoung-Ae;Shin, Im-Hee;Kim, Dal-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.2
    • /
    • pp.269-277
    • /
    • 2000
  • We propose the Bayesian testing for the equality of two log-normal population means. Specifically we use the intrinsic Bayes factors suggested by Berger and Perichi (1996, 1998) based on the noninformative priors for the parameters. In order to investigate the usefulness of the proposed Bayesian testing procedures, we compare it with classical tests via both real data analysis and simulation.

  • PDF

A Computer Code Development for Updating Reliability Data Using Bayes' Theorem and Its Application (Bayes정리를 이용한 신뢰도 자료 평가용 전산코드 개발 및 응용)

  • Won-Guk Hwang;Kun Joong Yoo
    • Nuclear Engineering and Technology
    • /
    • v.15 no.1
    • /
    • pp.41-49
    • /
    • 1983
  • A computer code, BERD (Bayesian Estimation of Reliability Data), has been developed and tested in order to update the data for the reliability analysis of safety related systems in a specific nuclear power plant. The code has been used to derive the plant-specific data for reliability analysis of the auxiliary feedwater system of a pressurized water reactor. The prior information for components selected was taken from the U.S. Reactor Safety Study, WASH-1400, and the operating experiences from published licensee event reports. The results show that the updated data are well fitted to log-normal distribution curves and the error factors are reduced significantly.

  • PDF

The Study of Chronic Kidney Disease Classification using KHANES data (국민건강영양조사 자료를 이용한 만성신장질환 분류기법 연구)

  • Lee, Hong-Ki;Myoung, Sungmin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.01a
    • /
    • pp.271-272
    • /
    • 2020
  • Data mining is known useful in medical area when no availability of evidence favoring a particular treatment option is found. Huge volume of structured/unstructured data is collected by the healthcare field in order to find unknown information or knowledge for effective diagnosis and clinical decision making. The data of 5,179 records considered for analysis has been collected from Korean National Health and Nutrition Examination Survey(KHANES) during 2-years. Data splitting, referred as the training and test sets, was applied to predict to fit the model. We analyzed to predict chronic kidney disease (CKD) using data mining method such as naive Bayes, logistic regression, CART and artificial neural network(ANN). This result present to select significant features and data mining techniques for the lifestyle factors related CKD.

  • PDF

Performance analysis and comparison of various machine learning algorithms for early stroke prediction

  • Vinay Padimi;Venkata Sravan Telu;Devarani Devi Ningombam
    • ETRI Journal
    • /
    • v.45 no.6
    • /
    • pp.1007-1021
    • /
    • 2023
  • Stroke is the leading cause of permanent disability in adults, and it can cause permanent brain damage. According to the World Health Organization, 795 000 Americans experience a new or recurrent stroke each year. Early detection of medical disorders, for example, strokes, can minimize the disabling effects. Thus, in this paper, we consider various risk factors that contribute to the occurrence of stoke and machine learning algorithms, for example, the decision tree, random forest, and naive Bayes algorithms, on patient characteristics survey data to achieve high prediction accuracy. We also consider the semisupervised self-training technique to predict the risk of stroke. We then consider the near-miss undersampling technique, which can select only instances in larger classes with the smaller class instances. Experimental results demonstrate that the proposed method obtains an accuracy of approximately 98.83% at low cost, which is significantly higher and more reliable compared with the compared techniques.

A Bayesian cure rate model with dispersion induced by discrete frailty

  • Cancho, Vicente G.;Zavaleta, Katherine E.C.;Macera, Marcia A.C.;Suzuki, Adriano K.;Louzada, Francisco
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.5
    • /
    • pp.471-488
    • /
    • 2018
  • In this paper, we propose extending proportional hazards frailty models to allow a discrete distribution for the frailty variable. Having zero frailty can be interpreted as being immune or cured. Thus, we develop a new survival model induced by discrete frailty with zero-inflated power series distribution, which can account for overdispersion. This proposal also allows for a realistic description of non-risk individuals, since individuals cured due to intrinsic factors (immunes) are modeled by a deterministic fraction of zero-risk while those cured due to an intervention are modeled by a random fraction. We put the proposed model in a Bayesian framework and use a Markov chain Monte Carlo algorithm for the computation of posterior distribution. A simulation study is conducted to assess the proposed model and the computation algorithm. We also discuss model selection based on pseudo-Bayes factors as well as developing case influence diagnostics for the joint posterior distribution through ${\psi}-divergence$ measures. The motivating cutaneous melanoma data is analyzed for illustration purposes.

Bayesian Multiple Comparisons for K-Exponential Populations with Type-II Censored Data by Fractional Bayes Factors

  • Mun, Gyeong-Ae;Kim, Dal-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.1
    • /
    • pp.67-77
    • /
    • 2002
  • We propose the Bayesian testing for the equality of K-exponential populations means with Type-II censored data. Specially we use the fractional Bayesian factors suggested by O'Hagan (1995) based on the noninformative priors for the parameters. And, we investigate the usefulness of the proposed Bayesian testing procedures via both real data analysis and simulations and compare the classical likelihood ratio(LR) test with the proposed Bayesian test.

  • PDF

Clustering and classification to characterize daily electricity demand (시간단위 전력사용량 시계열 패턴의 군집 및 분류분석)

  • Park, Dain;Yoon, Sanghoo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.395-406
    • /
    • 2017
  • The purpose of this study is to identify the pattern of daily electricity demand through clustering and classification. The hourly data was collected by KPS (Korea Power Exchange) between 2008 and 2012. The time trend was eliminated for conducting the pattern of daily electricity demand because electricity demand data is times series data. We have considered k-means clustering, Gaussian mixture model clustering, and functional clustering in order to find the optimal clustering method. The classification analysis was conducted to understand the relationship between external factors, day of the week, holiday, and weather. Data was divided into training data and test data. Training data consisted of external factors and clustered number between 2008 and 2011. Test data was daily data of external factors in 2012. Decision tree, random forest, Support vector machine, and Naive Bayes were used. As a result, Gaussian model based clustering and random forest showed the best prediction performance when the number of cluster was 8.