• Title/Summary/Keyword: Bayes test

Search Result 110, Processing Time 0.024 seconds

Metabolic Syndrome Prediction Using Machine Learning Models with Genetic and Clinical Information from a Nonobese Healthy Population

  • Choe, Eun Kyung;Rhee, Hwanseok;Lee, Seungjae;Shin, Eunsoon;Oh, Seung-Won;Lee, Jong-Eun;Choi, Seung Ho
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.31.1-31.7
    • /
    • 2018
  • The prevalence of metabolic syndrome (MS) in the nonobese population is not low. However, the identification and risk mitigation of MS are not easy in this population. We aimed to develop an MS prediction model using genetic and clinical factors of nonobese Koreans through machine learning methods. A prediction model for MS was designed for a nonobese population using clinical and genetic polymorphism information with five machine learning algorithms, including naïve Bayes classification (NB). The analysis was performed in two stages (training and test sets). Model A was designed with only clinical information (age, sex, body mass index, smoking status, alcohol consumption status, and exercise status), and for model B, genetic information (for 10 polymorphisms) was added to model A. Of the 7,502 nonobese participants, 647 (8.6%) had MS. In the test set analysis, for the maximum sensitivity criterion, NB showed the highest sensitivity: 0.38 for model A and 0.42 for model B. The specificity of NB was 0.79 for model A and 0.80 for model B. In a comparison of the performances of models A and B by NB, model B (area under the receiver operating characteristic curve [AUC] = 0.69, clinical and genetic information input) showed better performance than model A (AUC = 0.65, clinical information only input). We designed a prediction model for MS in a nonobese population using clinical and genetic information. With this model, we might convince nonobese MS individuals to undergo health checks and adopt behaviors associated with a preventive lifestyle.

Comparative Study of PSO-ANN in Estimating Traffic Accident Severity

  • Md. Ashikuzzaman;Wasim Akram;Md. Mydul Islam Anik;Taskeed Jabid;Mahamudul Hasan;Md. Sawkat Ali
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.95-100
    • /
    • 2023
  • Due to Traffic accidents people faces health and economical casualties around the world. As the population increases vehicles on road increase which leads to congestion in cities. Congestion can lead to increasing accident risks due to the expansion in transportation systems. Modern cities are adopting various technologies to minimize traffic accidents by predicting mathematically. Traffic accidents cause economical casualties and potential death. Therefore, to ensure people's safety, the concept of the smart city makes sense. In a smart city, traffic accident factors like road condition, light condition, weather condition etcetera are important to consider to predict traffic accident severity. Several machine learning models can significantly be employed to determine and predict traffic accident severity. This research paper illustrated the performance of a hybridized neural network and compared it with other machine learning models in order to measure the accuracy of predicting traffic accident severity. Dataset of city Leeds, UK is being used to train and test the model. Then the results are being compared with each other. Particle Swarm optimization with artificial neural network (PSO-ANN) gave promising results compared to other machine learning models like Random Forest, Naïve Bayes, Nearest Centroid, K Nearest Neighbor Classification. PSO- ANN model can be adopted in the transportation system to counter traffic accident issues. The nearest centroid model gave the lowest accuracy score whereas PSO-ANN gave the highest accuracy score. All the test results and findings obtained in our study can provide valuable information on reducing traffic accidents.

FAFS: A Fuzzy Association Feature Selection Method for Network Malicious Traffic Detection

  • Feng, Yongxin;Kang, Yingyun;Zhang, Hao;Zhang, Wenbo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.240-259
    • /
    • 2020
  • Analyzing network traffic is the basis of dealing with network security issues. Most of the network security systems depend on the feature selection of network traffic data and the detection ability of malicious traffic in network can be improved by the correct method of feature selection. An FAFS method, which is short for Fuzzy Association Feature Selection method, is proposed in this paper for network malicious traffic detection. Association rules, which can reflect the relationship among different characteristic attributes of network traffic data, are mined by association analysis. The membership value of association rules are obtained by the calculation of fuzzy reasoning. The data features with the highest correlation intensity in network data sets are calculated by comparing the membership values in association rules. The dimension of data features are reduced and the detection ability of malicious traffic detection algorithm in network is improved by FAFS method. To verify the effect of malicious traffic feature selection by FAFS method, FAFS method is used to select data features of different dataset in this paper. Then, K-Nearest Neighbor algorithm, C4.5 Decision Tree algorithm and Naïve Bayes algorithm are used to test on the dataset above. Moreover, FAFS method is also compared with classical feature selection methods. The analysis of experimental results show that the precision and recall rate of malicious traffic detection in the network can be significantly improved by FAFS method, which provides a valuable reference for the establishment of network security system.

Conditional Probability Based Early Termination of Recursive Coding Unit Structures in HEVC (HEVC의 재귀적 CU 구조에 대한 조건부 확률 기반 고속 탐색 알고리즘)

  • Han, Woo-Jin
    • Journal of Broadcast Engineering
    • /
    • v.17 no.2
    • /
    • pp.354-362
    • /
    • 2012
  • Recently, High Efficiency Video Coding (HEVC) is under development jointly by MPEG and ITU-T for the next international video coding standard. Compared to the previous standards, HEVC supports variety of splitting units, such as coding unit (CU), prediction unit (PU), and transform unit (TU). Among them, it has been known that the recursive quadtree structure of CU can improve the coding efficiency while the encoding complexity is increased significantly. In this paper, a simple conditional probability to predict the early termination condition of recursive unit structure is introduced. The proposed conditional probability is estimated based on Bayes' formula from local statistics of rate-distortion costs in encoder. Experimental results show that the proposed method can reduce the total encoding time by about 32% according to the test configuration while the coding efficiency loss is 0.4%-0.5%. In addition, the encoding time can be reduced by 50% with 0.9% coding efficiency loss when the proposed method was used jointly with HM4.0 early CU termination algorithm.

The use of data mining methods for dystocia detection in Polish Holstein-Friesian Black-and-White cattle

  • Zaborski, Daniel;Proskura, Witold S.;Grzesiak, Wilhelm
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.31 no.11
    • /
    • pp.1700-1713
    • /
    • 2018
  • Objective: The aim of this study was to verify the usefulness of artificial neural networks (ANN), multivariate adaptive regression splines (MARS), naïve Bayes classifier (NBC), general discriminant analysis (GDA), and logistic regression (LR) for dystocia detection in Polish Holstein-Friesian Black-and-White heifers and cows and to indicate the most influential predictors of calving difficulty. Methods: A total of 1,342 and 1,699 calving records including six categorical and four continuous predictors were used. Calving category (difficult vs easy or difficult, moderate and easy) was the dependent variable. Results: The maximum sensitivity, specificity and accuracy achieved for heifers on the independent test set were 0.855 (for ANN), 0.969 (for NBC), and 0.813 (for GDA), respectively, whereas the values for cows were 0.600 (for ANN), 1.000 and 0.965 (for NBC, GDA, and LR), respectively. With the three categories of calving difficulty, the maximum overall accuracy for heifers and cows was 0.589 (for MARS) and 0.649 (for ANN), respectively. The most influential predictors for heifers were an average calving difficulty score for the dam's sire, calving age and the mean yield of the farm, where the heifer was kept, whereas for cows, these additionally included: calf sex, the difficulty of the preceding calving, and the mean daily milk yield for the preceding lactation. Conclusion: The potential application of the investigated models in dairy cattle farming requires, however, their further improvement in order to reduce the rate of dystocia misdiagnosis and to increase detection reliability.

Study of Computer Aided Diagnosis for the Improvement of Survival Rate of Lung Cancer based on Adaboost Learning (폐암 생존율 향상을 위한 아다부스트 학습 기반의 컴퓨터보조 진단방법에 관한 연구)

  • Won, Chulho
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.10 no.1
    • /
    • pp.87-92
    • /
    • 2016
  • In this paper, we improved classification performance of benign and malignant lung nodules by including the parenchyma features. For small pulmonary nodules (4-10mm) nodules, there are a limited number of CT data voxels within the solid tumor, making them difficult to process through traditional CAD(computer aided diagnosis) tools. Increasing feature extraction to include the surrounding parenchyma will increase the CT voxel set for analysis in these very small pulmonary nodule cases and likely improve diagnostic performance while keeping the CAD tool flexible to scanner model and parameters. In AdaBoost learning using naive Bayes and SVM weak classifier, a number of significant features were selected from 304 features. The results from the COPDGene test yielded an accuracy, sensitivity and specificity of 100%. Therefore proposed method can be used for the computer aided diagnosis effectively.

Analysis of Elderly Drivers' Accident Models Considering Operations and Physical Characteristics (고령운전자 운전 및 신체특성을 반영한 교통사고 분석 연구)

  • Lim, Sam Jin;Park, Jun Tae;Kim, Young Il;Kim, Tae Ho
    • Journal of Korean Society of Transportation
    • /
    • v.30 no.6
    • /
    • pp.37-46
    • /
    • 2012
  • The number of traffic accidents caused by elderly drivers over the age of 65 has surged over the past ten years from 37,000 to 274,000 cases. The proportion of elderly drivers' accidents has jumped 3.1 times from 1.2% to 3.7% out of all traffic accidents, and traffic safety organizations are pursuing diverse measures to address the situation. Above all, connecting safety measures with an in-depth research on behavioral and physical characteristics of elderly drivers will prove vital. This study conducted an empirical research linking the driving characteristics and traffic accidents by elderly drivers based on the Driving Aptitude Test items and traffic accident data, which enabled the measurement of behavioral characteristics of elderly drivers. In developing the Influence Model, we applied the zero-inflated Poisson (ZIP) regression model and selected an accident prediction model based on the Bayesian Influence in regards to the ZIP regression model and the zero-inflated negative binomial (ZINB) regression model. According to the results of the AAE analysis, the ZIP regression model was more appropriate and it was found that three variables? prediction of velocity, diversion, and cognitive ability? had a relation of influence with traffic accidents caused by elderly drivers.

Development of integrative diagnosis methods for the jaundice through statistical analysis (통합의료적 황달진단법개발을 위한 통계적 접근방법)

  • Shin, Im Hee;Kwak, Sang Gyu;Kim, Sang Gyung;Sohn, Ki Cheul;Jung, Hyun-Jung;Cho, Yoon-Jeong;Lee, A-Jin;Kwon, O Sung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.3
    • /
    • pp.515-521
    • /
    • 2013
  • Healthcare approach in Western medicine and Korean Traditional Medicine (KTM) varies from its nature of human understanding and cultural differences. This fundamental difference in their approach of the human pathology has dualised and hindered common medical communication between the two fields of medicines. Within this current difficulty, the integrative medical service is said to become a novel method to provide the patients with the best medical care as their intent is to adapt and combine the advantages stated from the two different fields. This research paper shows the integrative approach of treating jaundice, where the symptoms of dampness and heat on Korean traditional standards are analyzed using statistical methods based on monitoring the blood test results. Therefore, we can explore an approach to diagnose and treat with comprehensive and integrative medicine algorithm.

Prototype based Classification by Generating Multidimensional Spheres per Class Area (클래스 영역의 다차원 구 생성에 의한 프로토타입 기반 분류)

  • Shim, Seyong;Hwang, Doosung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.2
    • /
    • pp.21-28
    • /
    • 2015
  • In this paper, we propose a prototype-based classification learning by using the nearest-neighbor rule. The nearest-neighbor is applied to segment the class area of all the training data into spheres within which the data exist from the same class. Prototypes are the center of spheres and their radii are computed by the mid-point of the two distances to the farthest same class point and the nearest another class point. And we transform the prototype selection problem into a set covering problem in order to determine the smallest set of prototypes that include all the training data. The proposed prototype selection method is based on a greedy algorithm that is applicable to the training data per class. The complexity of the proposed method is not complicated and the possibility of its parallel implementation is high. The prototype-based classification learning takes up the set of prototypes and predicts the class of test data by the nearest neighbor rule. In experiments, the generalization performance of our prototype classifier is superior to those of the nearest neighbor, Bayes classifier, and another prototype classifier.

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier (영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축)

  • Kim, Yuyoung;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.71-89
    • /
    • 2016
  • Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.