• Title/Summary/Keyword: bayesian

Search Result 2,741, Processing Time 0.028 seconds

Diabetes prediction mechanism using machine learning model based on patient IQR outlier and correlation coefficient (환자 IQR 이상치와 상관계수 기반의 머신러닝 모델을 이용한 당뇨병 예측 메커니즘)

  • Jung, Juho;Lee, Naeun;Kim, Sumin;Seo, Gaeun;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1296-1301
    • /
    • 2021
  • With the recent increase in diabetes incidence worldwide, research has been conducted to predict diabetes through various machine learning and deep learning technologies. In this work, we present a model for predicting diabetes using machine learning techniques with German Frankfurt Hospital data. We apply outlier handling using Interquartile Range (IQR) techniques and Pearson correlation and compare model-specific diabetes prediction performance with Decision Tree, Random Forest, Knn (k-nearest neighbor), SVM (support vector machine), Bayesian Network, ensemble techniques XGBoost, Voting, and Stacking. As a result of the study, the XGBoost technique showed the best performance with 97% accuracy on top of the various scenarios. Therefore, this study is meaningful in that the model can be used to accurately predict and prevent diabetes prevalent in modern society.

Inclusion of bioclimatic variables in genetic evaluations of dairy cattle

  • Negri, Renata;Aguilar, Ignacio;Feltes, Giovani Luis;Machado, Juliana Dementshuk;Neto, Jose Braccini;Costa-Maia, Fabiana Martins;Cobuci, Jaime Araujo
    • Animal Bioscience
    • /
    • v.34 no.2
    • /
    • pp.163-171
    • /
    • 2021
  • Objective: Considering the importance of dairy farming and the negative effects of heat stress, more tolerant genotypes need to be identified. The objective of this study was to investigate the effect of heat stress via temperature-humidity index (THI) and diurnal temperature variation (DTV) in the genetic evaluations for daily milk yield of Holstein dairy cattle, using random regression models. Methods: The data comprised 94,549 test-day records of 11,294 first parity Holstein cows from Brazil, collected from 1997 to 2013, and bioclimatic data (THI and DTV) from 18 weather stations. Least square linear regression models were used to determine the THI and DTV thresholds for milk yield losses caused by heat stress. In addition to the standard model (SM, without bioclimatic variables), THI and DTV were combined in various ways and tested for different days, totaling 41 models. Results: The THI and DTV thresholds for milk yield losses was THI = 74 (-0.106 kg/d/THI) and DTV = 13 (-0.045 kg/d/DTV). The model that included THI and DTV as fixed effects, considering the two-day average, presented better fit (-2logL, Akaike information criterion, and Bayesian information criterion). The estimated breeding values (EBVs) and the reliabilities of the EBVs improved when using this model. Conclusion: Sires are re-ranking when heat stress indicators are included in the model. Genetic evaluation using the mean of two days of THI and DTV as fixed effect, improved EBVs and EBVs reliability.

Phylogenetic analysis of Neottia japonica (Orchidaceae) based on ITS and matK regions

  • SO, Ji-Hyeon;LEE, Nam-Sook
    • Korean Journal of Plant Taxonomy
    • /
    • v.50 no.4
    • /
    • pp.385-394
    • /
    • 2020
  • To elucidate the molecular phylogeny of Neottia japonica, which is a terrestrial orchid distributed in East Asia, the internal transcribed spacer (ITS) of nuclear DNA and the matK of chloroplast DNA were used. A total 22 species of 69 accessions for ITS and 21 species of 114 accessions for matK phylogeny were analyzed with the maximum parsimony and Bayesian methods. In addition, we sought to establish a correlation between the distribution, morphology of the auricles and genetic association of N. japonica with phylogenetic data. The phylogenetic results suggest that N. japonica is monophyletic and a sister to N. suzukii in terms of the ITS phylogeny, while it is paraphyletic with N. suzukii in terms of the matK phylogeny. N. japonica and N. suzukii show similar morphologies of the lip and column, they both flower in April, and they are both distributed sympatrically in Taiwan. Therefore, it appears to be clear that N. japonica and N. suzukii are close taxa within Neottia, although there is incongruence between the nrDNA and cpDNA phylogenies of N. japonica. The incongruence between the two datasets may have various causes, meaning that further studies are needed to confirm the evolutionary process of N. japonica. The phylogenetic status of N. kiusiana, which was not included in previous studies, was as a sister to N. nidus-avis. Meanwhile, the ITS and matK phylogenies are unsuitable for identifying genetic associations with the characteristic of auricles. The phylogenetic topologies of Korean, Taiwanese and mainland Chinese individuals suggest that the populations of N. japonica in Korea originated from China's mainland and island areas. The characterization of regional gene differences could provide useful preliminary data for future studies.

A study on the Filtering of Spam E-mail using n-Gram indexing and Support Vector Machine (n-Gram 색인화와 Support Vector Machine을 사용한 스팸메일 필터링에 대한 연구)

  • 서정우;손태식;서정택;문종섭
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.14 no.2
    • /
    • pp.23-33
    • /
    • 2004
  • Because of a rapid growth of internet environment, it is also fast increasing to exchange message using e-mail. But, despite the convenience of e-mail, it is rising a currently bi9 issue to waste their time and cost due to the spam mail in an individual or enterprise. Many kinds of solutions have been studied to solve harmful effects of spam mail. Such typical methods are as follows; pattern matching using the keyword with representative method and method using the probability like Naive Bayesian. In this paper, we propose a classification method of spam mails from normal mails using Support Vector Machine, which has excellent performance in pattern classification problems, to compensate for the problems of existing research. Especially, the proposed method practices efficiently a teaming procedure with a word dictionary including a generated index by the n-Gram. In the conclusion, we verified the proposed method through the accuracy comparison of spm mail separation between an existing research and proposed scheme.

Malicious Code Detection using the Effective Preprocessing Method Based on Native API (Native API 의 효과적인 전처리 방법을 이용한 악성 코드 탐지 방법에 관한 연구)

  • Bae, Seong-Jae;Cho, Jae-Ik;Shon, Tae-Shik;Moon, Jong-Sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.22 no.4
    • /
    • pp.785-796
    • /
    • 2012
  • In this paper, we propose an effective Behavior-based detection technique using the frequency of system calls to detect malicious code, when the number of training data is fewer than the number of properties on system calls. In this study, we collect the Native APIs which are Windows kernel data generated by running program code. Then we adopt the normalized freqeuncy of Native APIs as the basic properties. In addition, the basic properties are transformed to new properties by GLDA(Generalized Linear Discriminant Analysis) that is an effective method to discriminate between malicious code and normal code, although the number of training data is fewer than the number of properties. To detect the malicious code, kNN(k-Nearest Neighbor) classification, one of the bayesian classification technique, was used in this paper. We compared the proposed detection method with the other methods on collected Native APIs to verify efficiency of proposed method. It is presented that proposed detection method has a lower false positive rate than other methods on the threshold value when detection rate is 100%.

An Effective Feature Generation Method for Distributed Denial of Service Attack Detection using Entropy (엔트로피를 이용한 분산 서비스 거부 공격 탐지에 효과적인 특징 생성 방법 연구)

  • Kim, Tae-Hun;Seo, Ki-Taek;Lee, Young-Hoon;Lim, Jong-In;Moon, Jong-Sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.20 no.4
    • /
    • pp.63-73
    • /
    • 2010
  • Malicious bot programs, the source of distributed denial of service attack, are widespread and the number of PCs which were infected by malicious bot program are increasing geometrically thesedays. The continuous distributed denial of service attacks are happened constantly through these bot PCs and some financial incident cases have found lately. Therefore researches to response distributed denial of service attack are necessary so we propose an effective feature generation method for distributed denial of service attack detection using entropy. In this paper, we apply our method to both the DARPA 2000 datasets and also the distributed denial of service attack datasets that we composed and generated ourself in general university. And then we evaluate how the proposed method is useful through classification using bayesian network classifier.

Reliability of microarray analysis for studying periodontitis: low consistency in 2 periodontitis cohort data sets from different platforms and an integrative meta-analysis

  • Jeon, Yoon-Seon;Shivakumar, Manu;Kim, Dokyoon;Kim, Chang-Sung;Lee, Jung-Seok
    • Journal of Periodontal and Implant Science
    • /
    • v.51 no.1
    • /
    • pp.18-29
    • /
    • 2021
  • Purpose: The aim of this study was to compare the characteristic expression patterns of advanced periodontitis in 2 cohort data sets analyzed using different microarray platforms, and to identify differentially expressed genes (DEGs) through a meta-analysis of both data sets. Methods: Twenty-two patients for cohort 1 and 40 patients for cohort 2 were recruited with the same inclusion criteria. The 2 cohort groups were analyzed using different platforms: Illumina and Agilent. A meta-analysis was performed to increase reliability by removing statistical differences between platforms. An integrative meta-analysis based on an empirical Bayesian methodology (ComBat) was conducted. DEGs for the integrated data sets were identified using the limma package to adjust for age, sex, and platform and compared with the results for cohorts 1 and 2. Clustering and pathway analyses were also performed. Results: This study detected 557 and 246 DEGs in cohorts 1 and 2, respectively, with 146 and 42 significantly enriched gene ontology (GO) terms. Overlapping between cohorts 1 and 2 was present in 59 DEGs and 18 GO terms. However, only 6 genes from the top 30 enriched DEGs overlapped, and there were no overlapping GO terms in the top 30 enriched pathways. The integrative meta-analysis detected 34 DEGs, of which 10 overlapped in all the integrated data sets of cohorts 1 and 2. Conclusions: The characteristic expression pattern differed between periodontitis and the healthy periodontium, but the consistency between the data sets from different cohorts and metadata was too low to suggest specific biomarkers for identifying periodontitis.

Major Watershed Characteristics Influencing Spatial Variability of Stream TP Concentration in the Nakdong River Basin (낙동강 유역에서 하천 TP 농도의 공간적 변동성에 영향을 미치는 주요 유역특성)

  • Seo, Jiyu;Won, Jeongeun;Choi, Jeonghyeon;Kim, Sangdan
    • Journal of Korean Society on Water Environment
    • /
    • v.37 no.3
    • /
    • pp.204-216
    • /
    • 2021
  • It is important to understand the factors influencing the temporal and spatial variability of water quality in order to establish an effective customized management strategy for contaminated aquatic ecosystems. In this study, the spatial diversity of the 5-year (2015 - 2019) average total phosphorus (TP) concentration observed in 40 Total Maximum Daily Loads unit-basins in the Nakdong River watershed was analyzed using 50 predictive variables of watershed characteristics, climate characteristics, land use characteristics, and soil characteristics. Cross-correlation analysis, a two-stage exhaustive search approach, and Bayesian inference were applied to identify predictors that best matched the time-averaged TP. The predictors that were finally identified included watershed altitude, precipitation in fall, precipitation in winter, residential area, public facilities area, paddy field, soil available phosphate, soil magnesium, soil available silicic acid, and soil potassium. Among them, it was found that the most influential factors for the spatial difference of TP were watershed altitude in watershed characteristics, public facilities area in land use characteristics, and soil available silicic acid in soil characteristics. This means that artificial factors have a great influence on the spatial variability of TP. It is expected that the proposed statistical modeling approach can be applied to the identification of major factors affecting the spatial variability of the temporal average state of various water quality parameters.

Data-Driven Modeling of Freshwater Aquatic Systems: Status and Prospects (자료기반 물환경 모델의 현황 및 발전 방향)

  • Cha, YoonKyung;Shin, Jihoon;Kim, YoungWoo
    • Journal of Korean Society on Water Environment
    • /
    • v.36 no.6
    • /
    • pp.611-620
    • /
    • 2020
  • Although process-based models have been a preferred approach for modeling freshwater aquatic systems over extended time intervals, the increasing utility of data-driven models in a big data environment has made the data-driven models increasingly popular in recent decades. In this study, international peer-reviewed journals for the relevant fields were searched in the Web of Science Core Collection, and an extensive literature review, which included total 2,984 articles published during the last two decades (2000-2020), was performed. The review results indicated that the rate of increase in the number of published studies using data-driven models exceeded those using process-based models since 2010. The increase in the use of data-driven models was partly attributable to the increasing availability of data from new data sources, e.g., remotely sensed hyperspectral or multispectral data. Consistently throughout the past two decades, South Korea has been one of the top ten countries in which the greatest number of studies using the data-driven models were published. Among the major data-driven approaches, i.e., artificial neural network, decision tree, and Bayesian model, were illustrated with case studies. Based on the review, this study aimed to inform the current state of knowledge regarding the biogeochemical water quality and ecological models using data-driven approaches, and provide the remaining challenges and future prospects.

OGLE-2017-BLG-1049: ANOTHER GIANT PLANET MICROLENSING EVENT

  • Kim, Yun Hak;Chung, Sun-Ju;Udalski, A.;Bond, Ian A.;Jung, Youn Kil;Gould, Andrew;Albrow, Michael D.;Han, Cheongho;Hwang, Kyu-Ha;Ryu, Yoon-Hyun;Shin, In-Gu;Shvartzvald, Yossi;Yee, Jennifer C.;Zang, Weicheng;Cha, Sang-Mok;Kim, Dong-Jin;Kim, Hyoun-Woo;Kim, Seung-Lee;Lee, Chung-Uk;Lee, Dong-Joo
    • Journal of The Korean Astronomical Society
    • /
    • v.53 no.6
    • /
    • pp.161-168
    • /
    • 2020
  • We report the discovery of a giant exoplanet in the microlensing event OGLE-2017-BLG-1049, with a planet-host star mass ratio of q = 9.53 ± 0.39 × 10-3 and a caustic crossing feature in Korea Microlensing Telescope Network (KMTNet) observations. The caustic crossing feature yields an angular Einstein radius of θE = 0.52 ± 0.11 mas. However, the microlens parallax is not measured because the time scale of the event, tE ≃ 29 days, is too short. Thus, we perform a Bayesian analysis to estimate physical quantities of the lens system. We find that the lens system has a star with mass Mh = 0.55+0.36-0.29 M⊙ hosting a giant planet with Mp = 5.53+3.62-2.87 MJup, at a distance of DL = 5.67+1.11-1.52 kpc. The projected star-planet separation is a⊥ = 3.92+1.10-1.32 au. This means that the planet is located beyond the snow line of the host. The relative lens-source proper motion is μrel ~ 7 mas yr-1, thus the lens and source will be separated from each other within 10 years. After this, it will be possible to measure the flux of the host star with 30 meter class telescopes and to determine its mass.