• 제목/요약/키워드: Bayes Factors

검색결과 106건 처리시간 0.033초

Random Forests 기법을 이용한 백내장 예측모형 - 일개 대학병원 건강검진 수검자료에서 - (A Prediction Model for the Development of Cataract Using Random Forests)

  • 한은정;송기준;김동건
    • 응용통계연구
    • /
    • 제22권4호
    • /
    • pp.771-780
    • /
    • 2009
  • 백내장 질환은 노령인구가 증가하고 있는 시점에서 사회, 경제적으로 심각한 문제로 부각되고 있는 질병으로 조기 진단이 이루어진다면 발병률을 크게 줄일 수 있는 질병이다. 본 연구에서는 백내장을 조기 진단하기 위한 예측 모형을 구축하고자 1994년부터 2001년까지 연세대학병원에서 2회 이상 건강검진을 받고 의사진단을 통해 백내장 여부를 확인할 수 있는 30세 이상 남 녀 3,237명에 대한 건강검진 수검 자료를 활용하여 백내장 발생 위험 예측모형을 개발하였다. 모형개발에는 데이터마이닝 기법인 Random Forests를 사용하였고, 기존의 로지스틱 회귀분석, 판별분석, 의사결정나무 모형(Decision tree), 나이브베이즈(Naive Bayes), 앙상블 모형인 배깅(Bagging)과 아킹(Arcing)을 이용하여 그 성능을 비교 분석하였다. Random Forests를 통해 개발한 백내장 발생 예측모형은 정확도가 67.16%, 민감도가 72.28%였고, 주요 영향요인은 연령, 혈당, 백혈구수치(WBC), 혈소판수치(platelet), 중성지질(triglyceride), BMI였다. 이 결과는 의사의 안과검진 정보 없이 건강검진 수검 자료만으로 백내장 질환 유 무에 관한 정보를 70% 정도 예측할 수 있음을 보여주는 것으로, 백내장의 조기 진단에 많은 기여를 할 것으로 판단된다.

고립성 폐결절의 예후에 관여하는 인자 (The Prognostic Factors of Solitary Pulmonary Nodule)

  • 정윤섭;김주현
    • Journal of Chest Surgery
    • /
    • 제22권3호
    • /
    • pp.425-435
    • /
    • 1989
  • The solitary pulmonary nodule is considered as a round or ovoid lesion with sharp, circumscribed borders, surrounded by normal appearing lung parenchyme on all sides, and found on a simple chest X-ray without any particular symptoms or signs. There is a wide spectrum of pathologic conditions in the solitary pulmonary nodules prove to be malignant tumors, either primary or metastatic. Most Benign granulomas and other benign conditions can also be seen as solitary nodules. The resection of solitary malignant nodules results in a surprisingly high 5-year survival rate. On the contrary, most benign nodules do not need to be resected and a period of prolonged observation and nonsurgical management is usually indicated. Therefore, the best approach to the controversial management of solitary pulmonary nodules depends on finding factors affecting the probability of malignancy. In this article, clinical records and chest roentgenographies of 60 patients operated on over the past 8 years at the Department of Thoracic and Cardiovascular Surgery, Seoul National University Hospital were reviewed. There were 15 malignant nodules and 45 benign nodules and the prevalence of malignancy was 25%. The most common pathologic entity was tuberculoma [21 cases]. The mean age was 55.5*9.6 years in the malignant group, 45.8>12.5 years in the benign group and there was a significant statistical difference between the two groups [P < 0.05]. The malignant ratio in each age group increased with advancing age. The average smoking amount was 35.6*12.9 cigarettes per day in malignant smokers, 20.9* 12.0 cigarettes per day in benign smokers, and there was a significant statistical difference between the two groups [p< 0.05]. The malignant ratio also increased with the increasing smoking amount. Comparing the appearance of the nodule on chest films, 6 calcifications and 7 cavitations were found only in benign nodules, not in malignant nodules. Therefore, calcification and cavitation can be considered as preferential findings for benignity. Previous cancer history was also a significant factor deciding the prognosis of the nodule [p< 0.05]. The average diameter on chest X-ray was 3.07*0.82 cm in malignant nodules, 3.25*1.04 cm in benign nodules and there was no significant statistical difference between the two groups [p< 0.05]. The author used Bayes theorem to develop a simple method for combining individual clinical or radiological factors of patients with solitary nodules into an overall estimate of the probability that the nodule is malignant. In conclusion, patient age, smoking amount, appearance of nodule on chest film such as calcification and cavitation, and previous cancer history were found to be strongly associated with malignancy, but size of nodule was not associated with malignancy. Since these prognostic factors have been found retrospectively, prospective controlled studies are needed to determine whether these factors have really prognostic significance.

  • PDF

빅데이터 검색 정확도에 미치는 다양한 측정 방법 기반 검색 기법의 효과 (Impact of Diverse Document-evaluation Measure-based Searching Methods in Big Data Search Accuracy)

  • 김지영;한다현;김종권
    • 정보과학회 논문지
    • /
    • 제44권5호
    • /
    • pp.553-558
    • /
    • 2017
  • 빅데이터의 공급이 늘어남에 따라, 이로부터 유용한 정보를 추출해내기 위한 학계와 업계의 연구가 활발히 진행 되고 있다. 특히 분석한 정보의 특징과 함께, 정보 검색 시 검색자의 의도를 함께 반영하여 정보를 여과해 주는 것이 대부분의 연구의 최종 목표이다. 정확하게 분석된 자료는 기업이 제공하는 서비스에 대한 사용자의 충성도를 높여주고, 사용자 스스로 보다 효율적이고 효과적으로 정보를 이용할 수 있게 된다. 본 논문에서는 가장 높은 빈도로 사용되는 검색 분야인 기사를 검색하는 경우의 정확도를 높이기 위해, 관련 데이터를 TF-IDF, 결정 트리, 코사인 유사도, 단순 베이지안 분류기 등의 다양한 측도방법으로 평가해 보고, 이를 분석하였다. 또한, 분석 결과를 바탕으로 가장 적합한 측도 방법을 제안한다.

Development and application of a floor failure depth prediction system based on the WEKA platform

  • Lu, Yao;Bai, Liyang;Chen, Juntao;Tong, Weixin;Jiang, Zhe
    • Geomechanics and Engineering
    • /
    • 제23권1호
    • /
    • pp.51-59
    • /
    • 2020
  • In this paper, the WEKA platform was used to mine and analyze measured data of floor failure depth and a prediction system of floor failure depth was developed with Java. Based on the standardization and discretization of 35-set measured data of floor failure depth in China, the grey correlation degree analysis on five factors affecting the floor failure depth was carried out. The correlation order from big to small is: mining depth, working face length, floor failure resistance, mining thickness, dip angle of coal seams. Naive Bayes model, neural network model and decision tree model were used for learning and training, and the accuracy of the confusion matrix, detailed accuracy and node error rate were analyzed. Finally, artificial neural network was concluded to be the optimal model. Based on Java language, a prediction system of floor failure depth was developed. With the easy operation in the system, the prediction from measured data and error analyses were performed for nine sets of data. The results show that the WEKA prediction formula has the smallest relative error and the best prediction effect. Besides, the applicability of WEKA prediction formula was analyzed. The results show that WEKA prediction has a better applicability under the coal seam mining depth of 110 m~550 m, dip angle of coal seams of 0°~15° and working face length of 30 m~135 m.

Metabolic Syndrome Prediction Using Machine Learning Models with Genetic and Clinical Information from a Nonobese Healthy Population

  • Choe, Eun Kyung;Rhee, Hwanseok;Lee, Seungjae;Shin, Eunsoon;Oh, Seung-Won;Lee, Jong-Eun;Choi, Seung Ho
    • Genomics & Informatics
    • /
    • 제16권4호
    • /
    • pp.31.1-31.7
    • /
    • 2018
  • The prevalence of metabolic syndrome (MS) in the nonobese population is not low. However, the identification and risk mitigation of MS are not easy in this population. We aimed to develop an MS prediction model using genetic and clinical factors of nonobese Koreans through machine learning methods. A prediction model for MS was designed for a nonobese population using clinical and genetic polymorphism information with five machine learning algorithms, including naïve Bayes classification (NB). The analysis was performed in two stages (training and test sets). Model A was designed with only clinical information (age, sex, body mass index, smoking status, alcohol consumption status, and exercise status), and for model B, genetic information (for 10 polymorphisms) was added to model A. Of the 7,502 nonobese participants, 647 (8.6%) had MS. In the test set analysis, for the maximum sensitivity criterion, NB showed the highest sensitivity: 0.38 for model A and 0.42 for model B. The specificity of NB was 0.79 for model A and 0.80 for model B. In a comparison of the performances of models A and B by NB, model B (area under the receiver operating characteristic curve [AUC] = 0.69, clinical and genetic information input) showed better performance than model A (AUC = 0.65, clinical information only input). We designed a prediction model for MS in a nonobese population using clinical and genetic information. With this model, we might convince nonobese MS individuals to undergo health checks and adopt behaviors associated with a preventive lifestyle.

Multi-dimensional Analysis and Prediction Model for Tourist Satisfaction

  • Shrestha, Deepanjal;Wenan, Tan;Gaudel, Bijay;Rajkarnikar, Neesha;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권2호
    • /
    • pp.480-502
    • /
    • 2022
  • This work assesses the degree of satisfaction tourists receive as final recipients in a tourism destination based on the fact that satisfied tourists can make a significant contribution to the growth and continuous improvement of a tourism business. The work considers Pokhara, the tourism capital of Nepal as a prefecture of study. A stratified sampling methodology with open-ended survey questions is used as a primary source of data for a sample size of 1019 for both international and domestic tourists. The data collected through a survey is processed using a data mining tool to perform multi-dimensional analysis to discover information patterns and visualize clusters. Further, supervised machine learning algorithms, kNN, Decision tree, Support vector machine, Random forest, Neural network, Naive Bayes, and Gradient boost are used to develop models for training and prediction purposes for the survey data. To find the best model for prediction purposes, different performance matrices are used to evaluate a model for performance, accuracy, and robustness. The best model is used in constructing a learning-enabled model for predicting tourists as satisfied, neutral, and unsatisfied visitors. This work is very important for tourism business personnel, government agencies, and tourism stakeholders to find information on tourist satisfaction and factors that influence it. Though this work was carried out for Pokhara city of Nepal, the study is equally relevant to any other tourism destination of similar nature.

Comparative Study of PSO-ANN in Estimating Traffic Accident Severity

  • Md. Ashikuzzaman;Wasim Akram;Md. Mydul Islam Anik;Taskeed Jabid;Mahamudul Hasan;Md. Sawkat Ali
    • International Journal of Computer Science & Network Security
    • /
    • 제23권8호
    • /
    • pp.95-100
    • /
    • 2023
  • Due to Traffic accidents people faces health and economical casualties around the world. As the population increases vehicles on road increase which leads to congestion in cities. Congestion can lead to increasing accident risks due to the expansion in transportation systems. Modern cities are adopting various technologies to minimize traffic accidents by predicting mathematically. Traffic accidents cause economical casualties and potential death. Therefore, to ensure people's safety, the concept of the smart city makes sense. In a smart city, traffic accident factors like road condition, light condition, weather condition etcetera are important to consider to predict traffic accident severity. Several machine learning models can significantly be employed to determine and predict traffic accident severity. This research paper illustrated the performance of a hybridized neural network and compared it with other machine learning models in order to measure the accuracy of predicting traffic accident severity. Dataset of city Leeds, UK is being used to train and test the model. Then the results are being compared with each other. Particle Swarm optimization with artificial neural network (PSO-ANN) gave promising results compared to other machine learning models like Random Forest, Naïve Bayes, Nearest Centroid, K Nearest Neighbor Classification. PSO- ANN model can be adopted in the transportation system to counter traffic accident issues. The nearest centroid model gave the lowest accuracy score whereas PSO-ANN gave the highest accuracy score. All the test results and findings obtained in our study can provide valuable information on reducing traffic accidents.

Improving SARIMA model for reliable meteorological drought forecasting

  • Jehanzaib, Muhammad;Shah, Sabab Ali;Son, Ho Jun;Kim, Tae-Woong
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2022년도 학술발표회
    • /
    • pp.141-141
    • /
    • 2022
  • Drought is a global phenomenon that affects almost all landscapes and causes major damages. Due to non-linear nature of contributing factors, drought occurrence and its severity is characterized as stochastic in nature. Early warning of impending drought can aid in the development of drought mitigation strategies and measures. Thus, drought forecasting is crucial in the planning and management of water resource systems. The primary objective of this study is to make improvement is existing drought forecasting techniques. Therefore, we proposed an improved version of Seasonal Autoregressive Integrated Moving Average (SARIMA) model (MD-SARIMA) for reliable drought forecasting with three years lead time. In this study, we selected four watersheds of Han River basin in South Korea to validate the performance of MD-SARIMA model. The meteorological data from 8 rain gauge stations were collected for the period 1973-2016 and converted into watershed scale using Thiessen's polygon method. The Standardized Precipitation Index (SPI) was employed to represent the meteorological drought at seasonal (3-month) time scale. The performance of MD-SARIMA model was compared with existing models such as Seasonal Naive Bayes (SNB) model, Exponential Smoothing (ES) model, Trigonometric seasonality, Box-Cox transformation, ARMA errors, Trend and Seasonal components (TBATS) model, and SARIMA model. The results showed that all the models were able to forecast drought, but the performance of MD-SARIMA was robust then other statistical models with Wilmott Index (WI) = 0.86, Mean Absolute Error (MAE) = 0.66, and Root mean square error (RMSE) = 0.80 for 36 months lead time forecast. The outcomes of this study indicated that the MD-SARIMA model can be utilized for drought forecasting.

  • PDF

Automated Prioritization of Construction Project Requirements using Machine Learning and Fuzzy Logic System

  • Hassan, Fahad ul;Le, Tuyen;Le, Chau;Shrestha, K. Joseph
    • 국제학술발표논문집
    • /
    • The 9th International Conference on Construction Engineering and Project Management
    • /
    • pp.304-311
    • /
    • 2022
  • Construction inspection is a crucial stage that ensures that all contractual requirements of a construction project are verified. The construction inspection capabilities among state highway agencies have been greatly affected due to budget reduction. As a result, efficient inspection practices such as risk-based inspection are required to optimize the use of limited resources without compromising inspection quality. Automated prioritization of textual requirements according to their criticality would be extremely helpful since contractual requirements are typically presented in an unstructured natural language in voluminous text documents. The current study introduces a novel model for predicting the risk level of requirements using machine learning (ML) algorithms. The ML algorithms tested in this study included naïve Bayes, support vector machines, logistic regression, and random forest. The training data includes sequences of requirement texts which were labeled with risk levels (such as very low, low, medium, high, very high) using the fuzzy logic systems. The fuzzy model treats the three risk factors (severity, probability, detectability) as fuzzy input variables, and implements the fuzzy inference rules to determine the labels of requirements. The performance of the model was examined on labeled dataset created by fuzzy inference rules and three different membership functions. The developed requirement risk prediction model yielded a precision, recall, and f-score of 78.18%, 77.75%, and 75.82%, respectively. The proposed model is expected to provide construction inspectors with a means for the automated prioritization of voluminous requirements by their importance, thus help to maximize the effectiveness of inspection activities under resource constraints.

  • PDF

계층적 베이즈 모형을 이용한 대학등록금에 대한 부모님의 경제적 지원 영향 분석 (Effects of Financial College Tuition Support by Korean Parents using a Hierarchical Bayes Model)

  • 오만숙;오현숙;오민정
    • 응용통계연구
    • /
    • 제26권2호
    • /
    • pp.267-280
    • /
    • 2013
  • 최근 한국 사회에서 경제적, 정치적, 사회적 이슈가 되고 있는 대학 등록금의 경제적 부담에 영향을 미치는 요인들에 대한 분석을 위하여 통계청에서 실시한 '2010년도 사회조사'에서 수집된 자료를 기반으로 지역을 계층으로 하는 베이지안 계층모형을 이용한 분석을 수행하였다. 등록금의 70% 이상을 부모님이 지원하는가에 대한 이항 반응변수에 대하여 계층적 프로빗 모형을 설정한 후 설명변수들에 대한 요인분석을 실시하여 설명변수를 압축하고 마코브체인 몬테칼로 기법을 적용하여 모수를 추정하였다. 자료의 분석 결과, 많은 지역에서 소득과 정신적 스트레스 요인이 부모님의 등록금에 대한 경제적 지원과 유의한 관련이 있음을 보여주었다. 소득이 높은 부모일수록 자녀의 대학 등록금을 지원하며 부모로부터 경제적 지원을 받는 학생일수록 정신적 스트레스를 덜 받는 것으로 나타나 부모의 소득이 자녀의 정신건강에 유의한 영향을 미침을 보여 주었다. 반면에, 성별, 생활건강, 학교 만족도는 대부분의 지역에서 부모님의 등록금 지원과 유의한 관련이 없었다. 스트레스 또는 소득과 부모님의 지원에 대한 지역별 차이를 보면, 강원도 지역 학생들이 부모님의 지원이 낮을 경우 가장 정신적 스트레스를 많이 받는 것으로 나타났으며 소득이 많을수록 부모님의 지원 가능성이 높아지는 경향은 지방 행정도에 비하여 대도시에서 더 뚜렷하게 나타남을 알 수 있었다.