• Title/Summary/Keyword: 10-fold cross-validation

Search Result 213, Processing Time 0.02 seconds

Use of a Machine Learning Algorithm to Predict Individuals with Suicide Ideation in the General Population

  • Ryu, Seunghyong;Lee, Hyeongrae;Lee, Dong-Kyun;Park, Kyeongwoo
    • Psychiatry investigation
    • /
    • v.15 no.11
    • /
    • pp.1030-1036
    • /
    • 2018
  • Objective In this study, we aimed to develop a model predicting individuals with suicide ideation within a general population using a machine learning algorithm. Methods Among 35,116 individuals aged over 19 years from the Korea National Health & Nutrition Examination Survey, we selected 11,628 individuals via random down-sampling. This included 5,814 suicide ideators and the same number of non-suicide ideators. We randomly assigned the subjects to a training set (n=10,466) and a test set (n=1,162). In the training set, a random forest model was trained with 15 features selected with recursive feature elimination via 10-fold cross validation. Subsequently, the fitted model was used to predict suicide ideators in the test set and among the total of 35,116 subjects. All analyses were conducted in R. Results The prediction model achieved a good performance [area under receiver operating characteristic curve (AUC)=0.85] in the test set and predicted suicide ideators among the total samples with an accuracy of 0.821, sensitivity of 0.836, and specificity of 0.807. Conclusion This study shows the possibility that a machine learning approach can enable screening for suicide risk in the general population. Further work is warranted to increase the accuracy of prediction.

Grading System of Movie Review through the Use of An Appraisal Dictionary and Computation of Semantic Segments (감정어휘 평가사전과 의미마디 연산을 이용한 영화평 등급화 시스템)

  • Ko, Min-Su;Shin, Hyo-Pil
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.4
    • /
    • pp.669-696
    • /
    • 2010
  • Assuming that the whole meaning of a document is a composition of the meanings of each part, this paper proposes to study the automatic grading of movie reviews which contain sentimental expressions. This will be accomplished by calculating the values of semantic segments and performing data classification for each review. The ARSSA(The Automatic Rating System for Sentiment analysis using an Appraisal dictionary) system is an effort to model decision making processes in a manner similar to that of the human mind. This aims to resolve the discontinuity between the numerical ranking and textual rationalization present in the binary structure of the current review rating system: {rate: review}. This model can be realized by performing analysis on the abstract menas extracted from each review. The performance of this system was experimentally calculated by performing a 10-fold Cross-Validation test of 1000 reviews obtained from the Naver Movie site. The system achieved an 85% F1 Score when compared to predefined values using a predefined appraisal dictionary.

  • PDF

Cooperative Robot for Table Balancing Using Q-learning (테이블 균형맞춤 작업이 가능한 Q-학습 기반 협력로봇 개발)

  • Kim, Yewon;Kang, Bo-Yeong
    • The Journal of Korea Robotics Society
    • /
    • v.15 no.4
    • /
    • pp.404-412
    • /
    • 2020
  • Typically everyday human life tasks involve at least two people moving objects such as tables and beds, and the balancing of such object changes based on one person's action. However, many studies in previous work performed their tasks solely on robots without factoring human cooperation. Therefore, in this paper, we propose cooperative robot for table balancing using Q-learning that enables cooperative work between human and robot. The human's action is recognized in order to balance the table by the proposed robot whose camera takes the image of the table's state, and it performs the table-balancing action according to the recognized human action without high performance equipment. The classification of human action uses a deep learning technology, specifically AlexNet, and has an accuracy of 96.9% over 10-fold cross-validation. The experiment of Q-learning was carried out over 2,000 episodes with 200 trials. The overall results of the proposed Q-learning show that the Q function stably converged at this number of episodes. This stable convergence determined Q-learning policies for the robot actions. Video of the robotic cooperation with human over the table balancing task using the proposed Q-Learning can be found at http://ibot.knu.ac.kr/videocooperation.html.

Mapping Biodiversity throughoptimized selection of input variables in decision tree models (의사결정나무 변수 선정 방법을 적용한 대축적 생물다양성 지도 구축)

  • Kim, Do Yeon;Heo, Joon;Kim, Chang Jae
    • Journal of Environmental Impact Assessment
    • /
    • v.20 no.5
    • /
    • pp.663-673
    • /
    • 2011
  • In the face of accelerating biodiversity loss and its significance in our coexistence with nature, biodiversity is becoming more crucial in sustainable development perspective. To estimate biodiversity in the future which provides valuable information for decision making system especially in the national level, a quantitative approach must be studied forehand as a baseline of the present status. In this study, we developed a large-scale map of Plant Species Richness (PSR, typical indicator of biodiversity) for Young-dong and Pyung-chang provinces. Due to the accessibility of appropriate data and advance of modelling techniques, reduction of variables without deteriorating the predictive power is considered by applying Genetic algorithm. In addition, a number of Correctly Classified Instances (CCI) with 10-fold cross validation which indicates the predictive power, was carried out for evaluation. This study, as a fundamental baseline, will be beneficial in future land work as well as ecosystem restoration business or other relevant decision making agenda.

Spatial Prediction of Soil Carbon Using Terrain Analysis in a Steep Mountainous Area and the Associated Uncertainties (지형분석을 이용한 산지토양 탄소의 분포 예측과 불확실성)

  • Jeong, Gwanyong
    • Journal of The Geomorphological Association of Korea
    • /
    • v.23 no.3
    • /
    • pp.67-78
    • /
    • 2016
  • Soil carbon(C) is an essential property for characterizing soil quality. Understanding spatial patterns of soil C is particularly limited for mountain areas. This study aims to predict the spatial pattern of soil C using terrain analysis in a steep mountainous area. Specifically, model performances and prediction uncertainties were investigated based on the number of resampling repetitions. Further, important predictors for soil C were also identified. Finally, the spatial distribution of uncertainty was analyzed. A total of 91 soil samples were collected via conditioned latin hypercube sampling and a digital soil C map was developed using support vector regression which is one of the powerful machine learning methods. Results showed that there were no distinct differences of model performances depending on the number of repetitions except for 10-fold cross validation. For soil C, elevation and surface curvature were selected as important predictors by recursive feature elimination. Soil C showed higher values in higher elevation and concave slopes. The spatial pattern of soil C might possibly reflect lateral movement of water and materials along the surface configuration of the study area. The higher values of uncertainty in higher elevation and concave slopes might be related to geomorphological characteristics of the research area and the sampling design. This study is believed to provide a better understanding of the relationship between geomorphology and soil C in the mountainous ecosystem.

Using CNN- VGG 16 to detect the tennis motion tracking by information entropy and unascertained measurement theory

  • Zhong, Yongfeng;Liang, Xiaojun
    • Advances in nano research
    • /
    • v.12 no.2
    • /
    • pp.223-239
    • /
    • 2022
  • Object detection has always been to pursue objects with particular properties or representations and to predict details on objects including the positions, sizes and angle of rotation in the current picture. This was a very important subject of computer vision science. While vision-based object tracking strategies for the analysis of competitive videos have been developed, it is still difficult to accurately identify and position a speedy small ball. In this study, deep learning (DP) network was developed to face these obstacles in the study of tennis motion tracking from a complex perspective to understand the performance of athletes. This research has used CNN-VGG 16 to tracking the tennis ball from broadcasting videos while their images are distorted, thin and often invisible not only to identify the image of the ball from a single frame, but also to learn patterns from consecutive frames, then VGG 16 takes images with 640 to 360 sizes to locate the ball and obtain high accuracy in public videos. VGG 16 tests 99.6%, 96.63%, and 99.5%, respectively, of accuracy. In order to avoid overfitting, 9 additional videos and a subset of the previous dataset are partly labelled for the 10-fold cross-validation. The results show that CNN-VGG 16 outperforms the standard approach by a wide margin and provides excellent ball tracking performance.

JAYA-GBRT model for predicting the shear strength of RC slender beams without stirrups

  • Tran, Viet-Linh;Kim, Jin-Kook
    • Steel and Composite Structures
    • /
    • v.44 no.5
    • /
    • pp.691-705
    • /
    • 2022
  • Shear failure in reinforced concrete (RC) structures is very hazardous. This failure is rarely predicted and may occur without any prior signs. Accurate shear strength prediction of the RC members is challenging, and traditional methods have difficulty solving it. This study develops a JAYA-GBRT model based on the JAYA algorithm and the gradient boosting regression tree (GBRT) to predict the shear strength of RC slender beams without stirrups. Firstly, 484 tests are carefully collected and divided into training and test sets. Then, the hyperparameters of the GBRT model are determined using the JAYA algorithm and 10-fold cross-validation. The performance of the JAYA-GBRT model is compared with five well-known empirical models. The comparative results show that the JAYA-GBRT model (R2 = 0.982, RMSE = 9.466 kN, MAE = 6.299 kN, µ = 1.018, and Cov = 0.116) outperforms the other models. Moreover, the predictions of the JAYA-GBRT model are globally and locally explained using the Shapley Additive exPlanation (SHAP) method. The effective depth is determined as the most crucial parameter influencing the shear strength through the SHAP method. Finally, a Graphic User Interface (GUI) tool and a web application (WA) are developed to apply the JAYA-GBRT model for rapidly predicting the shear strength of RC slender beams without stirrups.

Surface-Engineered Graphene surface-enhanced Raman scattering Platform with Machine-learning Enabled Classification of Mixed Analytes

  • Jae Hee Cho;Garam Bae;Ki-Seok An
    • Journal of Sensor Science and Technology
    • /
    • v.33 no.3
    • /
    • pp.139-146
    • /
    • 2024
  • Surface-enhanced Raman scattering (SERS) enables the detection of various types of π-conjugated biological and chemical molecules owing to its exceptional sensitivity in obtaining unique spectra, offering nondestructive classification capabilities for target analytes. Herein, we demonstrate an innovative strategy that provides significant machine learning (ML)-enabled predictive SERS platforms through surface-engineered graphene via complementary hybridization with Au nanoparticles (NPs). The hybridized Au NPs/graphene SERS platforms showed exceptional sensitivity (10-7 M) due to the collaborative strong correlation between the localized electromagnetic effect and the enhanced chemical bonding reactivity. The chemical and physical properties of the demonstrated SERS platform were systematically investigated using microscopy and spectroscopic analysis. Furthermore, an innovative strategy employing ML is proposed to predict various analytes based on a featured Raman spectral database. Using a customized data-preprocessing algorithm, the feature data for ML were extracted from the Raman peak characteristic information, such as intensity, position, and width, from the SERS spectrum data. Additionally, sophisticated evaluations of various types of ML classification models were conducted using k-fold cross-validation (k = 5), showing 99% prediction accuracy.

Cloning of Korean Morphological Analyzers using Pre-analyzed Eojeol Dictionary and Syllable-based Probabilistic Model (기분석 어절 사전과 음절 단위의 확률 모델을 이용한 한국어 형태소 분석기 복제)

  • Shim, Kwangseob
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.3
    • /
    • pp.119-126
    • /
    • 2016
  • In this study, we verified the feasibility of a Korean morphological analyzer that uses a pre-analyzed Eojeol dictionary and syllable-based probabilistic model. For the verification, MACH and KLT2000, Korean morphological analyzers, were cloned with a pre-analyzed eojeol dictionary and syllable-based probabilistic model. The analysis results were compared between the cloned morphological analyzer, MACH, and KLT2000. The 10 million Eojeol Sejong corpus was segmented into 10 sets for cross-validation. The 10-fold cross-validated precision and recall for cloned MACH and KLT2000 were 97.16%, 98.31% and 96.80%, 99.03%, respectively. Analysis speed of a cloned MACH was 308,000 Eojeols per second, and the speed of a cloned KLT2000 was 436,000 Eojeols per second. The experimental results indicated that a Korean morphological analyzer that uses a pre-analyzed eojeol dictionary and syllable-based probabilistic model could be used in practical applications.

Machine Learning Prediction for the Recurrence After Electrical Cardioversion of Patients With Persistent Atrial Fibrillation

  • Soonil Kwon;Eunjung Lee;Hojin Ju;Hyo-Jeong Ahn;So-Ryoung Lee;Eue-Keun Choi;Jangwon Suh;Seil Oh;Wonjong Rhee
    • Korean Circulation Journal
    • /
    • v.53 no.10
    • /
    • pp.677-689
    • /
    • 2023
  • Background and Objectives: There is limited evidence regarding machine-learning prediction for the recurrence of atrial fibrillation (AF) after electrical cardioversion (ECV). This study aimed to predict the recurrence of AF after ECV using machine learning of clinical features and electrocardiograms (ECGs) in persistent AF patients. Methods: We analyzed patients who underwent successful ECV for persistent AF. Machine learning was designed to predict patients with 1-month recurrence. Individual 12-lead ECGs were collected before and after ECV. Various clinical features were collected and trained the extreme gradient boost (XGBoost)-based model. Ten-fold cross-validation was used to evaluate the performance of the model. The performance was compared to the C-statistics of the selected clinical features. Results: Among 718 patients (mean age 63.5±9.3 years, men 78.8%), AF recurred in 435 (60.6%) patients after 1 month. With the XGBoost-based model, the areas under the receiver operating characteristic curves (AUROCs) were 0.57, 0.60, and 0.63 if the model was trained by clinical features, ECGs, and both (the final model), respectively. For the final model, the sensitivity, specificity, and F1-score were 84.7%, 28.2%, and 0.73, respectively. Although the AF duration showed the best predictive performance (AUROC, 0.58) among the clinical features, it was significantly lower than that of the final machine-learning model (p<0.001). Additional training of extended monitoring data of 15-minute single-lead ECG and photoplethysmography in available patients (n=261) did not significantly improve the model's performance. Conclusions: Machine learning showed modest performance in predicting AF recurrence after ECV in persistent AF patients, warranting further validation studies.