• Title/Summary/Keyword: Split Ratio

Search Result 352, Processing Time 0.032 seconds

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • v.86 no.3
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.

The Study on Determination of Benefit Factor as Constructing Traffic Facilities Using ANP (ANP기법을 이용한 교통시설 건설사업의 편익항목 선정에 관한 연구)

  • Kim, Man Kyeong;Jung, Hun Young;Lee, Sang Yong
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.1D
    • /
    • pp.41-47
    • /
    • 2006
  • The construction of traffic facilities has generated a variety of problems in the equality and efficiency when it would be planed and evaluated. One of the reasons for these phenomena isn't the definition of an objective standard about benefit items. Thus, results of evaluation couldn't give a demonstration of confidence. But, the traffic facility construction and its operation costs are securely appeared. Therefor, it will be demonstrated to decide the benefit items in this study. Before deciding the items, user's satisfaction evaluation and economic analysis would be carried. We find out subway user's satisfaction higher than load traffic mode user in satisfaction evaluation, while subway's economic feasibility is lower than load facility, as a result of B/C analysis. In this inconsistent results, we found out that the benefit value is a little lower because of indefinite standard of it's items as comparing Busan Metropolitan City's population with subway's modal split ratio. Accordingly, we enumerate some benefit items in the case of feasibility evaluation as constructing traffic facility. And each of evaluation items' weight is estimated by using ANP. We found out that the weight value of accessibility has the highest one, that of punctuality has second, that of travel time has third, and benefit items according to improvement of user's traffic condition have much more important than those which were considered in the existence economic analysis.

Explainable Artificial Intelligence (XAI) Surrogate Models for Chemical Process Design and Analysis (화학 공정 설계 및 분석을 위한 설명 가능한 인공지능 대안 모델)

  • Yuna Ko;Jonggeol Na
    • Korean Chemical Engineering Research
    • /
    • v.61 no.4
    • /
    • pp.542-549
    • /
    • 2023
  • Since the growing interest in surrogate modeling, there has been continuous research aimed at simulating nonlinear chemical processes using data-driven machine learning. However, the opaque nature of machine learning models, which limits their interpretability, poses a challenge for their practical application in industry. Therefore, this study aims to analyze chemical processes using Explainable Artificial Intelligence (XAI), a concept that improves interpretability while ensuring model accuracy. While conventional sensitivity analysis of chemical processes has been limited to calculating and ranking the sensitivity indices of variables, we propose a methodology that utilizes XAI to not only perform global and local sensitivity analysis, but also examine the interactions among variables to gain physical insights from the data. For the ammonia synthesis process, which is the target process of the case study, we set the temperature of the preheater leading to the first reactor and the split ratio of the cold shot to the three reactors as process variables. By integrating Matlab and Aspen Plus, we obtained data on ammonia production and the maximum temperatures of the three reactors while systematically varying the process variables. We then trained tree-based models and performed sensitivity analysis using the SHAP technique, one of the XAI methods, on the most accurate model. The global sensitivity analysis showed that the preheater temperature had the greatest effect, and the local sensitivity analysis provided insights for defining the ranges of process variables to improve productivity and prevent overheating. By constructing alternative models for chemical processes and using XAI for sensitivity analysis, this work contributes to providing both quantitative and qualitative feedback for process optimization.

Non-Contrast Cine Cardiac Magnetic Resonance Derived-Radiomics for the Prediction of Left Ventricular Adverse Remodeling in Patients With ST-Segment Elevation Myocardial Infarction

  • Xin A;Mingliang Liu;Tong Chen;Feng Chen;Geng Qian;Ying Zhang;Yundai Chen
    • Korean Journal of Radiology
    • /
    • v.24 no.9
    • /
    • pp.827-837
    • /
    • 2023
  • Objective: To investigate the predictive value of radiomics features based on cardiac magnetic resonance (CMR) cine images for left ventricular adverse remodeling (LVAR) after acute ST-segment elevation myocardial infarction (STEMI). Materials and Methods: We conducted a retrospective, single-center, cohort study involving 244 patients (random-split into 170 and 74 for training and testing, respectively) having an acute STEMI (88.5% males, 57.0 ± 10.3 years of age) who underwent CMR examination at one week and six months after percutaneous coronary intervention. LVAR was defined as a 20% increase in left ventricular end-diastolic volume 6 months after acute STEMI. Radiomics features were extracted from the oneweek CMR cine images using the least absolute shrinkage and selection operator regression (LASSO) analysis. The predictive performance of the selected features was evaluated using receiver operating characteristic curve analysis and the area under the curve (AUC). Results: Nine radiomics features with non-zero coefficients were included in the LASSO regression of the radiomics score (RAD score). Infarct size (odds ratio [OR]: 1.04 (1.00-1.07); P = 0.031) and RAD score (OR: 3.43 (2.34-5.28); P < 0.001) were independent predictors of LVAR. The RAD score predicted LVAR, with an AUC (95% confidence interval [CI]) of 0.82 (0.75-0.89) in the training set and 0.75 (0.62-0.89) in the testing set. Combining the RAD score with infarct size yielded favorable performance in predicting LVAR, with an AUC of 0.84 (0.72-0.95). Moreover, the addition of the RAD score to the left ventricular ejection fraction (LVEF) significantly increased the AUC from 0.68 (0.52-0.84) to 0.82 (0.70-0.93) (P = 0.018), which was also comparable to the prediction provided by the combined microvascular obstruction, infarct size, and LVEF with an AUC of 0.79 (0.65-0.94) (P = 0.727). Conclusion: Radiomics analysis using non-contrast cine CMR can predict LVAR after STEMI independently and incrementally to LVEF and may provide an alternative to traditional CMR parameters.

Deep Learning-Assisted Diagnosis of Pediatric Skull Fractures on Plain Radiographs

  • Jae Won Choi;Yeon Jin Cho;Ji Young Ha;Yun Young Lee;Seok Young Koh;June Young Seo;Young Hun Choi;Jung-Eun Cheon;Ji Hoon Phi;Injoon Kim;Jaekwang Yang;Woo Sun Kim
    • Korean Journal of Radiology
    • /
    • v.23 no.3
    • /
    • pp.343-354
    • /
    • 2022
  • Objective: To develop and evaluate a deep learning-based artificial intelligence (AI) model for detecting skull fractures on plain radiographs in children. Materials and Methods: This retrospective multi-center study consisted of a development dataset acquired from two hospitals (n = 149 and 264) and an external test set (n = 95) from a third hospital. Datasets included children with head trauma who underwent both skull radiography and cranial computed tomography (CT). The development dataset was split into training, tuning, and internal test sets in a ratio of 7:1:2. The reference standard for skull fracture was cranial CT. Two radiology residents, a pediatric radiologist, and two emergency physicians participated in a two-session observer study on an external test set with and without AI assistance. We obtained the area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity along with their 95% confidence intervals (CIs). Results: The AI model showed an AUROC of 0.922 (95% CI, 0.842-0.969) in the internal test set and 0.870 (95% CI, 0.785-0.930) in the external test set. The model had a sensitivity of 81.1% (95% CI, 64.8%-92.0%) and specificity of 91.3% (95% CI, 79.2%-97.6%) for the internal test set and 78.9% (95% CI, 54.4%-93.9%) and 88.2% (95% CI, 78.7%-94.4%), respectively, for the external test set. With the model's assistance, significant AUROC improvement was observed in radiology residents (pooled results) and emergency physicians (pooled results) with the difference from reading without AI assistance of 0.094 (95% CI, 0.020-0.168; p = 0.012) and 0.069 (95% CI, 0.002-0.136; p = 0.043), respectively, but not in the pediatric radiologist with the difference of 0.008 (95% CI, -0.074-0.090; p = 0.850). Conclusion: A deep learning-based AI model improved the performance of inexperienced radiologists and emergency physicians in diagnosing pediatric skull fractures on plain radiographs.

Development and Validation of MRI-Based Radiomics Models for Diagnosing Juvenile Myoclonic Epilepsy

  • Kyung Min Kim;Heewon Hwang;Beomseok Sohn;Kisung Park;Kyunghwa Han;Sung Soo Ahn;Wonwoo Lee;Min Kyung Chu;Kyoung Heo;Seung-Koo Lee
    • Korean Journal of Radiology
    • /
    • v.23 no.12
    • /
    • pp.1281-1289
    • /
    • 2022
  • Objective: Radiomic modeling using multiple regions of interest in MRI of the brain to diagnose juvenile myoclonic epilepsy (JME) has not yet been investigated. This study aimed to develop and validate radiomics prediction models to distinguish patients with JME from healthy controls (HCs), and to evaluate the feasibility of a radiomics approach using MRI for diagnosing JME. Materials and Methods: A total of 97 JME patients (25.6 ± 8.5 years; female, 45.5%) and 32 HCs (28.9 ± 11.4 years; female, 50.0%) were randomly split (7:3 ratio) into a training (n = 90) and a test set (n = 39) group. Radiomic features were extracted from 22 regions of interest in the brain using the T1-weighted MRI based on clinical evidence. Predictive models were trained using seven modeling methods, including a light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, with radiomics features in the training set. The performance of the models was validated and compared to the test set. The model with the highest area under the receiver operating curve (AUROC) was chosen, and important features in the model were identified. Results: The seven tested radiomics models, including light gradient boosting machine, support vector classifier, random forest, logistic regression, extreme gradient boosting, gradient boosting machine, and decision tree, showed AUROC values of 0.817, 0.807, 0.783, 0.779, 0.767, 0.762, and 0.672, respectively. The light gradient boosting machine with the highest AUROC, albeit without statistically significant differences from the other models in pairwise comparisons, had accuracy, precision, recall, and F1 scores of 0.795, 0.818, 0.931, and 0.871, respectively. Radiomic features, including the putamen and ventral diencephalon, were ranked as the most important for suggesting JME. Conclusion: Radiomic models using MRI were able to differentiate JME from HCs.

Studies on the Environmentally and Ecologically Stable Revegetation Measures on Rock Cut-Slopes - Availability of Forest Topsoil as a Hydroseeding Material in Greenhouse Experiment - (암절취(岩切取) 훼손(毁損)비탈면에 대한 환경생태적(環境生態的)으로 안정(安定)된 녹화공법(綠化工法)에 관(關)한 연구(硏究)(I) -산림표층토(山林表層土)를 이용(利用)한 녹화토(綠化土)의 효능분석(效能分析)을 위한 실내실험(室內實驗)-)

  • Woo, Bo-Myeong;Kim, Kyung-Hoon
    • Journal of Korean Society of Forest Science
    • /
    • v.87 no.2
    • /
    • pp.308-315
    • /
    • 1998
  • This study was conducted to evaluate the availability of the forest topsoil as a source of the "Native-soil(seed-fertilizer-soil materials)" for the hydroseeding measures which are environmentally and ecologically stable revegetation measures on rock cut-slopes. Soil sampling and factorial experiments were used with a split plot design(main plot : forest soil type and soil spraying thickness, subplot : seeding rate) in 1996. Results obtained in this study were summarized as follows : Because of the competition between the seeded(introduced) species and the native species, the number of naturally emerged species in the non-seeded plot and that of in the seeded plot were $5{\sim}9species/0.07m^2$ and $2{\sim}6species/0.07m^2$, respectively. As increasing the seeding rate(introduced species), the appearance ratio of naturally emerged species was decreased. The total number of individuals was high in the plot which used coniferous forest soil as a seed source, however the ratio of the individuals of naturally emerged species was high(30%) in the plot which used deciduous forest soil. The usage of the forest topsoil as seed bank source onto the "Native-soil" materials for hydroseeding could be reduce the seeding rate to $1,000seedlings/m^2$. According to the several factors which are competition between seeded species and naturally emerged species, dryness of soil materials, and seed burial, spraying thickness with more than 5cm was suitable for the growth of plants in variety.

  • PDF

Effects of Application of Controlled Release Fertilizer Blended with Different Nitrogen Releasing Latex Coated Ureas on Rice Growth and Grain Quality (질소 용출속도가 다른 피복요소를 혼합한 완효성비료 시용이 벼 생육 및 쌀 품질에 미치는 영향)

  • Lee, Dong-Wook;Park, Ki-Do;Park, Chang-Young;Kang, Ui-Gum;Son, Il-Soo;Park, Sung-Tae
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.52 no.3
    • /
    • pp.311-319
    • /
    • 2007
  • This study was conducted to estimate effects of application of controlled release complex fertilizer with latex coated urea (LCU-complex) on growth and grain quality of rice under direct seeded on dry paddy (DS) and transplanted on flooding paddy (TP). Three types of latex coated urea different nitrogen (N) releasing were LCU40, LCU80 and LCU100. The time of N releasing of LCU formulations in water at both 20 and $30^{\circ}C$ was faster in the order of LCU40, LCU80, LCU blend (LCU40, LCU80 and LCU100 was mixed in ratio of 2:2:1), and LCU100. The number of tillers and dry matter weight were great in order of LCU-complex 100% > LCU-complex80% > urea and plant height was not significant. Grain yields at LCU-complex80% in both DS and TP plot were similar to those of urea application. N recovery of LCU-complex80% and 100% was improved 8 and 6% compared to that of conventional urea split application in DS plot and 9 and 4% in TP. Content of protein of grain at applied LCU-complex was less 0.8% and $0.1{\sim}0.7%$ than that of urea in DS and TP, respectively. Content of amylose and Mg/K ratio in rice grain was not significant. Consequently application of LCU-complex blended types of coated urea different N releasing can be reduced 20% of N without yield reduction and improved grain quality compared with urea application.

Influence of Growth Location And Cutting Managements on Macro-And Microelements in Temperate Grasses (주요 화본과 목초에 있어서 재배지역 및 예취관리가 다량 및 미량요소 함량에 미치는 영향)

  • 김정갑;황석중
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.6 no.3
    • /
    • pp.145-150
    • /
    • 1986
  • The experiments were conducted to study the influence of growth location and cutting microelements macro-and on managements in temperate grasses in Korea and West Germany from 1975 to 1979. The field trials were designed as split plot with three grass species of Dactylis glomerata L., Lolium perenne L. and Festuca pratensis Huds under three cutting regimes at grazing stage, silage stage and hay stage. The results obtained are summarized as follows: 1. Concentrations of macro-and microelements in temperate grasses showed a different response to growth location and growing season. P concentration in the plants was decreased under hot stress in summer, whereas Mg and Na tended to be increased. The seasonal changes in K and Zn were not significant. 2. Morphological growth stage was to be found as an important factors influenced to mineral components. P and K contents in temperate grasses tended to be decreased as morphological development especially under high temperature in Suweon and Cheju. Ca and Mg were less affected by morphological stage and cutting managements. 3. Mean value of Ca/P ratio in the plants were 1.58, 1.33 and 1.21 for meadow fescue, perennial ryegrass and orchardgrass, respectively. Ca/P ratio in grasses tended to be increased as morphological development. 4. Zn deficiency in the plants occured in all grass species and experimental sites. Mean Zn concentration of the plant were 34.2%, 31.2% and 37.8% for Suweon, Cheju and Taekwalyong, respectivelly. Na deficiency occured in orchardgrass and meadow fescue, especially in taekwalyong. Cool temperature resulted in a decrease of Na absorption and accumulation.

  • PDF

Performance of Investment Strategy using Investor-specific Transaction Information and Machine Learning (투자자별 거래정보와 머신러닝을 활용한 투자전략의 성과)

  • Kim, Kyung Mock;Kim, Sun Woong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.65-82
    • /
    • 2021
  • Stock market investors are generally split into foreign investors, institutional investors, and individual investors. Compared to individual investor groups, professional investor groups such as foreign investors have an advantage in information and financial power and, as a result, foreign investors are known to show good investment performance among market participants. The purpose of this study is to propose an investment strategy that combines investor-specific transaction information and machine learning, and to analyze the portfolio investment performance of the proposed model using actual stock price and investor-specific transaction data. The Korea Exchange offers daily information on the volume of purchase and sale of each investor to securities firms. We developed a data collection program in C# programming language using an API provided by Daishin Securities Cybosplus, and collected 151 out of 200 KOSPI stocks with daily opening price, closing price and investor-specific net purchase data from January 2, 2007 to July 31, 2017. The self-organizing map model is an artificial neural network that performs clustering by unsupervised learning and has been introduced by Teuvo Kohonen since 1984. We implement competition among intra-surface artificial neurons, and all connections are non-recursive artificial neural networks that go from bottom to top. It can also be expanded to multiple layers, although many fault layers are commonly used. Linear functions are used by active functions of artificial nerve cells, and learning rules use Instar rules as well as general competitive learning. The core of the backpropagation model is the model that performs classification by supervised learning as an artificial neural network. We grouped and transformed investor-specific transaction volume data to learn backpropagation models through the self-organizing map model of artificial neural networks. As a result of the estimation of verification data through training, the portfolios were rebalanced monthly. For performance analysis, a passive portfolio was designated and the KOSPI 200 and KOSPI index returns for proxies on market returns were also obtained. Performance analysis was conducted using the equally-weighted portfolio return, compound interest rate, annual return, Maximum Draw Down, standard deviation, and Sharpe Ratio. Buy and hold returns of the top 10 market capitalization stocks are designated as a benchmark. Buy and hold strategy is the best strategy under the efficient market hypothesis. The prediction rate of learning data using backpropagation model was significantly high at 96.61%, while the prediction rate of verification data was also relatively high in the results of the 57.1% verification data. The performance evaluation of self-organizing map grouping can be determined as a result of a backpropagation model. This is because if the grouping results of the self-organizing map model had been poor, the learning results of the backpropagation model would have been poor. In this way, the performance assessment of machine learning is judged to be better learned than previous studies. Our portfolio doubled the return on the benchmark and performed better than the market returns on the KOSPI and KOSPI 200 indexes. In contrast to the benchmark, the MDD and standard deviation for portfolio risk indicators also showed better results. The Sharpe Ratio performed higher than benchmarks and stock market indexes. Through this, we presented the direction of portfolio composition program using machine learning and investor-specific transaction information and showed that it can be used to develop programs for real stock investment. The return is the result of monthly portfolio composition and asset rebalancing to the same proportion. Better outcomes are predicted when forming a monthly portfolio if the system is enforced by rebalancing the suggested stocks continuously without selling and re-buying it. Therefore, real transactions appear to be relevant.