• 제목/요약/키워드: elastic net regression

검색결과 29건 처리시간 0.025초

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • Park, Chanwoo;Jiang, Nan;Park, Taesung
    • Genomics & Informatics
    • /
    • 제17권4호
    • /
    • pp.47.1-47.12
    • /
    • 2019
  • The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.

Predicting Reports of Theft in Businesses via Machine Learning

  • JungIn, Seo;JeongHyeon, Chang
    • International Journal of Advanced Culture Technology
    • /
    • 제10권4호
    • /
    • pp.499-510
    • /
    • 2022
  • This study examines the reporting factors of crime against business in Korea and proposes a corresponding predictive model using machine learning. While many previous studies focused on the individual factors of theft victims, there is a lack of evidence on the reporting factors of crime against a business that serves the public good as opposed to those that protect private property. Therefore, we proposed a crime prevention model for the willingness factor of theft reporting in businesses. This study used data collected through the 2015 Commercial Crime Damage Survey conducted by the Korea Institute for Criminal Policy. It analyzed data from 834 businesses that had experienced theft during a 2016 crime investigation. The data showed a problem with unbalanced classes. To solve this problem, we jointly applied the Synthetic Minority Over Sampling Technique and the Tomek link techniques to the training data. Two prediction models were implemented. One was a statistical model using logistic regression and elastic net. The other involved a support vector machine model, tree-based machine learning models (e.g., random forest, extreme gradient boosting), and a stacking model. As a result, the features of theft price, invasion, and remedy, which are known to have significant effects on reporting theft offences, can be predicted as determinants of such offences in companies. Finally, we verified and compared the proposed predictive models using several popular metrics. Based on our evaluation of the importance of the features used in each model, we suggest a more accurate criterion for predicting var.

Modelling the deflection of reinforced concrete beams using the improved artificial neural network by imperialist competitive optimization

  • Li, Ning;Asteris, Panagiotis G.;Tran, Trung-Tin;Pradhan, Biswajeet;Nguyen, Hoang
    • Steel and Composite Structures
    • /
    • 제42권6호
    • /
    • pp.733-745
    • /
    • 2022
  • This study proposed a robust artificial intelligence (AI) model based on the social behaviour of the imperialist competitive algorithm (ICA) and artificial neural network (ANN) for modelling the deflection of reinforced concrete beams, abbreviated as ICA-ANN model. Accordingly, the ICA was used to adjust and optimize the parameters of an ANN model (i.e., weights and biases) aiming to improve the accuracy of the ANN model in modelling the deflection reinforced concrete beams. A total of 120 experimental datasets of reinforced concrete beams were employed for this aim. Therein, applied load, tensile reinforcement strength and the reinforcement percentage were used to simulate the deflection of reinforced concrete beams. Besides, five other AI models, such as ANN, SVM (support vector machine), GLMNET (lasso and elastic-net regularized generalized linear models), CART (classification and regression tree) and KNN (k-nearest neighbours), were also used for the comprehensive assessment of the proposed model (i.e., ICA-ANN). The comparison of the derived results with the experimental findings demonstrates that among the developed models the ICA-ANN model is that can approximate the reinforced concrete beams deflection in a more reliable and robust manner.

공압기 소비전력에 대한 예측 모형의 비교연구 (A Comparison Study on Forecasting Models for Air Compressor Power Consumption)

  • 김주헌;장문수;김예진;허요섭;정현상;박소영
    • 한국산업융합학회 논문집
    • /
    • 제26권4_2호
    • /
    • pp.657-668
    • /
    • 2023
  • It's important to note that air compressors in the industrial sector are major energy consumers, accounting for a significant portion of total energy costs in manufacturing plants, ranging from 12% to 40%. To address this issue, researchers have compared forecasting models that can predict the power consumption of air compressors. The forecasting models were designed to incorporate variables such as flow rate, pressure, temperature, humidity, and dew point, utilizing statistical methods, machine learning, and deep learning techniques. The model performance was compared using measures such as RMSE, MAE and SMAPE. Out of the 21 models tested, the Elastic Net, a statistical method, proved to be the most effective in power comsumption forecasting.

Drought forecasting over South Korea based on the teleconnected global climate variables

  • Taesam Lee;Yejin Kong;Sejeong Lee;Taegyun Kim
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2023년도 학술발표회
    • /
    • pp.47-47
    • /
    • 2023
  • Drought occurs due to lack of water resources over an extended period and its intensity has been magnified globally by climate change. In recent years, drought over South Korea has also been intensed, and the prediction was inevitable for the water resource management and water industry. Therefore, drought forecasting over South Korea was performed in the current study with the following procedure. First, accumulated spring precipitation(ASP) driven by the 93 weather stations in South Korea was taken with their median. Then, correlation analysis was followed between ASP and Df4m, the differences of two pair of the global winter MSLP. The 37 Df4m variables with high correlations over 0.55 was chosen and sorted into three regions. The selected Df4m variables in the same region showed high similarity, leading the multicollinearity problem. To avoid this problem, a model that performs variable selection and model fitting at once, least absolute shrinkage and selection operator(LASSO) was applied. The LASSO model selected 5 variables which showed a good agreement of the predicted with the observed value, R2=0.72. Other models such as multiple linear regression model and ElasticNet were also performed, but did not present a performance as good as LASSO. Therefore, LASSO model can be an appropriate model to forecast spring drought over South Korea and can be used to mange water resources efficiently.

  • PDF

Machine learning based anti-cancer drug response prediction and search for predictor genes using cancer cell line gene expression

  • Qiu, Kexin;Lee, JoongHo;Kim, HanByeol;Yoon, Seokhyun;Kang, Keunsoo
    • Genomics & Informatics
    • /
    • 제19권1호
    • /
    • pp.10.1-10.7
    • /
    • 2021
  • Although many models have been proposed to accurately predict the response of drugs in cell lines recent years, understanding the genome related to drug response is also the key for completing oncology precision medicine. In this paper, based on the cancer cell line gene expression and the drug response data, we established a reliable and accurate drug response prediction model and found predictor genes for some drugs of interest. To this end, we first performed pre-selection of genes based on the Pearson correlation coefficient and then used ElasticNet regression model for drug response prediction and fine gene selection. To find more reliable set of predictor genes, we performed regression twice for each drug, one with IC50 and the other with area under the curve (AUC) (or activity area). For the 12 drugs we tested, the predictive performance in terms of Pearson correlation coefficient exceeded 0.6 and the highest one was 17-AAG for which Pearson correlation coefficient was 0.811 for IC50 and 0.81 for AUC. We identify common predictor genes for IC50 and AUC, with which the performance was similar to those with genes separately found for IC50 and AUC, but with much smaller number of predictor genes. By using only common predictor genes, the highest performance was AZD6244 (0.8016 for IC50, 0.7945 for AUC) with 321 predictor genes.

평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구 (How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores)

  • 현지연;유상이;이상용
    • 지능정보연구
    • /
    • 제25권1호
    • /
    • pp.219-239
    • /
    • 2019
  • 개인에게 맞춤형 서비스를 제공하는 것이 중요해지면서 개인화 추천 시스템 관련 연구들이 끊임없이 이루어지고 있다. 추천 시스템 중 협업 필터링은 학계 및 산업계에서 가장 많이 사용되고 있다. 다만 사용자들의 평점 혹은 사용 여부와 같은 정량적인 정보에 국한하여 추천이 이루어져 정확도가 떨어진다는 문제가 제기되고 있다. 이와 같은 문제를 해결하기 위해 현재까지 많은 연구에서 정량적 정보 외에 다른 정보들을 활용하여 추천 시스템의 성능을 개선하려는 시도가 활발하게 이루어지고 있다. 리뷰를 이용한 감성 분석이 대표적이지만, 기존의 연구에서는 감성 분석의 결과를 추천 시스템에 직접적으로 반영하지 못한다는 한계가 있다. 이에 본 연구는 리뷰에 나타난 감성을 수치화하여 평점에 반영하는 것을 목표로 한다. 즉, 사용자가 직접 작성한 리뷰를 감성 수치화하여 정량적인 정보로 변환해 추천 시스템에 직접 반영할 수 있는 새로운 알고리즘을 제안한다. 이를 위해서는 정성적인 정보인 사용자들의 리뷰를 정량화 시켜야 하므로, 본 연구에서는 텍스트 마이닝의 감성 분석 기법을 통해 감성 수치를 산출하였다. 데이터는 영화 리뷰를 대상으로 하여 도메인 맞춤형 감성 사전을 구축하고, 이를 기반으로 리뷰의 감성점수를 산출한다. 본 논문에서 사용자 리뷰의 감성 수치를 반영한 협업 필터링이 평점만을 고려하는 전통적인 방식의 협업 필터링과 비교하여 우수한 정확도를 나타내는 것을 확인하였다. 이후 제안된 모델이 더 개선된 방식이라고 할 근거를 확보하기 위해 paired t-test 검증을 시도했고, 제안된 모델이 더 우수하다는 결론을 도출하였다. 본 연구에서는 평점만으로 사용자의 감성을 판단한 기존의 선행연구들이 가지는 한계를 극복하고자 리뷰를 수치화하여 기존의 평점 시스템보다 사용자의 의견을 더 정교하게 추천 시스템에 반영시켜 정확도를 향상시켰다. 이를 기반으로 추가적으로 다양한 분석을 시행한다면 추천의 정확도가 더 높아질 것으로 기대된다.

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • 제86권3호
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.

통계적 학습 모형에 기반한 불규칙 맥파 검출 알고리즘 개발 (Development of The Irregular Radial Pulse Detection Algorithm Based on Statistical Learning Model)

  • 배장한;장준수;구본초
    • 대한의용생체공학회:의공학회지
    • /
    • 제41권5호
    • /
    • pp.185-194
    • /
    • 2020
  • Arrhythmia is basically diagnosed with the electrocardiogram (ECG) signal, however, ECG is difficult to measure and it requires expert help in analyzing the signal. On the other hand, the radial pulse can be measured with easy and uncomplicated way in daily life, and could be suitable bio-signal for the recent untact paradigm and extensible signal for diagnosis of Korean medicine based on pulse pattern. In this study, we developed an irregular radial pulse detection algorithm based on a learning model and considered its applicability as arrhythmia screening. A total of 1432 pulse waves including irregular pulse data were used in the experiment. Three data sets were prepared with minimal preprocessing to avoid the heuristic feature extraction. As classification algorithms, elastic net logistic regression, random forest, and extreme gradient boosting were applied to each data set and the irregular pulse detection performances were estimated using area under the receiver operating characteristic curve based on a 10-fold cross-validation. The extreme gradient boosting method showed the superior performance than others and found that the classification accuracy reached 99.7%. The results confirmed that the proposed algorithm could be used for arrhythmia screening. To make a fusion technology integrating western and Korean medicine, arrhythmia subtype classification from the perspective of Korean medicine will be needed for future research.