• Title/Summary/Keyword: lasso

Search Result 171, Processing Time 0.026 seconds

The Prediction Ability of Genomic Selection in the Wheat Core Collection

  • Yuna Kang;Changsoo Kim
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.235-235
    • /
    • 2022
  • Genome selection is a promising tool for plant and animal breeding, which uses genome-wide molecular marker data to capture large and small effect quantitative trait loci and predict the genetic value of selection candidates. Genomic selection has been shown previously to have higher prediction accuracies than conventional marker-assisted selection (MAS) for quantitative traits. In this study, the prediction accuracy of 10 agricultural traits in the wheat core group with 567 points was compared. We used a cross-validation approach to train and validate prediction accuracy to evaluate the effects of training population size and training model.As for the prediction accuracy according to the model, the prediction accuracy of 0.4 or more was evaluated except for the SVN model among the 6 models (GBLUP, LASSO, BayseA, RKHS, SVN, RF) used in most all traits. For traits such as days to heading and days to maturity, the prediction accuracy was very high, over 0.8. As for the prediction accuracy according to the training group, the prediction accuracy increased as the number of training groups increased in all traits. It was confirmed that the prediction accuracy was different in the training population according to the genetic composition regardless of the number. All training models were verified through 5-fold cross-validation. To verify the prediction ability of the training population of the wheat core collection, we compared the actual phenotype and genomic estimated breeding value using 35 breeding population. In fact, out of 10 individuals with the fastest days to heading, 5 individuals were selected through genomic selection, and 6 individuals were selected through genomic selection out of the 10 individuals with the slowest days to heading. Therefore, we confirmed the possibility of selecting individuals according to traits with only the genotype for a shorter period of time through genomic selection.

  • PDF

Prediction of Venous Trans-Stenotic Pressure Gradient Using Shape Features Derived From Magnetic Resonance Venography in Idiopathic Intracranial Hypertension Patients

  • Chao Ma;Haoyu Zhu;Shikai Liang;Yuzhou Chang;Dapeng Mo;Chuhan Jiang;Yupeng Zhang
    • Korean Journal of Radiology
    • /
    • v.25 no.1
    • /
    • pp.74-85
    • /
    • 2024
  • Objective: Idiopathic intracranial hypertension (IIH) is a condition of unknown etiology associated with venous sinus stenosis. This study aimed to develop a magnetic resonance venography (MRV)-based radiomics model for predicting a high trans-stenotic pressure gradient (TPG) in IIH patients diagnosed with venous sinus stenosis. Materials and Methods: This retrospective study included 105 IIH patients (median age [interquartile range], 35 years [27-42 years]; female:male, 82:23) who underwent MRV and catheter venography complemented by venous manometry. Contrast enhanced-MRV was conducted under 1.5 Tesla system, and the images were reconstructed using a standard algorithm. Shape features were derived from MRV images via the PyRadiomics package and selected by utilizing the least absolute shrinkage and selection operator (LASSO) method. A radiomics score for predicting high TPG (≥ 8 mmHg) in IIH patients was formulated using multivariable logistic regression; its discrimination performance was assessed using the area under the receiver operating characteristic curve (AUROC). A nomogram was constructed by incorporating the radiomics scores and clinical features. Results: Data from 105 patients were randomly divided into two distinct datasets for model training (n = 73; 50 and 23 with and without high TPG, respectively) and testing (n = 32; 22 and 10 with and without high TPG, respectively). Three informative shape features were identified in the training datasets: least axis length, sphericity, and maximum three-dimensional diameter. The radiomics score for predicting high TPG in IIH patients demonstrated an AUROC of 0.906 (95% confidence interval, 0.836-0.976) in the training dataset and 0.877 (95% confidence interval, 0.755-0.999) in the test dataset. The nomogram showed good calibration. Conclusion: Our study presents the feasibility of a novel model for predicting high TPG in IIH patients using radiomics analysis of noninvasive MRV-based shape features. This information may aid clinicians in identifying patients who may benefit from stenting.

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

  • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.219-239
    • /
    • 2019
  • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.

Elucidation of the Biosynthetic Pathway of Vitamin B Groups and Potential Secondary Metabolite Gene Clusters Via Genome Analysis of a Marine Bacterium Pseudoruegeria sp. M32A2M

  • Cho, Sang-Hyeok;Lee, Eunju;Ko, So-Ra;Jin, Sangrak;Song, Yoseb;Ahn, Chi-Yong;Oh, Hee-Mock;Cho, Byung-Kwan;Cho, Suhyung
    • Journal of Microbiology and Biotechnology
    • /
    • v.30 no.4
    • /
    • pp.505-514
    • /
    • 2020
  • The symbiotic nature of the relationship between algae and marine bacteria is well-studied among the complex microbial interactions. The mutual profit between algae and bacteria occurs via nutrient and vitamin exchange. It is necessary to analyze the genome sequence of a bacterium to predict its symbiotic relationships. In this study, the genome of a marine bacterium, Pseudoruegeria sp. M32A2M, isolated from the south-eastern isles (GeoJe-Do) of South Korea, was sequenced and analyzed. A draft genome (91 scaffolds) of 5.5 Mb with a DNA G+C content of 62.4% was obtained. In total, 5,101 features were identified from gene annotation, and 4,927 genes were assigned to functional proteins. We also identified transcription core proteins, RNA polymerase subunits, and sigma factors. In addition, full flagella-related gene clusters involving the flagellar body, motor, regulator, and other accessory compartments were detected even though the genus Pseudoruegeria is known to comprise non-motile bacteria. Examination of annotated KEGG pathways revealed that Pseudoruegeria sp. M32A2M has the metabolic pathways for all seven vitamin Bs, including thiamin (vitamin B1), biotin (vitamin B7), and cobalamin (vitamin B12), which are necessary for symbiosis with vitamin B auxotroph algae. We also identified gene clusters for seven secondary metabolites including ectoine, homoserine lactone, beta-lactone, terpene, lasso peptide, bacteriocin, and non-ribosomal proteins.

Effects of Herbicide Application on Growth and the Nodulation in Soybean (제초제 처리가 콩의 생육 및 근류형성에 미치는 영향)

  • Jeong-Hae Oh
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.34 no.3
    • /
    • pp.303-309
    • /
    • 1989
  • Present study was conducted to reveal the effects of the herbicides, Lasso and Devrinol, on the soybean growth and the nodulation in field condition. Emergence rate was reduced positively in proportion to increase in the concentration of herbicides regardless of the herbicidal difference and it was significantly reduced even in the recommended concentration as compared to untreated plot, showing marked abnormal symptom on seedlings. Plant height, fresh weight of the plant, number of internodes, branches, pods, seeds per plant and 100-seed weight were reduced with increase in the concentration of herbicides and were highly significant in difference between the untreated plot and double concentration plot, eventhough most were nonsignificant in difference from the recommended concentration. Nodulation was significantly decreased with increase in the concentration of herbicides. The reduction was remarkably different with soybean varieties and consistently appeared from three weeks to six weeks after sawing. Significant correlation was realized between the reduction of nodulation and the agronomic characters of soybean and it was considered that the reduction of nodulation by misapplication of the herbicides might be a causal factor for decrease in soybean yield.

  • PDF

Investigation on the Key Parameters for the Strengthening Behavior of Biopolymer-based Soil Treatment (BPST) Technology (바이오폴리머-흙 처리(BPST) 기술의 강도 발현 거동에 대한 주요 영향인자 분석에 관한 연구)

  • Lee, Hae-Jin;Cho, Gye-Chum;Chang, Ilhan
    • Land and Housing Review
    • /
    • v.12 no.3
    • /
    • pp.109-119
    • /
    • 2021
  • Global warming caused by greenhouse gas emissions has rapidly increased abnormal climate events and geotechnical engineering hazards in terms of their size and frequency accordingly. Biopolymer-based soil treatment (BPST) in geotechnical engineering has been implemented in recent years as an alternative to reducing carbon footprint. Furthermore, thermo-gelating biopolymers, including agar gum, gellan gum, and xanthan gum, are known to strengthen soils noticeably. However, an explicitly detailed evaluation of the correlation between the factors, that have a significant influence on the strengthening behavior of BPST, has not been explored yet. In this study, machine learning regression analysis was performed using the UCS (unconfined compressive strength) data for BPST tested in the laboratory to evaluate the factors influencing the strengthening behavior of gellan gum-treated soil mixtures. General linear regression, Ridge, and Lasso were used as linear regression methods; the key factors influencing the behavior of BPST were determined by RMSE (root mean squared error) and regression coefficient values. The results of the analysis showed that the concentration of biopolymer and the content of clay have the most significant influence on the strength of BPST.

The Spillover Effects of Fluctuations in Apartment Sales Prices in the Capital Region (수도권 아파트 매매가격 변동의 확산효과)

  • Jeong, Jun Ho
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.25 no.1
    • /
    • pp.147-170
    • /
    • 2022
  • This article analyzes the spillover effects by dividing the weekly rate of return on apartment prices in 70 si-gun-gu (local area) in the Capital Region into three periods: the entire period (April 2008~August 2021); the period before the price surge (April 2008~October 2018); and the period of price surge (November, 2018~August 2021), based on a consideration of the cycle of fluctuations in apartment sales prices and the timing of the current government's policy interventions. The results obtained from this analysis are summarized as follows. First, the analysis of the spillover effects is similar to or different from the results of existing work depending on the period. The analysis of the spillover effects on the entire period and the period before the price surge shows that the 'Gangnam' effect exists in the apartment market in the Capital Region. On the other hand, the analysis of the spillover effects on the period of price surge reveals different results than before. The spillover effect index calculated through the analysis of the rolling sample decreases during the decline in the cycle of apartment sales prices, while the opposite trend is shown during the upward period. Looking at the timing between the peak of the spillover effect index and policy interventions, it appears that the government's policy interventions took place after the peak of the spillover effect index in 2017, before the peak in 2018 and 2019, and around or after the peak after 2020.

A Study on the Prediction Models of Used Car Prices for Domestic Brands Using Machine Learning (머신러닝을 활용한 브랜드별 국내 중고차 가격 예측 모델에 관한 연구)

  • Seungjun Yim;Joungho Lee;Choonho Ryu
    • Journal of Service Research and Studies
    • /
    • v.13 no.3
    • /
    • pp.105-126
    • /
    • 2023
  • The domestic used car market continues to grow along with the used car online platform service. The used car online platform service discloses vehicle specifications, accident history, inspection history, and detailed options to service consumers. Most of the preceding studies were predictions of used car prices using vehicle specifications and some options for vehicles. As a result of the study, it was confirmed that there was a nonlinear relationship between used car prices and some specification variables. Accordingly, the researchers tried to solve the nonlinear problem by executing a Machine Learning model. In common, the Regression based Machine Learning model had the advantage of knowing the actual influence and direction of variables, but there was a disadvantage of low Cost Function figures compared to the Decision Tree based Machine Learning model. This study attempted to predict used car prices of six domestic brands by utilizing both vehicle specifications and vehicle options. Through this, we tried to collect the advantages of the two types of Machine Learning models. To this end, we sequentially conducted a regression based Machine Learning model and a decision tree based Machine Learning model. As a result of the analysis, the practical influence and direction of each brand variable, and the best tree based Machine Learning model were selected. The implications of this study are as follows. It will help buyers and sellers who use used car online platform services to predict approximate used car prices. And it is hoped that it will help solve the problem caused by information inequality among users of the used car online platform service.

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • v.86 no.3
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.