Search | Korea Science

Identifying the Optimal Machine Learning Algorithm for Breast Cancer Prediction

ByungJoo Kim
- International journal of advanced smart convergence
- /
- v.13 no.3
- /
- pp.80-88
- /
- 2024
Breast cancer remains a significant global health burden, necessitating accurate and timely detection for improved patient outcomes. Machine learning techniques have demonstrated remarkable potential in assisting breast cancer diagnosis by learning complex patterns from multi-modal patient data. This study comprehensively evaluates several popular machine learning models, including logistic regression, decision trees, random forests, support vector machines (SVMs), naive Bayes, k-nearest neighbors (KNN), XGBoost, and ensemble methods for breast cancer prediction using the Wisconsin Breast Cancer Dataset (WBCD). Through rigorous benchmarking across metrics like accuracy, precision, recall, F1-score, and area under the ROC curve (AUC), we identify the naive Bayes classifier as the top-performing model, achieving an accuracy of 0.974, F1-score of 0.979, and highest AUC of 0.988. Other strong performers include logistic regression, random forests, and XGBoost, with AUC values exceeding 0.95. Our findings showcase the significant potential of machine learning, particularly the robust naive Bayes algorithm, to provide highly accurate and reliable breast cancer screening from fine needle aspirate (FNA) samples, ultimately enabling earlier intervention and optimized treatment strategies.
https://doi.org/10.7236/IJASC.2024.13.3.80 인용 PDF

Comparison Study of Multi-class Classification Methods

Bae, Wha-Soo;Jeon, Gab-Dong;Seok, Kyung-Ha
- Communications for Statistical Applications and Methods
- /
- v.14 no.2
- /
- pp.377-388
- /
- 2007
As one of multi-class classification methods, ECOC (Error Correcting Output Coding) method is known to have low classification error rate. This paper aims at suggesting effective multi-class classification method (1) by comparing various encoding methods and decoding methods in ECOC method and (2) by comparing ECOC method and direct classification method. Both SVM (Support Vector Machine) and logistic regression model were used as binary classifiers in comparison.
https://doi.org/10.5351/CKSS.2007.14.2.377 인용 PDF KSCI

Flicker Measurement based on SVR for Fixed-Speed Wind Generator Systems

Van, Tan Luong;Lee, Dong-Choon
- Proceedings of the KIPE Conference
- /
- 2009.11a
- /
- pp.117-119
- /
- 2009
This paper presents a simulation model based on support vector regression (SVR) for flicker emission estimation from wind turbines. Training patterns are developed by varying the wind speed and network parameters that might affect the expected flicker levels. A comparison is done to the fixed speed wind turbine (WT), which leads to a conclusion that the factors mentioned above have different influences on flicker emission. The simulation results have shown that the flicker estimation is performed accurately.
PDF

Estimating software development cost using machine-learning approach (학습이론을 이용한 소프트웨어 개발비 예측 모형)

Park, Chan-Kyoo
- 한국IT서비스학회:학술대회논문집
- /
- 2005.11a
- /
- pp.345-355
- /
- 2005
As the portion of information systems(IS) budget to the total government budget becomes greater, the cost estimation of IS development and maintenance projects is recognized as one of the most important problems to be resolved for quantitative and efficient management of IS budget. The primary concern in the cost estimation of IS projects is to estimate software development cost. In this paper, we propose a new method to estimate software cost using support vector regression(SVR), which has attracted considerable attention because of its good performance and theoretical clearness. The paper is the first study which apply SVR to software cost estimation.
PDF

APPLICATION OF SUPPORT VECTOR MACHINE TO THE PREDICTION OF GEO-EFFECTIVE HALO CMES

Choi, Seong-Hwan;Moon, Yong-Jae;Vien, Ngo Anh;Park, Young-Deuk
- Journal of The Korean Astronomical Society
- /
- v.45 no.2
- /
- pp.31-38
- /
- 2012
In this study we apply Support Vector Machine (SVM) to the prediction of geo-effective halo coronal mass ejections (CMEs). The SVM, which is one of machine learning algorithms, is used for the purpose of classification and regression analysis. We use halo and partial halo CMEs from January 1996 to April 2010 in the SOHO/LASCO CME Catalog for training and prediction. And we also use their associated X-ray flare classes to identify front-side halo CMEs (stronger than B1 class), and the Dst index to determine geo-effective halo CMEs (stronger than -50 nT). The combinations of the speed and the angular width of CMEs, and their associated X-ray classes are used for input features of the SVM. We make an attempt to find the best model by using cross-validation which is processed by changing kernel functions of the SVM and their parameters. As a result we obtain statistical parameters for the best model by using the speed of CME and its associated X-ray flare class as input features of the SVM: Accuracy=0.66, PODy=0.76, PODn=0.49, FAR=0.72, Bias=1.06, CSI=0.59, TSS=0.25. The performance of the statistical parameters by applying the SVM is much better than those from the simple classifications based on constant classifiers.
https://doi.org/10.5303/JKAS.2012.45.2.31 인용 PDF KSCI

Feasibility Evaluation of High-Tech New Product Development Projects Using Support Vector Machines

Shin, Teak-Soo;Noh, Jeon-Pyo
- Proceedings of the Korea Inteligent Information System Society Conference
- /
- 2005.11a
- /
- pp.241-250
- /
- 2005
New product development (NPD) is defined as the transformation of a market opportunity and a set of assumptions about product technology into a product available for sale. Managers charged with project selection decisions in the NPD process, such as go/no-go choices and specific resource allocation decisions, are faced with a complicated problem. Therefore, the ability to develop new successful products has identifies as a major determinant in sustaining a firm's competitive advantage. The purpose of this study is to develop a new evaluation model for NPD project selection in the high -tech industry using support vector machines (SYM). The evaluation model is developed through two phases. In the first phase, binary (go/no-go) classification prediction model, i.e. SVM for high-tech NPD project selection is developed. In the second phase. using the predicted output value of SVM, feasibility grade is calculated for the final NPD project decision making. In this study, the feasibility grades are also divided as three level grades. We assume that the frequency of NPD project cases is symmetrically determined according to the feasibility grades and misclassification errors are partially minimized by the multiple grades. However, the horizon of grade level can be changed by firms' NPD strategy. Our proposed feasibility grade method is more reasonable in NPD decision problems by considering particularly risk factor of NPD in viewpoints of future NPD success probability. In our empirical study using Korean NPD cases, the SVM significantly outperformed ANN and logistic regression as benchmark models in hit ratio. And the feasibility grades generated from the predicted output value of SVM showed that they can offer a useful guideline for NPD project selection.
PDF

Comparative Assessment of Linear Regression and Machine Learning for Analyzing the Spatial Distribution of Ground-level NO₂ Concentrations: A Case Study for Seoul, Korea (서울 지역 지상 NO₂ 농도 공간 분포 분석을 위한 회귀 모델 및 기계학습 기법 비교)

Kang, Eunjin;Yoo, Cheolhee;Shin, Yeji;Cho, Dongjin;Im, Jungho
- Korean Journal of Remote Sensing
- /
- v.37 no.6_1
- /
- pp.1739-1756
- /
- 2021
Atmospheric nitrogen dioxide (NO₂) is mainly caused by anthropogenic emissions. It contributes to the formation of secondary pollutants and ozone through chemical reactions, and adversely affects human health. Although ground stations to monitor NO₂ concentrations in real time are operated in Korea, they have a limitation that it is difficult to analyze the spatial distribution of NO₂ concentrations, especially over the areas with no stations. Therefore, this study conducted a comparative experiment of spatial interpolation of NO₂ concentrations based on two linear-regression methods(i.e., multi linear regression (MLR), and regression kriging (RK)), and two machine learning approaches (i.e., random forest (RF), and support vector regression (SVR)) for the year of 2020. Four approaches were compared using leave-one-out-cross validation (LOOCV). The daily LOOCV results showed that MLR, RK, and SVR produced the average daily index of agreement (IOA) of 0.57, which was higher than that of RF (0.50). The average daily normalized root mean square error of RK was 0.9483%, which was slightly lower than those of the other models. MLR, RK and SVR showed similar seasonal distribution patterns, and the dynamic range of the resultant NO₂ concentrations from these three models was similar while that from RF was relatively small. The multivariate linear regression approaches are expected to be a promising method for spatial interpolation of ground-level NO₂ concentrations and other parameters in urban areas.
https://doi.org/10.7780/kjrs.2021.37.6.1.21 인용 PDF KSCI HTML

A Comparative Study of Estimation by Analogy using Data Mining Techniques

Nagpal, Geeta;Uddin, Moin;Kaur, Arvinder
- Journal of Information Processing Systems
- /
- v.8 no.4
- /
- pp.621-652
- /
- 2012
Software Estimations provide an inclusive set of directives for software project developers, project managers, and the management in order to produce more realistic estimates based on deficient, uncertain, and noisy data. A range of estimation models are being explored in the industry, as well as in academia, for research purposes but choosing the best model is quite intricate. Estimation by Analogy (EbA) is a form of case based reasoning, which uses fuzzy logic, grey system theory or machine-learning techniques, etc. for optimization. This research compares the estimation accuracy of some conventional data mining models with a hybrid model. Different data mining models are under consideration, including linear regression models like the ordinary least square and ridge regression, and nonlinear models like neural networks, support vector machines, and multivariate adaptive regression splines, etc. A precise and comprehensible predictive model based on the integration of GRA and regression has been introduced and compared. Empirical results have shown that regression when used with GRA gives outstanding results; indicating that the methodology has great potential and can be used as a candidate approach for software effort estimation.
https://doi.org/10.3745/JIPS.2012.8.4.621 인용 PDF KSCI

Predicting the Young's modulus of frozen sand using machine learning approaches: State-of-the-art review

Reza Sarkhani Benemaran;Mahzad Esmaeili-Falak
- Geomechanics and Engineering
- /
- v.34 no.5
- /
- pp.507-527
- /
- 2023
Accurately estimation of the geo-mechanical parameters in Artificial Ground Freezing (AGF) is a most important scientific topic in soil improvement and geotechnical engineering. In order for this, one way is using classical and conventional constitutive models based on different theories like critical state theory, Hooke's law, and so on, which are time-consuming, costly, and troublous. The others are the application of artificial intelligence (AI) techniques to predict considered parameters and behaviors accurately. This study presents a comprehensive data-mining-based model for predicting the Young's Modulus of frozen sand under the triaxial test. For this aim, several single and hybrid models were considered including additive regression, bagging, M5-Rules, M5P, random forests (RF), support vector regression (SVR), locally weighted linear (LWL), gaussian process regression (GPR), and multi-layered perceptron neural network (MLP). In the present study, cell pressure, strain rate, temperature, time, and strain were considered as the input variables, where the Young's Modulus was recognized as target. The results showed that all selected single and hybrid predicting models have acceptable agreement with measured experimental results. Especially, hybrid Additive Regression-Gaussian Process Regression and Bagging-Gaussian Process Regression have the best accuracy based on Model performance assessment criteria.
https://doi.org/10.12989/gae.2023.34.5.507 인용

Prediction on Busan's Gross Product and Employment of Major Industry with Logistic Regression and Machine Learning Model (로지스틱 회귀모형과 머신러닝 모형을 활용한 주요산업의 부산 지역총생산 및 고용 효과 예측)

Chae-Deug Yi
- Korea Trade Review
- /
- v.47 no.2
- /
- pp.69-88
- /
- 2022
This paper aims to predict Busan's regional product and employment using the logistic regression models and machine learning models. The following are the main findings of the empirical analysis. First, the OLS regression model shows that the main industries such as electricity and electronics, machine and transport, and finance and insurance affect the Busan's income positively. Second, the binomial logistic regression models show that the Busan's strategic industries such as the future transport machinery, life-care, and smart marine industries contribute on the Busan's income in large order. Third, the multinomial logistic regression models show that the Korea's main industries such as the precise machinery, transport equipment, and machinery influence the Busan's economy positively. And Korea's exports and the depreciation can affect Busan's economy more positively at the higher employment level. Fourth, the voting ensemble model show the higher predictive power than artificial neural network model and support vector machine models. Furthermore, the gradient boosting model and the random forest show the higher predictive power than the voting model in large order.
https://doi.org/10.22659/KTRA.2022.47.2.69 인용 PDF

Search Result 554, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)