• 제목/요약/키워드: Confusion matrix

검색결과 117건 처리시간 0.03초

기계학습 기법에 따른 KOMPSAT-3A 시가화 영상 분류 - 서울시 양재 지역을 중심으로 - (KOMPSAT-3A Urban Classification Using Machine Learning Algorithm - Focusing on Yang-jae in Seoul -)

  • 윤형진;정종철
    • 대한원격탐사학회지
    • /
    • 제36권6_2호
    • /
    • pp.1567-1577
    • /
    • 2020
  • 시가화 지역 토지피복분류는 도시계획 및 관리에 활용된다. 따라서, 시가화 지역에 대한 분류 정확도 향상 연구는 중요하다고 할 수 있다. 본 연구에서는 고해상도 위성영상인 KOMPSAT-3A을 기계학습 중 Support Vector Machine(SVM)과 Artificial Neural Network(ANN)을 기반으로 시가화지역 분류를 진행하였다. 훈련 데이터 구축과정에서 25 m 격자를 기반으로 훈련 지역을 구분하여 영상을 학습하였으며, 학습된 모델을 활용하여 테스트 지역을 분류하였다. 검증과정에서 250개의 GTP를 활용하여 오차 행렬을 통한 결과를 제시하였다. SVM 4가지 기법과 ANN 2가지 기법 중 SVM Polynomial Model이 가장 높은 정확도인 86%를 나타냈다. Ground Truth Points(GTP)를 활용하여 두 개의 모델을 비교하는 과정에서, SVM 모델은 전체적으로 ANN 모델보다 효과적으로 KOMPSAT-3A 영상을 분류하였다. 건물, 도로, 식생, 나대지 4가지 클래스 분류 중 건물이 가장 낮은 분류정확도를 보여주었으며, 이는 고층건물에 따른 건물 그림자에 의한 오분류가 주요 원인으로 나타났다.

Deep learning improves implant classification by dental professionals: a multi-center evaluation of accuracy and efficiency

  • Lee, Jae-Hong;Kim, Young-Taek;Lee, Jong-Bin;Jeong, Seong-Nyum
    • Journal of Periodontal and Implant Science
    • /
    • 제52권3호
    • /
    • pp.220-229
    • /
    • 2022
  • Purpose: The aim of this study was to evaluate and compare the accuracy performance of dental professionals in the classification of different types of dental implant systems (DISs) using panoramic radiographic images with and without the assistance of a deep learning (DL) algorithm. Methods: Using a self-reported questionnaire, the classification accuracy of dental professionals (including 5 board-certified periodontists, 8 periodontology residents, and 31 dentists not specialized in implantology working at 3 dental hospitals) with and without the assistance of an automated DL algorithm were determined and compared. The accuracy, sensitivity, specificity, confusion matrix, receiver operating characteristic (ROC) curves, and area under the ROC curves were calculated to evaluate the classification performance of the DL algorithm and dental professionals. Results: Using the DL algorithm led to a statistically significant improvement in the average classification accuracy of DISs (mean accuracy: 78.88%) compared to that without the assistance of the DL algorithm (mean accuracy: 63.13%, P<0.05). In particular, when assisted by the DL algorithm, board-certified periodontists (mean accuracy: 88.56%) showed higher average accuracy than did the DL algorithm, and dentists not specialized in implantology (mean accuracy: 77.83%) showed the largest improvement, reaching an average accuracy similar to that of the algorithm (mean accuracy: 80.56%). Conclusions: The automated DL algorithm classified DISs with accuracy and performance comparable to those of board-certified periodontists, and it may be useful for dental professionals for the classification of various types of DISs encountered in clinical practice.

Performance of Support Vector Machine for Classifying Land Cover in Optical Satellite Images: A Case Study in Delaware River Port Area

  • Ramayanti, Suci;Kim, Bong Chan;Park, Sungjae;Lee, Chang-Wook
    • 대한원격탐사학회지
    • /
    • 제38권6_4호
    • /
    • pp.1911-1923
    • /
    • 2022
  • The availability of high-resolution satellite images provides precise information without direct observation of the research target. Korea Multi-Purpose Satellite (KOMPSAT), also known as the Arirang satellite, has been developed and utilized for earth observation. The machine learning model was continuously proven as a good classifier in classifying remotely sensed images. This study aimed to compare the performance of the support vector machine (SVM) model in classifying the land cover of the Delaware River port area on high and medium-resolution images. Three optical images, which are KOMPSAT-2, KOMPSAT-3A, and Sentinel-2B, were classified into six land cover classes, including water, road, vegetation, building, vacant, and shadow. The KOMPSAT images are provided by Korea Aerospace Research Institute (KARI), and the Sentinel-2B image was provided by the European Space Agency (ESA). The training samples were manually digitized for each land cover class and considered the reference image. The predicted images were compared to the actual data to obtain the accuracy assessment using a confusion matrix analysis. In addition, the time-consuming training and classifying were recorded to evaluate the model performance. The results showed that the KOMPSAT-3A image has the highest overall accuracy and followed by KOMPSAT-2 and Sentinel-2B results. On the contrary, the model took a long time to classify the higher-resolution image compared to the lower resolution. For that reason, we can conclude that the SVM model performed better in the higher resolution image with the consequence of the longer time-consuming training and classifying data. Thus, this finding might provide consideration for related researchers when selecting satellite imagery for effective and accurate image classification.

XGBoost를 활용한 EBM 3D 프린터의 결함 예측 (Predicting defects of EBM-based additive manufacturing through XGBoost)

  • 정자훈
    • 한국정보통신학회논문지
    • /
    • 제26권5호
    • /
    • pp.641-648
    • /
    • 2022
  • 본 논문은 3D 프린터 출력 방식 중 하나인, 전자빔용해법(EBM)의 공정 간에 발생하는 결함에 영향을 미치는 요인들을 데이터 분석을 통해 규명하는 연구이다. 선행 연구들을 기반으로 결함발생에 주요한 원인으로 지목되는 요소들을 참고하였으며, 공정 간 발생하는 로그파일 분석을 통해 결함 발생과 연관된 변수들을 추출하였다. 또한, 해당 데이터가 시계열 데이터라는 점에 착안하여 window의 개념을 도입하여, 현재 공정 층으로부터 총 3개 전 층까지의 데이터를 포함하여 분석에 사용 될 변수들을 구성하였다. 해당 연구의 종속변수는 결함발생유무이기에 이진분류를 통한 분석을 하였으며, 이때 결함 층의 비율이 낮다는(약 4%) 문제로 인해 SMOTE 기법을 적용하여 균형잡힌 훈련용 데이터를 만들었다. 분석을 위해 Gridsearch CV를 활용한 XGBoost를 사용하였고, 분류 성능은 혼동행렬을 기반으로 평가하였다. 마지막으로, SHAP값을 통한 변수 중요도 분석을 통해 연구의 결론을 내렸다.

A Detecting Technique for the Climatic Factors that Aided the Spread of COVID-19 using Deep and Machine Learning Algorithms

  • Al-Sharari, Waad;Mahmood, Mahmood A.;Abd El-Aziz, A.A.;Azim, Nesrine A.
    • International Journal of Computer Science & Network Security
    • /
    • 제22권6호
    • /
    • pp.131-138
    • /
    • 2022
  • Novel Coronavirus (COVID-19) is viewed as one of the main general wellbeing theaters on the worldwide level all over the planet. Because of the abrupt idea of the flare-up and the irresistible force of the infection, it causes individuals tension, melancholy, and other pressure responses. The avoidance and control of the novel Covid pneumonia have moved into an imperative stage. It is fundamental to early foresee and figure of infection episode during this troublesome opportunity to control of its grimness and mortality. The entire world is investing unimaginable amounts of energy to fight against the spread of this lethal infection. In this paper, we utilized machine learning and deep learning techniques for analyzing what is going on utilizing countries shared information and for detecting the climate factors that effect on spreading Covid-19, such as humidity, sunny hours, temperature and wind speed for understanding its regular dramatic way of behaving alongside the forecast of future reachability of the COVID-2019 around the world. We utilized data collected and produced by Kaggle and the Johns Hopkins Center for Systems Science. The dataset has 25 attributes and 9566 objects. Our Experiment consists of two phases. In phase one, we preprocessed dataset for DL model and features were decreased to four features humidity, sunny hours, temperature and wind speed by utilized the Pearson Correlation Coefficient technique (correlation attributes feature selection). In phase two, we utilized the traditional famous six machine learning techniques for numerical datasets, and Dense Net deep learning model to predict and detect the climatic factor that aide to disease outbreak. We validated the model by using confusion matrix (CM) and measured the performance by four different metrics: accuracy, f-measure, recall, and precision.

기계학습 Adaboost에 기초한 미세먼지 등급 지도 (Particulate Matter Rating Map based on Machine Learning with Adaboost Algorithm)

  • 정종철
    • 지적과 국토정보
    • /
    • 제51권2호
    • /
    • pp.141-150
    • /
    • 2021
  • 미세먼지는 사람의 건강에 많은 영향을 미치는 물질로서 이와 관련하여 다양한 연구가 이루어지고 있다. 미세먼지의 인체 영향으로 인해 서울시 모니터링 네트워크에서 측정된 과거 데이터를 활용하여 미세먼지를 예측하려는 다양한 연구가 진행되고 있다. 본 연구는 2019년 5월 서울시의 미세먼지를 중점으로 진행하였으며, 학습에 사용한 변수는 SO2, CO, NO2, O3와 같은 대기오염물질 데이터를 활용하였다. 예측모델은 Adaboost에 기반하여 구축하였고, 훈련모델은 PM10과 PM2.5로 구분하였다. 에러 메트릭스를 통한 예측모델의 정확도 평가 결과로 Adaboost가 시도되었다. 대기오염물질은 초미세먼지와 더 높은 상관성을 보이는 것으로 나타났지만, 보다 효과적인 분포등급을 제시하기 위해서는 많은 양의 데이터를 학습하고, PM10과 PM2.5의 공간분포 등급을 효과적으로 예측하기 위해서 교통량 등의 추가적인 변수를 활용할 필요성이 있다고 판단된다.

Personalized Diabetes Risk Assessment Through Multifaceted Analysis (PD- RAMA): A Novel Machine Learning Approach to Early Detection and Management of Type 2 Diabetes

  • Gharbi Alshammari
    • International Journal of Computer Science & Network Security
    • /
    • 제23권8호
    • /
    • pp.17-25
    • /
    • 2023
  • The alarming global prevalence of Type 2 Diabetes Mellitus (T2DM) has catalyzed an urgent need for robust, early diagnostic methodologies. This study unveils a pioneering approach to predicting T2DM, employing the Extreme Gradient Boosting (XGBoost) algorithm, renowned for its predictive accuracy and computational efficiency. The investigation harnesses a meticulously curated dataset of 4303 samples, extracted from a comprehensive Chinese research study, scrupulously aligned with the World Health Organization's indicators and standards. The dataset encapsulates a multifaceted spectrum of clinical, demographic, and lifestyle attributes. Through an intricate process of hyperparameter optimization, the XGBoost model exhibited an unparalleled best score, elucidating a distinctive combination of parameters such as a learning rate of 0.1, max depth of 3, 150 estimators, and specific colsample strategies. The model's validation accuracy of 0.957, coupled with a sensitivity of 0.9898 and specificity of 0.8897, underlines its robustness in classifying T2DM. A detailed analysis of the confusion matrix further substantiated the model's diagnostic prowess, with an F1-score of 0.9308, illustrating its balanced performance in true positive and negative classifications. The precision and recall metrics provided nuanced insights into the model's ability to minimize false predictions, thereby enhancing its clinical applicability. The research findings not only underline the remarkable efficacy of XGBoost in T2DM prediction but also contribute to the burgeoning field of machine learning applications in personalized healthcare. By elucidating a novel paradigm that accentuates the synergistic integration of multifaceted clinical parameters, this study fosters a promising avenue for precise early detection, risk stratification, and patient-centric intervention in diabetes care. The research serves as a beacon, inspiring further exploration and innovation in leveraging advanced analytical techniques for transformative impacts on predictive diagnostics and chronic disease management.

Deep learning method for compressive strength prediction for lightweight concrete

  • Yaser A. Nanehkaran;Mohammad Azarafza;Tolga Pusatli;Masoud Hajialilue Bonab;Arash Esmatkhah Irani;Mehdi Kouhdarag;Junde Chen;Reza Derakhshani
    • Computers and Concrete
    • /
    • 제32권3호
    • /
    • pp.327-337
    • /
    • 2023
  • Concrete is the most widely used building material, with various types including high- and ultra-high-strength, reinforced, normal, and lightweight concretes. However, accurately predicting concrete properties is challenging due to the geotechnical design code's requirement for specific characteristics. To overcome this issue, researchers have turned to new technologies like machine learning to develop proper methodologies for concrete specification. In this study, we propose a highly accurate deep learning-based predictive model to investigate the compressive strength (UCS) of lightweight concrete with natural aggregates (pumice). Our model was implemented on a database containing 249 experimental records and revealed that water, cement, water-cement ratio, fine-coarse aggregate, aggregate substitution rate, fine aggregate replacement, and superplasticizer are the most influential covariates on UCS. To validate our model, we trained and tested it on random subsets of the database, and its performance was evaluated using a confusion matrix and receiver operating characteristic (ROC) overall accuracy. The proposed model was compared with widely known machine learning methods such as MLP, SVM, and DT classifiers to assess its capability. In addition, the model was tested on 25 laboratory UCS tests to evaluate its predictability. Our findings showed that the proposed model achieved the highest accuracy (accuracy=0.97, precision=0.97) and the lowest error rate with a high learning rate (R2=0.914), as confirmed by ROC (AUC=0.971), which is higher than other classifiers. Therefore, the proposed method demonstrates a high level of performance and capability for UCS predictions.

Prediction of Stunting Among Under-5 Children in Rwanda Using Machine Learning Techniques

  • Similien Ndagijimana;Ignace Habimana Kabano;Emmanuel Masabo;Jean Marie Ntaganda
    • Journal of Preventive Medicine and Public Health
    • /
    • 제56권1호
    • /
    • pp.41-49
    • /
    • 2023
  • Objectives: Rwanda reported a stunting rate of 33% in 2020, decreasing from 38% in 2015; however, stunting remains an issue. Globally, child deaths from malnutrition stand at 45%. The best options for the early detection and treatment of stunting should be made a community policy priority, and health services remain an issue. Hence, this research aimed to develop a model for predicting stunting in Rwandan children. Methods: The Rwanda Demographic and Health Survey 2019-2020 was used as secondary data. Stratified 10-fold cross-validation was used, and different machine learning classifiers were trained to predict stunting status. The prediction models were compared using different metrics, and the best model was chosen. Results: The best model was developed with the gradient boosting classifier algorithm, with a training accuracy of 80.49% based on the performance indicators of several models. Based on a confusion matrix, the test accuracy, sensitivity, specificity, and F1 were calculated, yielding the model's ability to classify stunting cases correctly at 79.33%, identify stunted children accurately at 72.51%, and categorize non-stunted children correctly at 94.49%, with an area under the curve of 0.89. The model found that the mother's height, television, the child's age, province, mother's education, birth weight, and childbirth size were the most important predictors of stunting status. Conclusions: Therefore, machine-learning techniques may be used in Rwanda to construct an accurate model that can detect the early stages of stunting and offer the best predictive attributes to help prevent and control stunting in under five Rwandan children.

머신 러닝을 사용한 이미지 클러스터링: K-means 방법을 사용한 InceptionV3 연구 (Image Clustering Using Machine Learning : Study of InceptionV3 with K-means Methods.)

  • 닌담 솜사우트;이효종
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2021년도 추계학술발표대회
    • /
    • pp.681-684
    • /
    • 2021
  • In this paper, we study image clustering without labeling using machine learning techniques. We proposed an unsupervised machine learning technique to design an image clustering model that automatically categorizes images into groups. Our experiment focused on inception convolutional neural networks (inception V3) with k-mean methods to cluster images. For this, we collect the public datasets containing Food-K5, Flowers, Handwritten Digit, Cats-dogs, and our dataset Rice Germination, and the owner dataset Palm print. Our experiment can expand into three-part; First, format all the images to un-label and move to whole datasets. Second, load dataset into the inception V3 extraction image features and transferred to the k-mean cluster group hold on six classes. Lastly, evaluate modeling accuracy using the confusion matrix base on precision, recall, F1 to analyze. In this our methods, we can get the results as 1) Handwritten Digit (precision = 1.000, recall = 1.000, F1 = 1.00), 2) Food-K5 (precision = 0.975, recall = 0.945, F1 = 0.96), 3) Palm print (precision = 1.000, recall = 0.999, F1 = 1.00), 4) Cats-dogs (precision = 0.997, recall = 0.475, F1 = 0.64), 5) Flowers (precision = 0.610, recall = 0.982, F1 = 0.75), and our dataset 6) Rice Germination (precision = 0.997, recall = 0.943, F1 = 0.97). Our experiment showed that modeling could get an accuracy rate of 0.8908; the outcomes state that the proposed model is strongest enough to differentiate the different images and classify them into clusters.