• Title/Summary/Keyword: tree classification

Search Result 938, Processing Time 0.029 seconds

Comparing automated and non-automated machine learning for autism spectrum disorders classification using facial images

  • Elshoky, Basma Ramdan Gamal;Younis, Eman M.G.;Ali, Abdelmgeid Amin;Ibrahim, Osman Ali Sadek
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.613-623
    • /
    • 2022
  • Autism spectrum disorder (ASD) is a developmental disorder associated with cognitive and neurobehavioral disorders. It affects the person's behavior and performance. Autism affects verbal and non-verbal communication in social interactions. Early screening and diagnosis of ASD are essential and helpful for early educational planning and treatment, the provision of family support, and for providing appropriate medical support for the child on time. Thus, developing automated methods for diagnosing ASD is becoming an essential need. Herein, we investigate using various machine learning methods to build predictive models for diagnosing ASD in children using facial images. To achieve this, we used an autistic children dataset containing 2936 facial images of children with autism and typical children. In application, we used classical machine learning methods, such as support vector machine and random forest. In addition to using deep-learning methods, we used a state-of-the-art method, that is, automated machine learning (AutoML). We compared the results obtained from the existing techniques. Consequently, we obtained that AutoML achieved the highest performance of approximately 96% accuracy via the Hyperpot and tree-based pipeline optimization tool optimization. Furthermore, AutoML methods enabled us to easily find the best parameter settings without any human efforts for feature engineering.

A Comparative Study on Feature Selection and Classification Methods Using Closed Frequent Patterns Mining (닫힌 빈발 패턴을 기반으로 한 특징 선택과 분류방법 비교)

  • Zhang, Lei;Jin, Cheng Hao;Ryu, Keun Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.148-151
    • /
    • 2010
  • 분류 기법은 데이터 마이닝 기술 중 가장 잘 알려진 방법으로서, Decision tree, SVM(Support Vector Machine), ANN(Artificial Neural Network) 등 기법을 포함한다. 분류 기법은 이미 알려진 상호 배반적인 몇 개 그룹에 속하는 다변량 관측치로부터 각각의 그룹이 어떤 특징을 가지고 있는지 분류 모델을 만들고, 소속 그룹이 알려지지 않은 새로운 관측치가 어떤 그룹에 분류될 것인가를 결정하는 분석 방법이다. 분류기법을 수행할 때에 기본적으로 특징 공간이 잘 표현되어 있다고 가정한다. 그러나 실제 응용에서는 단일 특징으로 구성된 특징공간이 분명하지 않기 때문에 분류를 잘 수행하지 못하는 문제점이 있다. 본 논문에서는 이 문제에 대한 해결방안으로써 많은 정보를 포함하면서 빈발패턴에 대한 정보의 순실이 없는 닫힌 빈발패턴 기반 분류에 대한 연구를 진행하였다. 본 실험에서는 ${\chi}^2$(Chi-square)과 정보이득(Information Gain) 속성 선택 척도를 사용하여 의미있는 특징 선택을 수행하였다. 그 결과, 이 연구에서 제시한 척도를 사용하여 특징 선택을 수행한 경우, C4.5, SVM 과 같은 분류기법보다 더 향상된 분류 성능을 보였다.

New Approaches to Xerostomia with Salivary Flow Rate Based on Machine Learning Algorithm

  • Yeon-Hee Lee;Q-Schick Auh;Hee-Kyung Park
    • Journal of Korean Dental Science
    • /
    • v.16 no.1
    • /
    • pp.47-62
    • /
    • 2023
  • Purpose: We aimed to investigate the objective cutoff values of unstimulated flow rates (UFR) and stimulated salivary flow rates (SFR) in patients with xerostomia and to present an optimal machine learning model with a classification and regression tree (CART) for all ages. Materials and Methods: A total of 829 patients with oral diseases were enrolled (591 females; mean age, 59.29±16.40 years; 8~95 years old), 199 patients with xerostomia and 630 patients without xerostomia. Salivary and clinical characteristics were collected and analyzed. Result: Patients with xerostomia had significantly lower levels of UFR (0.29±0.22 vs. 0.41±0.24 ml/min) and SFR (1.12±0.55 vs. 1.39±0.94 ml/min) (P<0.001), respectively, compared to those with non-xerostomia. The presence of xerostomia had a significantly negative correlation with UFR (r=-0.603, P=0.002) and SFR (r=-0.301, P=0.017). In the diagnosis of xerostomia based on the CART algorithm, the presence of stomatitis, candidiasis, halitosis, psychiatric disorder, and hyperlipidemia were significant predictors for xerostomia, and the cutoff ranges for xerostomia for UFR and SFR were 0.03~0.18 ml/min and 0.85~1.6 ml/min, respectively. Conclusion: Xerostomia was correlated with decreases in UFR and SFR, and their cutoff values varied depending on the patient's underlying oral and systemic conditions.

Performance Comparison of Statistics-Based Machine Learning Model for Classification of Technical Documents (기술문서 분류를 위한 통계기반 기계학습 모델 성능비교 및 한계 연구)

  • Kim, Jin-gu;Yu, Heonchang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.393-396
    • /
    • 2022
  • 본 연구는 국방과학기술 분야의 특허 및 논문 실적을 이용하여 통계기반 기계학습 모델 4 종을 학습하고, 실제 분석 대상기관의 데이터 입력결과를 분석하여 실용성에 대한 한계점 분석을 목적으로 한다. 기존 연구에서는 특허분류코드를 기준으로 분류하여 특수 목적으로 활용하거나 세부 연구 범위 내 연구 주제탐색 및 특징연구 등 미시적인 관점에서의 상세연구 활용 목적인 반면, 본 연구는 거시적인 관점에서 연구의 전체적인 흐름과 경향성 파악을 목적으로 한다. 이에 ICT 기술 138 종의 특허 및 논문 30,965 건과 국방과학기술 192 종의 특허 및 논문 23,406 건을 학습데이터로 각 모델을 학습하였다. 비교한 통계기반 학습모델은 Support Vector Machines, Decision Tree, Naive Bayes, XGBoost 모델이다. 학습데이터에 대한 학습검증 단계에서는 최대 99.4%의 성능을 보였다. 다만, 실제 분석대상기관의 특허 및 논문 12,824 건으로 입력분석한 결과, 모델별 편향성 문제, 데이터 전처리 이슈, 다중클래스 및 다중레이블 문제를 확인, 도출한 문제에 대한 해결방안을 제시하고 추가 연구의 방향성을 제시한다.

The Analysis of the Activity Patterns of Dog with Wearable Sensors Using Machine Learning

  • Hussain, Ali;Ali, Sikandar;Kim, Hee-Cheol
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.141-143
    • /
    • 2021
  • The Activity patterns of animal species are difficult to access and the behavior of freely moving individuals can not be assessed by direct observation. As it has become large challenge to understand the activity pattern of animals such as dogs, and cats etc. One approach for monitoring these behaviors is the continuous collection of data by human observers. Therefore, in this study we assess the activity patterns of dog using the wearable sensors data such as accelerometer and gyroscope. A wearable, sensor -based system is suitable for such ends, and it will be able to monitor the dogs in real-time. The basic purpose of this study was to develop a system that can detect the activities based on the accelerometer and gyroscope signals. Therefore, we purpose a method which is based on the data collected from 10 dogs, including different nine breeds of different sizes and ages, and both genders. We applied six different state-of-the-art classifiers such as Random forests (RF), Support vector machine (SVM), Gradient boosting machine (GBM), XGBoost, k-nearest neighbors (KNN), and Decision tree classifier, respectively. The Random Forest showed a good classification result. We achieved an accuracy 86.73% while the detecting the activity.

  • PDF

An Automatic Approach for the Recommendation of Bug Report Priority Based on the Stack Trace (Stack Trace 기반 Bug report 우선순위 자동 추천 접근 방안)

  • Lee, JeongHoon;kim, Taeyoung;Choi, Jiwon;Kim, SunTae;Ryu, Duksan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.866-869
    • /
    • 2020
  • 소프트웨어 개발 환경이 빠르게 변화함에 따라 시스템의 복잡성이 증가하고 있다. 이에 따라 크고 작은 소프트웨어의 버그를 피할 수 없게 되며 이를 효율적으로 처리하기 위해 Bug report 를 사용한다. 하지만, Bug report 에서 개발자가 해당 Bug report 의 우선순위를 결정하는 과정은 노력과 비용 그리고 시간을 많이 소모하게 만든다. 따라서, 본 논문에서는 Bug report 내의 Stack trace 를 기반으로 Bug 의 우선순위를 자동적으로 추천하는 기법을 제안한다. 이를 위해 본 연구에서는 첫 번째로 Bug report 로부터 Stack trace 를 추출하였으며 Stack trace 의 3 가지 요소(Exception, Reason 그리고 Stack frame)에 TF-IDF, Word2Vec 그리고 Stack overflow 를 사용하여 특징 벡터를 정의하였다. 그리고 Bug 의 우선순위 추천 모델을 생성하기 위해 4 가지의 Classification 알고리즘을(Random Forest, Decision Tree, XGBoost, SVM)을 적용하였다. 평가에서는 266,292 개의 JDK library 의 Bug report 데이터를 수집하였고 그중 Stack trace 를 가진 Bug report 로부터 68%의 정확도를 산출하였다.

Machine Learning Based BLE Indoor Positioning Performance Improvement (머신러닝 기반 BLE 실내측위 성능 개선)

  • Moon, Joon;Pak, Sang-Hyon;Hwang, Jae-Jeong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.467-468
    • /
    • 2021
  • In order to improve the performance of the indoor positioning system using BLE beacons, a receiver that measures the angle of arrival among the direction finding technologies supported by BLE5.1 was manufactured and analyzed by machine learning to measure the optimal position. For the creation and testing of machine learning models, k-nearest neighbor classification and regression, logistic regression, support vector machines, decision tree artificial neural networks, and deep neural networks were used to learn and test. As a result, when the test set 4 produced in the study was used, the accuracy was up to 99%.

  • PDF

An Outlier Detection Algorithm and Data Integration Technique for Prediction of Hypertension (고혈압 예측을 위한 이상치 탐지 알고리즘 및 데이터 통합 기법)

  • Khongorzul Dashdondov;Mi-Hye Kim;Mi-Hwa Song
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.417-419
    • /
    • 2023
  • Hypertension is one of the leading causes of mortality worldwide. In recent years, the incidence of hypertension has increased dramatically, not only among the elderly but also among young people. In this regard, the use of machine-learning methods to diagnose the causes of hypertension has increased in recent years. In this study, we improved the prediction of hypertension detection using Mahalanobis distance-based multivariate outlier removal using the KNHANES database from the Korean national health data and the COVID-19 dataset from Kaggle. This study was divided into two modules. Initially, the data preprocessing step used merged datasets and decision-tree classifier-based feature selection. The next module applies a predictive analysis step to remove multivariate outliers using the Mahalanobis distance from the experimental dataset and makes a prediction of hypertension. In this study, we compared the accuracy of each classification model. The best results showed that the proposed MAH_RF algorithm had an accuracy of 82.66%. The proposed method can be used not only for hypertension but also for the detection of various diseases such as stroke and cardiovascular disease.

Overview of Biliary Atresia (담도폐쇄증의 개요)

  • Tae Yeon Jeon
    • Journal of the Korean Society of Radiology
    • /
    • v.83 no.5
    • /
    • pp.979-990
    • /
    • 2022
  • Biliary atresia is a progressive, idiopathic, obliterative disease of the extrahepatic biliary tree that presents with biliary obstruction in the neonatal period. It is the most common indication for liver transplantation in children. If untreated, progressive liver cirrhosis leads to death by two years of age. Nowadays, more than 90% of biliary atresia patients survive into adulthood with the development of Kasai portoenterostomy and liver transplantation technology. Early diagnosis is critical since the success rate of the Kasai portoenterostomy decreases with time. This study comprehensively reviews the recent advances in the etiology, classification, prevalence, clinical manifestations, treatment, and prognosis of biliary atresia.

Study on Fault Diagnosis and Data Processing Techniques for Substrate Transfer Robots Using Vibration Sensor Data

  • MD Saiful Islam;Mi-Jin Kim;Kyo-Mun Ku;Hyo-Young Kim;Kihyun Kim
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.31 no.2
    • /
    • pp.45-53
    • /
    • 2024
  • The maintenance of semiconductor equipment is crucial for the continuous growth of the semiconductor market. System management is imperative given the anticipated increase in the capacity and complexity of industrial equipment. Ensuring optimal operation of manufacturing processes is essential to maintaining a steady supply of numerous parts. Particularly, monitoring the status of substrate transfer robots, which play a central role in these processes, is crucial. Diagnosing failures of their major components is vital for preventive maintenance. Fault diagnosis methods can be broadly categorized into physics-based and data-driven approaches. This study focuses on data-driven fault diagnosis methods due to the limitations of physics-based approaches. We propose a methodology for data acquisition and preprocessing for robot fault diagnosis. Data is gathered from vibration sensors, and the data preprocessing method is applied to the vibration signals. Subsequently, the dataset is trained using Gradient Tree-based XGBoost machine learning classification algorithms. The effectiveness of the proposed model is validated through performance evaluation metrics, including accuracy, F1 score, and confusion matrix. The XGBoost classifiers achieve an accuracy of approximately 92.76% and an equivalent F1 score. ROC curves indicate exceptional performance in class discrimination, with 100% discrimination for the normal class and 98% discrimination for abnormal classes.