• 제목/요약/키워드: random forest (RF)

검색결과 185건 처리시간 0.025초

유전지표를 활용한 사상체질 분류모델 (Predictive Models for Sasang Constitution Types Using Genetic Factors)

  • 반효정;이시우;진희정
    • 사상체질의학회지
    • /
    • 제32권2호
    • /
    • pp.10-21
    • /
    • 2020
  • Objectives Genome-wide association studies(GWAS) is a useful method to identify genetic associations for various phenotypes. The purpose of this study was to develop predictive models for Sasang constitution types using genetic factors. Methods The genotypes of the 1,999 subjects was performed using Axiom Precision Medicine Research Array (PMRA) by Life Technologies. All participants were prescribed Sasang Constitution-specific herbal remedies for the treatment, and showed improvement of original symptoms as confirmed by Korean medicine doctor. The genotypes were imputed by using the IMPUTE program. Association analysis was conducted using a logistic regression model to discover Single Nucleotide Polymorphism (SNP), adjusting for age, sex, and BMI. Results & Conclusions We developed models to predict Korean medicine constitution types using identified genectic factors and sex, age, BMI using Random Forest (RF), Support Vector Machine (SVM), and Neural Network (NN). Each maximum Area Under the Curve (AUC) of Teaeum, Soeum, Soyang is 0.894, 0.868, 0.767, respectively. Each AUC of the models increased by 6~17% more than that of models except for genetic factors. By developing the predictive models, we confirmed usefulness of genetic factors related with types. It demonstrates a mechanism for more accurate prediction through genetic factors related with type.

정교한 데이터 분류를 위한 방법론의 고찰 (A Review of the Methodology for Sophisticated Data Classification)

  • 김승재;김성환
    • 통합자연과학논문집
    • /
    • 제14권1호
    • /
    • pp.27-34
    • /
    • 2021
  • 전 세계적으로 인공지능(AI)을 구현하려는 움직임이 많아지고 있다. AI구현에서는 많은 양의 데이터, 목적에 맞는 데이터의 분류 등 데이터의 중요성을 뺄 수 없다. 이러한 데이터를 생성하고 가공하는 기술에는 사물인터넷(IOT)과 빅데이터(Big-data) 분석이 있으며 4차 산업을 이끌어 가는 원동력이라 할 수 있다. 또한 이러한 기술은 국가와 개인 차원에서 많이 활용되고 있으며, 특히나 특정분야에 집결되는 데이터를 기준으로 빅데이터 분석에 활용함으로써 새로운 모델을 발견하고, 그 모델로 새로운 값을 추론하고 예측함으로써 미래비전을 제시하려는 시도가 많아지고 있는 추세이다. 데이터 분석을 통한 결론은 데이터가 가지고 있는 정보의 정확성에 따라 많은 변화를 가져올 수 있으며, 그 변화에 따라 잘못된 결과를 발생시킬 수도 있다. 이렇듯 데이터의 분석은 데이터가 가지는 정보 또는 분석 목적에 맞는 데이터 분류가 매우 중요하다는 것을 알 수 있다. 또한 빅데이터 분석결과 통계량의 신뢰성과 정교함을 얻기 위해서는 각 변수의 의미와 변수들 간의 상관관계, 다중공선성 등을 고려하여 분석해야 한다. 즉, 빅데이터 분석에 앞서 분석목적에 맞도록 데이터의 분류가 잘 이루어지도록 해야 한다. 이에 본 고찰에서는 AI기술을 구현하는 머신러닝(machine learning, ML) 기법에 속하는 분류분석(classification analysis, CA) 중 의사결정트리(decision tree, DT)기법, 랜덤포레스트(random forest, RF)기법, 선형분류분석(linear discriminant analysis, LDA), 이차선형분류분석(quadratic discriminant analysis, QDA)을 이용하여 데이터를 분류한 후 데이터의 분류정도를 평가함으로써 데이터의 분류 분석률 향상을 위한 방안을 모색하려 한다.

Modeling with Thin Film Thickness using Machine Learning

  • Kim, Dong Hwan;Choi, Jeong Eun;Ha, Tae Min;Hong, Sang Jeen
    • 반도체디스플레이기술학회지
    • /
    • 제18권2호
    • /
    • pp.48-52
    • /
    • 2019
  • Virtual metrology, which is one of APC techniques, is a method to predict characteristics of manufactured films using machine learning with saving time and resources. As the photoresist is no longer a mask material for use in high aspect ratios as the CD is reduced, hard mask is introduced to solve such problems. Among many types of hard mask materials, amorphous carbon layer(ACL) is widely investigated due to its advantages of high etch selectivity than conventional photoresist, high optical transmittance, easy deposition process, and removability by oxygen plasma. In this study, VM using different machine learning algorithms is applied to predict the thickness of ACL and trained models are evaluated which model shows best prediction performance. ACL specimens are deposited by plasma enhanced chemical vapor deposition(PECVD) with four different process parameters(Pressure, RF power, $C_3H_6$ gas flow, $N_2$ gas flow). Gradient boosting regression(GBR) algorithm, random forest regression(RFR) algorithm, and neural network(NN) are selected for modeling. The model using gradient boosting algorithm shows most proper performance with higher R-squared value. A model for predicting the thickness of the ACL film within the abovementioned conditions has been successfully constructed.

열화상 이미지와 환경변수를 이용한 콘크리트 균열 깊이 예측 머신 러닝 분석 (Comparison Analysis of Machine Learning for Concrete Crack Depths Prediction Using Thermal Image and Environmental Parameters)

  • 김지형;장아름;박민재;주영규
    • 한국공간구조학회논문집
    • /
    • 제21권2호
    • /
    • pp.99-110
    • /
    • 2021
  • This study presents the estimation of crack depth by analyzing temperatures extracted from thermal images and environmental parameters such as air temperature, air humidity, illumination. The statistics of all acquired features and the correlation coefficient among thermal images and environmental parameters are presented. The concrete crack depths were predicted by four different machine learning models: Multi-Layer Perceptron (MLP), Random Forest (RF), Gradient Boosting (GB), and AdaBoost (AB). The machine learning algorithms are validated by the coefficient of determination, accuracy, and Mean Absolute Percentage Error (MAPE). The AB model had a great performance among the four models due to the non-linearity of features and weak learner aggregation with weights on misclassified data. The maximum depth 11 of the base estimator in the AB model is efficient with high performance with 97.6% of accuracy and 0.07% of MAPE. Feature importances, permutation importance, and partial dependence are analyzed in the AB model. The results show that the marginal effect of air humidity, crack depth, and crack temperature in order is higher than that of the others.

후두내시경 영상에서의 라디오믹스에 의한 병변 분류 연구 (Research on the Lesion Classification by Radiomics in Laryngoscopy Image)

  • 박준하;김영재;우주현;김광기
    • 대한의용생체공학회:의공학회지
    • /
    • 제43권5호
    • /
    • pp.353-360
    • /
    • 2022
  • Laryngeal disease harms quality of life, and laryngoscopy is critical in identifying causative lesions. This study extracts and analyzes using radiomics quantitative features from the lesion in laryngoscopy images and will fit and validate a classifier for finding meaningful features. Searching the region of interest for lesions not classified by the YOLOv5 model, features are extracted with radionics. Selected the extracted features are through a combination of three feature selectors, and three estimator models. Through the selected features, trained and verified two classification models, Random Forest and Gradient Boosting, and found meaningful features. The combination of SFS, LASSO, and RF shows the highest performance with an accuracy of 0.90 and AUROC 0.96. Model using features to select by SFM, or RIDGE was low lower performance than other things. Classification of larynx lesions through radiomics looks effective. But it should use various feature selection methods and minimize data loss as losing color data.

Sentiment Analysis of COVID-19 Vaccination in Saudi Arabia

  • Sawsan Alowa;Lama Alzahrani;Noura Alhakbani;Hend Alrasheed
    • International Journal of Computer Science & Network Security
    • /
    • 제23권2호
    • /
    • pp.13-30
    • /
    • 2023
  • Since the COVID-19 vaccine became available, people have been sharing their opinions on social media about getting vaccinated, causing discussions of the vaccine to trend on Twitter alongside certain events, making the website a rich data source. This paper explores people's perceptions regarding the COVID-19 vaccine during certain events and how these events influenced public opinion about the vaccine. The data consisted of tweets sent during seven important events that were gathered within 14 days of the first announcement of each event. These data represent people's reactions to these events without including irrelevant tweets. The study targeted tweets sent in Arabic from users located in Saudi Arabia. The data were classified as positive, negative, or neutral in tone. Four classifiers were used-support vector machine (SVM), naïve Bayes (NB), logistic regression (LOGR), and random forest (RF)-in addition to a deep learning model using BiLSTM. The results showed that the SVM achieved the highest accuracy, at 91%. Overall perceptions about the COVID-19 vaccine were 54% negative, 36% neutral, and 10% positive.

Form-finding of lifting self-forming GFRP elastic gridshells based on machine learning interpretability methods

  • Soheila, Kookalani;Sandy, Nyunn;Sheng, Xiang
    • Structural Engineering and Mechanics
    • /
    • 제84권5호
    • /
    • pp.605-618
    • /
    • 2022
  • Glass fiber reinforced polymer (GFRP) elastic gridshells consist of long continuous GFRP tubes that form elastic deformations. In this paper, a method for the form-finding of gridshell structures is presented based on the interpretable machine learning (ML) approaches. A comparative study is conducted on several ML algorithms, including support vector regression (SVR), K-nearest neighbors (KNN), decision tree (DT), random forest (RF), AdaBoost, XGBoost, category boosting (CatBoost), and light gradient boosting machine (LightGBM). A numerical example is presented using a standard double-hump gridshell considering two characteristics of deformation as objective functions. The combination of the grid search approach and k-fold cross-validation (CV) is implemented for fine-tuning the parameters of ML models. The results of the comparative study indicate that the LightGBM model presents the highest prediction accuracy. Finally, interpretable ML approaches, including Shapely additive explanations (SHAP), partial dependence plot (PDP), and accumulated local effects (ALE), are applied to explain the predictions of the ML model since it is essential to understand the effect of various values of input parameters on objective functions. As a result of interpretability approaches, an optimum gridshell structure is obtained and new opportunities are verified for form-finding investigation of GFRP elastic gridshells during lifting construction.

머신러닝 기반의 하수처리장 예측 모델 평가 및 개발 (Development and Evaluation of Machine Learning-based Prediction Models for Wastewater Treatment Plant)

  • 심규대;김효상;장근수;김동균;김영모
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2023년도 학술발표회
    • /
    • pp.499-499
    • /
    • 2023
  • 최근 컴퓨터 성능 향상과 새로운 머신러닝 알고리즘 개발됨에 따라, 각 분야별 연구자들이 이를 활용한 연구를 다양하게 수행하고 있으며, 하수처리시설의 경우에는 막대한 양의 운영자료가 축척됨에 따라 머신러닝을 활용한 다양한 연구가 가속화 되고 있다. 기존 하수처리장의 물리학적 모델은 적용된 영향 인자에 여러 가지 가정이 고려되어 모델 정확도가 부정확해지는 경향이 있었으며, 이러한 문제점을 보완하기 위해 하수처리장의 수집된 운영자료 및 머신러닝 기반의 예측 모델을 활용하여 예측 모델 정확도를 향상하는 선행 연구들이 진행되고 있다. A 하수처리장의 부지 내에 설치된 센서를 통하여 운영자료가 중앙제어실 서버에 실시간으로 저장되는 자료를 활용하여 NN (Neural Network), SVM (Support Vector Machine), RF (Random Forest) 등과 같은 다양한 머신러닝 모델을 적용하였고, 하수처리장 운영자료를 적용할 경우 어느 모델이 가장 높은 성능이 나타나는지 인사이트를 도출하고자 하였다. 금회 연구는 A 하수처리장을 대상으로 여러 머신러닝 기반 예측 모델을 개발하고, 각 모델의 예측정확도를 서로 평가함으로써, 머신러닝 모델 최적화를 수행할 수 있었다. 이번 연구에서 도출된 결과를 활용하여 하수처리장 예측 모델 최적화를 진행할 경우, 향후 비교적 짧은 시간에 하수처리장 머신러닝 기반 예측 모델 개발이 가능하다는 점에 의의가 있다.

  • PDF

Automated detection of panic disorder based on multimodal physiological signals using machine learning

  • Eun Hye Jang;Kwan Woo Choi;Ah Young Kim;Han Young Yu;Hong Jin Jeon;Sangwon Byun
    • ETRI Journal
    • /
    • 제45권1호
    • /
    • pp.105-118
    • /
    • 2023
  • We tested the feasibility of automated discrimination of patients with panic disorder (PD) from healthy controls (HCs) based on multimodal physiological responses using machine learning. Electrocardiogram (ECG), electrodermal activity (EDA), respiration (RESP), and peripheral temperature (PT) of the participants were measured during three experimental phases: rest, stress, and recovery. Eleven physiological features were extracted from each phase and used as input data. Logistic regression (LoR), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) algorithms were implemented with nested cross-validation. Linear regression analysis showed that ECG and PT features obtained in the stress and recovery phases were significant predictors of PD. We achieved the highest accuracy (75.61%) with MLP using all 33 features. With the exception of MLP, applying the significant predictors led to a higher accuracy than using 24 ECG features. These results suggest that combining multimodal physiological signals measured during various states of autonomic arousal has the potential to differentiate patients with PD from HCs.

Predicting idiopathic pulmonary fibrosis (IPF) disease in patients using machine approaches

  • Ali, Sikandar;Hussain, Ali;Kim, Hee-Cheol
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2021년도 춘계학술대회
    • /
    • pp.144-146
    • /
    • 2021
  • Idiopathic pulmonary fibrosis (IPF) is one of the most dreadful lung diseases which effects the performance of the lung unpredictably. There is no any authentic natural history discovered yet pertaining to this disease and it has been very difficult for the physicians to diagnosis this disease. With the advent of Artificial intelligent and its related technologies this task has become a little bit easier. The aim of this paper is to develop and to explore the machine learning models for the prediction and diagnosis of this mysterious disease. For our study, we got IPF dataset from Haeundae Paik hospital consisting of 2425 patients. This dataset consists of 502 features. We applied different data preprocessing techniques for data cleaning while making the data fit for the machine learning implementation. After the preprocessing of the data, 18 features were selected for the experiment. In our experiment, we used different machine learning classifiers i.e., Multilayer perceptron (MLP), Support vector machine (SVM), and Random forest (RF). we compared the performance of each classifier. The experimental results showed that MLP outperformed all other compared models with 91.24% accuracy.

  • PDF