• Title/Summary/Keyword: 의사결정트리 알고리즘

Search Result 80, Processing Time 0.028 seconds

A Study of Influencing Factors on World Handball Win-Loss using the Decision Tree Analysis (의사결정나무 분석을 통한 세계핸드볼 승패결정요인 분석)

  • Kim, Hyunchul
    • Journal of Digital Convergence
    • /
    • v.19 no.5
    • /
    • pp.461-468
    • /
    • 2021
  • The purpose of this study is to collect official records of the 2019 Men's and Women's Handball World Championships to identify important shooting variables that determine the team's record of winning or losing. After collecting 192 games of men's and women's national teams from 24 countries and verifying the difference in competition records according to the winning and losing groups, the decision tree method, one of the data mining techniques, is analyzed. According to the analysis, the 9m shooting success rate and Near shooting success rate were the most important factors for both men and women. Men win 83.3% if the 9m shooting success rate is 32.5% or higher and the Near shooting success rate is 67.5%, and women win 75% if the 9m shooting success rate is 75% or more and the Near shooting success rate is 51%. Also, the women's yellow cards are considered important variables that determine victory or defeat. In conclusion, both men and women were able to identify the factors of winning and losing decision shooting, but follow-up studies are needed considering the relativity of various record variables and performance in future handball.

Biological Early Warning System for Toxicity Detection (독성 감지를 위한 생물 조기 경보 시스템)

  • Kim, Sung-Yong;Kwon, Ki-Yong;Lee, Won-Don
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.9
    • /
    • pp.1979-1986
    • /
    • 2010
  • Biological early warning system detects toxicity by looking at behavior of organisms in water. The system uses classifier for judgement about existence and amount of toxicity in water. Boosting algorithm is one of possible application method for improving performance in a classifier. Boosting repetitively change training example set by focusing on difficult examples in basic classifier. As a result, prediction performance is improved for the events which are difficult to classify, but the information contained in the events which can be easily classified are discarded. In this paper, an incremental learning method to overcome this shortcoming is proposed by using the extended data expression. In this algorithm, decision tree classifier define class distribution information using the weight parameter in the extended data expression by exploiting the necessary information not only from the well classified, but also from the weakly classified events. Experimental results show that the new algorithm outperforms the former Learn++ method without using the weight parameter.

Crop Yield Estimation Utilizing Feature Selection Based on Graph Classification (그래프 분류 기반 특징 선택을 활용한 작물 수확량 예측)

  • Ohnmar Khin;Sung-Keun Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1269-1276
    • /
    • 2023
  • Crop estimation is essential for the multinational meal and powerful demand due to its numerous aspects like soil, rain, climate, atmosphere, and their relations. The consequence of climate shift impacts the farming yield products. We operate the dataset with temperature, rainfall, humidity, etc. The current research focuses on feature selection with multifarious classifiers to assist farmers and agriculturalists. The crop yield estimation utilizing the feature selection approach is 96% accuracy. Feature selection affects a machine learning model's performance. Additionally, the performance of the current graph classifier accepts 81.5%. Eventually, the random forest regressor without feature selections owns 78% accuracy and the decision tree regressor without feature selections retains 67% accuracy. Our research merit is to reveal the experimental results of with and without feature selection significance for the proposed ten algorithms. These findings support learners and students in choosing the appropriate models for crop classification studies.

Classification of Very High Concerns HRCT Images using Extended Bayesian Networks (확장 베이지안망을 적용한 고위험성 HRCT 영상 분류)

  • Lim, Chae-Gyun;Jung, Yong-Gyu
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.2
    • /
    • pp.7-12
    • /
    • 2012
  • Recently the medical field to efficiently process the vast amounts of information to decision trees, neural networks, Bayesian Networks, including the application method of various data mining techniques are investigated. In addition, the basic personal information or patient history, family history, in addition to information such as MRI, HRCT images and additional information to collect and leverage in the diagnosis of disease, improved diagnostic accuracy is to promote a common status. But in real world situations that affect the results much because of the variable exists for a particular data mining techniques to obtain information through the enemy can be seen fairly limited. Medical images were taken as well as a minor can not give a positive impact on the diagnosis, but the proportion increased subjective judgments by the automated system is to deal with difficult issues. As a result of a complex reality, the situation is more advantageous to deal with the relative probability of the multivariate model based on Bayesian network, or TAN in the K2 search algorithm improves due to expansion model has been proposed. At this point, depending on the type of search algorithm applied significantly influenced the performance characteristics of the extended Bayesian network, the performance and suitability of each technique for evaluation of the facts is required. In this paper, we extend the Bayesian network for diagnosis of diseases using the same data were carried out, K2, TAN and changes in search algorithms such as classification accuracy was measured. In the 10-fold cross-validation experiment was performed to compare the performance evaluation based on the analysis and the onset of high-risk classification for patients with HRCT images could be possible to identify high-risk data.

Estimation of River Flow Data Using Machine Learning (머신러닝 기법을 이용한 유량 자료 생산 방법)

  • Kang, Noel;Lee, Ji Hun;Lee, Jung Hoon;Lee, Chungdae
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.261-261
    • /
    • 2020
  • 물관리의 기본이 되는 연속적인 유량 자료 확보를 위해서는 정확도 높은 수위-유량 관계 곡선식 개발이 필수적이다. 수위-유량 관계곡선식은 모든 수문시설 설계의 기초가 되며 홍수, 가뭄 등 물재해 대응을 위해서도 중요한 의미를 가지고 있다. 그러나 일반적으로 유량 측정은 많은 비용과 시간이 들고, 식생성장, 단면변화 등의 통제특성(control)이 변함에 따라 구간분리, 기간분리와 같은 비선형적인 양상이 나타나 자료 해석에 어려움이 존재한다. 특히, 국내 하천의 경우 자연적 및 인위적인 환경 변화가 다양하여 지점 및 기간에 따라 세밀한 분석이 요구된다. 머신러닝(Machine Learning)이란 데이터를 통해 컴퓨터가 스스로 학습하여 모델을 구축하고 성능을 향상시키는 일련의 과정을 뜻한다. 기존의 수위-유량 관계곡선식은 개발자의 판단에 의해 데이터의 종류와 기간 등을 설정하여 회귀식의 파라미터를 산출한다면, 머신러닝은 유효한 전체 데이터를 이용해 스스로 학습하여 자료 간 상관성을 찾아내 모델을 구축하고 성능을 지속적으로 향상 시킬 수 있다. 머신러닝은 충분한 수문자료가 확보되었다는 전제 하에 복잡하고 가변적인 수자원 환경을 반영하여 유량 추정의 정확도를 지속적으로 향상시킬 수 있다는 이점을 가지고 있다. 본 연구는 머신러닝의 대표적인 알고리즘들을 활용하여 유량을 추정하는 모델을 구축하고 성능을 비교·분석하였다. 대상지역은 안정적인 수량을 확보하고 있는 한강수계의 거운교 지점이며, 사용자료는 2010~2018년의 시간, 수위, 유량, 수면폭 등 이다. 프로그램은 파이썬을 기반으로 한 머신러닝 라이브러리인 사이킷런(sklearn)을 사용하였고 알고리즘은 랜덤포레스트 회귀, 의사결정트리, KNN(K-Nearest Neighbor), rgboost을 적용하였다. 학습(train) 데이터는 입력자료 종류별로 조합하여 6개의 세트로 구분하여 모델을 구축하였고, 이를 적용해 검증(test) 데이터를 RMSE(Roog Mean Square Error)로 평가하였다. 그 결과 모델 및 입력 자료의 조합에 따라 3.67~171.46로 다소 넓은 범위의 값이 도출되었다. 그 중 가장 우수한 유형은 수위, 연도, 수면폭 3개의 입력자료를 조합하여 랜덤포레스트 회귀 모델에 적용한 경우이다. 비교를 위해 동일한 검증 데이터를 한국수문조사연보(2018년) 내거운교 지점의 수위별 수위-유량 곡선식을 이용해 유량을 추정한 결과 RMSE가 3.76이 산출되어, 머신러닝이 세분화된 수위-유량 곡선식과 비슷한 수준까지 성능을 내는 것으로 확인되었다. 본 연구는 양질의 유량자료 생산을 위해 기 구축된 수문자료를 기반으로 머신러닝 기법의 적용 가능성을 검토한 기초 연구로써, 국내 효율적인 수문자료 측정 및 수위-유량 곡선 산출에 도움이 될 수 있을 것으로 판단된다. 향후 수자원 환경 및 통제특성에 영향을 미치는 다양한 영향변수를 파악하기 위해 기상자료, 취수량 등의 입력 자료를 적용할 필요가 있으며, 머신러닝 내 비지도학습인 딥러닝과 같은 보다 정교한 모델에 대한 추가적인 연구도 수행되어야 할 것이다.

  • PDF

Case Studies on Planning and Learning for Large-Scale CGFs with POMDPs through Counterfire and Mechanized Infantry Scenarios (대화력전 및 기계화 보병 시나리오를 통한 대규모 가상군의 POMDP 행동계획 및 학습 사례연구)

  • Lee, Jongmin;Hong, Jungpyo;Park, Jaeyoung;Lee, Kanghoon;Kim, Kee-Eung;Moon, Il-Chul;Park, Jae-Hyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.6
    • /
    • pp.343-349
    • /
    • 2017
  • Combat modeling and simulation (M&S) of large-scale computer generated forces (CGFs) enables the development of even the most sophisticated strategy of combat warfare and the efficient facilitation of a comprehensive simulation of the upcoming battle. The DEVS-POMDP framework is proposed where the DEVS framework describing the explicit behavior rules in military doctrines, and POMDP model describing the autonomous behavior of the CGFs are hierarchically combined to capture the complexity of realistic world combat modeling and simulation. However, it has previously been well documented that computing the optimal policy of a POMDP model is computationally demanding. In this paper, we show that not only can the performance of CGFs be improved by an efficient POMDP tree search algorithm but CGFs are also able to conveniently learn the behavior model of the enemy through case studies in the scenario of counterfire warfare and the scenario of a mechanized infantry brigade's offensive operations.

API Feature Based Ensemble Model for Malware Family Classification (악성코드 패밀리 분류를 위한 API 특징 기반 앙상블 모델 학습)

  • Lee, Hyunjong;Euh, Seongyul;Hwang, Doosung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.3
    • /
    • pp.531-539
    • /
    • 2019
  • This paper proposes the training features for malware family analysis and analyzes the multi-classification performance of ensemble models. We construct training data by extracting API and DLL information from malware executables and use Random Forest and XGBoost algorithms which are based on decision tree. API, API-DLL, and DLL-CM features for malware detection and family classification are proposed by analyzing frequently used API and DLL information from malware and converting high-dimensional features to low-dimensional features. The proposed feature selection method provides the advantages of data dimension reduction and fast learning. In performance comparison, the malware detection rate is 93.0% for Random Forest, the accuracy of malware family dataset is 92.0% for XGBoost, and the false positive rate of malware family dataset including benign is about 3.5% for Random Forest and XGBoost.

A Study on Analyzing Children's Crossing Behaviors on Non-signalized Crosswalk (비신호 횡단보도에서의 어린이 횡단행태 분석 연구)

  • Lee, Deok Whan;Lee, Yun Suk;Kim, Won Ho;Lee, Back Jin
    • Journal of Korean Society of Transportation
    • /
    • v.31 no.3
    • /
    • pp.19-32
    • /
    • 2013
  • The study aims to find the characteristics of children's crossing behavior on crosswalk in school zones. It considers accident occurrence and physical form of school zones. Seven elementary school zones were investigated. Using data collected by field observation and video recording, statistical analysis, CHAID algorithm analysis, and pattern analysis were performed. As a result, it was found that children's waiting, attention and distraction were related to the accident occurrence. While 69.1% children showed waiting-before-crossing behavior in low-accident occurrence crosswalk, 83.6% children showed non waiting-before-crossing behavior in high-accident occurrence crosswalk. Moreover, the ratio of waiting, attention behavior was found to be higher when the width of the crosswalk was wide and the distance from the school's entrance to the crosswalk was long. These research findings showed that children's behavior-oriented approach was required to improve safety in school zone.

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Kim, SungJin;Choi, NakJin;Lee, JunDong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.105-112
    • /
    • 2021
  • In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP's data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

Machine learning model for residual chlorine prediction in sediment basin to control pre-chlorination in water treatment plant (정수장 전염소 공정제어를 위한 침전지 잔류염소농도 예측 머신러닝 모형)

  • Kim, Juhwan;Lee, Kyunghyuk;Kim, Soojun;Kim, Kyunghun
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.spc1
    • /
    • pp.1283-1293
    • /
    • 2022
  • The purpose of this study is to predict residual chlorine in order to maintain stable residual chlorine concentration in sedimentation basin by using artificial intelligence algorithms in water treatment process employing pre-chlorination. Available water quantity and quality data are collected and analyzed statistically to apply into mathematical multiple regression and artificial intelligence models including multi-layer perceptron neural network, random forest, long short term memory (LSTM) algorithms. Water temperature, turbidity, pH, conductivity, flow rate, alkalinity and pre-chlorination dosage data are used as the input parameters to develop prediction models. As results, it is presented that the random forest algorithm shows the most moderate prediction result among four cases, which are long short term memory, multi-layer perceptron, multiple regression including random forest. Especially, it is result that the multiple regression model can not represent the residual chlorine with the input parameters which varies independently with seasonal change, numerical scale and dimension difference between quantity and quality. For this reason, random forest model is more appropriate for predict water qualities than other algorithms, which is classified into decision tree type algorithm. Also, it is expected that real time prediction by artificial intelligence models can play role of the stable operation of residual chlorine in water treatment plant including pre-chlorination process.