• Title/Summary/Keyword: Decision forest

Search Result 429, Processing Time 0.025 seconds

SSP Climate Change Scenarios with 1km Resolution Over Korean Peninsula for Agricultural Uses (농업분야 활용을 위한 한반도 1km 격자형 SSP 기후변화 시나리오)

  • Jina Hur;Jae-Pil Cho;Sera Jo;Kyo-Moon Shim;Yong-Seok Kim;Min-Gu Kang;Chan-Sung Oh;Seung-Beom Seo;Eung-Sup Kim
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.26 no.1
    • /
    • pp.1-30
    • /
    • 2024
  • The international community adopts the SSP (Shared Socioeconomic Pathways) scenario as a new greenhouse gas emission pathway. As part of efforts to reflect these international trends and support for climate change adaptation measure in the agricultural sector, the National Institute of Agricultural Sciences (NAS) produced high-resolution (1 km) climate change scenarios for the Korean Peninsula based on SSP scenarios, certified as a "National Climate Change Standard Scenario" in 2022. This paper introduces SSP climate change scenario of the NAS and shows the results of the climate change projections. In order to produce future climate change scenarios, global climate data produced from 18 GCM models participating in CMIP6 were collected for the past (1985-2014) and future (2015-2100) periods, and were statistically downscaled for the Korean Peninsula using the digital climate maps with 1km resolution and the SQM method. In the end of the 21st century (2071-2100), the average annual maximum/minimum temperature of the Korean Peninsula is projected to increase by 2.6~6.1℃/2.5~6.3℃ and annual precipitation by 21.5~38.7% depending on scenarios. The increases in temperature and precipitation under the low-carbon scenario were smaller than those under high-carbon scenario. It is projected that the average wind speed and solar radiation over the analysis region will not change significantly in the end of the 21st century compared to the present. This data is expected to contribute to understanding future uncertainties due to climate change and contributing to rational decision-making for climate change adaptation.

Predicting the Effects of Agriculture Non-point Sources Best Management Practices (BMPs) on the Stream Water Quality using HSPF (HSPF를 이용한 농업비점오염원 최적관리방안에 따른 수질개선효과 예측)

  • Kyoung-Seok Lee;Dong Hoon Lee;Youngmi Ahn;Joo-Hyon Kang
    • Journal of Wetlands Research
    • /
    • v.25 no.2
    • /
    • pp.99-110
    • /
    • 2023
  • Non-point source (NP) pollutants in an agricultural landuse are discharged from a large area compared to those in other land uses, and thus effective source control measures are needed. To develop appropriate control measures, it is necessary to quantify discharge load of each source and evaluate the degree of water quality improvement by implementing different options of the control measures. This study used Hydrological Simulation Program-FORTRAN (HSPF) to quantify pollutant discharge loads from different sources and effects of different control measures on water quality improvements, thereby supporting decision making in developing appropirate pollutant control strategies. The study area is the Gyeseong river watershed in Changnyeong county, Gyeongsangnam-do, with agricultural areas occupying the largest proportion (26.13%) of the total area except for the forest area. The main pollutant sources include chemical and liquid fertilizers for agricultural activities, and manure produced from small scale livestock facilities and applied to agriculture lands or stacked near the facilities. Source loads of chemical fertilizers, liquid fertilizers and livestock manure of small scale livestock facilities, and point sources such as municipal wastewater treatment plants (WWTPs), community WWTPs, private sewage treament plants were considered in the HSPF model setup. Especially, NITR and PHOS modules were used to simulate detailed fate and transport processes including vegitation uptake, nutrient deposition, adsorption/desorption, and loss by deep percolation. The HSPF model was calibrated and validated based on the observed data from 2015 to 2020 at the outlet of the watershed. The calibrated model showed reasonably good performance in simulating the flow and water quality. Five Pollutants control scenarios were established from three sectors: agriculture pollution management (drainge outlet control, and replacement of controlled release fertilizers), livestock pollution management (liquid fertilizer reduction, and 'manure management of small scale livestock facilities) and private STP management. Each pollutant control measure was further divided into short-term, mid-term, and long-term scenarios based on the potential achievement period. The simulation results showed that the most effective control measure is the replacement of controlled release fertilizers followed by the drainge outlet control and the manure management of small scale livestock facilities. Furthermore, the simulation showed that application of all the control measures in the entire watershed can decrease the annual TN and TP loads at the outlet by 40.6% and 41.1%, respectively, and the annual average concentrations of TN and TP at the outlet by 35.1% and 29.2%, respectively. This study supports decision makers in priotizing different pollutant control measures based on their predicted performance on the water quality improvements in an agriculturally dominated watershed.

Development of disaster severity classification model using machine learning technique (머신러닝 기법을 이용한 재해강도 분류모형 개발)

  • Lee, Seungmin;Baek, Seonuk;Lee, Junhak;Kim, Kyungtak;Kim, Soojun;Kim, Hung Soo
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.4
    • /
    • pp.261-272
    • /
    • 2023
  • In recent years, natural disasters such as heavy rainfall and typhoons have occurred more frequently, and their severity has increased due to climate change. The Korea Meteorological Administration (KMA) currently uses the same criteria for all regions in Korea for watch and warning based on the maximum cumulative rainfall with durations of 3-hour and 12-hour to reduce damage. However, KMA's criteria do not consider the regional characteristics of damages caused by heavy rainfall and typhoon events. In this regard, it is necessary to develop new criteria considering regional characteristics of damage and cumulative rainfalls in durations, establishing four stages: blue, yellow, orange, and red. A classification model, called DSCM (Disaster Severity Classification Model), for the four-stage disaster severity was developed using four machine learning models (Decision Tree, Support Vector Machine, Random Forest, and XGBoost). This study applied DSCM to local governments of Seoul, Incheon, and Gyeonggi Province province. To develop DSCM, we used data on rainfall, cumulative rainfall, maximum rainfalls for durations of 3-hour and 12-hour, and antecedent rainfall as independent variables, and a 4-class damage scale for heavy rain damage and typhoon damage for each local government as dependent variables. As a result, the Decision Tree model had the highest accuracy with an F1-Score of 0.56. We believe that this developed DSCM can help identify disaster risk at each stage and contribute to reducing damage through efficient disaster management for local governments based on specific events.

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

  • Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.23-45
    • /
    • 2020
  • Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.

The Economic Feasibility Analysis of Crop Cultivation Practice Project in Pirganj and Kurigram Districts, Bangladesh (작물재배기술의 경제적 타당성 분석 : 방글라데시 피르간즈군과 쿠리그람군 사례)

  • Tabassum, Nazia;Lim, Jae-Hwan;Gim, Uhn-Soon
    • Korean Journal of Agricultural Science
    • /
    • v.35 no.1
    • /
    • pp.85-100
    • /
    • 2008
  • The United States Department of Agriculture (USDA) funded collaborative project on The Economic Feasibility Analysis of Crop Cultivation Practice Project in Pirganj and Kurigram Districts in Bangladesh will started during 2008-2012, for 4 years with total project cost of US$ 571,270. The project will be implemented in 6 villages; has 1,097 hectares areas which is divided into 948 hectares of agricultural land, 52 hectares of forest land and 345 hectares of other land, covered 1,059 households equal to 5,305 persons in Pirganj and Kurigram districts The project has proposed to be implemented in joint collaboration by Bangladesh Agricultural Research Council (BARC), Bangladesh Agricultural Research Institute (BARI) and Rangpur Dinajpur Rural Service (RDRS) Bangladesh with full participation of the farmers' groups of respective project site. The specific objectives of the project are: (1) to estimate the productivity of paddy, wheat, maize, tobacco and sugarcane (2) to determine the cost of production and returns to the above mentioned crops (3) to study the interrelationship between inputs and output of the above mentioned crops and (4) to examine the resource utilization patterns at farm level. In this project analysis, the net incremental profit is US$33,028. The expected incremental project benefit and incremented production cost are estimated as US$ 219,959 and US$ 186,931 respectively. The financial decision making criteria would be followed in this crop cultivation practice project. After the project implementation, the expected project benefits are assumed to be continued for 15 years. The benefit cost ratio (B/C) of the project is estimated at 1.077 (table 11) when using discount rate of 10% as an opportunity cost of capital in Bangladesh. FIRR of project is estimated at 26.15% which is bigger than the opportunity cost by more than double. So this project is financially feasible and acceptable. Therefore, this project should be extended to other areas to increase the farm income and economic growth of marginal poor farmers in Bangladesh.

  • PDF

Development of 1ST-Model for 1 hour-heavy rain damage scale prediction based on AI models (1시간 호우피해 규모 예측을 위한 AI 기반의 1ST-모형 개발)

  • Lee, Joonhak;Lee, Haneul;Kang, Narae;Hwang, Seokhwan;Kim, Hung Soo;Kim, Soojun
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.5
    • /
    • pp.311-323
    • /
    • 2023
  • In order to reduce disaster damage by localized heavy rains, floods, and urban inundation, it is important to know in advance whether natural disasters occur. Currently, heavy rain watch and heavy rain warning by the criteria of the Korea Meteorological Administration are being issued in Korea. However, since this one criterion is applied to the whole country, we can not clearly recognize heavy rain damage for a specific region in advance. Therefore, in this paper, we tried to reset the current criteria for a special weather report which considers the regional characteristics and to predict the damage caused by rainfall after 1 hour. The study area was selected as Gyeonggi-province, where has more frequent heavy rain damage than other regions. Then, the rainfall inducing disaster or hazard-triggering rainfall was set by utilizing hourly rainfall and heavy rain damage data, considering the local characteristics. The heavy rain damage prediction model was developed by a decision tree model and a random forest model, which are machine learning technique and by rainfall inducing disaster and rainfall data. In addition, long short-term memory and deep neural network models were used for predicting rainfall after 1 hour. The predicted rainfall by a developed prediction model was applied to the trained classification model and we predicted whether the rain damage after 1 hour will be occurred or not and we called this as 1ST-Model. The 1ST-Model can be used for preventing and preparing heavy rain disaster and it is judged to be of great contribution in reducing damage caused by heavy rain.

Steel Plate Faults Diagnosis with S-MTS (S-MTS를 이용한 강판의 표면 결함 진단)

  • Kim, Joon-Young;Cha, Jae-Min;Shin, Junguk;Yeom, Choongsub
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.47-67
    • /
    • 2017
  • Steel plate faults is one of important factors to affect the quality and price of the steel plates. So far many steelmakers generally have used visual inspection method that could be based on an inspector's intuition or experience. Specifically, the inspector checks the steel plate faults by looking the surface of the steel plates. However, the accuracy of this method is critically low that it can cause errors above 30% in judgment. Therefore, accurate steel plate faults diagnosis system has been continuously required in the industry. In order to meet the needs, this study proposed a new steel plate faults diagnosis system using Simultaneous MTS (S-MTS), which is an advanced Mahalanobis Taguchi System (MTS) algorithm, to classify various surface defects of the steel plates. MTS has generally been used to solve binary classification problems in various fields, but MTS was not used for multiclass classification due to its low accuracy. The reason is that only one mahalanobis space is established in the MTS. In contrast, S-MTS is suitable for multi-class classification. That is, S-MTS establishes individual mahalanobis space for each class. 'Simultaneous' implies comparing mahalanobis distances at the same time. The proposed steel plate faults diagnosis system was developed in four main stages. In the first stage, after various reference groups and related variables are defined, data of the steel plate faults is collected and used to establish the individual mahalanobis space per the reference groups and construct the full measurement scale. In the second stage, the mahalanobis distances of test groups is calculated based on the established mahalanobis spaces of the reference groups. Then, appropriateness of the spaces is verified by examining the separability of the mahalanobis diatances. In the third stage, orthogonal arrays and Signal-to-Noise (SN) ratio of dynamic type are applied for variable optimization. Also, Overall SN ratio gain is derived from the SN ratio and SN ratio gain. If the derived overall SN ratio gain is negative, it means that the variable should be removed. However, the variable with the positive gain may be considered as worth keeping. Finally, in the fourth stage, the measurement scale that is composed of selected useful variables is reconstructed. Next, an experimental test should be implemented to verify the ability of multi-class classification and thus the accuracy of the classification is acquired. If the accuracy is acceptable, this diagnosis system can be used for future applications. Also, this study compared the accuracy of the proposed steel plate faults diagnosis system with that of other popular classification algorithms including Decision Tree, Multi Perception Neural Network (MLPNN), Logistic Regression (LR), Support Vector Machine (SVM), Tree Bagger Random Forest, Grid Search (GS), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The steel plates faults dataset used in the study is taken from the University of California at Irvine (UCI) machine learning repository. As a result, the proposed steel plate faults diagnosis system based on S-MTS shows 90.79% of classification accuracy. The accuracy of the proposed diagnosis system is 6-27% higher than MLPNN, LR, GS, GA and PSO. Based on the fact that the accuracy of commercial systems is only about 75-80%, it means that the proposed system has enough classification performance to be applied in the industry. In addition, the proposed system can reduce the number of measurement sensors that are installed in the fields because of variable optimization process. These results show that the proposed system not only can have a good ability on the steel plate faults diagnosis but also reduce operation and maintenance cost. For our future work, it will be applied in the fields to validate actual effectiveness of the proposed system and plan to improve the accuracy based on the results.

Suggestion of Urban Regeneration Type Recommendation System Based on Local Characteristics Using Text Mining (텍스트 마이닝을 활용한 지역 특성 기반 도시재생 유형 추천 시스템 제안)

  • Kim, Ikjun;Lee, Junho;Kim, Hyomin;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.149-169
    • /
    • 2020
  • "The Urban Renewal New Deal project", one of the government's major national projects, is about developing underdeveloped areas by investing 50 trillion won in 100 locations on the first year and 500 over the next four years. This project is drawing keen attention from the media and local governments. However, the project model which fails to reflect the original characteristics of the area as it divides project area into five categories: "Our Neighborhood Restoration, Housing Maintenance Support Type, General Neighborhood Type, Central Urban Type, and Economic Base Type," According to keywords for successful urban regeneration in Korea, "resident participation," "regional specialization," "ministerial cooperation" and "public-private cooperation", when local governments propose urban regeneration projects to the government, they can see that it is most important to accurately understand the characteristics of the city and push ahead with the projects in a way that suits the characteristics of the city with the help of local residents and private companies. In addition, considering the gentrification problem, which is one of the side effects of urban regeneration projects, it is important to select and implement urban regeneration types suitable for the characteristics of the area. In order to supplement the limitations of the 'Urban Regeneration New Deal Project' methodology, this study aims to propose a system that recommends urban regeneration types suitable for urban regeneration sites by utilizing various machine learning algorithms, referring to the urban regeneration types of the '2025 Seoul Metropolitan Government Urban Regeneration Strategy Plan' promoted based on regional characteristics. There are four types of urban regeneration in Seoul: "Low-use Low-Level Development, Abandonment, Deteriorated Housing, and Specialization of Historical and Cultural Resources" (Shon and Park, 2017). In order to identify regional characteristics, approximately 100,000 text data were collected for 22 regions where the project was carried out for a total of four types of urban regeneration. Using the collected data, we drew key keywords for each region according to the type of urban regeneration and conducted topic modeling to explore whether there were differences between types. As a result, it was confirmed that a number of topics related to real estate and economy appeared in old residential areas, and in the case of declining and underdeveloped areas, topics reflecting the characteristics of areas where industrial activities were active in the past appeared. In the case of the historical and cultural resource area, since it is an area that contains traces of the past, many keywords related to the government appeared. Therefore, it was possible to confirm political topics and cultural topics resulting from various events. Finally, in the case of low-use and under-developed areas, many topics on real estate and accessibility are emerging, so accessibility is good. It mainly had the characteristics of a region where development is planned or is likely to be developed. Furthermore, a model was implemented that proposes urban regeneration types tailored to regional characteristics for regions other than Seoul. Machine learning technology was used to implement the model, and training data and test data were randomly extracted at an 8:2 ratio and used. In order to compare the performance between various models, the input variables are set in two ways: Count Vector and TF-IDF Vector, and as Classifier, there are 5 types of SVM (Support Vector Machine), Decision Tree, Random Forest, Logistic Regression, and Gradient Boosting. By applying it, performance comparison for a total of 10 models was conducted. The model with the highest performance was the Gradient Boosting method using TF-IDF Vector input data, and the accuracy was 97%. Therefore, the recommendation system proposed in this study is expected to recommend urban regeneration types based on the regional characteristics of new business sites in the process of carrying out urban regeneration projects."

A Study on Searching for Export Candidate Countries of the Korean Food and Beverage Industry Using Node2vec Graph Embedding and Light GBM Link Prediction (Node2vec 그래프 임베딩과 Light GBM 링크 예측을 활용한 식음료 산업의 수출 후보국가 탐색 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Seo, Jinny
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.73-95
    • /
    • 2021
  • This study uses Node2vec graph embedding method and Light GBM link prediction to explore undeveloped export candidate countries in Korea's food and beverage industry. Node2vec is the method that improves the limit of the structural equivalence representation of the network, which is known to be relatively weak compared to the existing link prediction method based on the number of common neighbors of the network. Therefore, the method is known to show excellent performance in both community detection and structural equivalence of the network. The vector value obtained by embedding the network in this way operates under the condition of a constant length from an arbitrarily designated starting point node. Therefore, it has the advantage that it is easy to apply the sequence of nodes as an input value to the model for downstream tasks such as Logistic Regression, Support Vector Machine, and Random Forest. Based on these features of the Node2vec graph embedding method, this study applied the above method to the international trade information of the Korean food and beverage industry. Through this, we intend to contribute to creating the effect of extensive margin diversification in Korea in the global value chain relationship of the industry. The optimal predictive model derived from the results of this study recorded a precision of 0.95 and a recall of 0.79, and an F1 score of 0.86, showing excellent performance. This performance was shown to be superior to that of the binary classifier based on Logistic Regression set as the baseline model. In the baseline model, a precision of 0.95 and a recall of 0.73 were recorded, and an F1 score of 0.83 was recorded. In addition, the light GBM-based optimal prediction model derived from this study showed superior performance than the link prediction model of previous studies, which is set as a benchmarking model in this study. The predictive model of the previous study recorded only a recall rate of 0.75, but the proposed model of this study showed better performance which recall rate is 0.79. The difference in the performance of the prediction results between benchmarking model and this study model is due to the model learning strategy. In this study, groups were classified by the trade value scale, and prediction models were trained differently for these groups. Specific methods are (1) a method of randomly masking and learning a model for all trades without setting specific conditions for trade value, (2) arbitrarily masking a part of the trades with an average trade value or higher and using the model method, and (3) a method of arbitrarily masking some of the trades with the top 25% or higher trade value and learning the model. As a result of the experiment, it was confirmed that the performance of the model trained by randomly masking some of the trades with the above-average trade value in this method was the best and appeared stably. It was found that most of the results of potential export candidates for Korea derived through the above model appeared appropriate through additional investigation. Combining the above, this study could suggest the practical utility of the link prediction method applying Node2vec and Light GBM. In addition, useful implications could be derived for weight update strategies that can perform better link prediction while training the model. On the other hand, this study also has policy utility because it is applied to trade transactions that have not been performed much in the research related to link prediction based on graph embedding. The results of this study support a rapid response to changes in the global value chain such as the recent US-China trade conflict or Japan's export regulations, and I think that it has sufficient usefulness as a tool for policy decision-making.