• Title/Summary/Keyword: tree classification

Search Result 938, Processing Time 0.024 seconds

WQI Class Prediction of Sihwa Lake Using Machine Learning-Based Models (기계학습 기반 모델을 활용한 시화호의 수질평가지수 등급 예측)

  • KIM, SOO BIN;LEE, JAE SEONG;KIM, KYUNG TAE
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.27 no.2
    • /
    • pp.71-86
    • /
    • 2022
  • The water quality index (WQI) has been widely used to evaluate marine water quality. The WQI in Korea is categorized into five classes by marine environmental standards. But, the WQI calculation on huge datasets is a very complex and time-consuming process. In this regard, the current study proposed machine learning (ML) based models to predict WQI class by using water quality datasets. Sihwa Lake, one of specially-managed coastal zone, was selected as a modeling site. In this study, adaptive boosting (AdaBoost) and tree-based pipeline optimization (TPOT) algorithms were used to train models and each model performance was evaluated by metrics (accuracy, precision, F1, and Log loss) on classification. Before training, the feature importance and sensitivity analysis were conducted to find out the best input combination for each algorithm. The results proved that the bottom dissolved oxygen (DOBot) was the most important variable affecting model performance. Conversely, surface dissolved inorganic nitrogen (DINSur) and dissolved inorganic phosphorus (DIPSur) had weaker effects on the prediction of WQI class. In addition, the performance varied over features including stations, seasons, and WQI classes by comparing spatio-temporal and class sensitivities of each best model. In conclusion, the modeling results showed that the TPOT algorithm has better performance rather than the AdaBoost algorithm without considering feature selection. Moreover, the WQI class for unknown water quality datasets could be surely predicted using the TPOT model trained with satisfactory training datasets.

Stock Price Direction Prediction Using Convolutional Neural Network: Emphasis on Correlation Feature Selection (합성곱 신경망을 이용한 주가방향 예측: 상관관계 속성선택 방법을 중심으로)

  • Kyun Sun Eo;Kun Chang Lee
    • Information Systems Review
    • /
    • v.22 no.4
    • /
    • pp.21-39
    • /
    • 2020
  • Recently, deep learning has shown high performance in various applications such as pattern analysis and image classification. Especially known as a difficult task in the field of machine learning research, stock market forecasting is an area where the effectiveness of deep learning techniques is being verified by many researchers. This study proposed a deep learning Convolutional Neural Network (CNN) model to predict the direction of stock prices. We then used the feature selection method to improve the performance of the model. We compared the performance of machine learning classifiers against CNN. The classifiers used in this study are as follows: Logistic Regression, Decision Tree, Neural Network, Support Vector Machine, Adaboost, Bagging, and Random Forest. The results of this study confirmed that the CNN showed higher performancecompared with other classifiers in the case of feature selection. The results show that the CNN model effectively predicted the stock price direction by analyzing the embedded values of the financial data

A Study on Korean Local Governments' Operation of Participatory Budgeting System : Classification by Support Vector Machine Technique (한국 지방자치단체의 주민참여예산제도 운영에 관한 연구 - Support Vector Machine 기법을 이용한 유형 구분)

  • Junhyun Han;Jaemin Ryou;Jayon Bae;Chunghyeok Im
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.461-466
    • /
    • 2024
  • Korean local governments operates the participatory budgeting system autonomously. This study is to classify these entities into clusters. Among the diverse machine learning methodologies(Neural Network, Rule Induction(CN2), KNN, Decision Tree, Random Forest, Gradient Boosting, SVM, Naïve Bayes), the Support Vector Machine technique emerged as the most efficacious in the analysis of 2022 Korean municipalities data. The first cluster C1 is characterized by minimal committee activity but a substantial allocation of participatory budgeting; another cluster C3 comprises cities that exhibit a passive stance. The majority of cities falls into the final cluster C2 which is noted for its proactive engagement in. Overall, most Korean local government operates the participatory busgeting system in good shape. Only a small number of cities is less active in this system. We anticipate that analyzing time-series data from the past decade in follow-up studies will further enhance the reliability of classifying local government types regarding participatory budgeting.

A study on automated soil moisture monitoring methods for the Korean peninsula based on Google Earth Engine (Google Earth Engine 기반의 한반도 토양수분 모니터링 자동화 기법 연구)

  • Jang, Wonjin;Chung, Jeehun;Lee, Yonggwan;Kim, Jinuk;Kim, Seongjoon
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.9
    • /
    • pp.615-626
    • /
    • 2024
  • To accurately and efficiently monitor soil moisture (SM) across South Korea, this study developed a SM estimation model that integrates the cloud computing platform Google Earth Engine (GEE) and Automated Machine Learning (AutoML). Various spatial information was utilized based on Terra MODIS (Moderate Resolution Imaging Spectroradiometer) and the global precipitation observation satellite GPM (Global Precipitation Measurement) to test optimal input data combinations. The results indicated that GPM-based accumulated dry-days, 5-day antecedent average precipitation, NDVI (Normalized Difference Vegetation Index), the sum of LST (Land Surface Temperature) acquired during nighttime and daytime, soil properties (sand and clay content, bulk density), terrain data (elevation and slope), and seasonal classification had high feature importance. After setting the objective function (Determination of coefficient, R2 ; Root Mean Square Error, RMSE; Mean Absolute Percent Error, MAPE) using AutoML for the combination of the aforementioned data, a comparative evaluation of machine learning techniques was conducted. The results revealed that tree-based models exhibited high performance, with Random Forest demonstrating the best performance (R2 : 0.72, RMSE: 2.70 vol%, MAPE: 0.14).

Assessment of Slope Failures Potential in Forest Roads using a Logistic Regression Model (로지스틱 회귀분석을 이용한 임도붕괴 위험도 평가)

  • Baek, Seung-An;Cho, Koo-Hyun;Hwang, Jin-Sung;Jung, Do-Hyun;Park, Jin-Woo;Choi, Byoungkoo;Cha, Du-Song
    • Journal of Korean Society of Forest Science
    • /
    • v.105 no.4
    • /
    • pp.429-434
    • /
    • 2016
  • Slope failures in forest roads often result in social and economic loss as well as environmental damage. This study was carried out to assess susceptibility of slope failures of forest roads in Hongcheon-gun, Gangwon-do where many slope failures occurred after heavy rainfall in 2013 using GIS and logistic regression analysis. The results showed that sandy soil (6.616) in soil texture type had the highest susceptibility to slope failures while medium class (-3.282) in tree diameter showed the lowest susceptibility. A error matrix for both slope failure and non-slope failure area was made and a model was developed showing a classification accuracy of 74.6%. Non-slope failures area in the forest roads were classified mostly in the range of >0.7 which was higher values than the classification criteria (0.5) used by the logistic regression model. It is suggested that considering forest environment and site factors related to forest road failures would improve the accuracy in predicting susceptibility of slope failures.

Shifting Cultivation and Environmental Problems of Nam Khane Watershed, Laos (라오스 남칸(Nam Khane)유역분지(流域盆地)의 이동식화전농업(移動式火田農業)과 환경문제(環境問題))

  • Jo, Myung-Hee;Jo, Hwa-Ryong
    • Journal of the Korean association of regional geographers
    • /
    • v.1 no.1
    • /
    • pp.93-101
    • /
    • 1995
  • Nam Khane watershed, in the Northern Laos, consists of limestone plateau surrounded with steep slope(above 1000m), wide piedmont hill land(300-700m) and narrow alluvial plain. Opium on the plateau and up-land rice on the hill-side are cultivated for each, but its shifting agricultural activity, which degrades the forest and soil, has caused the serious environmental problems. MOS-1 satellite image and 40 points of soil samples are analyzed to identify the distribution of the shifting cultivation and to evaluate the environmental problems for Nam Khane watershed. The land use classification map is presented on the photo 2, and the value of each land use area by elevation level and soil property are showed on the table 2 and 3, respectively. Excessive agricultural activity of shifting cultivation in the Nam Khane watershed not only decreased the forest area, but also changed the primary forest of tree into secondary woodland of shrub. On the phase of soil property, it accelerated the soil and gully erosion, and acidification. To solve these environmental problems, the most important step is to settle the agriculture from shifting cultivation to permanent cropping.

  • PDF

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

A Study on the Change of the Plant Community Structure for Five years in Puk′ansan National Park (북한산 국립공원 삼림군집구조의 5년간 변화 연구)

  • 최송현;이경재
    • Korean Journal of Environment and Ecology
    • /
    • v.7 no.1
    • /
    • pp.35-48
    • /
    • 1993
  • To compare ecological succession stage between 1987 and 1992, this study was executed in Mt. Puk'an. 26 sample plots of 500$m^2$ were set up. The results were summarized as follows; 1. To analysis plant community structure, the classification by TWINSPAN and CCA, DCA and RA ordination were applied to the study area. That of Mt. Puk'an was divided 4 groups by altitude. The dividing groups are Robinia pseudo-acacia-Quercus spp. community, mixed forest community, Q. serrata community, and Q. mongolica community. The successional trends of tree species over 500m seem to be from Pinus densiflora to Q. mongolica and below 500m in altitude seem to be from Robinia pseudo-acacia through Quercus acutissima, Q. mongolica, Prunus sargentii, Sorbus alnifolia to Q. serrata in the canopy layer. In the case of understory and shrub layer, the successional trends to be from Corylus sieboldiana, Zanthoxylum schinifolium through Rhus trichocarpa, Rhododendron mucronulatum, Rh. schlippenbachii to Acer pseudo-sieboldianum. 2. In comparing successional trends with 1987', the advanced data was not obtained in 1992. It was postulated that succession is not progressed by human disturbance and air pollution.

  • PDF

Forest Vegetation Classification and Species Composition of Mt. Ilwol, Yeongyang-Gun, Korea (일월산 산림식생의 종구성적 특성)

  • Lee Jung-Hyo;Bae Kwan-Ho;Cho Hyun-Je
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.8 no.3
    • /
    • pp.132-140
    • /
    • 2006
  • Forest vegetation classification and species composition of Mt. Ilwol, Yeongyang-Gun, Korea, were studied combining the Braun-Blanquet approach with numerical syntaxonomical analyses (TWINSPAN). Vegetation types and various ecological characteristics such as flora, constancy classes, species ratio of life-form, species diversity and importance value were analyzed. Sixty-eight samples were taken from a $100m^2$ square plot each. Forest communities were identified as two great types: arid landform of mountainside (AM) and humid fertility of piedmont and valley (HP). The former was divided into 3 communities (Rhododendron mucronulatum, Quercus variabilis, Hosta capitat community) and 2groups, and the latter into 3 communities (Tilia amurensis, Vitis coignetiae, Philadelphus schrenckii community) and 2 groups. Vegetation was classified into 8 units. Floristically, the most represented family was Compositae with 26 species. Species with percentage constance degree of more than 61% was Quercus mongolica (72.1%, IV); Carex siderosticat (III) and Fraxinus rhynchophylla (III) were 50.0 and 41.1%, respectively. Life-forms species ratios for trees, subtrees, shrub, vines, grominoids, forbs and ferns were 18.5, 5.7, 14.9, 6.6, 8.8, 42.4 and 3.1%, respectively, PH type showed from $1.70{\pm}0.50\;to\;1.97{\pm}0.57$ and AM type was from $1.40{\pm}0.18\;to\;1.62{\pm}0.20$ in species diversity; therefore, the former type showed higher species diversity than the latter, According to importance value analysis, Pinus densiflora, Quercus mongolica and Q. variabilis were higher in the tree layer, Q. mongolica in the subtree layer, Fraxinus sieboldiana, R. schlippenbachii, etc. in the shrub layer and Carex siderosticta, Carex humilis, etc. in the herb layer.

The Structure of Plant Community on Orimok, Yongsil and Donnaeko Area in Mt. Halla (한라산 어리목, 영실, 돈내코지역의 식물군집구조)

  • 이경재;류창희;최송현
    • Korean Journal of Environment and Ecology
    • /
    • v.6 no.1
    • /
    • pp.25-43
    • /
    • 1992
  • A survey of Orimok, Yongsil and Donnaeko area forest in Mt. Halla. was conducted using 71 sample plots of 500$m^2$ size. In the analysis of -actual vegetation, Carpinus tschonoskii and C. laxiflora community covered 53.7%, Quercus grosseserrata - Q. serrata community 25.8%, Pinus densiflora community 8.3%, Abies koreana community 4.5% and these communities covered 92.2% of Mt. Halla forest. The degree of 8, 9 and 10 in human disturbance of vegetation covered 64.5, 28.6 and 6.9% respectively. The classification by TWINSPAN and DCA ordination were applied to the study area in order to classify them into several groups based on woody plants and environmental variables. By two techniques, the plant community were divided into several groups by the aspect and altitude. The dividing groups are C. tschonoskii community, C. tschonoskii - Q. serrata community, P. densiflora - C. tschonoskii community, P. densiflora - C. laxiflora community, C. laxiflora community, C. laxifrora - Daphniphyllum macropodum - Eurya japonica community, P. densiflora community. The successional trends of tree species by both techniques seem to be from P. densiflora. Sorbus alnifolia through Q. serrata. Maackia faurier to C. tsihonoskii in Orimok and Yongsil area and from P. densifiora to C. laxiflora in Donnaeko area. There was no difference between the stand scores of. DCA and environmental variables.

  • PDF