• Title/Summary/Keyword: tree classification method

Search Result 361, Processing Time 0.023 seconds

Study on the ensemble methods with kernel ridge regression

  • Kim, Sun-Hwa;Cho, Dae-Hyeon;Seok, Kyung-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.375-383
    • /
    • 2012
  • The purpose of the ensemble methods is to increase the accuracy of prediction through combining many classifiers. According to recent studies, it is proved that random forests and forward stagewise regression have good accuracies in classification problems. However they have great prediction error in separation boundary points because they used decision tree as a base learner. In this study, we use the kernel ridge regression instead of the decision trees in random forests and boosting. The usefulness of our proposed ensemble methods was shown by the simulation results of the prostate cancer and the Boston housing data.

Rich Transcription Generation Using Automatic Insertion of Punctuation Marks (자동 구두점 삽입을 이용한 Rich Transcription 생성)

  • Kim, Ji-Hwan
    • MALSORI
    • /
    • no.61
    • /
    • pp.87-100
    • /
    • 2007
  • A punctuation generation system which combines prosodic information with acoustic and language model information is presented. Experiments have been conducted first for the reference text transcriptions. In these experiments, prosodic information was shown to be more useful than language model information. When these information sources are combined, an F-measure of up to 0.7830 was obtained for adding punctuation to a reference transcription. This method of punctuation generation can also be applied to the 1-best output of a speech recogniser. The 1-best output is first time aligned. Based on the time alignment information, prosodic features are generated. As in the approach applied in the punctuation generation for reference transcriptions, the best sequence of punctuation marks for this 1-best output is found using the prosodic feature model and an language model trained on texts which contain punctuation marks.

  • PDF

Defect structure classification of neutron-irradiated graphite using supervised machine learning

  • Kim, Jiho;Kim, Geon;Heo, Gyunyoung;Chang, Kunok
    • Nuclear Engineering and Technology
    • /
    • v.54 no.8
    • /
    • pp.2783-2791
    • /
    • 2022
  • Molecular dynamics simulations were performed to predict the behavior of graphite atoms under neutron irradiation using large-scale atomic/molecular massively parallel simulator (LAMMPS) package with adaptive intermolecular reactive empirical bond order (AIREBOM) potential. Defect structures of graphite were compared with results from previous studies by means of density functional theory (DFT) calculations. The quantitative relation between primary knock-on atom (PKA) energy and irradiation damage on graphite was calculated. and the effect of PKA direction on the amount of defects is estimated by counting displaced atoms. Defects are classified into four groups: structural defects, energy defects, vacancies, and near-defect structures, where a structural defect is further subdivided into six types by decision tree method which is one of the supervised machine learning techniques.

Classification for Imbalanced Breast Cancer Dataset Using Resampling Methods

  • Hana Babiker, Nassar
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.89-95
    • /
    • 2023
  • Analyzing breast cancer patient files is becoming an exciting area of medical information analysis, especially with the increasing number of patient files. In this paper, breast cancer data is collected from Khartoum state hospital, and the dataset is classified into recurrence and no recurrence. The data is imbalanced, meaning that one of the two classes have more sample than the other. Many pre-processing techniques are applied to classify this imbalanced data, resampling, attribute selection, and handling missing values, and then different classifiers models are built. In the first experiment, five classifiers (ANN, REP TREE, SVM, and J48) are used, and in the second experiment, meta-learning algorithms (Bagging, Boosting, and Random subspace). Finally, the ensemble model is used. The best result was obtained from the ensemble model (Boosting with J48) with the highest accuracy 95.2797% among all the algorithms, followed by Bagging with J48(90.559%) and random subspace with J48(84.2657%). The breast cancer imbalanced dataset was classified into recurrence, and no recurrence with different classified algorithms and the best result was obtained from the ensemble model.

Analysis on the Forest Community of Daewon Vally in Mt. Chiri by the Classification and Ordination Techniques (Classification 및 Ordination 방법에 의한 지리산 대원계곡의 삼림군집구조 분석)

  • 이경재;구관효;최재식;조현서
    • Korean Journal of Environment and Ecology
    • /
    • v.5 no.1
    • /
    • pp.54-67
    • /
    • 1991
  • To investigate the structure of the plant community of Daewon valley forest in Mt. Chiri, eighty-nine plots were set up by the dumped sampling method. The classification by TWINSPAN and DCA ordination were applied to the study area in order to classify item into several groups based on woody plants and environmental variables. The classification had been successfully overlayed on an ordination of the same data using DCA. The plots can be classified into five groups by TWINSPAN and DCA. There are Pinus densiflora community. Quercus variabilis-Q. serrata community. Carpinus laxiflora community. Q. monogolica community and Cornus controversa-Q. mongolica community. The successional trends of tree species by both techniques seem to be from P. densiflora through Q. variabilis, Q. serrata to C. laxiflora on the low altitude and from Q. mongolica to C. controversa on the high altitude in the canopy layer. As a result of the analysis for the relationship between the stand scores of DCA and environmental variables. they had a tendancy to increase significantly from the P. densiflora community to C. laxiflora community that was soil moisture. the amount of soil humus and soil nutrients.

  • PDF

Detection of Depression Trends in Literary Cyber Writers Using Sentiment Analysis and Machine Learning

  • Faiza Nasir;Haseeb Ahmad;CM Nadeem Faisal;Qaisar Abbas;Mubarak Albathan;Ayyaz Hussain
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.3
    • /
    • pp.67-80
    • /
    • 2023
  • Rice is an important food crop for most of the population in Nowadays, psychologists consider social media an important tool to examine mental disorders. Among these disorders, depression is one of the most common yet least cured disease Since abundant of writers having extensive followers express their feelings on social media and depression is significantly increasing, thus, exploring the literary text shared on social media may provide multidimensional features of depressive behaviors: (1) Background: Several studies observed that depressive data contains certain language styles and self-expressing pronouns, but current study provides the evidence that posts appearing with self-expressing pronouns and depressive language styles contain high emotional temperatures. Therefore, the main objective of this study is to examine the literary cyber writers' posts for discovering the symptomatic signs of depression. For this purpose, our research emphases on extracting the data from writers' public social media pages, blogs, and communities; (3) Results: To examine the emotional temperatures and sentences usage between depressive and not depressive groups, we employed the SentiStrength algorithm as a psycholinguistic method, TF-IDF and N-Gram for ranked phrases extraction, and Latent Dirichlet Allocation for topic modelling of the extracted phrases. The results unearth the strong connection between depression and negative emotional temperatures in writer's posts. Moreover, we used Naïve Bayes, Support Vector Machines, Random Forest, and Decision Tree algorithms to validate the classification of depressive and not depressive in terms of sentences, phrases and topics. The results reveal that comparing with others, Support Vectors Machines algorithm validates the classification while attaining highest 79% f-score; (4) Conclusions: Experimental results show that the proposed system outperformed for detection of depression trends in literary cyber writers using sentiment analysis.

Analysis of Feature Importance of Ship's Berthing Velocity Using Classification Algorithms of Machine Learning (머신러닝 분류 알고리즘을 활용한 선박 접안속도 영향요소의 중요도 분석)

  • Lee, Hyeong-Tak;Lee, Sang-Won;Cho, Jang-Won;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.2
    • /
    • pp.139-148
    • /
    • 2020
  • The most important factor affecting the berthing energy generated when a ship berths is the berthing velocity. Thus, an accident may occur if the berthing velocity is extremely high. Several ship features influence the determination of the berthing velocity. However, previous studies have mostly focused on the size of the vessel. Therefore, the aim of this study is to analyze various features that influence berthing velocity and determine their respective importance. The data used in the analysis was based on the berthing velocity of a ship on a jetty in Korea. Using the collected data, machine learning classification algorithms were compared and analyzed, such as decision tree, random forest, logistic regression, and perceptron. As an algorithm evaluation method, indexes according to the confusion matrix were used. Consequently, perceptron demonstrated the best performance, and the feature importance was in the following order: DWT, jetty number, and state. Hence, when berthing a ship, the berthing velocity should be determined in consideration of various features, such as the size of the ship, position of the jetty, and loading condition of the cargo.

Spatial Distribution of Major Soil Types in Korea and an Assessment of Soil Predictability Using Soil Forming Factors (한국 주요 토양유형의 공간적 분포와 토양형성요인을 이용한 예측가능성 평가)

  • Park, Soo-Jin;Sonn, Yeon-Kyu;Hong, Suk-Young;Park, Chan-Won;Zhang, Yong-Seon
    • Journal of the Korean Geographical Society
    • /
    • v.45 no.1
    • /
    • pp.95-118
    • /
    • 2010
  • This study aims to investigate the spatial distribution of major soil types in Korea, and to assess the ability to predict soil distribution using environmental variables. A classification tree method was used to assess soil predictability. While the great soil groups can give more intuitive understandings on their spatial distributions, its predictability using environmental factors is much lower than that of the great groups. The most important factor to determine the spatial distribution of major soil types is the geomorphological characteristic of Korea that shows distinctive morphological difference between mountains and plains. Spatial distribution of climatic variables and catenary soil sequence along slopes play additional roles in determining the distribution of soil types. The classification tree models resulted in 35-75% of prediction accuracy, depends on the combination of different environmental variables brought in the models. While geomorphological variables are the best predictors for the great groups, climatic variables perform better for the great soil groups.

Machine Learning Based Structural Health Monitoring System using Classification and NCA (분류 알고리즘과 NCA를 활용한 기계학습 기반 구조건전성 모니터링 시스템)

  • Shin, Changkyo;Kwon, Hyunseok;Park, Yurim;Kim, Chun-Gon
    • Journal of Advanced Navigation Technology
    • /
    • v.23 no.1
    • /
    • pp.84-89
    • /
    • 2019
  • This is a pilot study of machine learning based structural health monitoring system using flight data of composite aircraft. In this study, the most suitable machine learning algorithm for structural health monitoring was selected and dimensionality reduction method for application on the actual flight data was conducted. For these tasks, impact test on the cantilever beam with added mass, which is the simulation of damage in the aircraft wing structure was conducted and classification model for damage states (damage location and level) was trained. Through vibration test of cantilever beam with fiber bragg grating (FBG) sensor, data of normal and 12 damaged states were acquired, and the most suitable algorithm was selected through comparison between algorithms like tree, discriminant, support vector machine (SVM), kNN, ensemble. Besides, through neighborhood component analysis (NCA) feature selection, dimensionality reduction which is necessary to deal with high dimensional flight data was conducted. As a result, quadratic SVMs performed best with 98.7% for without NCA and 95.9% for with NCA. It is also shown that the application of NCA improved prediction speed, training time, and model memory.

Landslide Susceptibility Analysis in Jeju Using Artificial Neural Network(ANN) and GIS (인공신경망기법과 GIS를 이용한 제주도 산사태 취약성분석)

  • Quan, He-Chun;Lee, Byung-Gul;Cho, Eun-Il
    • Journal of Environmental Science International
    • /
    • v.17 no.6
    • /
    • pp.679-687
    • /
    • 2008
  • In this study, we implemented landslide distribution of Jeju Island using ANN and GIS, respectively. To do this, we first get the counter line from 1:2,5000 digital map and use this counter line to make the DEM. for the evaluate the land slide susceptibility. Next, we abstracted slop map and aspect map from the DEM and get the land use map using ISODATA classification method from Landsat 7 images. In the computation processes of landslide analysis, we make the class to the soil map, tree diameter map, Isohyet map, geological map and so on. Finally, we applied the ANN method to the landslide one and calculated its weighted values. GIS results can be calculated by using Acrview program and produced Jeju landslide susceptibility map by usign Weighted Overlay method. Based on our results, we found the relatively weak points of landslide ware concentrated to the top of Halla mountains.