• Title/Summary/Keyword: tree classification

Search Result 926, Processing Time 0.031 seconds

Predicting Corporate Bankruptcy using Simulated Annealing-based Random Fores (시뮬레이티드 어니일링 기반의 랜덤 포레스트를 이용한 기업부도예측)

  • Park, Hoyeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.155-170
    • /
    • 2018
  • Predicting a company's financial bankruptcy is traditionally one of the most crucial forecasting problems in business analytics. In previous studies, prediction models have been proposed by applying or combining statistical and machine learning-based techniques. In this paper, we propose a novel intelligent prediction model based on the simulated annealing which is one of the well-known optimization techniques. The simulated annealing is known to have comparable optimization performance to the genetic algorithms. Nevertheless, since there has been little research on the prediction and classification of business decision-making problems using the simulated annealing, it is meaningful to confirm the usefulness of the proposed model in business analytics. In this study, we use the combined model of simulated annealing and machine learning to select the input features of the bankruptcy prediction model. Typical types of combining optimization and machine learning techniques are feature selection, feature weighting, and instance selection. This study proposes a combining model for feature selection, which has been studied the most. In order to confirm the superiority of the proposed model in this study, we apply the real-world financial data of the Korean companies and analyze the results. The results show that the predictive accuracy of the proposed model is better than that of the naïve model. Notably, the performance is significantly improved as compared with the traditional decision tree, random forests, artificial neural network, SVM, and logistic regression analysis.

Naming and Object Specifying of Dangsan Forests and Bibo Forests Designated as Natural Monument (천연기념물 지정 당산숲·비보숲의 명칭 부여 및 지정 물량 실태 고찰)

  • Choi, Jai Ung;Kim, Dong Yeob
    • Korean Journal of Heritage: History & Science
    • /
    • v.43 no.1
    • /
    • pp.28-55
    • /
    • 2010
  • Currently, the natural monument system of Korea for naming and designation of natural monuments is based on "Chosun Natural Monument Conservation Acts for Treasure, Ancient Landmark, and Natural Beauty" enacted in 1934 during Japanese colonization period. The framework of natural monument system is still in effect, which is pointed out as a problem. The Dangsan forests and Bibo forests are Korean traditional cultural resources representing countryside of Korea. Cultural Heritage Administration follows and relies on the 'Limsu of Chosun' (1938), a report written by a Japanese, for naming and classification of natural monuments. A Dangsan forest at Yesong-ri was named "Yesong-ri evergreen forest" in 1938. They followed the naming system of "evergeen forest" until today. The objective of this study is to review the issues and problems of 'Limsu of Chosun' and natural monument naming system begun during Japanese occupation period, and suggest an alternative to the current situation where naming natural monument accordingly without discretion. Eighteen dangsan forests bibo forests were selected for examination and analysis. The names of the dangsan forests bibo forests were evaluated to find out whether various aspects of the forests are reflected in the name. The study suggests that many forests and old trees designated as natural monument should be named as "~Dangsan forest", "~Dangsan forest Bibo forest", or "~Dangsan tree" with consistency accordingly. The new names will bring a momentum to overcome the limitation of natural monument naming system continued since Japanese occupation period, and also enhance the value of Dangsan forests and Bibo forests as Korean traditional and cultural landscapes.

Vegetation structure and distribution characteristics of Symplocos prunifolia, a rare evergreen broad-leaved tree in Korea

  • Kim, Yangji;Song, Kukman;Yim, Eunyoung;Seo, Yeonok;Choi, Hyungsoon;Choi, Byoungki
    • Journal of Ecology and Environment
    • /
    • v.44 no.4
    • /
    • pp.275-285
    • /
    • 2020
  • Background: In Korea, Symplocos prunifolia Siebold. & Zucc. is only found on Jeju Island. Conservation of the species is difficult because little is known about its distribution and natural habitat. The lack of research and survey data on the characteristics of native vegetation and distribution of this species means that there is insufficient information to guide the management and conservation of this species and related vegetation. Therefore, this study aims to identify the distribution and vegetation associated with S. prunifolia. Results: As a result of field investigations, it was confirmed that the native S. prunifolia communities were distributed in 4 areas located on the southern side of Mt. Halla and within the evergreen broad-leaved forest zones. Furthermore, these evergreen broad-leaved forest zones are themselves located in the warm temperate zone which are distributed along the valley sides at elevations between 318 and 461 m. S. prunifolia was only found on the south side of Mt. Halla, and mainly on south-facing slopes; however, small communities were found to be growing on northwest-facing slopes. It has been confirmed that S. prunifolia trees are rare but an important constituent species in the evergreen broad-leaved forest of Jeju. The mean importance percentage of S. prunifolia community was 48.84 for Castanopsis sieboldii, 17.79 for Quercus acuta, and 12.12 for Pinus thunbergii; S. prunifolia was the ninth most important species (2.6). Conclusions: S. prunifolia can be found growing along the natural streams of Jeju, where there is little anthropogenic influence and where the streams have caused soil disturbance through natural processes of erosion and deposition of sediments. Currently, the native area of S. prunifolia is about 3300 ㎡, which contains a confirmed population of 180 individual plants. As a result of these low population sizes, it places it in the category of an extremely endangered plant in Korea. In some native sites, the canopy of evergreen broad-leaved forest formed, but the frequency and coverage of species were not high. Negative factors that contributed to the low distribution of this species were factors such as lacking in shade tolerance, low fruiting rates, small native areas, and special habitats as well as requiring adequate stream disturbance. Presently, due to changes in climate, it is unclear whether this species will see an increase in its population and habitat area or whether it will remain as an endangered species within Korea. What is clear, however, is that the preservation of the present native habitats and population is extremely important if the population is to be maintained and expanded. It is also meaningful in terms of the stable conservation of biodiversity in Korea. Therefore, based on the results of this study, it is judged that a systematic evaluation for the preservation and conservation of the habitat and vegetation management method of S. prunifolia should be conducted.

The Detection of Online Manipulated Reviews Using Machine Learning and GPT-3 (기계학습과 GPT3를 시용한 조작된 리뷰의 탐지)

  • Chernyaeva, Olga;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.347-364
    • /
    • 2022
  • Fraudulent companies or sellers strategically manipulate reviews to influence customers' purchase decisions; therefore, the reliability of reviews has become crucial for customer decision-making. Since customers increasingly rely on online reviews to search for more detailed information about products or services before purchasing, many researchers focus on detecting manipulated reviews. However, the main problem in detecting manipulated reviews is the difficulties with obtaining data with manipulated reviews to utilize machine learning techniques with sufficient data. Also, the number of manipulated reviews is insufficient compared with the number of non-manipulated reviews, so the class imbalance problem occurs. The class with fewer examples is under-represented and can hamper a model's accuracy, so machine learning methods suffer from the class imbalance problem and solving the class imbalance problem is important to build an accurate model for detecting manipulated reviews. Thus, we propose an OpenAI-based reviews generation model to solve the manipulated reviews imbalance problem, thereby enhancing the accuracy of manipulated reviews detection. In this research, we applied the novel autoregressive language model - GPT-3 to generate reviews based on manipulated reviews. Moreover, we found that applying GPT-3 model for oversampling manipulated reviews can recover a satisfactory portion of performance losses and shows better performance in classification (logit, decision tree, neural networks) than traditional oversampling models such as random oversampling and SMOTE.

Classification and Spatial Distribution of Forest Vegetation Types in Yokjido Island, Korea (욕지도(경남) 산림식생 유형구분과 공간분포 특성)

  • Lee, Bora;Lee, Ho-Sang;Kim, Jun-Soo;Cho, Joon-Hee;Oh, Seung-Hwan;Cho, Hyun-Je
    • Journal of Korean Society of Forest Science
    • /
    • v.111 no.3
    • /
    • pp.345-356
    • /
    • 2022
  • Yokjido is a 15-km2 inhabited island located at the tip of the southeastern coast of the Korean Peninsula. Its forest is mostly composed of substitutional vegetation. Our aim was to provide basic information necessary for the conservation and management of the forest vegetation in Yokjido. We classified the types of existing vegetation using methods of the Zurich-Montpellier school of phytosociology. The resulting vegetation map shows the dominant tree species in the top canopy-layer. A total of 8 vegetation types were identified, which were arranged into a vegetation unit hierarchy of 2 communities, 4 sub-communities, 6 variants, and 2 subvariants. Evaluations of each type showed large and small differences in floristic composition, which reflect anthropogenic influences, site conditions, succession stages, and the establishment period. Moreover, vegetation types differed significantly in terms of species diversity indices; in particular, overall species richness, species diversity, and species evenness tended to increase significantly as the elevation increased. The herbaceous plant species showed the highest positive (+) correlation to x. These results were consistent with those of McCain, who reported that species diversity increases in mountainous areas with relatively low elevations due to the mid-domain effect. The forest succession in Yokjido will potentially enter a mixed-forest stage and then proceed to become an all-evergreen broad-leaved forest.

Diagnosis of Residual Tumors after Unplanned Excision of Soft-Tissue Sarcomas: Conventional MRI Features and Added Value of Diffusion-Weighted Imaging

  • Jin, Kiok;Lee, Min Hee;Yoon, Min A;Kim, Hwa Jung;Kim, Wanlim;Chee, Choong Geun;Chung, Hye Won;Lee, Sang Hoon;Shin, Myung Jin
    • Investigative Magnetic Resonance Imaging
    • /
    • v.26 no.1
    • /
    • pp.20-31
    • /
    • 2022
  • Purpose: To assess conventional MRI features associated with residual soft-tissue sarcomas following unplanned excision (UPE), and to compare the diagnostic performance of conventional MRI only with that of MRI including diffusion-weighted imaging (DWI) for residual tumors after UPE. Materials and Methods: We included 103 consecutive patients who had received UPE of a soft-tissue sarcoma with wide excision of the tumor bed between December 2013 and December 2019 and who also underwent conventional MRI and DWI in this retrospective study. The presence of focal enhancement, soft-tissue edema, fascial enhancement, fluid collections, and hematoma on MRI including DWI was reviewed by two musculoskeletal radiologists. We used classification and regression tree (CART) analysis to identify the most significant MRI features. We compared the diagnostic performances of conventional MRI and added DWI using the McNemar test. Results: Residual tumors were present in 69 (66.9%) of 103 patients, whereas no tumors were found in 34 (33.1%) patients. CART showed focal enhancement to be the most significant predictor of residual tumors and correctly predicted residual tumors in 81.6% (84/103) and 78.6% (81/103) of patients for Reader 1 and Reader 2, respectively. Compared with conventional MRI only, the addition of DWI for Reader 1 improved specificity (32.8% vs. 56%, 33.3% vs. 63.0%, P < 0.05), decreased sensitivity (96.8% vs. 84.1%, 98.7% vs. 76.7%, P < 0.05), without a difference in diagnostic accuracy (76.7% vs. 74.8%, 72.9% vs. 71.4%) in total and in subgroups. For Reader 2, diagnostic performance was not significantly different between the sets of MRI (P > 0.05). Conclusion: After UPE of a soft-tissue sarcoma, the presence or absence of a focal enhancement was the most significant MRI finding predicting residual tumors. MRI provided good diagnostic accuracy for detecting residual tumors, and the addition of DWI to conventional MRI may increase specificity.

Morphogenetic Identification of Eel's Larva (Leptocephalus) Collected by Set net in Namhae, Korea (남해 정치망에서 채집한 엽상자어(Leptocephalus)의 형태 및 유전학적 특성)

  • Chang-Gi Hong;Kyeong-Ho Han
    • Journal of Marine Life Science
    • /
    • v.8 no.2
    • /
    • pp.128-135
    • /
    • 2023
  • The present study was tried to identify whether the eel's larva was close to a conger (Conger myriaster), a pipe conger (Muraenesox cinereus) or four species of Anguilla. Experimental fishes were collected by set net in the gulf of enggang, Namhae, Korea from May to June. Their morphological characteristics were compared with adult fishes of a conger, a pipe conger and four species of Anguilla. For genetic classification, DNA was isolated and amplified by using 12S rRNA and 16S rRNA primer set. The PCR products were direct sequencing in both directions. The nucleotide sequences were analyzed using softwares. As results of morphological measurement on eel's larva, the percentages of head length and preanal length against total length were similar with a conger. Based on the nucleotide sequences, the phylogenetic tree also revealed a close relationship to a conger. Therefore, eel's larva, caught in Namhae from May to June, was identified into a conger's larva.

Investigating the Performance of Bayesian-based Feature Selection and Classification Approach to Social Media Sentiment Analysis (소셜미디어 감성분석을 위한 베이지안 속성 선택과 분류에 대한 연구)

  • Chang Min Kang;Kyun Sun Eo;Kun Chang Lee
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.1-19
    • /
    • 2022
  • Social media-based communication has become crucial part of our personal and official lives. Therefore, it is no surprise that social media sentiment analysis has emerged an important way of detecting potential customers' sentiment trends for all kinds of companies. However, social media sentiment analysis suffers from huge number of sentiment features obtained in the process of conducting the sentiment analysis. In this sense, this study proposes a novel method by using Bayesian Network. In this model MBFS (Markov Blanket-based Feature Selection) is used to reduce the number of sentiment features. To show the validity of our proposed model, we utilized online review data from Yelp, a famous social media about restaurant, bars, beauty salons evaluation and recommendation. We used a number of benchmarking feature selection methods like correlation-based feature selection, information gain, and gain ratio. A number of machine learning classifiers were also used for our validation tasks, like TAN, NBN, Sons & Spouses BN (Bayesian Network), Augmented Markov Blanket. Furthermore, we conducted Bayesian Network-based what-if analysis to see how the knowledge map between target node and related explanatory nodes could yield meaningful glimpse into what is going on in sentiments underlying the target dataset.

Verification Test of High-Stability SMEs Using Technology Appraisal Items (기술력 평가항목을 이용한 고안정성 중소기업 판별력 검증)

  • Jun-won Lee
    • Information Systems Review
    • /
    • v.20 no.4
    • /
    • pp.79-96
    • /
    • 2018
  • This study started by focusing on the internalization of the technology appraisal model into the credit rating model to increase the discriminative power of the credit rating model not only for SMEs but also for all companies, reflecting the items related to the financial stability of the enterprises among the technology appraisal items. Therefore, it is aimed to verify whether the technology appraisal model can be applied to identify high-stability SMEs in advance. We classified companies into industries (manufacturing vs. non-manufacturing) and the age of company (initial vs. non-initial), and defined as a high-stability company that has achieved an average debt ratio less than 1/2 of the group for three years. The C5.0 was applied to verify the discriminant power of the model. As a result of the analysis, there is a difference in importance according to the type of industry and the age of company at the sub-item level, but in the mid-item level the R&D capability was a key variable for discriminating high-stability SMEs. In the early stage of establishment, the funding capacity (diversification of funding methods, capital structure and capital cost which taking into account profitability) is an important variable in financial stability. However, we concluded that technology development infrastructure, which enables continuous performance as the age of company increase, becomes an important variable affecting financial stability. The classification accuracy of the model according to the age of company and industry is 71~91%, and it is confirmed that it is possible to identify high-stability SMEs by using technology appraisal items.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.