• Title/Summary/Keyword: tree classification method

Search Result 361, Processing Time 0.03 seconds

Community Structure and Habitat Environment of Genus Liriope Group in Korea (한반도 맥문동속 집단의 자생지 생육환경과 군락구조)

  • Song, Hong-Seon;Lee, Jung-Hoon;Kim, Seong-Min;Shin, Dong-Il;Kim, Chang-Ho;Koo, Han-Mo;Park, Chung-Berm;Park, Yong-Jin
    • Korean Journal of Medicinal Crop Science
    • /
    • v.19 no.1
    • /
    • pp.24-30
    • /
    • 2011
  • This text was analyzed and investigated the vegetation and floristic composition by cluster analysis and classification of phytosociological method, to evaluate the species composition, habitat environment and community structure of Liriope platyphylla and Liriope spicata group in Korea. The southeast slope gradient of the habitat of L. platyphylla and L. spicata was 6.7 to 8.4%, and the habitat altitude of L. platyphylla (41.0 m), L. spicata (114.9 m) was different. Habitat distribution of L. spicata was broader than L. platyphylla. Appearing plants of L. platyphylla and L. spicata group was 58 taxa, 99 taxa, respectively, and Coverage of tree layer was 87.5%, 92.5% respectively. In genus Liriope group, the highest appearing frequency of plant grow in the moist valley as Quercus serrata. Thus, plants of genus Liriope growth was better in moist shade. The vegetation of L. platyphylla group was classified into Quercus serrata community, Castanopsis sieboldii community, Pinus densiflora community and Pinus thunbergii community, and the Liriope spicata group was classified into Quercus serrata community, Quercus alien community, Quercus acutissima community, Prunus verecunda community, Robinia pseudoacacia community, Pinus densiflora community and Pinus thunbergii community. In genus Liriope group, Quercus serrata and Pinus densiflora communities was the closest the similarities.

Spam-Filtering by Identifying Automatically Generated Email Accounts (자동 생성 메일계정 인식을 통한 스팸 필터링)

  • Lee Sangho
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.378-384
    • /
    • 2005
  • In this paper, we describe a novel method of spam-filtering to improve the performance of conventional spam-filtering systems. Conventional systems filter emails by investigating words distribution in email headers or bodies. Nowadays, spammers begin making email accounts in web-based email service sites and sending emails as if they are not spams. Investigating the email accounts of those spams, we notice that there is a large difference between the automatically generated accounts and ordinaries. Based on that difference, incoming emails are classified into spam/non-spam classes. To classify emails from only account strings, we used decision trees, which have been generally used for conventional pattern classification problems. We collected about 2.15 million account strings from email service sites, and our account checker resulted in the accuracy of $96.3\%$. The previous filter system with the checker yielded the improved filtering performance.

A Study on the Development of Web-based Expert System for Urban Transit (웹 기반의 도시철도 전문가시스템 개발에 관한 연구)

  • Kim Hyunjun;Bae Chulho;Kim Sungbin;Lee Hoyong;Kim Moonhyun;Suh Myungwon
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.13 no.5
    • /
    • pp.163-170
    • /
    • 2005
  • Urban transit is a complex system that is combined electrically and mechanically, it is necessary to construct maintenance system for securing safety accompanying high-speed driving and maintaining promptly. Expert system is a computer program which uses numerical or non-numerical domain-specific knowledge to solve problems. In this research, we intend to develop the expert system which diagnose failure causes quickly and display measures. For the development of expert system, standardization of failure code classification system and creation of BOM(Bill Of Materials) have been first performed. Through the analysis of failure history and maintenance manuals, knowledge base has been constructed. Also, for retrieving the procedure of failure diagnosis and repair linking with the knowledge base, we have built RBR(Rule Based Reasoning) engine by pattern matching technique and CBR(Case Based Reasoning) engine by similarity search method. This system has been developed based on web to maximize the accessibility.

Integrative Analysis of Microarray Data with Gene Ontology to Select Perturbed Molecular Functions using Gene Ontology Functional Code

  • Kim, Chang-Sik;Choi, Ji-Won;Yoon, Suk-Joon
    • Genomics & Informatics
    • /
    • v.7 no.2
    • /
    • pp.122-130
    • /
    • 2009
  • A systems biology approach for the identification of perturbed molecular functions is required to understand the complex progressive disease such as breast cancer. In this study, we analyze the microarray data with Gene Ontology terms of molecular functions to select perturbed molecular functional modules in breast cancer tissues based on the definition of Gene ontology Functional Code. The Gene Ontology is three structured vocabularies describing genes and its products in terms of their associated biological processes, cellular components and molecular functions. The Gene Ontology is hierarchically classified as a directed acyclic graph. However, it is difficult to visualize Gene Ontology as a directed tree since a Gene Ontology term may have more than one parent by providing multiple paths from the root. Therefore, we applied the definition of Gene Ontology codes by defining one or more GO code(s) to each GO term to visualize the hierarchical classification of GO terms as a network. The selected molecular functions could be considered as perturbed molecular functional modules that putatively contributes to the progression of disease. We evaluated the method by analyzing microarray dataset of breast cancer tissues; i.e., normal and invasive breast cancer tissues. Based on the integration approach, we selected several interesting perturbed molecular functions that are implicated in the progression of breast cancers. Moreover, these selected molecular functions include several known breast cancer-related genes. It is concluded from this study that the present strategy is capable of selecting perturbed molecular functions that putatively play roles in the progression of diseases and provides an improved interpretability of GO terms based on the definition of Gene Ontology codes.

Morphological Variation and Partial Mitochondrial Sequence Analysis of Echinoid Species from the Coasts of the East Sea (동해 연안에 서식하는 성게의 형태변이와 미토콘드리아 유전자 분석)

  • Shin, Ji-Hye;Kim, Sung-Gyu;Kim, Young-Dae;Sohn, Young-Chang
    • Journal of Aquaculture
    • /
    • v.21 no.3
    • /
    • pp.139-145
    • /
    • 2008
  • Morphological classification of echinoid species has many difficulties because of their phenotypic variations. In the present study, we analyzed morphotypes and partial mitochondrial 12S rDNA sequences of four sea urchin species classified as Pseudocentrotus depressus, Anthocidaris crassispina, Hemicentrotus pulcherrimus and Strongylocentrotus nudus, and unidentified four species collected from the coasts of the East sea. Their genomic DNAs were extracted from gonads and mitochondrial 12S rDNA sequences were amplified by the polymerase chain reaction (PCR) method. The sequence identities among the known four sea urchin species were 87.4-95.6%. The sequence identities among the unidentified four species were 99.4-99.6% and showed the highest homology to S. intermedius(99.8%). Thus, our phylogenetic tree indicates that the unidentified four species belong to S. intermedius.

A New Latent Class Model for Analysis of Purchasing and Browsing Histories on EC Sites

  • Goto, Masayuki;Mikawa, Kenta;Hirasawa, Shigeichi;Kobayashi, Manabu;Suko, Tota;Horii, Shunsuke
    • Industrial Engineering and Management Systems
    • /
    • v.14 no.4
    • /
    • pp.335-346
    • /
    • 2015
  • The electronic commerce site (EC site) has become an important marketing channel where consumers can purchase many kinds of products; their access logs, including purchase records and browsing histories, are saved in the EC sites' databases. These log data can be utilized for the purpose of web marketing. The customers who purchase many product items are good customers, whereas the other customers, who do not purchase many items, must not be good customers even if they browse many items. If the attributes of good customers and those of other customers are clarified, such information is valuable as input for making a new marketing strategy. Regarding the product items, the characteristics of good items that are bought by many users are valuable information. It is necessary to construct a method to efficiently analyze such characteristics. This paper proposes a new latent class model to analyze both purchasing and browsing histories to make latent item and user clusters. By applying the proposal, an example of data analysis on an EC site is demonstrated. Through the clusters obtained by the proposed latent class model and the classification rule by the decision tree model, new findings are extracted from the data of purchasing and browsing histories.

A Study on Detection of Small Size Malicious Code using Data Mining Method (데이터 마이닝 기법을 이용한 소규모 악성코드 탐지에 관한 연구)

  • Lee, Taek-Hyun;Kook, Kwang-Ho
    • Convergence Security Journal
    • /
    • v.19 no.1
    • /
    • pp.11-17
    • /
    • 2019
  • Recently, the abuse of Internet technology has caused economic and mental harm to society as a whole. Especially, malicious code that is newly created or modified is used as a basic means of various application hacking and cyber security threats by bypassing the existing information protection system. However, research on small-capacity executable files that occupy a large portion of actual malicious code is rather limited. In this paper, we propose a model that can analyze the characteristics of known small capacity executable files by using data mining techniques and to use them for detecting unknown malicious codes. Data mining analysis techniques were performed in various ways such as Naive Bayesian, SVM, decision tree, random forest, artificial neural network, and the accuracy was compared according to the detection level of virustotal. As a result, more than 80% classification accuracy was verified for 34,646 analysis files.

A Study on the Walkability Scores in Jeonju City Using Multiple Regression Models (다중 회귀 모델을 이용한 전주시 보행 환경 점수 예측에 관한 연구)

  • Lee, KiChun;Nam, KwangWoo;Lee, ChangWoo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.4
    • /
    • pp.1-10
    • /
    • 2022
  • Attempts to interpret human perspectives using computer vision have been developed in various fields. In this paper, we propose a method for evaluating the walking environment through semantic segmentation results of images from road images. First, the Kakao Map API was used to collect road images, and four-way images were collected from about 50,000 points in JeonJu. 20% of the collected images build datasets through crowdsourcing-based paired comparisons, and train various regression models using paired comparison data. In order to derive the walkability score of the image data, the ranking score is calculated using the Trueskill algorithm, which is a ranking algorithm, and the walkability and analysis using various regression models are performed using the constructed data. Through this study, it is shown that the walkability of Jeonju can be evaluated and scores can be derived through the correlation between pixel distribution classification information rather than human vision.

A Hybrid Under-sampling Approach for Better Bankruptcy Prediction (부도예측 개선을 위한 하이브리드 언더샘플링 접근법)

  • Kim, Taehoon;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.173-190
    • /
    • 2015
  • The purpose of this study is to improve bankruptcy prediction models by using a novel hybrid under-sampling approach. Most prior studies have tried to enhance the accuracy of bankruptcy prediction models by improving the classification methods involved. In contrast, we focus on appropriate data preprocessing as a means of enhancing accuracy. In particular, we aim to develop an effective sampling approach for bankruptcy prediction, since most prediction models suffer from class imbalance problems. The approach proposed in this study is a hybrid under-sampling method that combines the k-Reverse Nearest Neighbor (k-RNN) and one-class support vector machine (OCSVM) approaches. k-RNN can effectively eliminate outliers, while OCSVM contributes to the selection of informative training samples from majority class data. To validate our proposed approach, we have applied it to data from H Bank's non-external auditing companies in Korea, and compared the performances of the classifiers with the proposed under-sampling and random sampling data. The empirical results show that the proposed under-sampling approach generally improves the accuracy of classifiers, such as logistic regression, discriminant analysis, decision tree, and support vector machines. They also show that the proposed under-sampling approach reduces the risk of false negative errors, which lead to higher misclassification costs.

Classification of Forest Vegetation for a Forest Genetic Resource Reserve in Mt. Seondalsan, Bongwha (봉화 선달산 산림유전자원보호구역의 산림식생 유형)

  • Lee, Jeong Eun;Lee, Cheul Ho;Yun, Chung Weon
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.1
    • /
    • pp.1-12
    • /
    • 2021
  • In this study, the structure of forest vegetation in Mt. Seondalsan, Bongwha-gun, was analyzed. Vegetation data were collected in 137 quadrat plots using the Z-M phytosociological method from June to October 2018. These data were analyzed using vegetation classification, importance value,and species diversity. Consequently, vegetation was classified as a Quercus mongolica community group that was divided into four communities: Cornus controversa, Phlomis umbrosa, Pinus densiflora, and Q. mongolica communities. The C. controversa community was subdivided into Magnolia sieboldii and Parthenocissus tricuspidata groups; the P. densiflora community was divided into Vaccinium hirtum var. koreanum, Quercus variabilis, and P. densiflora groups. In the C. controversa community, the M. sieboldii group was divided into the Acer mandshuricum and M. sieboldii subgroups, whereas the P. tricuspidata group was divided into the Larix kaempferi, Pinus koraiensis, and P. tricuspidata subgroups. In the P. densiflora community, the V. hirtum var. koreanum group was divided into the Rhododendron micranthum and V. hirtum var. koreanum subgroups. According to importance value analysis, C. controversa, L. kaempferi, P. koraiensis, Q. mongolica, Acer pictum subsp. mono, P. densiflora, and Q. variabilis were mainly indicated to have high value in the tree layer. The species diversity of Mt. Seondalsan was 1.969, which was greater than that of another Forest Genetic Resource Reserve.