• Title/Summary/Keyword: tree classification

Search Result 938, Processing Time 0.033 seconds

Phylogenetic Classification and Evaluation of Agronomic Traits of Korean Wheat Landrace (Triticum aestivum L.) (국내 재래종 밀 계통 분리와 농업형질 특성 평가)

  • Yumi Lee;Sejin Oh;Seong-Wook Kang;Chang-Hyun Choi;Jongtae Lee;Seong-Woo Cho
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.69 no.2
    • /
    • pp.111-122
    • /
    • 2024
  • This study was conducted to evaluate agronomic traits and classify phylogenetic characteristics of Korean wheat landraces (KWLs) collected in Gyeongnam province. We used the squash method for chromosome observation, image analysis to examine seed characteristics, and genotyping using commercial single-nucleotide polymorphism chips to construct a phylogenetic tree. All KWLs contained 42 chromosomes and two pairs of microsatellites as observed in Keumgang, a Korean wheat cultivar. All KWLs showed smaller seed traits compared with those of Keumgang, although KWL-3 had a larger embryo length than that of Keumgang. Among agronomic traits compared with those of Keumgang, all KWLs had a late heading date and ripening period except for KWL-3, which showed the smallest culm and spike length. KWL-1 had the lowest tiller, highest floret, and grain number. All KWLs showed a lower thousand grain weight than that of Keumgang because of their smaller seeds. In the variation of variety and area, the heading date, ripening period, tiller number, and floret number were affected by the cultivation area, whereas the culm length, spike length, and 1000 grain weight were affected by the variety. Correlation distribution analysis showed differences in agronomic traits according to the cultivation area, and the heading date was positively correlated with the culm length and floret number in three cultivation areas. Principal component analysis explained that the heading date had a positive relationship with the ripening period and floret number and a negative relationship with the tiller number. Principal component analysis also revealed that all KWLs had a lower thousand grain weight than that of Keumgang. Phylogenetic tree showed that KWL-1 was near KWL-3, while KWL-2 was near KWL-4. All KWLs were genetically near the Korean wheat cultivars milsung and saeol, whereas they were genetically far from the Korean wheat cultivars goso and olgrue.

A Hybrid SVM Classifier for Imbalanced Data Sets (불균형 데이터 집합의 분류를 위한 하이브리드 SVM 모델)

  • Lee, Jae Sik;Kwon, Jong Gu
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.125-140
    • /
    • 2013
  • We call a data set in which the number of records belonging to a certain class far outnumbers the number of records belonging to the other class, 'imbalanced data set'. Most of the classification techniques perform poorly on imbalanced data sets. When we evaluate the performance of a certain classification technique, we need to measure not only 'accuracy' but also 'sensitivity' and 'specificity'. In a customer churn prediction problem, 'retention' records account for the majority class, and 'churn' records account for the minority class. Sensitivity measures the proportion of actual retentions which are correctly identified as such. Specificity measures the proportion of churns which are correctly identified as such. The poor performance of the classification techniques on imbalanced data sets is due to the low value of specificity. Many previous researches on imbalanced data sets employed 'oversampling' technique where members of the minority class are sampled more than those of the majority class in order to make a relatively balanced data set. When a classification model is constructed using this oversampled balanced data set, specificity can be improved but sensitivity will be decreased. In this research, we developed a hybrid model of support vector machine (SVM), artificial neural network (ANN) and decision tree, that improves specificity while maintaining sensitivity. We named this hybrid model 'hybrid SVM model.' The process of construction and prediction of our hybrid SVM model is as follows. By oversampling from the original imbalanced data set, a balanced data set is prepared. SVM_I model and ANN_I model are constructed using the imbalanced data set, and SVM_B model is constructed using the balanced data set. SVM_I model is superior in sensitivity and SVM_B model is superior in specificity. For a record on which both SVM_I model and SVM_B model make the same prediction, that prediction becomes the final solution. If they make different prediction, the final solution is determined by the discrimination rules obtained by ANN and decision tree. For a record on which SVM_I model and SVM_B model make different predictions, a decision tree model is constructed using ANN_I output value as input and actual retention or churn as target. We obtained the following two discrimination rules: 'IF ANN_I output value <0.285, THEN Final Solution = Retention' and 'IF ANN_I output value ${\geq}0.285$, THEN Final Solution = Churn.' The threshold 0.285 is the value optimized for the data used in this research. The result we present in this research is the structure or framework of our hybrid SVM model, not a specific threshold value such as 0.285. Therefore, the threshold value in the above discrimination rules can be changed to any value depending on the data. In order to evaluate the performance of our hybrid SVM model, we used the 'churn data set' in UCI Machine Learning Repository, that consists of 85% retention customers and 15% churn customers. Accuracy of the hybrid SVM model is 91.08% that is better than that of SVM_I model or SVM_B model. The points worth noticing here are its sensitivity, 95.02%, and specificity, 69.24%. The sensitivity of SVM_I model is 94.65%, and the specificity of SVM_B model is 67.00%. Therefore the hybrid SVM model developed in this research improves the specificity of SVM_B model while maintaining the sensitivity of SVM_I model.

Landslide Susceptibility Analysis in Jeju Using Artificial Neural Network(ANN) and GIS (인공신경망기법과 GIS를 이용한 제주도 산사태 취약성분석)

  • Quan, He-Chun;Lee, Byung-Gul;Cho, Eun-Il
    • Journal of Environmental Science International
    • /
    • v.17 no.6
    • /
    • pp.679-687
    • /
    • 2008
  • In this study, we implemented landslide distribution of Jeju Island using ANN and GIS, respectively. To do this, we first get the counter line from 1:2,5000 digital map and use this counter line to make the DEM. for the evaluate the land slide susceptibility. Next, we abstracted slop map and aspect map from the DEM and get the land use map using ISODATA classification method from Landsat 7 images. In the computation processes of landslide analysis, we make the class to the soil map, tree diameter map, Isohyet map, geological map and so on. Finally, we applied the ANN method to the landslide one and calculated its weighted values. GIS results can be calculated by using Acrview program and produced Jeju landslide susceptibility map by usign Weighted Overlay method. Based on our results, we found the relatively weak points of landslide ware concentrated to the top of Halla mountains.

Impact of Life Style Characteristics on Prevalence Risk of Metabolic Syndrome (생활습관 요인이 대사증후군 유병 위험에 미치는 영향)

  • Yoo, Ji-Soo;Jeong, Jeong-In;Park, Chang-Gi;Kang, Se-Won;Ahn, Jeong-Ah
    • Journal of Korean Academy of Nursing
    • /
    • v.39 no.4
    • /
    • pp.594-601
    • /
    • 2009
  • Purpose: The goal of this study was to evaluate the impact of life style characteristics on the prevalence risk of metabolic syndrome (MS). Methods: A total of 581 adults were recruited from a cardiovascular outpatient clinic. A newly developed comprehensive life style evaluation tool for MS patients was used, and patient data related to the MS diagnosis were reviewed from the hospital records. Results: The overall prevalence of MS was 53.2%, and the mean of MS score was 2.6 for patients at a cardiovascular outpatient clinic (78% of the patients had hypertension). Dietary habits among the life style characteristics had significant influence on the prevalence risk of MS and MS scores. And also interestingly, the classification and regression tree (CART) model suggested that the high prevalence risk groups for MS were older adults (61.5$\leq$age<79.4), and adults between 48.5 and 61.5 yr of age with bad dietary habits. Conclusion: This study indicates that nurses should focus on dietary habits of patients (especially patients classified as high prevalence risk for MS) for improvement and prevention of MS prevalence risk.

Plant Community Structure Analysis in Noinbong area of Odaesan National Park (오대산 국립공원 노인봉지역 식물군집구조분석)

  • 최송현;권전오;민성환
    • Korean Journal of Environment and Ecology
    • /
    • v.9 no.2
    • /
    • pp.156-165
    • /
    • 1996
  • To investigate the forest structure and to suggest the management of vegetation landscape in Noinbong area, Pdaesan National Pa, twelve plots were set up and surveyed. According to the acalysis of classification by TWINSPAN, the community was divided by two groups of Carpinus laxiflora - Quercus mongolica community and the other is Betula costata - schmidtii - C. laxiflora community. It was found out that the successional stage of Noinbong forests was climax and introduced-climax by the analysis of species structure, similarity index and species diversity. The number of individuals was about 120~130 and species was 17 per 100m$^{2}$. Through the analysis of basal area and DBH class distribution, it was estimated that C. laxiflora, B. costata, and B. schmidtii will be clmax species instead of Q. mongolica in tree layer, and in the subtree layer, Acer pseudo-sieboldianum will be dominant species.

  • PDF

Development of Text-to-Speech System for PC (PC용 Text-to-Speech 시스템 개발)

  • Choi Muyeol;Hwang Cholgyu;Kim Soontae;Kim Junggon;Yi Sopae;Jang Seokbok;Pyo Kyungnan;Ahn Hyesun;Kim Hyung Soon
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.41-44
    • /
    • 1999
  • 본 논문에서는 PC 응용을 위한 고음질의 한국어 text-to-speech(TTS) 합성 시스템을 개발하였다. 개발된 시스템의 합성방식으로는 음의 고저 조절, 인접음 사이의 연결 처리 및 음색제어 등에서 기존의 PSOLA 방식에 비해 장점을 가지는 정현파 모델 기반의 방식을 채택하였고, 자연스러운 운율 모델링을 위하여 통계적 기법중의 하나인 Classification and regression tree(CART) 방법을 사용하였다. 또한 음소 경계의 불연속성 문제를 줄이기 위한 합성단위로 초성-중성 및 종성 단위를 사용하였고, 다양한 음색표현이 가능하도록 음색제어 기능을 갖추었다. 그리고, 표준 Speech Application Program Interface(SAPI)를 준용한 TTS engine 형태로 구현함으로써 PC 상에서의 응용 프로그램 개발 편의성을 높였다. 합성음의 청취평가 결과 음질의 우수성 및 음색제어 기능의 유효성을 확인할 수 있었다.

  • PDF

Visual Semantic Based 3D Video Retrieval System Using HDFS

  • Ranjith Kumar, C.;Suguna, S.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.8
    • /
    • pp.3806-3825
    • /
    • 2016
  • This paper brings out a neoteric frame of reference for visual semantic based 3d video search and retrieval applications. Newfangled 3D retrieval application spotlight on shape analysis like object matching, classification and retrieval not only sticking up entirely with video retrieval. In this ambit, we delve into 3D-CBVR (Content Based Video Retrieval) concept for the first time. For this purpose we intent to hitch on BOVW and Mapreduce in 3D framework. Here, we tried to coalesce shape, color and texture for feature extraction. For this purpose, we have used combination of geometric & topological features for shape and 3D co-occurrence matrix for color and texture. After thriving extraction of local descriptors, TB-PCT (Threshold Based- Predictive Clustering Tree) algorithm is used to generate visual codebook. Further, matching is performed using soft weighting scheme with L2 distance function. As a final step, retrieved results are ranked according to the Index value and produce results .In order to handle prodigious amount of data and Efficacious retrieval, we have incorporated HDFS in our Intellection. Using 3D video dataset, we fiture the performance of our proposed system which can pan out that the proposed work gives meticulous result and also reduce the time intricacy.

지능형 IoT서비스를 위한 기계학습 기반 동작 인식 기술

  • Choe, Dae-Ung;Jo, Hyeon-Jung
    • The Proceeding of the Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.27 no.4
    • /
    • pp.19-28
    • /
    • 2016
  • 최근 RFID와 같은 무선 센싱 네트워크 기술과 객체 추적을 위한 센싱 디바이스 및 다양한 컴퓨팅 자원들이 빠르게 발전함에 따라, 기존 웹의 형태는 소셜 웹에서 유비쿼터스 컴퓨팅 웹으로 자연스럽게 진화되고 있다. 유비쿼터스 컴퓨팅 웹에서 사물인터넷(IoT)은 기존의 컴퓨터를 대체할 수 있는데, 이것은 곧 한 사람과 주변 사물들 간에 연결되는 네트워크가 확장되는 것과 동시에 네트워크 안에서 생성되는 데이터의 수가 기하급수적으로 증가되는 것을 의미한다. 따라서 보다 지능적인 IoT 서비스를 위해서는, 수많은 미가공 데이터들 사이에서 사람의 의도와 상황을 실시간으로 정확히 파악할 수 있어야 한다. 이때 사물과의 상호작용을 위한 동작 인식 기술(Gesture recognition)은 집적적인 접촉을 필요로 하지 않기 때문에, 미래의 사람-사물 간 상호작용에 응용될 수 있는 잠재력을 갖고 있다. 한편, 기계학습 분야의 최신 알고리즘들은 다양한 문제에서 사람의 인지능력을 종종 뛰어넘는 성능을 보이고 있는데, 그 중에서도 의사결정나무(Decision Tree)를 기반으로 한 Decision Forest는 분류(Classification)와 회귀(Regression)를 포함한 전 영역에 걸쳐 우월한 성능을 보이고 있다. 따라서 본 논문에서는 지능형 IoT 서비스를 위한 다양한 동작 인식 기술들을 알아보고, 동작 인식을 위한 Decision Forest의 기본 개념과 구현을 위한 학습, 테스팅에 대해 구체적으로 소개한다. 특히 대표적으로 사용되는 3가지 학습방법인 배깅(Bagging), 부스팅(Boosting) 그리고 Random Forest에 대해 소개하고, 이것들이 동작 인식을 위해 어떠한 특징을 갖는지 기존의 연구결과를 토대로 알아보았다.

Classification and Characteristics of Forest Community in Seodaesan, Geumsan (금산 서대산의 임분 특성 및 군락 분류)

  • Ji, Yun-Ui;Song, Ho-Kyung
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.7 no.5
    • /
    • pp.38-46
    • /
    • 2004
  • This study was carried out to analyze forest vegetation in Seodaesan of Geumsan, Chungnam Province. Employing the releve method of Braun-Blanquet and quadrat method, 36 plots were sampled in forest of Seodaesan. The sub-communities were classified into Pinus densiflora, Acer pseudosieboldianum, and Carpinus laxiflora sub-community of Quercus mongolica community. The importance values were 77.07 in Quercus mongolica, 40.79 in Pinus densiflora, 17.03 Fraxinus rhynchophylla, 14.06 in Fraxinus sieboldiana, 13.99 in Quercus serrata, 12.93 Acer pseudosiebotdianum. Coverage rate was 84.6% in tree layer, 52.8% in subtree layer, 29.1% in shrub layer, 27.9% in herb layer, respectively. Most of the DBH of Quercus mongolica and Pinus densiflora was between 5cm and 20cm. Therefore, Quercus mongolica and Pinus densiflora might be dominant species in the study area for several decades. Acer pseudosieboldianum and Carpinus laxiflora sub-communities were distributed mainly in a high-altitude and northern and north-western area. Pinus densiflora sub-community was distributed mainly in a low-altitude and western area.

유전자 알고리즘을 활용한 데이터 불균형 해소 기법의 조합적 활용

  • Jang, Yeong-Sik;Kim, Jong-U;Heo, Jun
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2007.05a
    • /
    • pp.309-320
    • /
    • 2007
  • The data imbalance problem which can be uncounted in data mining classification problems typically means that there are more or less instances in a class than those in other classes. It causes low prediction accuracy of the minority class because classifiers tend to assign instances to major classes and ignore the minor class to reduce overall misclassification rate. In order to solve the data imbalance problem, there has been proposed a number of techniques based on resampling with replacement, adjusting decision thresholds, and adjusting the cost of the different classes. In this paper, we study the feasibility of the combination usage of the techniques previously proposed to deal with the data imbalance problem, and suggest a combination method using genetic algorithm to find the optimal combination ratio of the techniques. To improve the prediction accuracy of a minority class, we determine the combination ratio based on the F-value of the minority class as the fitness function of genetic algorithm. To compare the performance with those of single techniques and the matrix-style combination of random percentage, we performed experiments using four public datasets which has been generally used to compare the performance of methods for the data imbalance problem. From the results of experiments, we can find the usefulness of the proposed method.

  • PDF