• Title/Summary/Keyword: tree classification

Search Result 926, Processing Time 0.036 seconds

Clinicoradiologic Characteristics of Intradural Extramedullary Conventional Spinal Ependymoma (경막내 척수외 뇌실막세포종의 임상 영상의학적 특징)

  • Seung Hyun Lee;Yoon Jin Cha;Yong Eun Cho;Mina Park;Bio Joo;Sang Hyun Suh;Sung Jun Ahn
    • Journal of the Korean Society of Radiology
    • /
    • v.84 no.5
    • /
    • pp.1066-1079
    • /
    • 2023
  • Purpose Distinguishing intradural extramedullary (IDEM) spinal ependymoma from myxopapillary ependymoma is challenging due to the location of IDEM spinal ependymoma. This study aimed to investigate the utility of clinical and MR imaging features for differentiating between IDEM spinal and myxopapillary ependymomas. Materials and Methods We compared tumor size, longitudinal/axial location, enhancement degree/pattern, tumor margin, signal intensity (SI) of the tumor on T2-weighted images and T1-weighted image (T1WI), increased cerebrospinal fluid (CSF) SI caudal to the tumor on T1WI, and CSF dissemination of pathologically confirmed 12 IDEM spinal and 10 myxopapillary ependymomas. Furthermore, classification and regression tree (CART) was performed to identify the clinical and MR features for differentiating between IDEM spinal and myxopapillary ependymomas. Results Patients with IDEM spinal ependymomas were older than those with myxopapillary ependymomas (48 years vs. 29.5 years, p < 0.05). A high SI of the tumor on T1W1 was more frequently observed in IDEM spinal ependymomas than in myxopapillary ependymomas (p = 0.02). Conversely, myxopapillary ependymomas show CSF dissemination. Increased CSF SI caudal to the tumor on T1WI was observed more frequently in myxopapillary ependymomas than in IDEM spinal ependymomas (p < 0.05). Dissemination to the CSF space and increased CSF SI caudal to the tumor on T1WI were the most important variables in CART analysis. Conclusion Clinical and radiological variables may help differentiate between IDEM spinal and myxopapillary ependymomas.

Trend Analysis of the Prices and Numbers of Azalea Cultivars for Landscaping in Korea (국내 조경용 철쭉류의 가격 및 종수 추이분석)

  • Choi, Jae-Jin;Park, Seok-Gon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.42 no.4
    • /
    • pp.30-36
    • /
    • 2014
  • This study was conducted to determine the causes of unreasonable prices and small numbers of azalea cultivars by analyzing the price trends and the number of azalea cultivars announced over the last 25 years based on data from the Public Procurement Service(PPS), Korea Price Research Center and the Landscaping Tree Association(LTA)(hereinafter, officially announcing agencies and organizations) which are major references used when landscape planting is decided. The prices of azalea cultivars announced by the official announcing agencies and organizations have moved in similar patterns over the past 25 years because the prices of azalea cultivars announced by the LTA were referred to by other official announcing agencies and organizations when they officially announced the prices of azalea cultivars. The PPS set lower officially fixed prices of azalea cultivars compared to other official announcing agencies and organizations, and the reason for this is considered to be the intention of the PPS to suppress landscape tree price increases because of the government's policies to suppress price increases. The prices of azalea cultivars seem to change rapidly due to the imbalance between the demand and supply of azalea cultivars rather than the effects of consumer price fluctuation rates because the production periods of azalea cultivars are shorter when compared to other landscape trees. The prices of azalea cultivars from the official announcing agencies and organizations have been set higher than the prices in actual transactions. The reason for this is considered to be the intention of the official announcing agencies and organizations to allow landscaping companies to cover defect costs resulting from the practice of subcontracting planting work and secure profits of subcontractors for planting work. The official announcing agencies and organizations have simply announced prices of 5~8 main azalea cultivars that have been used in the past. The names of azalea cultivars being cultivated and criteria for classification have not been clear; thus, landscape designers have not written clear names of azalea cultivars to be cultivated on planting drawings as practice and landscapers planted those azalea cultivars which could be easily obtained. Therefore, it is assumed that there has been no demand for new azalea cultivars. Thus, the vicious circle in which the prices of only those azalea cultivars that were produced in the past have been announced is repeated.

Application of Hyperspectral Imagery to Decision Tree Classifier for Assessment of Spring Potato (Solanum tuberosum) Damage by Salinity and Drought (초분광 영상을 이용한 의사결정 트리 기반 봄감자(Solanum tuberosum)의 염해 판별)

  • Kang, Kyeong-Suk;Ryu, Chan-Seok;Jang, Si-Hyeong;Kang, Ye-Seong;Jun, Sae-Rom;Park, Jun-Woo;Song, Hye-Young;Lee, Su Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.4
    • /
    • pp.317-326
    • /
    • 2019
  • Salinity which is often detected on reclaimed land is a major detrimental factor to crop growth. It would be advantageous to develop an approach for assessment of salinity and drought damages using a non-destructive method in a large landfills area. The objective of this study was to examine applicability of the decision tree classifier using imagery for classifying for spring potatoes (Solanum tuberosum) damaged by salinity or drought at vegetation growth stages. We focused on comparing the accuracies of OA (Overall accuracy) and KC (Kappa coefficient) between the simple reflectance and the band ratios minimizing the effect on the light unevenness. Spectral merging based on the commercial band width with full width at half maximum (FWHM) such as 10 nm, 25 nm, and 50 nm was also considered to invent the multispectral image sensor. In the case of the classification based on original simple reflectance with 5 nm of FWHM, the selected bands ranged from 3-13 bands with the accuracy of less than 66.7% of OA and 40.8% of KC in all FWHMs. The maximum values of OA and KC values were 78.7% and 57.7%, respectively, with 10 nm of FWHM to classify salinity and drought damages of spring potato. When the classifier was built based on the band ratios, the accuracy was more than 95% of OA and KC regardless of growth stages and FWHMs. If the multispectral image sensor is made with the six bands (the ratios of three bands) with 10 nm of FWHM, it is possible to classify the damaged spring potato by salinity or drought using the reflectance of images with 91.3% of OA and 85.0% of KC.

A Phylogenetic Analysis of Otters (Lutra lutra) Inhabiting in the Gyeongnam Area Using D-Loop Sequence of mtDNA and Microsatellite Markers (경남지역 수달(Lutra lutra)의 mitochondrial DNA D-loop지역과 microsatellite marker를 이용한 계통유전학적 유연관계 분석)

  • Park, Moon-Sung;Lim, Hyun-Tae;Oh, Ki-Cheol;Moon, Young-Rok;Kim, Jong-Gap;Jeon, Jin-Tae
    • Journal of Life Science
    • /
    • v.21 no.3
    • /
    • pp.385-392
    • /
    • 2011
  • The otter (Lutra lutra) in Korea is classified as a first grade endangered species and is managed under state control. We performed a phylogenetic analysis of the otter that inhabits the Changnyeong, Jinju, and Geoje areas in Gyeongsangnamdo, Korea using mtDNA and microsatellite (MS) markers. As a result of the analysis using the 676-bp D-loop sequence of mtDNA, six haplotypes were estimated from five single nucleotide polymorphisms. The genetic distance between the Jinju and Geoje areas was greater than distances within the areas, and the distance between Jinju and Geoje was especially clear. From the phylogenetic tree estimated using the Bayesian Markov chain Monte Carlo analysis by the MrBays program, two subgroups, one containing samples from Jinju and the other containing samples from the Changnyeong and Geoje areas were clearly identified. The result of a parsimonious median-joining network analysis also showed two clear subgroups, supporting the result of the phylogenetic analysis. On the other hand, in the consensus tree estimated using the genetic distances estimated from the genotypes of 13 MS markers, there were clear two subgroups, one containing samples from the Jinju, Geoje and Changnyeong areas and the other containing samples from only the Jinju area. The samples were not identically classified into each subgroup defined by mtDNA and MS markers. It could be inferred that the differential classification of samples by the two different marker systems was because of the different characteristics of the marker systems used, that is, the mtDNA was for detecting maternal lineage and the MS markers were for estimating autosomal genetic distances. Nonetheless, the results from the two marker systems showed that there has been a progressive genetic fixation according to the habitats of the otters. Further analyses using not only newly developed MS markers that will possess more analytical power but also the whole mtDNA are needed. Expansion of the phylogenetic analysis using otter samples collected from the major habitats in Korea should be helpful in scientifically and efficiently maintaining and preserving them.

A Case Study on Forecasting Inbound Calls of Motor Insurance Company Using Interactive Data Mining Technique (대화식 데이터 마이닝 기법을 활용한 자동차 보험사의 인입 콜량 예측 사례)

  • Baek, Woong;Kim, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.99-120
    • /
    • 2010
  • Due to the wide spread of customers' frequent access of non face-to-face services, there have been many attempts to improve customer satisfaction using huge amounts of data accumulated throughnon face-to-face channels. Usually, a call center is regarded to be one of the most representative non-faced channels. Therefore, it is important that a call center has enough agents to offer high level customer satisfaction. However, managing too many agents would increase the operational costs of a call center by increasing labor costs. Therefore, predicting and calculating the appropriate size of human resources of a call center is one of the most critical success factors of call center management. For this reason, most call centers are currently establishing a department of WFM(Work Force Management) to estimate the appropriate number of agents and to direct much effort to predict the volume of inbound calls. In real world applications, inbound call prediction is usually performed based on the intuition and experience of a domain expert. In other words, a domain expert usually predicts the volume of calls by calculating the average call of some periods and adjusting the average according tohis/her subjective estimation. However, this kind of approach has radical limitations in that the result of prediction might be strongly affected by the expert's personal experience and competence. It is often the case that a domain expert may predict inbound calls quite differently from anotherif the two experts have mutually different opinions on selecting influential variables and priorities among the variables. Moreover, it is almost impossible to logically clarify the process of expert's subjective prediction. Currently, to overcome the limitations of subjective call prediction, most call centers are adopting a WFMS(Workforce Management System) package in which expert's best practices are systemized. With WFMS, a user can predict the volume of calls by calculating the average call of each day of the week, excluding some eventful days. However, WFMS costs too much capital during the early stage of system establishment. Moreover, it is hard to reflect new information ontothe system when some factors affecting the amount of calls have been changed. In this paper, we attempt to devise a new model for predicting inbound calls that is not only based on theoretical background but also easily applicable to real world applications. Our model was mainly developed by the interactive decision tree technique, one of the most popular techniques in data mining. Therefore, we expect that our model can predict inbound calls automatically based on historical data, and it can utilize expert's domain knowledge during the process of tree construction. To analyze the accuracy of our model, we performed intensive experiments on a real case of one of the largest car insurance companies in Korea. In the case study, the prediction accuracy of the devised two models and traditional WFMS are analyzed with respect to the various error rates allowable. The experiments reveal that our data mining-based two models outperform WFMS in terms of predicting the amount of accident calls and fault calls in most experimental situations examined.

Analysis of Utilization Characteristics, Health Behaviors and Health Management Level of Participants in Private Health Examination in a General Hospital (일개 종합병원의 민간 건강검진 수검자의 검진이용 특성, 건강행태 및 건강관리 수준 분석)

  • Kim, Yoo-Mi;Park, Jong-Ho;Kim, Won-Joong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.1
    • /
    • pp.301-311
    • /
    • 2013
  • This study aims to analyze characteristics, health behaviors and health management level related to private health examination recipients in one general hospital. To achieve this, we analyzed 150,501 cases of private health examination data for 11 years from 2001 to 2011 for 20,696 participants in 2011 in a Dae-Jeon general hospital health examination center. The cluster analysis for classify private health examination group is used z-score standardization of K-means clustering method. The logistic regression analysis, decision tree and neural network analysis are used to periodic/non-periodic private health examination classification model. 1,000 people were selected as a customer management business group that has high probability to be non-periodic private health examination patients in new private health examination. According to results of this study, private health examination group was categorized by new, periodic and non-periodic group. New participants in private health examination were more 30~39 years old person than other age groups and more patients suspected of having renal disease. Periodic participants in private health examination were more male participants and more patients suspected of having hyperlipidemia. Non-periodic participants in private health examination were more smoking and sitting person and more patients suspected of having anemia and diabetes mellitus. As a result of decision tree, variables related to non-periodic participants in private health examination were sex, age, residence, exercise, anemia, hyperlipidemia, diabetes mellitus, obesity and liver disease. In particular, 71.4% of non-periodic participants were female, non-anemic, non-exercise, and suspicious obesity person. To operation of customized customer management business for private health examination will contribute to efficiency in health examination center.

Plant Community Structure of Abies holophylla Community from Sinseongam to Jungdaesa in Odaesan National Park (오대산국립공원 신성암~중대사 전나무림 식물군집구조 특성)

  • Kim, Dong-Wook;Han, Bong-Ho;Kim, Jong-Yup;Yeum, Jung-Hun
    • Korean Journal of Environment and Ecology
    • /
    • v.29 no.6
    • /
    • pp.895-906
    • /
    • 2015
  • This study was carried out to the structure of plant community from Sinseongam to Jungdaesa in Odaesan National Park, furthermore, it seeks to curate the basic data for planning of the Abies holophylla's forest management in Odaesan National Park. In order to identify the current ecological environment, this study explored the actual vegetation as primary research and set to twenty plots(i.e. $400m^2$) for analysing detailed structure of plant communities. The research methodology was qualitative analysis, therefore it used TWINSPAN and DCA analysis tools. Especially, TWINSPAN performed well in several comparisons of classification techniques, DCA is one of the ordination technique showed that the plant communities. The plant community was analysed classification and ordination by TWINSPAN and DCA, moreover it was analysed the structure of plant community such as importance percentage of woody species, DBH class distribution, the index of diversity and rate of sample tree growth. The main vegetation was A. holophylla-Quercus mongolica forest and Deciduous broad-leaved forest in the communities where located in low altitude and valley, whereas main vegetation where located in high altitude and slope was Q. mongolica forest. The research site's plant communities were classified four groups. In all of communities, A. holophylla was dominant species in main canopy layer, furthermore, the three communities (community I, II, III) are growing up next generation of A. holophylla excluding community IV. The communities (community I, II, III) can be sustained current status which dominates the A. holophylla communities, simultaneously, there might be expanded the Deciduous broad-leaved communities by Carpinus cordata, Betula schmidtii and so on. While, it showed that the community IV tended to be weaken the forces of A. holophylla, therefore the community IV can be transferred to C. cordata-Deciduous broad-leaved communities in the future. The age of sample trees was 79~128(i.e. A. holophylla), 75~87(i.e. Pinus koraiensis) and 190 years(i.e. Ulmus davidiana var. japonica). The index of Shannon's Species diversity (H') were ranged from 0.3889 to 1.3332 in the communities.

An Integrated Model based on Genetic Algorithms for Implementing Cost-Effective Intelligent Intrusion Detection Systems (비용효율적 지능형 침입탐지시스템 구현을 위한 유전자 알고리즘 기반 통합 모형)

  • Lee, Hyeon-Uk;Kim, Ji-Hun;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.125-141
    • /
    • 2012
  • These days, the malicious attacks and hacks on the networked systems are dramatically increasing, and the patterns of them are changing rapidly. Consequently, it becomes more important to appropriately handle these malicious attacks and hacks, and there exist sufficient interests and demand in effective network security systems just like intrusion detection systems. Intrusion detection systems are the network security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. Conventional intrusion detection systems have generally been designed using the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. However, they cannot handle new or unknown patterns of the network attacks, although they perform very well under the normal situation. As a result, recent studies on intrusion detection systems use artificial intelligence techniques, which can proactively respond to the unknown threats. For a long time, researchers have adopted and tested various kinds of artificial intelligence techniques such as artificial neural networks, decision trees, and support vector machines to detect intrusions on the network. However, most of them have just applied these techniques singularly, even though combining the techniques may lead to better detection. With this reason, we propose a new integrated model for intrusion detection. Our model is designed to combine prediction results of four different binary classification models-logistic regression (LOGIT), decision trees (DT), artificial neural networks (ANN), and support vector machines (SVM), which may be complementary to each other. As a tool for finding optimal combining weights, genetic algorithms (GA) are used. Our proposed model is designed to be built in two steps. At the first step, the optimal integration model whose prediction error (i.e. erroneous classification rate) is the least is generated. After that, in the second step, it explores the optimal classification threshold for determining intrusions, which minimizes the total misclassification cost. To calculate the total misclassification cost of intrusion detection system, we need to understand its asymmetric error cost scheme. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, total misclassification cost is more affected by FNE rather than FPE. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 10,000 samples from them by using random sampling method. Also, we compared the results from our model with the results from single techniques to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell R4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on GA outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that the proposed model outperformed all the other comparative models in the total misclassification cost perspective. Consequently, it is expected that our study may contribute to build cost-effective intelligent intrusion detection systems.

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.

Interspecific relationships of Korean Viola based on RAPD, ISSR and PCR-RFLP analyses (RAPD, ISSR과 PCR-RFLP를 이용한 한국산 제비꽃속(Viola)의 종간 유연관계)

  • Yoo, Ki-Oug;Lee, Woo-Tchul;Kwon, Oh-Keun
    • Korean Journal of Plant Taxonomy
    • /
    • v.34 no.1
    • /
    • pp.43-61
    • /
    • 2004
  • Molecular taxonomic studies were conducted to evaluate interspecific relationships in Korean Viola 34 taxa including two Japanese populations using RAPD(randornly amplified polymorphic DNA), ISSR(inter simple sequence repeat) and PCR-RFLP(restriction fragment length polymorphism) analysis. Only six and four primers out of 40 arbitrary and 12 ISSR primers were screened for 34 taxa, and were revealed 70 (98.6%) and 28 (96.6%) polymorphic bands, respectively. Fifteen restriction endonucleases produced 80 restriction sites and size variations from the large single copy region of cpDNA, 16 (20%) of which were polymorphic. The separate analyses from the RAPD, ISSR and PCR-RFLP data were incongruent in the relationships among 34 taxa, but combined data was in accordance with previous infrageneric classification system based on morphological characters, especially the subsection and series level. Section Chamaemelanium placed between subsect. Patellares and Vagimtae of section Nomimium was not formed as a distinct group. Viola alb ida complex including three very closely related taxa was recognized independent group within subsect. Patellares in combined data tree. This result strongly suggested that they should be treated to series Pinmtae. RAPD analysis was very useful to clarify the interspecific relationships among the species of Korean Viola than ISSH and PCR-RFLP analyses.