• 제목/요약/키워드: tree classification

검색결과 938건 처리시간 0.033초

데이터마이닝 기법을 이용한 사상체질 판별함수에 관한 연구 (Study on Classification Function into Sasang Constitution Using Data Mining Techniques)

  • 김규곤;김종원;이의주;김종열;최선미
    • 동의생리병리학회지
    • /
    • 제18권6호
    • /
    • pp.1938-1944
    • /
    • 2004
  • In this study, when we make a diagnosis of constitution using QSCC Ⅱ(Questionnaire of Sasang Constitution Classification). data mining techniques are applied to seek the classification function for improving the accuracy. Data used in the analysis are the questionnaires of 1051 patients who had been treated in Dong Eui Oriental Medical Hospital and Kyung Hee Oriental Medical Hospital. The criteria for data cleansing are the response pattern in the opposite questionnaires and the positive proportion of specific questionnaires in each constitution. And the criteria for variable selection are the test of homogeneity in frequency analysis and the coefficients in the linear discriminant function. Discriminant analysis model and decision tree model are applied to seek the classification function into Sasang constitution. The accuracy in learning sample is similar in two models, the higher accuracy in test sample is obtained in discriminant analysis model.

초분광영상의 분광라이브러리를 이용한 토지피복분류의 정확도 향상에 관한 연구 (The Study on Improving Accuracy of Land Cover Classification using Spectral Library of Hyperspectral Image)

  • 박정서;서진재;고제웅;조기성
    • 지적과 국토정보
    • /
    • 제46권2호
    • /
    • pp.239-251
    • /
    • 2016
  • 밴드 수가 많고 밴드 폭이 좁은 초분광영상은 기존의 다중 분광 영상에 비해 각 픽셀이 함유하고 있는 정보가 많아 영상을 이용한 토지피복분류를 하는데 있어 최적의 영상으로 평가 받고 있다. 하지만 초분광영상의 높은 분광해상도로 부터 증가된 데이터의 용량과 노이즈로 인해 다중분광영상을 분석하는 기법을 그대로 적용하기에는 효용성이 떨어진다. 초분광영상의 분석 기법으로서 벡터의 내적을 활용하는 SAM(Spectral Angle Mapping)은 연속적인 스펙트럼을 보이는 초분광영상의 특성을 해석하는데 가장 보편적인 방법이다. 이에 본 연구에서는 분광라이브러리를 이용한 초분광영상의 토지피복분류를 수행하기 위해 SAM기법을 채택하였으나 대기영향의 노이즈로 인해 낮은 정확도를 보였다. 이를 보안하기 위한 방법으로서 Decision Tree 기법을 제안하였고 그 결과, 분류 정확도를 향상시킬 수 있었다.

Doc2Vec 모형에 기반한 자기소개서 분류 모형 구축 및 실험 (Self Introduction Essay Classification Using Doc2Vec for Efficient Job Matching)

  • 김영수;문현실;김재경
    • 한국IT서비스학회지
    • /
    • 제19권1호
    • /
    • pp.103-112
    • /
    • 2020
  • Job seekers are making various efforts to find a good company and companies attempt to recruit good people. Job search activities through self-introduction essay are nowadays one of the most active processes. Companies spend time and cost to reviewing all of the numerous self-introduction essays of job seekers. Job seekers are also worried about the possibility of acceptance of their self-introduction essays by companies. This research builds a classification model and conducted an experiments to classify self-introduction essays into pass or fail using deep learning and decision tree techniques. Real world data were classified using stratified sampling to alleviate the data imbalance problem between passed self-introduction essays and failed essays. Documents were embedded using Doc2Vec method developed from existing Word2Vec, and they were classified using logistic regression analysis. The decision tree model was chosen as a benchmark model, and K-fold cross-validation was conducted for the performance evaluation. As a result of several experiments, the area under curve (AUC) value of PV-DM results better than that of other models of Doc2Vec, i.e., PV-DBOW and Concatenate. Furthmore PV-DM classifies passed essays as well as failed essays, while PV_DBOW can not classify passed essays even though it classifies well failed essays. In addition, the classification performance of the logistic regression model embedded using the PV-DM model is better than the decision tree-based classification model. The implication of the experimental results is that company can reduce the cost of recruiting good d job seekers. In addition, our suggested model can help job candidates for pre-evaluating their self-introduction essays.

Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on Gene Expression Levels

  • Podolsky, Maxim D;Barchuk, Anton A;Kuznetcov, Vladimir I;Gusarova, Natalia F;Gaidukov, Vadim S;Tarakanov, Segrey A
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제17권2호
    • /
    • pp.835-838
    • /
    • 2016
  • Background: Lung cancer remains one of the most common cancers in the world, both in terms of new cases (about 13% of total per year) and deaths (nearly one cancer death in five), because of the high case fatality. Errors in lung cancer type or malignant growth determination lead to degraded treatment efficacy, because anticancer strategy depends on tumor morphology. Materials and Methods: We have made an attempt to evaluate effectiveness of machine learning algorithms in the task of lung cancer classification based on gene expression levels. We processed four publicly available data sets. The Dana-Farber Cancer Institute data set contains 203 samples and the task was to classify four cancer types and sound tissue samples. With the University of Michigan data set of 96 samples, the task was to execute a binary classification of adenocarcinoma and non-neoplastic tissues. The University of Toronto data set contains 39 samples and the task was to detect recurrence, while with the Brigham and Women's Hospital data set of 181 samples it was to make a binary classification of malignant pleural mesothelioma and adenocarcinoma. We used the k-nearest neighbor algorithm (k=1, k=5, k=10), naive Bayes classifier with assumption of both a normal distribution of attributes and a distribution through histograms, support vector machine and C4.5 decision tree. Effectiveness of machine learning algorithms was evaluated with the Matthews correlation coefficient. Results: The support vector machine method showed best results among data sets from the Dana-Farber Cancer Institute and Brigham and Women's Hospital. All algorithms with the exception of the C4.5 decision tree showed maximum potential effectiveness in the University of Michigan data set. However, the C4.5 decision tree showed best results for the University of Toronto data set. Conclusions: Machine learning algorithms can be used for lung cancer morphology classification and similar tasks based on gene expression level evaluation.

멀티 비트 트리 비트맵 기반 패킷 분류 (A Multibit Tree Bitmap based Packet Classification)

  • 최병철;이정태
    • 한국통신학회논문지
    • /
    • 제29권3B호
    • /
    • pp.339-348
    • /
    • 2004
  • 패킷 분류근 인터넷 망에서 QoS(Quality of Service)보장, VPN(Virtual Private Network)등과 같은 사용자들의 다양한 서비스를 수용하기 위한 중요한 요소이다. 패킷 헤더는 기본적으로 IP(Internet Protocol) 패킷 헤더 내의 목적지 주소뿐만 아니라 발신지 주소, 프로토콜, TCP(Transmission Control Protocol)포트 번호 등 여러 필드들을 조합하여 룰 테이블로부터 best matching 룰을 찾는 것이다. 본 논문에서는 멀티 비트 트라이 구조의 트리 비트맵을 이용하여 하드웨어적인 룰 검색이 가능한 패킷 분류 기법을 제안한다. 검색 대상 필드 및 패킷 분류 룰을 구성하는 프레픽스를 비교 단위가 되는 일정한 비트 크기의 멀티 비트로 나누고, 이와 같이 구분된 멀티 비트 단위로 트리 비트맵 기반의 룰 검색 기능을 수행한다. 제안한 기법은 프레픽스의 일정한 상위 비트들에 대해서는 인덱싱 키로 사용하여 룰 검색을 위한 메모리 액세스 횟수를 줄이도록 하였다. 또한 룰 검색시 성능 저하를 초래하는 백트랙킹이 발생하지 않도록 하기 위하여 룰 테이블 구축시 마커 프레픽스에 대한 처리 기법을 제안하였다 그리고 본 논문에서는 IPMA(Internet Performance Measurement Analysis) 프로젝트에서 제공하는 라우팅 테이블의 프레픽스들을 이용하여 2차원 즉, 목적지 주소와 발신지 주소의 2필드로 구성되는 랜덤 룰 셋을 생성하고 제안한 기법에 대한 메모리 소요량 및 성능 비교를 하였다.

Identification of Cardiovascular Disease Based on Echocardiography and Electrocardiogram Data Using the Decision Tree Classification Approach

  • Tb Ai Munandar;Sumiati;Vidila Rosalina
    • International Journal of Computer Science & Network Security
    • /
    • 제23권9호
    • /
    • pp.150-156
    • /
    • 2023
  • For a doctor, diagnosing a patient's heart disease is not easy. It takes the ability and experience with high flying hours to be able to accurately diagnose the type of patient's heart disease based on the existing factors in the patient. Several studies have been carried out to develop tools to identify types of heart disease in patients. However, most only focus on the results of patient answers and lab results, the rest use only echocardiography data or electrocardiogram results. This research was conducted to test how accurate the results of the classification of heart disease by using two medical data, namely echocardiography and electrocardiogram. Three treatments were applied to the two medical data and analyzed using the decision tree approach. The first treatment was to build a classification model for types of heart disease based on echocardiography and electrocardiogram data, the second treatment only used echocardiography data and the third treatment only used electrocardiogram data. The results showed that the classification of types of heart disease in the first treatment had a higher level of accuracy than the second and third treatments. The accuracy level for the first, second and third treatment were 78.95%, 73.69% and 50%, respectively. This shows that in order to diagnose the type of patient's heart disease, it is advisable to look at the records of both the patient's medical data (echocardiography and electrocardiogram) to get an accurate level of diagnosis results that can be accounted for.

Application of Decision Tree to Classify Fall Risk Using Inertial Measurement Unit Sensor Data and Clinical Measurements

  • Junwoo Park;Jongwon Choi;Seyoung Lee;Kitaek Lim;Woochol Joseph Choi
    • 한국전문물리치료학회지
    • /
    • 제30권2호
    • /
    • pp.102-109
    • /
    • 2023
  • Background: While efforts have been made to differentiate fall risk in older adults using wearable devices and clinical methodologies, technologies are still infancy. We applied a decision tree (DT) algorithm using inertial measurement unit (IMU) sensor data and clinical measurements to generate high performance classification models of fall risk of older adults. Objects: This study aims to develop a classification model of fall risk using IMU data and clinical measurements in older adults. Methods: Twenty-six older adults were assessed and categorized into high and low fall risk groups. IMU sensor data were obtained while walking from each group, and features were extracted to be used for a DT algorithm with the Gini index (DT1) and the Entropy index (DT2), which generated classification models to differentiate high and low fall risk groups. Model's performance was compared and presented with accuracy, sensitivity, and specificity. Results: Accuracy, sensitivity and specificity were 77.8%, 80.0%, and 66.7%, respectively, for DT1; and 72.2%, 91.7%, and 33.3%, respectively, for DT2. Conclusion: Our results suggest that the fall risk classification using IMU sensor data obtained during gait has potentials to be developed for practical use. Different machine learning techniques involving larger data set should be warranted for future research and development.

열차 충돌/탈선사고 위험도 평가모델 개발 (Development of the Risk Assessment Model for Train Collision and Derailment)

  • 최돈범;왕종배;곽상록;박찬우;김민수
    • 한국철도학회:학술대회논문집
    • /
    • 한국철도학회 2008년도 춘계학술대회 논문집
    • /
    • pp.1518-1523
    • /
    • 2008
  • Train collision and derailment are types of accident with low probability of occurrence, but they could lead to disastrous consequences including loss of lives and properties. The development of the risk assessment model has been called upon to predict and assess the risk for a long time. Nevertheless, the risk assessment model is recently introduced to the railway system in Korea. The classification of the hazardous events and causes is the commencement of the risk assessment model. In previous researches related to the classification, the hazardous events and causes were classified by centering the results. That classification was simple, but might not show the root cause of the hazardous events. This study has classified the train collision and derailment based on the relevant hazardous event including faults of the train related the accidents, and investigates the causes related to the hazardous events. For the risk assessment model, FTA (fault tree analysis) and ETA (event tree analysis) methods are introduced to assess the risk.

  • PDF

단계별 비행훈련 성패 예측 모형의 성능 비교 연구 (Comparison of Classification Models for Sequential Flight Test Results)

  • 손소영;조용관;최성옥;김영준
    • 대한인간공학회지
    • /
    • 제21권1호
    • /
    • pp.1-14
    • /
    • 2002
  • The main purpose of this paper is to present selection criteria for ROK Airforce pilot training candidates in order to save costs involved in sequential pilot training. We use classification models such Decision Tree, Logistic Regression and Neural Network based on aptitude test results of 288 ROK Air Force applicants in 1994-1996. Different models are compared in terms of classification accuracy, ROC and Lift-value. Neural network is evaluated as the best model for each sequential flight test result while Logistic regression model outperforms the rest of them for discriminating the last flight test result. Therefore we suggest a pilot selection criterion based on this logistic regression. Overall. we find that the factors such as Attention Sharing, Speed Tracking, Machine Comprehension and Instrument Reading Ability having significant effects on the flight results. We expect that the use of our criteria can increase the effectiveness of flight resources.

자연어 처리 기반 『상한론(傷寒論)』 변병진단체계(辨病診斷體系) 분류를 위한 기계학습 모델 선정 (Selecting Machine Learning Model Based on Natural Language Processing for Shanghanlun Diagnostic System Classification)

  • 김영남
    • 대한상한금궤의학회지
    • /
    • 제14권1호
    • /
    • pp.41-50
    • /
    • 2022
  • Objective : The purpose of this study is to explore the most suitable machine learning model algorithm for Shanghanlun diagnostic system classification using natural language processing (NLP). Methods : A total of 201 data items were collected from 『Shanghanlun』 and 『Clinical Shanghanlun』, 'Taeyangbyeong-gyeolhyung' and 'Eumyangyeokchahunobokbyeong' were excluded to prevent oversampling or undersampling. Data were pretreated using a twitter Korean tokenizer and trained by logistic regression, ridge regression, lasso regression, naive bayes classifier, decision tree, and random forest algorithms. The accuracy of the models were compared. Results : As a result of machine learning, ridge regression and naive Bayes classifier showed an accuracy of 0.843, logistic regression and random forest showed an accuracy of 0.804, and decision tree showed an accuracy of 0.745, while lasso regression showed an accuracy of 0.608. Conclusions : Ridge regression and naive Bayes classifier are suitable NLP machine learning models for the Shanghanlun diagnostic system classification.

  • PDF