• Title/Summary/Keyword: 10-fold cross-validation

Search Result 213, Processing Time 0.021 seconds

Comparison of Survival Prediction of Rats with Hemorrhagic Shocks Using Artificial Neural Network and Support Vector Machine (출혈성 쇼크를 일으킨 흰쥐에서 인공신경망과 지원벡터기계를 이용한 생존율 비교)

  • Jang, Kyung-Hwan;Yoo, Tae-Keun;Nam, Ki-Chang;Choi, Jae-Rim;Kwon, Min-Kyung;Kim, Deok-Won
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.48 no.2
    • /
    • pp.47-55
    • /
    • 2011
  • Hemorrhagic shock is a cause of one third of death resulting from injury in the world. Early diagnosis of hemorrhagic shock makes it possible for physician to treat successfully. The objective of this paper was to select an optimal classifier model using physiological signals from rats measured during hemorrhagic experiment. This data set was used to train and predict survival rate using artificial neural network (ANN) and support vector machine (SVM). To avoid over-fitting, we chose the best classifier according to performance measured by a 10-fold cross validation method. As a result, we selected ANN having three hidden nodes with one hidden layer and SVM with Gaussian kernel function as trained prediction model, and the ANN showed 88.9 % of sensitivity, 96.7 % of specificity, 92.0 % of accuracy and the SVM provided 97.8 % of sensitivity, 95.0 % of specificity, 96.7 % of accuracy. Therefore, SVM was better than ANN for survival prediction.

Design of Classifier for Sorting of Black Plastics by Type Using Intelligent Algorithm (지능형 알고리즘을 이용한 재질별 검정색 플라스틱 분류기 설계)

  • Park, Sang Beom;Roh, Seok Beom;Oh, Sung Kwun;Park, Eun Kyu;Choi, Woo Zin
    • Resources Recycling
    • /
    • v.26 no.2
    • /
    • pp.46-55
    • /
    • 2017
  • In this study, the design methodology of Radial Basis Function Neural Networks is developed with the aid of Laser Induced Breakdown Spectroscopy and also applied to the practical plastics sorting system. To identify black plastics such as ABS, PP, and PS, RBFNNs classifier as a kind of intelligent algorithms is designed. The dimensionality of the obtained input variables are reduced by using PCA and divided into several groups by using K-means clustering which is a kind of clustering techniques. The entire data is split into training data and test data according to the ratio of 4:1. The 5-fold cross validation method is used to evaluate the performance as well as reliability of the proposed classifier. In case of input variables and clusters equal to 5 respectively, the classification performance of the proposed classifier is obtained as 96.78%. Also, the proposed classifier showed superiority in the viewpoint of classification performance where compared to other classifiers.

Design of Pattern Classifier for Electrical and Electronic Waste Plastic Devices Using LIBS Spectrometer (LIBS 분광기를 이용한 폐소형가전 플라스틱 패턴 분류기의 설계)

  • Park, Sang-Beom;Bae, Jong-Soo;Oh, Sung-Kwun;Kim, Hyun-Ki
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.6
    • /
    • pp.477-484
    • /
    • 2016
  • Small industrial appliances such as fan, audio, electric rice cooker mostly consist of ABS, PP, PS materials. In colored plastics, it is possible to classify by near infrared(NIR) spectroscopy, while in black plastics, it is very difficult to classify black plastic because of the characteristic of black material that absorbs the light. So the RBFNNs pattern classifier is introduced for sorting electrical and electronic waste plastics through LIBS(Laser Induced Breakdown Spectroscopy) spectrometer. At the preprocessing part, PCA(Principle Component Analysis), as a kind of dimension reduction algorithms, is used to improve processing speed as well as to extract the effective data characteristics. In the condition part, FCM(Fuzzy C-Means) clustering is exploited. In the conclusion part, the coefficients of linear function of being polynomial type are used as connection weights. PSO and 5-fold cross validation are used to improve the reliability of performance as well as to enhance classification rate. The performance of the proposed classifier is described based on both optimization and no optimization.

Transaction Pattern Discrimination of Malicious Supply Chain using Tariff-Structured Big Data (관세 정형 빅데이터를 활용한 우범공급망 거래패턴 선별)

  • Kim, Seongchan;Song, Sa-Kwang;Cho, Minhee;Shin, Su-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.2
    • /
    • pp.121-129
    • /
    • 2021
  • In this study, we try to minimize the tariff risk by constructing a hazardous cargo screening model by applying Association Rule Mining, one of the data mining techniques. For this, the risk level between supply chains is calculated using the Apriori Algorithm, which is an association analysis algorithm, using the big data of the import declaration form of the Korea Customs Service(KCS). We perform data preprocessing and association rule mining to generate a model to be used in screening the supply chain. In the preprocessing process, we extract the attributes required for rule generation from the import declaration data after the error removing process. Then, we generate the rules by using the extracted attributes as inputs to the Apriori algorithm. The generated association rule model is loaded in the KCS screening system. When the import declaration which should be checked is received, the screening system refers to the model and returns the confidence value based on the supply chain information on the import declaration data. The result will be used to determine whether to check the import case. The 5-fold cross-validation of 16.6% precision and 33.8% recall showed that import declaration data for 2 years and 6 months were divided into learning data and test data. This is a result that is about 3.4 times higher in precision and 1.5 times higher in recall than frequency-based methods. This confirms that the proposed method is an effective way to reduce tariff risks.

An Experimental Study on Feature Ranking Schemes for Text Classification (텍스트 분류를 위한 자질 순위화 기법에 관한 연구)

  • Pan Jun Kim
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.1
    • /
    • pp.1-21
    • /
    • 2023
  • This study specifically reviewed the performance of the ranking schemes as an efficient feature selection method for text classification. Until now, feature ranking schemes are mostly based on document frequency, and relatively few cases have used the term frequency. Therefore, the performance of single ranking metrics using term frequency and document frequency individually was examined as a feature selection method for text classification, and then the performance of combination ranking schemes using both was reviewed. Specifically, a classification experiment was conducted in an environment using two data sets (Reuters-21578, 20NG) and five classifiers (SVM, NB, ROC, TRA, RNN), and to secure the reliability of the results, 5-Fold cross-validation and t-test were applied. As a result, as a single ranking scheme, the document frequency-based single ranking metric (chi) showed good performance overall. In addition, it was found that there was no significant difference between the highest-performance single ranking and the combination ranking schemes. Therefore, in an environment where sufficient learning documents can be secured in text classification, it is more efficient to use a single ranking metric (chi) based on document frequency as a feature selection method.

Syllable-based Probabilistic Models for Korean Morphological Analysis (한국어 형태소 분석을 위한 음절 단위 확률 모델)

  • Shim, Kwangseob
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.642-651
    • /
    • 2014
  • This paper proposes three probabilistic models for syllable-based Korean morphological analysis, and presents the performance of proposed probabilistic models. Probabilities for the models are acquired from POS-tagged corpus. The result of 10-fold cross-validation experiments shows that 98.3% answer inclusion rate is achieved when trained with Sejong POS-tagged corpus of 10 million eojeols. In our models, POS tags are assigned to each syllable before spelling recovery and morpheme generation, which enables more efficient morphological analysis than the previous probabilistic models where spelling recovery is performed at the first stage. This efficiency gains the speed-up of morphological analysis. Experiments show that morphological analysis is performed at the rate of 147K eojeols per second, which is almost 174 times faster than the previous probabilistic models for Korean morphology.

Cloud Attack Detection with Intelligent Rules

  • Pradeepthi, K.V;Kannan, A
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.10
    • /
    • pp.4204-4222
    • /
    • 2015
  • Cloud is the latest buzz word in the internet community among developers, consumers and security researchers. There have been many attacks on the cloud in the recent past where the services got interrupted and consumer privacy has been compromised. Denial of Service (DoS) attacks effect the service availability to the genuine user. Customers are paying to use the cloud, so enhancing the availability of services is a paramount task for the service provider. In the presence of DoS attacks, the availability is reduced drastically. Such attacks must be detected and prevented as early as possible and the power of computational approaches can be used to do so. In the literature, machine learning techniques have been used to detect the presence of attacks. In this paper, a novel approach is proposed, where intelligent rule based feature selection and classification are performed for DoS attack detection in the cloud. The performance of the proposed system has been evaluated on an experimental cloud set up with real time DoS tools. It was observed that the proposed system achieved an accuracy of 98.46% on the experimental data for 10,000 instances with 10 fold cross-validation. By using this methodology, the service providers will be able to provide a more secure cloud environment to the customers.

Defect Severity-based Ensemble Model using FCM (FCM을 적용한 결함심각도 기반 앙상블 모델)

  • Lee, Na-Young;Kwon, Ki-Tae
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.12
    • /
    • pp.681-686
    • /
    • 2016
  • Software defect prediction is an important factor in efficient project management and success. The severity of the defect usually determines the degree to which the project is affected. However, existing studies focus only on the presence or absence of a defect and not the severity of defect. In this study, we proposed an ensemble model using FCM based on defect severity. The severity of the defect of NASA data set's PC4 was reclassified. To select the input column that affected the severity of the defect, we extracted the important defect factor of the data set using Random Forest (RF). We evaluated the performance of the model by changing the parameters in the 10-fold cross-validation. The evaluation results were as follows. First, defect severities were reclassified from 58, 40, 80 to 30, 20, 128. Second, BRANCH_COUNT was an important input column for the degree of severity in terms of accuracy and node impurities. Third, smaller tree number led to more variables for good performance.

Analysis of stage III proximal colon cancer using the Cox proportional hazards model (Cox 비례위험모형을 이용한 우측 대장암 3기 자료 분석)

  • Lee, Taeseob;Lee, Minjung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.349-359
    • /
    • 2017
  • In this paper, we conducted survival analyses by fitting the Cox proportional hazards model to stage III proximal colon cancer data obtained from the Surveillance, Epidemiology, and End Results program of the National Cancer Institute. We investigated the effect of covariates on the hazard function for death from proximal colon cancer in stage III with surgery performed and estimated the survival probability for a patient with specific covariates. We showed that the proportional hazards assumption is satisfied for covariates that were used to analyses, using a test based on the Schoenfeld residuals and plots of the Schoenfeld residuals and $log[-log\{{\hat{S}}(t)\}]$. We evaluated the model calibration and discriminatory accuracy by calibration plot and time-dependent area under the ROC curve, which were calculated using 10-fold cross validation.

Deep Learning-Based Plant Health State Classification Using Image Data (영상 데이터를 이용한 딥러닝 기반 작물 건강 상태 분류 연구)

  • Ali Asgher Syed;Jaehawn Lee;Alvaro Fuentes;Sook Yoon;Dong Sun Park
    • Journal of Internet of Things and Convergence
    • /
    • v.10 no.4
    • /
    • pp.43-53
    • /
    • 2024
  • Tomatoes are rich in nutrients like lycopene, β-carotene, and vitamin C. However, they often suffer from biological and environmental stressors, resulting in significant yield losses. Traditional manual plant health assessments are error-prone and inefficient for large-scale production. To address this need, we collected a comprehensive dataset covering the entire life span of tomato plants, annotated across 5 health states from 1 to 5. Our study introduces an Attention-Enhanced DS-ResNet architecture with Channel-wise attention and Grouped convolution, refined with new training techniques. Our model achieved an overall accuracy of 80.2% using 5-fold cross-validation, showcasing its robustness in precisely classifying the health states of tomato plants.