• Title/Summary/Keyword: Classifier's Significance

Search Result 9, Processing Time 0.022 seconds

A Detailed Analysis of Classifier Ensembles for Intrusion Detection in Wireless Network

  • Tama, Bayu Adhi;Rhee, Kyung-Hyune
    • Journal of Information Processing Systems
    • /
    • v.13 no.5
    • /
    • pp.1203-1212
    • /
    • 2017
  • Intrusion detection systems (IDSs) are crucial in this overwhelming increase of attacks on the computing infrastructure. It intelligently detects malicious and predicts future attack patterns based on the classification analysis using machine learning and data mining techniques. This paper is devoted to thoroughly evaluate classifier ensembles for IDSs in IEEE 802.11 wireless network. Two ensemble techniques, i.e. voting and stacking are employed to combine the three base classifiers, i.e. decision tree (DT), random forest (RF), and support vector machine (SVM). We use area under ROC curve (AUC) value as a performance metric. Finally, we conduct two statistical significance tests to evaluate the performance differences among classifiers.

Classification of Breast Tumor Cell Tissue Section Images (유방 종양 세포 조직 영상의 분류)

  • 황해길;최현주;윤혜경;남상희;최흥국
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.4
    • /
    • pp.22-30
    • /
    • 2001
  • In this paper we propose three classification algorithms to classify breast tumors that occur in duct into Benign, DCIS(ductal carcinoma in situ) NOS(invasive ductal carcinoma) The general approach for a creating classifier is composed of 2 steps: feature extraction and classification Above all feature extraction for a good classifier is very significance, because the classification performance depends on the extracted features, Therefore in the feature extraction step, we extracted morphology features describing the size of nuclei and texture features The internal structures of the tumor are reflected from wavelet transformed images with 10$\times$ and 40$\times$ magnification. Pariticulary to find the correlation between correct classification rates and wavelet depths we applied 1, 2, 3 and 4-level wavelet transforms to the images and extracted texture feature from the transformed images The morphology features used are area, perimeter, width of X axis width of Y axis and circularity The texture features used are entropy energy contrast and homogeneity. In the classification step, we created three classifiers from each of extracted features using discriminant analysis The first classifier was made by morphology features. The second and the third classifiers were made by texture features of wavelet transformed images with 10$\times$ and 40$\times$ magnification. Finally we analyzed and compared the correct classification rate of the three classifiers. In this study, we found that the best classifier was made by texture features of 3-level wavelet transformed images.

  • PDF

Person Tracking by Detection of Mobile Robot using RGB-D Cameras

  • Kim, Young-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.12
    • /
    • pp.17-25
    • /
    • 2017
  • In this paper, we have implemented a low-cost mobile robot supporting the person tracking by detection using RGB-D cameras and ROS(Robot Operating System) framework. The mobile robot was developed based on the Kobuki mobile base equipped with 2's Kinect devices and a high performance controller. One kinect device was used to detect and track the single person among people in the constrained working area by combining point cloud data filtering & clustering, HOG classifier and Kalman Filter-based estimation successively, and the other to perform the SLAM-based navigation supported in ROS framework. In performance evaluation, the person tracking by detection was proved to be robustly executed in real-time, and the navigation function showed the accuracy with the mean distance error being lower than 50mm. The mobile robot implemented has a significance in using the open-source based, general-purpose and low-cost approach.

Selection of markers in the framework of multivariate receiver operating characteristic curve analysis in binary classification

  • Sameera, G;Vishnu, Vardhan R
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.2
    • /
    • pp.79-89
    • /
    • 2019
  • Classification models pertaining to receiver operating characteristic (ROC) curve analysis have been extended from univariate to multivariate setup by linearly combining available multiple markers. One such classification model is the multivariate ROC curve analysis. However, not all markers contribute in a real scenario and may mask the contribution of other markers in classifying the individuals/objects. This paper addresses this issue by developing an algorithm that helps in identifying the important markers that are significant and true contributors. The proposed variable selection framework is supported by real datasets and a simulation study, it is shown to provide insight about the individual marker's significance in providing a classifier rule/linear combination with good extent of classification.

The earth mover's distance and Bayesian linear discriminant analysis for epileptic seizure detection in scalp EEG

  • Yuan, Shasha;Liu, Jinxing;Shang, Junliang;Kong, Xiangzhen;Yuan, Qi;Ma, Zhen
    • Biomedical Engineering Letters
    • /
    • v.8 no.4
    • /
    • pp.373-382
    • /
    • 2018
  • Since epileptic seizure is unpredictable and paroxysmal, an automatic system for seizure detecting could be of great significance and assistance to patients and medical staff. In this paper, a novel method is proposed for multichannel patient-specific seizure detection applying the earth mover's distance (EMD) in scalp EEG. Firstly, the wavelet decomposition is executed to the original EEGs with five scales, the scale 3, 4 and 5 are selected and transformed into histograms and afterwards the distances between histograms in pairs are computed applying the earth mover's distance as effective features. Then, the EMD features are sent to the classifier based on the Bayesian linear discriminant analysis (BLDA) for classification, and an efficient postprocessing procedure is applied to improve the detection system precision, finally. To evaluate the performance of the proposed method, the CHB-MIT scalp EEG database with 958 h EEG recordings from 23 epileptic patients is used and a relatively satisfactory detection rate is achieved with the average sensitivity of 95.65% and false detection rate of 0.68/h. The good performance of this algorithm indicates the potential application for seizure monitoring in clinical practice.

Effects of Field-Grown Genetically Modified Zoysia Grass on Bacterial Community Structure

  • Lee, Yong-Eok;Yang, Sang-Hwan;Bae, Tae-Woong;Kang, Hong-Gyu;Lim, Pyung-Ok;Lee, Hyo-Yeon
    • Journal of Microbiology and Biotechnology
    • /
    • v.21 no.4
    • /
    • pp.333-340
    • /
    • 2011
  • Herbicide-tolerant Zoysia grass has been previously developed through Agrobacterium-mediated transformation. We investigated the effects of genetically modified (GM) Zoysia grass and the associated herbicide application on bacterial community structure by using culture-independent approaches. To assess the possible horizontal gene transfer (HGT) of transgenic DNA to soil microorganisms, total soil DNAs were amplified by PCR with two primer sets for the bar and hpt genes, which were introduced into the GM Zoysia grass by a callus-type transformation. The transgenic genes were not detected from the total genomic DNAs extracted from 1.5 g of each rhizosphere soils of GM and non-GM Zoysia grasses. The structures and diversities of the bacterial communities in rhizosphere soils of GM and non-GM Zoysia grasses were investigated by constructing 16S rDNA clone libraries. Classifier, provided in the RDP II, assigned 100 clones in the 16S rRNA gene sequences library into 11 bacterial phyla. The most abundant phyla in both clone libraries were Acidobacteria and Proteobacteria. The bacterial diversity of the GM clone library was lower than that of the non- GM library. The former contained four phyla, whereas the latter had seven phyla. Phylogenetic trees were constructed to confirm these results. Phylogenetic analyses of the two clone libraries revealed considerable difference from each other. The significance of difference between clone libraries was examined with LIBSHUFF statistics. LIBSHUFF analysis revealed that the two clone libraries differed significantly (P<0.025), suggesting alterations in the composition of the microbial community associated with GM Zoysia grass.

Optimal Selection of Classifier Ensemble Using Genetic Algorithms (유전자 알고리즘을 이용한 분류자 앙상블의 최적 선택)

  • Kim, Myung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.99-112
    • /
    • 2010
  • Ensemble learning is a method for improving the performance of classification and prediction algorithms. It is a method for finding a highly accurateclassifier on the training set by constructing and combining an ensemble of weak classifiers, each of which needs only to be moderately accurate on the training set. Ensemble learning has received considerable attention from machine learning and artificial intelligence fields because of its remarkable performance improvement and flexible integration with the traditional learning algorithms such as decision tree (DT), neural networks (NN), and SVM, etc. In those researches, all of DT ensemble studies have demonstrated impressive improvements in the generalization behavior of DT, while NN and SVM ensemble studies have not shown remarkable performance as shown in DT ensembles. Recently, several works have reported that the performance of ensemble can be degraded where multiple classifiers of an ensemble are highly correlated with, and thereby result in multicollinearity problem, which leads to performance degradation of the ensemble. They have also proposed the differentiated learning strategies to cope with performance degradation problem. Hansen and Salamon (1990) insisted that it is necessary and sufficient for the performance enhancement of an ensemble that the ensemble should contain diverse classifiers. Breiman (1996) explored that ensemble learning can increase the performance of unstable learning algorithms, but does not show remarkable performance improvement on stable learning algorithms. Unstable learning algorithms such as decision tree learners are sensitive to the change of the training data, and thus small changes in the training data can yield large changes in the generated classifiers. Therefore, ensemble with unstable learning algorithms can guarantee some diversity among the classifiers. To the contrary, stable learning algorithms such as NN and SVM generate similar classifiers in spite of small changes of the training data, and thus the correlation among the resulting classifiers is very high. This high correlation results in multicollinearity problem, which leads to performance degradation of the ensemble. Kim,s work (2009) showedthe performance comparison in bankruptcy prediction on Korea firms using tradition prediction algorithms such as NN, DT, and SVM. It reports that stable learning algorithms such as NN and SVM have higher predictability than the unstable DT. Meanwhile, with respect to their ensemble learning, DT ensemble shows the more improved performance than NN and SVM ensemble. Further analysis with variance inflation factor (VIF) analysis empirically proves that performance degradation of ensemble is due to multicollinearity problem. It also proposes that optimization of ensemble is needed to cope with such a problem. This paper proposes a hybrid system for coverage optimization of NN ensemble (CO-NN) in order to improve the performance of NN ensemble. Coverage optimization is a technique of choosing a sub-ensemble from an original ensemble to guarantee the diversity of classifiers in coverage optimization process. CO-NN uses GA which has been widely used for various optimization problems to deal with the coverage optimization problem. The GA chromosomes for the coverage optimization are encoded into binary strings, each bit of which indicates individual classifier. The fitness function is defined as maximization of error reduction and a constraint of variance inflation factor (VIF), which is one of the generally used methods to measure multicollinearity, is added to insure the diversity of classifiers by removing high correlation among the classifiers. We use Microsoft Excel and the GAs software package called Evolver. Experiments on company failure prediction have shown that CO-NN is effectively applied in the stable performance enhancement of NNensembles through the choice of classifiers by considering the correlations of the ensemble. The classifiers which have the potential multicollinearity problem are removed by the coverage optimization process of CO-NN and thereby CO-NN has shown higher performance than a single NN classifier and NN ensemble at 1% significance level, and DT ensemble at 5% significance level. However, there remain further research issues. First, decision optimization process to find optimal combination function should be considered in further research. Secondly, various learning strategies to deal with data noise should be introduced in more advanced further researches in the future.

Crop Yield Estimation Utilizing Feature Selection Based on Graph Classification (그래프 분류 기반 특징 선택을 활용한 작물 수확량 예측)

  • Ohnmar Khin;Sung-Keun Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1269-1276
    • /
    • 2023
  • Crop estimation is essential for the multinational meal and powerful demand due to its numerous aspects like soil, rain, climate, atmosphere, and their relations. The consequence of climate shift impacts the farming yield products. We operate the dataset with temperature, rainfall, humidity, etc. The current research focuses on feature selection with multifarious classifiers to assist farmers and agriculturalists. The crop yield estimation utilizing the feature selection approach is 96% accuracy. Feature selection affects a machine learning model's performance. Additionally, the performance of the current graph classifier accepts 81.5%. Eventually, the random forest regressor without feature selections owns 78% accuracy and the decision tree regressor without feature selections retains 67% accuracy. Our research merit is to reveal the experimental results of with and without feature selection significance for the proposed ten algorithms. These findings support learners and students in choosing the appropriate models for crop classification studies.

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.