• Title/Summary/Keyword: negative selection algorithm

Search Result 41, Processing Time 0.027 seconds

GEP-based Framework for Immune-Inspired Intrusion Detection

  • Tang, Wan;Peng, Limei;Yang, Ximin;Xie, Xia;Cao, Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.4 no.6
    • /
    • pp.1273-1293
    • /
    • 2010
  • Immune-inspired intrusion detection is a promising technology for network security, and well known for its diversity, adaptation, self-tolerance, etc. However, scalability and coverage are two major drawbacks of the immune-inspired intrusion detection systems (IIDSes). In this paper, we propose an IIDS framework, named GEP-IIDS, with improved basic system elements to address these two problems. First, an additional bio-inspired technique, gene expression programming (GEP), is introduced in detector (corresponding to detection rules) representation. In addition, inspired by the avidity model of immunology, new avidity/affinity functions taking the priority of attributes into account are given. Based on the above two improved elements, we also propose a novel immune algorithm that is capable of integrating two bio-inspired mechanisms (i.e., negative selection and positive selection) by using a balance factor. Finally, a pruning algorithm is given to reduce redundant detectors that consume footprint and detection time but do not contribute to improving performance. Our experimental results show the feasibility and effectiveness of our solution to handle the scalability and coverage problems of IIDS.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

A Genetic Algorithm for A Cell Formation with Multiple Objectives (다목적 셀 형성을 위한 유전알고리즘)

  • 이준수;정병호
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.26 no.4
    • /
    • pp.31-41
    • /
    • 2003
  • This paper deals with a cell formation problem for a set of m-machines and n-processing parts. Generally, a cell formation problem is known as NP-completeness. Hence the cell formation problem with multiple objectives is more difficult than single objective problem. The paper considers multiple objectives; minimize number of intercell movements, minimize intracell workload variation and minimize intercell workload variation. We propose a multiple objective genetic algorithms(MOGA) resolving the mentioned three objectives. The MOGA procedure adopted Pareto optimal solution for selection method for next generation and the concept of Euclidean distance from the ideal and negative ideal solution for fitness test of a individual. As we consider several weights, decision maker will be reflected his consideration by adjusting high weights for important objective. A numerical example is given for a comparative analysis with the results of other research.

Modelling of Artificial Immune System for Development of Computer Immune system and Self Recognition Algorithm (컴퓨터 면역시스템 개발을 위한 인공면역계의 모델링과 자기인식 알고리즘)

  • Sim, Kwee-Bo;Kim, Dae-Su;Seo, Dong-Il;Rim, Kee-Wook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.12 no.1
    • /
    • pp.52-60
    • /
    • 2002
  • According as many people use a computer newly, damage of computer virus and hacking is rapidly increasing by the crucial users. A computer virus is one of program in computer and has abilities of self reproduction and destruction like a virus of biology. And hacking is to rob a person's data in a intruded computer and to delete data in a Person s computer from the outside. To block hacking that is intrusion of a person's computer and the computer virus that destroys data, a study for intrusion detection of system and virus detection using a biological immune system is in progress. In this paper, we make a model of positive and negative selection for self recognition which have a similar function like T-cytotoxic cell that plays an important role in biological immune system. We embody a self-nonself distinction algorithm in computer, which is an important part when we detect an infected data by computer virus and a modified data by intrusion from the outside. And we showed the validity and effectiveness of the proposed self recognition algorithm by computer simulation about various infected data obtained from the cell change and string change in the self file.

Online Identification for Normal and Abnormal Status of Water Quality on Ocean USN (해양 USN 환경에서 수질환경의 온라인 정상·비정상 상태 구분)

  • Jeoung, Sin-Chul;Ceong, Hee-Taek
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.7 no.4
    • /
    • pp.905-915
    • /
    • 2012
  • This paper suggests the online method to identify normal and abnormal state of water quality on the ocean USN. To define normal of the ocean water quality, we utilize the negative selection algorithm of artificial immunity system which has self and nonself identification characteristics. To distinguish abnormal status, normal state set of the ocean water quality needs to be defined. For this purpose, we generate normal state set base on mutations of each data and mutation of the data as logical product. This mutated normal (or self) sets used to identify abnormal status of the water quality. We represent the experimental result about mutated self set with the Gaussian function. Through setting the method on the ocean sensor logger, we can monitor whether the ocean water quality is normal or abnormal state by online.

A Pre-processing Study to Solve the Problem of Rare Class Classification of Network Traffic Data (네트워크 트래픽 데이터의 희소 클래스 분류 문제 해결을 위한 전처리 연구)

  • Ryu, Kyung Joon;Shin, DongIl;Shin, DongKyoo;Park, JeongChan;Kim, JinGoog
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.12
    • /
    • pp.411-418
    • /
    • 2020
  • In the field of information security, IDS(Intrusion Detection System) is normally classified in two different categories: signature-based IDS and anomaly-based IDS. Many studies in anomaly-based IDS have been conducted that analyze network traffic data generated in cyberspace by machine learning algorithms. In this paper, we studied pre-processing methods to overcome performance degradation problems cashed by rare classes. We experimented classification performance of a Machine Learning algorithm by reconstructing data set based on rare classes and semi rare classes. After reconstructing data into three different sets, wrapper and filter feature selection methods are applied continuously. Each data set is regularized by a quantile scaler. Depp neural network model is used for learning and validation. The evaluation results are compared by true positive values and false negative values. We acquired improved classification performances on all of three data sets.

Decision of the Node Decomposition Type for the Minimization of OPKFDDs (OPKFDD 최소화를 위한 노드의 확장형 결정)

  • Jung, Mi-Gyoung;Hwang, Min;Lee, Guee-Sang;Kim, Young-Chul
    • The KIPS Transactions:PartA
    • /
    • v.9A no.3
    • /
    • pp.363-370
    • /
    • 2002
  • OPKFDD (Ordered Pseudo-Kronecker Functional Decision Diagram) is one of ordered-DDs (Decision Diagrams) in which each node can take one of three decomposition types : Shannon, positive Davio and negative Davio decompositions. Whereas OBDD (Ordered Binary Decision Diagram) uses only the Shannon decomposition in each node, OPKFDD uses the three decompositions and generates representations of functions with smaller number of nodes than other DDs. However, this leads to the extreme difficulty of getting an optimal solution for the minimization of OPKFDD. Since an appropriate decomposition type has to be chosen for each node, the size of the representation is decided by the selection of the decomposition type. We propose a heuristic method to generate OPKFDD efficiently from the OBDD of the given function and the algorithm of the decision of decomposition type for a given variable ordering. Experimental results demonstrate the performance of the algorithm.

Feature Analysis of Ultrasonic Signals for Diagnosis of Welding Faults in Tubular Steel Tower (관형 철탑 용접 결함 진단을 위한 초음파 신호의 특징 분석)

  • Min, Tae-Hong;Yu, Hyeon-Tak;Kim, Hyeong-Jin;Choi, Byeong-Keun;Kim, Hyun-Sik;Lee, Gi-Seung;Kang, Seog-Geun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.4
    • /
    • pp.515-522
    • /
    • 2021
  • In this paper, we present and analyze a method of applying a machine learning to ultrasonic test signals for constant monitoring of the welding faults in a tubular steel tower. For the machine learning, feature selection based on genetic algorithm and fault signal classification using a support vector machine have been used. In the feature selection, the peak value, histogram lower bound, and normal negative log-likelihood from 30 features are selected. Those features clearly indicate the difference of signals according to the depth of faults. In addition, as a result of applying the selected features to the support vector machine, it has been possible to perfectly distinguish between the regions with and without faults. Hence, it is expected that the results of this study will be useful in the development of an early detection system for fault growth based on ultrasonic signals and in the energy transmission related industries in the future.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.