• 제목/요약/키워드: Instance classification

검색결과 91건 처리시간 0.025초

역퍼지화 기반의 인스턴스 선택을 이용한 파킨슨병 분류 (Classification of Parkinson's Disease Using Defuzzification-Based Instance Selection)

  • 이상홍
    • 인터넷정보학회논문지
    • /
    • 제15권3호
    • /
    • pp.109-116
    • /
    • 2014
  • 본 논문에서는 분류 성능을 향상하기 위해서 Takagi-Sugeno(T-S) 퍼지 모델 기반의 가중 퍼지소속함수 기반 신경망(Neural Network with Weighted Fuzzy Membership Functions; NEWFM)을 이용한 새로운 인스턴스 선택을 제안하였다. 제안하는 인스턴스 선택은 T-S 퍼지 모델에서의 가중 평균 역퍼지화와 통계학에서 사용하는 정규분포의 신뢰구간과 같은 구간 선택을 이용하여 인스턴스를 선택하였다. 제안하는 인스턴스 선택의 분류 성능을 평가하기 위해서 인스턴스 사용 전/후에 따라서 분류 성능을 비교하였다. 인스턴스 사용 전/후에 따른 분류 성능은 각각 77.33%, 78.19%로 나타났다. 또한 인스턴스 사용 전/후에 따른 분류 성능 간에 차이점을 보여주기 위해서 통계학에서 사용하는 맥니마 검정을 사용하였다. 맥니마 검정의 결과로 유의 확률이 0.05보다 적게 나오므로 인스턴스 선택의 분류 성능이 인스턴스 선택을 하지 않는 경우의 분류 성능보다 우수함을 확인 할 수가 있었다.

엔트로피 기반 분할과 중심 인스턴스를 이용한 분류기법의 데이터 감소 (Data Reduction for Classification using Entropy-based Partitioning and Center Instances)

  • 손승현;김재련
    • 산업경영시스템학회지
    • /
    • 제29권2호
    • /
    • pp.13-19
    • /
    • 2006
  • The instance-based learning is a machine learning technique that has proven to be successful over a wide range of classification problems. Despite its high classification accuracy, however, it has a relatively high storage requirement and because it must search through all instances to classify unseen cases, it is slow to perform classification. In this paper, we have presented a new data reduction method for instance-based learning that integrates the strength of instance partitioning and attribute selection. Experimental results show that reducing the amount of data for instance-based learning reduces data storage requirements, lowers computational costs, minimizes noise, and can facilitates a more rapid search.

Impact of Instance Selection on kNN-Based Text Categorization

  • Barigou, Fatiha
    • Journal of Information Processing Systems
    • /
    • 제14권2호
    • /
    • pp.418-434
    • /
    • 2018
  • With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction.

Hybrid Case-based Reasoning and Genetic Algorithms Approach for Customer Classification

  • Kim Kyoung-jae;Ahn Hyunchul
    • Journal of information and communication convergence engineering
    • /
    • 제3권4호
    • /
    • pp.209-212
    • /
    • 2005
  • This study proposes hybrid case-based reasoning and genetic algorithms model for customer classification. In this study, vertical and horizontal dimensions of the research data are reduced through integrated feature and instance selection process using genetic algorithms. We applied the proposed model to customer classification model which utilizes customers' demographic characteristics as inputs to predict their buying behavior for the specific product. Experimental results show that the proposed model may improve the classification accuracy and outperform various optimization models of typical CBR system.

사례 선택 기법을 활용한 앙상블 모형의 성능 개선 (Improving an Ensemble Model Using Instance Selection Method)

  • 민성환
    • 산업경영시스템학회지
    • /
    • 제39권1호
    • /
    • pp.105-115
    • /
    • 2016
  • Ensemble classification involves combining individually trained classifiers to yield more accurate prediction, compared with individual models. Ensemble techniques are very useful for improving the generalization ability of classifiers. The random subspace ensemble technique is a simple but effective method for constructing ensemble classifiers; it involves randomly drawing some of the features from each classifier in the ensemble. The instance selection technique involves selecting critical instances while deleting and removing irrelevant and noisy instances from the original dataset. The instance selection and random subspace methods are both well known in the field of data mining and have proven to be very effective in many applications. However, few studies have focused on integrating the instance selection and random subspace methods. Therefore, this study proposed a new hybrid ensemble model that integrates instance selection and random subspace techniques using genetic algorithms (GAs) to improve the performance of a random subspace ensemble model. GAs are used to select optimal (or near optimal) instances, which are used as input data for the random subspace ensemble model. The proposed model was applied to both Kaggle credit data and corporate credit data, and the results were compared with those of other models to investigate performance in terms of classification accuracy, levels of diversity, and average classification rates of base classifiers in the ensemble. The experimental results demonstrated that the proposed model outperformed other models including the single model, the instance selection model, and the original random subspace ensemble model.

On the Performance of Cuckoo Search and Bat Algorithms Based Instance Selection Techniques for SVM Speed Optimization with Application to e-Fraud Detection

  • AKINYELU, Andronicus Ayobami;ADEWUMI, Aderemi Oluyinka
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권3호
    • /
    • pp.1348-1375
    • /
    • 2018
  • Support Vector Machine (SVM) is a well-known machine learning classification algorithm, which has been widely applied to many data mining problems, with good accuracy. However, SVM classification speed decreases with increase in dataset size. Some applications, like video surveillance and intrusion detection, requires a classifier to be trained very quickly, and on large datasets. Hence, this paper introduces two filter-based instance selection techniques for optimizing SVM training speed. Fast classification is often achieved at the expense of classification accuracy, and some applications, such as phishing and spam email classifiers, are very sensitive to slight drop in classification accuracy. Hence, this paper also introduces two wrapper-based instance selection techniques for improving SVM predictive accuracy and training speed. The wrapper and filter based techniques are inspired by Cuckoo Search Algorithm and Bat Algorithm. The proposed techniques are validated on three popular e-fraud types: credit card fraud, spam email and phishing email. In addition, the proposed techniques are validated on 20 other datasets provided by UCI data repository. Moreover, statistical analysis is performed and experimental results reveals that the filter-based and wrapper-based techniques significantly improved SVM classification speed. Also, results reveal that the wrapper-based techniques improved SVM predictive accuracy in most cases.

기상레이더를 이용한 뉴로-퍼지 알고리즘 기반 강수/비강수 패턴분류 시스템 설계 : 사례 분류기 및 에코 분류기 (Design of Precipitation/non-precipitation Pattern Classification System based on Neuro-fuzzy Algorithm using Meteorological Radar Data : Instance Classifier and Echo Classifier)

  • 고준현;김현기;오성권
    • 전기학회논문지
    • /
    • 제64권7호
    • /
    • pp.1114-1124
    • /
    • 2015
  • In this paper, precipitation / non-precipitation pattern classification of meteorological radar data is conducted by using neuro-fuzzy algorithm. Structure expression of meteorological radar data information is analyzed in order to effectively classify precipitation and non-precipitation. Also diverse input variables for designing pattern classifier could be considered by exploiting the quantitative as well as qualitative characteristic of meteorological radar data information and then each characteristic of input variables is analyzed. Preferred pattern classifier can be designed by essential input variables that give a decisive effect on output performance as well as model architecture. As the proposed model architecture, neuro-fuzzy algorithm is designed by using FCM-based radial basis function neural network(RBFNN). Two parts of classifiers such as instance classifier part and echo classifier part are designed and carried out serially in the entire system architecture. In the instance classifier part, the pattern classifier identifies between precipitation and non-precipitation data. In the echo classifier part, because precipitation data information identified by the instance classifier could partially involve non-precipitation data information, echo classifier is considered to classify between them. The performance of the proposed classifier is evaluated and analyzed when compared with existing QC method.

확장된 Relief-F 알고리즘을 이용한 소규모 크기 문서의 자동분류 (Document Classification of Small Size Documents Using Extended Relief-F Algorithm)

  • 박흠
    • 정보처리학회논문지B
    • /
    • 제16B권3호
    • /
    • pp.233-238
    • /
    • 2009
  • 자질 수가 적은 소규모 크기 문서들의 자동분류는 좋은 성능을 얻기 어렵다. 그 이유는 문서집단 전체의 자질 수는 크지만 단위 문서 내 자질 수가 상대적으로 너무 적기 때문에 문서간 유사도가 너무 낮아 우수한 분류 알고리즘을 적용해도 좋은 성능을 얻지 못한다. 특히 웹 디렉토리 문서들의 자동분류에서나, 디스크 복구 작업에서 유사도 평가와 자동분류로 연결되지 않은 섹터를 연결하는 작업에서와 같은 소규모 크기 문서의 자동분류에서는 좋은 성능을 얻지 못한다. 따라서 본 논문에서는 소규모 크기 문서의 자동분류에서의 문제점을 해결하기 위해 분류 사전작업으로, 예제기반 자질 필터링 방법 Relief-F알고리즘을 소규모 문서 내 자질 필터링에 적합한 ERelief-F 알고리즘을 제시한다. 또 비교 실험을 위해, 기존의 자질 필터링 방법 중 Odds Ratio와 정보이득, 또 Relief-F 알고리즘을 함께 실험하여 분류결과를 비교하였다. 그 결과, ERelief-F 알고리즘을 사용했을 때의 결과가 정보이득과 Odds Ratio, Relief-F보다 월등히 우수한 성능을 보였고 부적절한 자질도 많이 줄일 수 있었다.

Deterministic and probabilistic analysis of tunnel face stability using support vector machine

  • Li, Bin;Fu, Yong;Hong, Yi;Cao, Zijun
    • Geomechanics and Engineering
    • /
    • 제25권1호
    • /
    • pp.17-30
    • /
    • 2021
  • This paper develops a convenient approach for deterministic and probabilistic evaluations of tunnel face stability using support vector machine classifiers. The proposed method is comprised of two major steps, i.e., construction of the training dataset and determination of instance-based classifiers. In step one, the orthogonal design is utilized to produce representative samples after the ranges and levels of the factors that influence tunnel face stability are specified. The training dataset is then labeled by two-dimensional strength reduction analyses embedded within OptumG2. For any unknown instance, the second step applies the training dataset for classification, which is achieved by an ad hoc Python program. The classification of unknown samples starts with selection of instance-based training samples using the k-nearest neighbors algorithm, followed by the construction of an instance-based SVM-KNN classifier. It eventually provides labels of the unknown instances, avoiding calculate its corresponding performance function. Probabilistic evaluations are performed by Monte Carlo simulation based on the SVM-KNN classifier. The ratio of the number of unstable samples to the total number of simulated samples is computed and is taken as the failure probability, which is validated and compared with the response surface method.

Improved Sliding Shapes for Instance Segmentation of Amodal 3D Object

  • Lin, Jinhua;Yao, Yu;Wang, Yanjie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권11호
    • /
    • pp.5555-5567
    • /
    • 2018
  • State-of-art instance segmentation networks are successful at generating 2D segmentation mask for region proposals with highest classification score, yet 3D object segmentation task is limited to geocentric embedding or detector of Sliding Shapes. To this end, we propose an amodal 3D instance segmentation network called A3IS-CNN, which extends the detector of Deep Sliding Shapes to amodal 3D instance segmentation by adding a new branch of 3D ConvNet called A3IS-branch. The A3IS-branch which takes 3D amodal ROI as input and 3D semantic instances as output is a fully convolution network(FCN) sharing convolutional layers with existing 3d RPN which takes 3D scene as input and 3D amodal proposals as output. For two branches share computation with each other, our 3D instance segmentation network adds only a small overhead of 0.25 fps to Deep Sliding Shapes, trading off accurate detection and point-to-point segmentation of instances. Experiments show that our 3D instance segmentation network achieves at least 10% to 50% improvement over the state-of-art network in running time, and outperforms the state-of-art 3D detectors by at least 16.1 AP.