• Title/Summary/Keyword: UCI repository

Search Result 74, Processing Time 0.025 seconds

Ensemble Learning of Region Based Classifiers (지역 기반 분류기의 앙상블 학습)

  • Choi, Sung-Ha;Lee, Byung-Woo;Yang, Ji-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.14B no.4
    • /
    • pp.303-310
    • /
    • 2007
  • In machine learning, the ensemble classifier that is a set of classifiers have been introduced for higher accuracy than individual classifiers. We propose a new ensemble learning method that employs a set of region based classifiers. To show the performance of the proposed method. we compared its performance with that of bagging and boosting, which ard existing ensemble methods. Since the distribution of data can be different in different regions in the feature space, we split the data and generate classifiers based on each region and apply a weighted voting among the classifiers. We used 11 data sets from the UCI Machine Learning Repository to compare the performance of our new ensemble method with that of individual classifiers as well as existing ensemble methods such as bagging and boosting. As a result, we found that our method produced improved performance, particularly when the base learner is Naive Bayes or SVM.

An Incremental Rule Extraction Algorithm Based on Recursive Partition Averaging (재귀적 분할 평균에 기반한 점진적 규칙 추출 알고리즘)

  • Han, Jin-Chul;Kim, Sang-Kwi;Yoon, Chung-Hwa
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.1
    • /
    • pp.11-17
    • /
    • 2007
  • One of the popular methods used for pattern classification is the MBR (Memory-Based Reasoning) algorithm. Since it simply computes distances between a test pattern and training patterns or hyperplanes stored in memory, and then assigns the class of the nearest training pattern, it cannot explain how the classification result is obtained. In order to overcome this problem, we propose an incremental teaming algorithm based on RPA (Recursive Partition Averaging) to extract IF-THEN rules that describe regularities inherent in training patterns. But rules generated by RPA eventually show an overfitting phenomenon, because they depend too strongly on the details of given training patterns. Also RPA produces more number of rules than necessary, due to over-partitioning of the pattern space. Consequently, we present the IREA (Incremental Rule Extraction Algorithm) that overcomes overfitting problem by removing useless conditions from rules and reduces the number of rules at the same time. We verify the performance of proposed algorithm using benchmark data sets from UCI Machine Learning Repository.

Genetic Algorithm for Node P겨ning of Neural Networks (신경망의 노드 가지치기를 위한 유전 알고리즘)

  • Heo, Gi-Su;Oh, Il-Seok
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.2
    • /
    • pp.65-74
    • /
    • 2009
  • In optimizing the neural network structure, there are two methods of the pruning scheme and the constructive scheme. In this paper we use the pruning scheme to optimize neural network structure, and the genetic algorithm to find out its optimum node pruning. In the conventional researches, the input and hidden layers were optimized separately. On the contrary we attempted to optimize the two layers simultaneously by encoding two layers in a chromosome. The offspring networks inherit the weights from the parent. For teaming, we used the existing error back-propagation algorithm. In our experiment with various databases from UCI Machine Learning Repository, we could get the optimal performance when the network size was reduced by about $8{\sim}25%$. As a result of t-test the proposed method was shown better performance, compared with other pruning and construction methods through the cross-validation.

A Hybrid Clustering Technique for Processing Large Data (대용량 데이터 처리를 위한 하이브리드형 클러스터링 기법)

  • Kim, Man-Sun;Lee, Sang-Yong
    • The KIPS Transactions:PartB
    • /
    • v.10B no.1
    • /
    • pp.33-40
    • /
    • 2003
  • Data mining plays an important role in a knowledge discovery process and various algorithms of data mining can be selected for the specific purpose. Most of traditional hierachical clustering methode are suitable for processing small data sets, so they difficulties in handling large data sets because of limited resources and insufficient efficiency. In this study we propose a hybrid neural networks clustering technique, called PPC for Pre-Post Clustering that can be applied to large data sets and find unknown patterns. PPC combinds an artificial intelligence method, SOM and a statistical method, hierarchical clustering technique, and clusters data through two processes. In pre-clustering process, PPC digests large data sets using SOM. Then in post-clustering, PPC measures Similarity values according to cohesive distances which show inner features, and adjacent distances which show external distances between clusters. At last PPC clusters large data sets using the simularity values. Experiment with UCI repository data showed that PPC had better cohensive values than the other clustering techniques.

Formation of Nearest Neighbors Set Based on Similarity Threshold (유사도 임계치에 근거한 최근접 이웃 집합의 구성)

  • Lee, Jae-Sik;Lee, Jin-Chun
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.2
    • /
    • pp.1-14
    • /
    • 2007
  • Case-based reasoning (CBR) is one of the most widely applied data mining techniques and has proven its effectiveness in various domains. Since CBR is basically based on k-Nearest Neighbors (NN) method, the value of k affects the performance of CBR model directly. Once the value of k is set, it is fixed for the lifetime of the CBR model. However, if the value is set greater or smaller than the optimal value, the performance of CBR model will be deteriorated. In this research, we propose a new method of composing the NN set using similarity scores as themselves, which we shall call s-NN method, rather than using the fixed value of k. In the s-NN method, the different number of nearest neighbors can be selected for each new case. Performance evaluation using the data from UCI Machine Learning Repository shows that the CBR model adopting the s-NN method outperforms the CBR model adopting the traditional k-NN method.

  • PDF

Output Power Prediction of Combined Cycle Power Plant using Logic-based Tree Structured Fuzzy Neural Networks (로직에 기반 한 트리 구조의 퍼지 뉴럴 네트워크를 이용한 복합 화력 발전소의 출력 예측)

  • Han, Chang-Wook;Lee, Don-Kyu
    • Journal of IKEEE
    • /
    • v.23 no.2
    • /
    • pp.529-533
    • /
    • 2019
  • Combined cycle power plants are often used to produce power. These days prediction of power plant output based on operating parameters is a major concern. This paper presents an approach to using computational intelligence technique to predict the output power of combined cycle power plant. Computational intelligence techniques have been developed and applied to many real world problems. In this paper, tree architectures of fuzzy neural networks are considered to predict the output power. Tree architectures of fuzzy neural networks have an advantage of reducing the number of rules by selecting fuzzy neurons as nodes and relevant inputs as leaves optimally. For the optimization of the networks, two-step optimization method is used. Genetic algorithms optimize the binary structure of the networks by selecting the nodes and leaves as binary, and followed by random signal-based learning further refines the optimized binary connections in the unit interval. To verify the effectiveness of the proposed method, combined cycle power plant dataset obtained from the UCI Machine Learning Repository Database is considered.

Feature selection and Classification of Heart attack Using NEWFM of Neural Network (뉴럴네트워크(NEWFM)를 이용한 심근경색의 특징추출과 분류)

  • Yoon, Heejin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.5
    • /
    • pp.151-155
    • /
    • 2019
  • Recently heart attack is 80% of the sudden death of elderly. The causes of a heart attack are complex and sudden, and it is difficult to predict the onset even if prevention or medical examination is performed. Therefore, early diagnosis and proper treatment are the most important. In this paper, we show the accuracy of normal and abnormal classification with neural network using weighted fuzzy function for accurate and rapid diagnosis of myocardial infarction. The data used in the experiment was data from the UCI Machine Learning Repository, which consists of 14 features and 303 sample data. The algorithm for feature selection uses the average of weight method. Two features were selected and removed. Heart attack was classified into normal and abnormal(1-normal, 2-abnormal) using the average of weight method. The test result for the diagnosis of heart attack using a weighted fuzzy neural network showed 87.66% accuracy.

A Representative Pattern Generation Algorithm Based on Evaluation And Selection (평가와 선택기법에 기반한 대표패턴 생성 알고리즘)

  • Yih, Hyeong-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.139-147
    • /
    • 2009
  • The memory based reasoning just stores in the memory in the form of the training pattern of the representative pattern. And it classifies through the distance calculation with the test pattern. Because it uses the techniques which stores the training pattern whole in the memory or in which it replaces training patterns with the representative pattern. Due to this, the memory in which it is a lot for the other machine learning techniques is required. And as the moreover stored training pattern increases, the time required for a classification is very much required. In this paper, We propose the EAS(Evaluation And Selection) algorithm in order to minimize memory usage and to improve classification performance. After partitioning the training space, this evaluates each partitioned space as MDL and PM method. The partitioned space in which the evaluation result is most excellent makes into the representative pattern. Remainder partitioned spaces again partitions and repeat the evaluation. We verify the performance of Proposed algorithm using benchmark data sets from UCI Machine Learning Repository.

Distributed Genetic Algorithm using Automatic Migration Control (분산 유전 알고리즘에서 자동 마이그레이션 조절방법)

  • Lee, Hyun-Jung;Na, Yong-Chan;Yang, Ji-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.17B no.2
    • /
    • pp.157-162
    • /
    • 2010
  • We present a new distributed genetic algorithm that can be used to extract useful information from distributed, large data over the network. The main idea of the proposed algorithms is to determine how many and which individuals move between subpopulations at each site adaptively. In addition, we present a method to help individuals from other subpopulations not be weeded out but adapt to the new subpopulation. We used six data sets from UCI Machine Learning Repository to compare the performance of our approach with that of the single, centralized genetic algorithm. As a result, the proposed algorithm produced better performance than the single genetic algorithm in terms of the classification accuracy with the feature subsets.

SVM based Clustering Technique for Processing High Dimensional Data (고차원 데이터 처리를 위한 SVM기반의 클러스터링 기법)

  • Kim, Man-Sun;Lee, Sang-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.7
    • /
    • pp.816-820
    • /
    • 2004
  • Clustering is a process of dividing similar data objects in data set into clusters and acquiring meaningful information in the data. The main issues related to clustering are the effective clustering of high dimensional data and optimization. This study proposed a method of measuring similarity based on SVM and a new method of calculating the number of clusters in an efficient way. The high dimensional data are mapped to Feature Space ones using kernel functions and then similarity between neighboring clusters is measured. As for created clusters, the desired number of clusters can be got using the value of similarity measured and the value of Δd. In order to verify the proposed methods, the author used data of six UCI Machine Learning Repositories and obtained the presented number of clusters as well as improved cohesiveness compared to the results of previous researches.