• Title/Summary/Keyword: UCI machine learning repository

Search Result 57, Processing Time 0.027 seconds

Diabetes Prediction with the TCN-Prophet model using UCI Machine Learning Repository (UCI machine learning repository 사용한 TCN-Prophet 기반 당뇨병 예측 )

  • Tan Tianbo;Inwhee Joe
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.325-327
    • /
    • 2023
  • Diabetes is a common chronic disease that threatens human life and health, and its prevalence remains high because its mechanisms are complex, further its etiology remains unclear. According to the International Diabetes Federation (IDF), there are 463 million cases of diabetes in adults worldwide, and the number is growing. This study aims to explore the potential influencing factors of diabetes by learning data from the UCI diabetes dataset, which is a multivariate time series dataset. In this paper we propose the TCN-prophet model for diabetes. The experimental results show that the prediction of insulin concentration by the TCN-prophet model provides a high degree of consistency, compared to the existing LSTM model.

Multi-Sensor Signal based Situation Recognition with Bayesian Networks

  • Kim, Jin-Pyung;Jang, Gyu-Jin;Jung, Jae-Young;Kim, Moon-Hyun
    • Journal of Electrical Engineering and Technology
    • /
    • v.9 no.3
    • /
    • pp.1051-1059
    • /
    • 2014
  • In this paper, we propose an intelligent situation recognition model by collecting and analyzing multiple sensor signals. Multiple sensor signals are collected for fixed time window. A training set of collected sensor data for each situation is provided to K2-learning algorithm to generate Bayesian networks representing causal relationship between sensors for the situation. Statistical characteristics of sensor values and topological characteristics of generated graphs are learned for each situation. A neural network is designed to classify the current situation based on the extracted features from collected multiple sensor values. The proposed method is implemented and tested with UCI machine learning repository data.

A Differential Evolution based Support Vector Clustering (차분진화 기반의 Support Vector Clustering)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.5
    • /
    • pp.679-683
    • /
    • 2007
  • Statistical learning theory by Vapnik consists of support vector machine(SVM), support vector regression(SVR), and support vector clustering(SVC) for classification, regression, and clustering respectively. In this algorithms, SVC is good clustering algorithm using support vectors based on Gaussian kernel function. But, similar to SVM and SVR, SVC needs to determine kernel parameters and regularization constant optimally. In general, the parameters have been determined by the arts of researchers and grid search which is demanded computing time heavily. In this paper, we propose a differential evolution based SVC(DESVC) which combines differential evolution into SVC for efficient selection of kernel parameters and regularization constant. To verify improved performance of our DESVC, we make experiments using the data sets from UCI machine learning repository and simulation.

Ensemble Learning of Region Based Classifiers (지역 기반 분류기의 앙상블 학습)

  • Choi, Sung-Ha;Lee, Byung-Woo;Yang, Ji-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.14B no.4
    • /
    • pp.303-310
    • /
    • 2007
  • In machine learning, the ensemble classifier that is a set of classifiers have been introduced for higher accuracy than individual classifiers. We propose a new ensemble learning method that employs a set of region based classifiers. To show the performance of the proposed method. we compared its performance with that of bagging and boosting, which ard existing ensemble methods. Since the distribution of data can be different in different regions in the feature space, we split the data and generate classifiers based on each region and apply a weighted voting among the classifiers. We used 11 data sets from the UCI Machine Learning Repository to compare the performance of our new ensemble method with that of individual classifiers as well as existing ensemble methods such as bagging and boosting. As a result, we found that our method produced improved performance, particularly when the base learner is Naive Bayes or SVM.

Proposal of Weight Adjustment Methods Using Statistical Information in Fuzzy Weighted Mean Classifiers (퍼지 가중치 평균 분류기에서 통계 정보를 활용한 가중치 설정 기법의 제안)

  • Woo, Young-Woon;Heo, Gyeong-Yong;Kim, Kwang-Baek
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.7
    • /
    • pp.9-15
    • /
    • 2009
  • The fuzzy weighted mean classifier is one of the most common classification models and could achieve high performance by adjusting the weights. However, the weights were generally decided based on the experience of experts, which made the resulting classifiers to suffer the lack of consistency and objectivity. To resolve this problem, in this paper, a weight deciding method based on the statistics of the data is introduced, which ensures the learned classifiers to be consistent and objective. To investigate the effectiveness of the proposed methods, Iris data set available from UCI machine learning repository is used and promising results are obtained.

A Representative Pattern Generation Algorithm Based on Evaluation And Selection (평가와 선택기법에 기반한 대표패턴 생성 알고리즘)

  • Yih, Hyeong-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.139-147
    • /
    • 2009
  • The memory based reasoning just stores in the memory in the form of the training pattern of the representative pattern. And it classifies through the distance calculation with the test pattern. Because it uses the techniques which stores the training pattern whole in the memory or in which it replaces training patterns with the representative pattern. Due to this, the memory in which it is a lot for the other machine learning techniques is required. And as the moreover stored training pattern increases, the time required for a classification is very much required. In this paper, We propose the EAS(Evaluation And Selection) algorithm in order to minimize memory usage and to improve classification performance. After partitioning the training space, this evaluates each partitioned space as MDL and PM method. The partitioned space in which the evaluation result is most excellent makes into the representative pattern. Remainder partitioned spaces again partitions and repeat the evaluation. We verify the performance of Proposed algorithm using benchmark data sets from UCI Machine Learning Repository.

Improvement of SOM using Stratification

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.9 no.1
    • /
    • pp.36-41
    • /
    • 2009
  • Self organizing map(SOM) is one of the unsupervised methods based on the competitive learning. Many clustering works have been performed using SOM. It has offered the data visualization according to its result. The visualized result has been used for decision process of descriptive data mining as exploratory data analysis. In this paper we propose improvement of SOM using stratified sampling of statistics. The stratification leads to improve the performance of SOM. To verify improvement of our study, we make comparative experiments using the data sets form UCI machine learning repository and simulation data.

A dominant hyperrectangle generation technique of classification using IG partitioning (정보이득 분할을 이용한 분류기법의 지배적 초월평면 생성기법)

  • Lee, Hyeong-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.1
    • /
    • pp.149-156
    • /
    • 2014
  • NGE(Nested Generalized Exemplar Method) can increase the performance of the noisy data at the same time, can reduce the size of the model. It is the optimal distance-based classification method using a matching rule. NGE cross or overlap hyperrectangles generated in the learning has been noted to inhibit the factors. In this paper, We propose the DHGen(Dominant Hyperrectangle Generation) algorithm which avoids the overlapping and the crossing between hyperrectangles, uses interval weights for mixed hyperrectangles to be splited based on the mutual information. The DHGen improves the classification performance and reduces the number of hyperrectangles by processing the training set in an incremental manner. The proposed DHGen has been successfully shown to exhibit comparable classification performance to k-NN and better result than EACH system which implements the NGE theory using benchmark data sets from UCI Machine Learning Repository.

Support Vector Machine based on Stratified Sampling

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.9 no.2
    • /
    • pp.141-146
    • /
    • 2009
  • Support vector machine is a classification algorithm based on statistical learning theory. It has shown many results with good performances in the data mining fields. But there are some problems in the algorithm. One of the problems is its heavy computing cost. So we have been difficult to use the support vector machine in the dynamic and online systems. To overcome this problem we propose to use stratified sampling of statistical sampling theory. The usage of stratified sampling supports to reduce the size of training data. In our paper, though the size of data is small, the performance accuracy is maintained. We verify our improved performance by experimental results using data sets from UCI machine learning repository.

Fuzzy Clustering Model using Principal Components Analysis and Naive Bayesian Classifier (주성분 분석과 나이브 베이지안 분류기를 이용한 퍼지 군집화 모형)

  • Jun, Sung-Hae
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.485-490
    • /
    • 2004
  • In data representation, the clustering performs a grouping process which combines given data into some similar clusters. The various similarity measures have been used in many researches. But, the validity of clustering results is subjective and ambiguous, because of difficulty and shortage about objective criterion of clustering. The fuzzy clustering provides a good method for subjective clustering problems. It performs clustering through the similarity matrix which has fuzzy membership value for assigning each object. In this paper, for objective fuzzy clustering, the clustering algorithm which joins principal components analysis as a dimension reduction model with bayesian learning as a statistical learning theory. For performance evaluation of proposed algorithm, Iris and Glass identification data from UCI Machine Learning repository are used. The experimental results shows a happy outcome of proposed model.