• Title/Summary/Keyword: Datasets

Search Result 2,085, Processing Time 0.025 seconds

Two-Step Filtering Datamining Method Integrating Case-Based Reasoning and Rule Induction

  • Park, Yoon-Joo;Chol, En-Mi;Park, Soo-Hyun
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2007.05a
    • /
    • pp.329-337
    • /
    • 2007
  • Case-based reasoning (CBR) methods are applied to various target problems on the supposition that previous cases are sufficiently similar to current target problems, and the results of previous similar cases support the same result consistently. However, these assumptions are not applicable for some target cases. There are some target cases that have no sufficiently similar cases, or if they have, the results of these previous cases are inconsistent. That is, the appropriateness of CBR is different for each target case, even though they are problems in the same domain. Thus, applying CBR to whole datasets in a domain is not reasonable. This paper presents a new hybrid datamining technique called two-step filtering CBR and Rule Induction (TSFCR), which dynamically selects either CBR or RI for each target case, taking into consideration similarities and consistencies of previous cases. We apply this method to three medical diagnosis datasets and one credit analysis dataset in order to demonstrate that TSFCR outperforms the genuine CBR and RI.

  • PDF

Facial Expression Recognition using 1D Transform Features and Hidden Markov Model

  • Jalal, Ahmad;Kamal, Shaharyar;Kim, Daijin
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.4
    • /
    • pp.1657-1662
    • /
    • 2017
  • Facial expression recognition systems using video devices have emerged as an important component of natural human-machine interfaces which contribute to various practical applications such as security systems, behavioral science and clinical practices. In this work, we present a new method to analyze, represent and recognize human facial expressions using a sequence of facial images. Under our proposed facial expression recognition framework, the overall procedure includes: accurate face detection to remove background and noise effects from the raw image sequences and align each image using vertex mask generation. Furthermore, these features are reduced by principal component analysis. Finally, these augmented features are trained and tested using Hidden Markov Model (HMM). The experimental evaluation demonstrated the proposed approach over two public datasets such as Cohn-Kanade and AT&T datasets of facial expression videos that achieved expression recognition results as 96.75% and 96.92%. Besides, the recognition results show the superiority of the proposed approach over the state of the art methods.

3D Shape Recovery using Line Fitting (Line Fitting 을 이용한 삼차원 형상복원)

  • Shim, Seong-O;Malik, Aamir Saeed;Choi, Tae-Sun
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.905-906
    • /
    • 2008
  • This paper presents a method where the best focues points are calculated using line fitting. Two datasets are selected for each pixel based on the maximum value which is calculated using Laplacian operator. Then linear regression model is used to find lines that approximate these datasets. The best fit lines are found using least squares method. After approximating the two lines, their intersection point is calculated and weights are assigned to calculate the new value for the depth map.

  • PDF

Robust Similarity Measure for Spectral Clustering Based on Shared Neighbors

  • Ye, Xiucai;Sakurai, Tetsuya
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.540-550
    • /
    • 2016
  • Spectral clustering is a powerful tool for exploratory data analysis. Many existing spectral clustering algorithms typically measure the similarity by using a Gaussian kernel function or an undirected k-nearest neighbor (kNN) graph, which cannot reveal the real clusters when the data are not well separated. In this paper, to improve the spectral clustering, we consider a robust similarity measure based on the shared nearest neighbors in a directed kNN graph. We propose two novel algorithms for spectral clustering: one based on the number of shared nearest neighbors, and one based on their closeness. The proposed algorithms are able to explore the underlying similarity relationships between data points, and are robust to datasets that are not well separated. Moreover, the proposed algorithms have only one parameter, k. We evaluated the proposed algorithms using synthetic and real-world datasets. The experimental results demonstrate that the proposed algorithms not only achieve a good level of performance, they also outperform the traditional spectral clustering algorithms.

Gender Classification of Low-Resolution Facial Image Based on Pixel Classifier Boosting

  • Ban, Kyu-Dae;Kim, Jaehong;Yoon, Hosub
    • ETRI Journal
    • /
    • v.38 no.2
    • /
    • pp.347-355
    • /
    • 2016
  • In face examinations, gender classification (GC) is one of several fundamental tasks. Recent literature on GC primarily utilizes datasets containing high-resolution images of faces captured in uncontrolled real-world settings. In contrast, there have been few efforts that focus on utilizing low-resolution images of faces in GC. We propose a GC method based on a pixel classifier boosting with modified census transform features. Experiments are conducted using large datasets, such as Labeled Faces in the Wild and The Images of Groups, and standard protocols of GC communities. Experimental results show that, despite using low-resolution facial images that have a 15-pixel inter-ocular distance, the proposed method records a higher classification rate compared to current state-of-the-art GC algorithms.

A Simple Tandem Method for Clustering of Multimodal Dataset

  • Cho C.;Lee J.W.;Lee J.W.
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2003.05a
    • /
    • pp.729-733
    • /
    • 2003
  • The presence of local features within clusters incurred by multi-modal nature of data prohibits many conventional clustering techniques from working properly. Especially, the clustering of datasets with non-Gaussian distributions within a cluster can be problematic when the technique with implicit assumption of Gaussian distribution is used. Current study proposes a simple tandem clustering method composed of k-means type algorithm and hierarchical method to solve such problems. The multi-modal dataset is first divided into many small pre-clusters by k-means or fuzzy k-means algorithm. The pre-clusters found from the first step are to be clustered again using agglomerative hierarchical clustering method with Kullback- Leibler divergence as the measure of dissimilarity. This method is not only effective at extracting the multi-modal clusters but also fast and easy in terms of computation complexity and relatively robust at the presence of outliers. The performance of the proposed method was evaluated on three generated datasets and six sets of publicly known real world data.

  • PDF

A Performance Comparison of Cluster Validity Indices based on K-means Algorithm (K-means 알고리즘 기반 클러스터링 인덱스 비교 연구)

  • Shim, Yo-Sung;Chung, Ji-Won;Choi, In-Chan
    • Asia pacific journal of information systems
    • /
    • v.16 no.1
    • /
    • pp.127-144
    • /
    • 2006
  • The K-means algorithm is widely used at the initial stage of data analysis in data mining process, partly because of its low time complexity and the simplicity of practical implementation. Cluster validity indices are used along with the algorithm in order to determine the number of clusters as well as the clustering results of datasets. In this paper, we present a performance comparison of sixteen indices, which are selected from forty indices in literature, while considering their applicability to nonhierarchical clustering algorithms. Data sets used in the experiment are generated based on multivariate normal distribution. In particular, four error types including standardization, outlier generation, error perturbation, and noise dimension addition are considered in the comparison. Through the experiment the effects of varying number of points, attributes, and clusters on the performance are analyzed. The result of the simulation experiment shows that Calinski and Harabasz index performs the best through the all datasets and that Davis and Bouldin index becomes a strong competitor as the number of points increases in dataset.

A K-Nearest Neighbor Algorithm for Categorical Sequence Data (범주형 시퀀스 데이터의 K-Nearest Neighbor알고리즘)

  • Oh Seung-Joon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.2 s.34
    • /
    • pp.215-221
    • /
    • 2005
  • TRecently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. In this Paper, we study how to classify these sequence datasets. There are several kinds techniques for data classification such as decision tree induction, Bayesian classification and K-NN etc. In our approach, we use a K-NN algorithm for classifying sequences. In addition, we propose a new similarity measure to compute the similarity between two sequences and an efficient method for measuring similarity.

  • PDF

Contribution to the Development of Global Land Related Dataset from Asia

  • Tateishi, Ryutaro
    • Proceedings of the KSRS Conference
    • /
    • 1998.09a
    • /
    • pp.116-121
    • /
    • 1998
  • Global land related datasets such as land use, land cover, vegetation cover percentage, forest cover percentage, are part of important global geospatial environmental datasets for global change studies. Since land cover varies place by place, continental production of dataset is a usual approach. Western academically developed countries have some projects to describe land cover related information in digital form using remote sensing technology in African, American continent and Oceania. In this paper, the author introduce his initiative to coordinate Asian scientists in order to develop land related dataset of Asia for our better understanding of the environment of Asia and for contribution to the development of global dataset. This paper explains activities by Land Cover Working Group (LCWG) of the Asian Association on Remote Sensing(AARS), Data and Information System(DIS) sub-committee of Japan national committee for the International Geosphere and Biosphere Program(IGBP), and the International Society for Photogrammetry and Remote Sensing(ISPRS) Working Group IV/6 on Global databases supporting environmental monitoring.

  • PDF

Merging Two Regional Geoid Estimates by Using Optimal Variance Components of Type repro-BIQUUE: An Algorithmic Approach

  • SCHAFFRIN Burkhard;MAUTZ Rainer
    • Korean Journal of Geomatics
    • /
    • v.5 no.1
    • /
    • pp.1-6
    • /
    • 2005
  • When merging various datasets the perennial problem of relative weighting arises. In case of two datasets an iterative algorithm has been developed recently that allows the rigorous determination of optimal variance components of type repro-BIQUUE even for large amounts of data, along with the estimation of the joint parameters. Here we shall present this new algorithm, and show its versatility in an example that will entail the merging of two regional geoid estimates (derived from EGM 96 and CHAMP) in terms of certain series expansions which have been proven previously to belong to the most efficient ones (e.g., wavelets, Hardy's multi-quadrics, etc.). Future attempts will be devoted to the sequential merging of altimeter and tide gauge data.

  • PDF