• Title/Summary/Keyword: datasets

Search Result 2,012, Processing Time 0.026 seconds

Improving accessibility and distinction between negative results in biomedical relation extraction

  • Sousa, Diana;Lamurias, Andre;Couto, Francisco M.
    • Genomics & Informatics
    • /
    • v.18 no.2
    • /
    • pp.20.1-20.4
    • /
    • 2020
  • Accessible negative results are relevant for researchers and clinicians not only to limit their search space but also to prevent the costly re-exploration of research hypotheses. However, most biomedical relation extraction datasets do not seek to distinguish between a false and a negative relation among two biomedical entities. Furthermore, datasets created using distant supervision techniques also have some false negative relations that constitute undocumented/ unknown relations (missing from a knowledge base). We propose to improve the distinction between these concepts, by revising a subset of the relations marked as false on the phenotype-gene relations corpus and give the first steps to automatically distinguish between the false (F), negative (N), and unknown (U) results. Our work resulted in a sample of 127 manually annotated FNU relations and a weighted-F1 of 0.5609 for their automatic distinction. This work was developed during the 6th Biomedical Linked Annotation Hackathon (BLAH6).

Spatial Selectivity Estimation Using Wavelet

  • Lee, Jin-Yul;Chi, Jeong-Hee;Ryu, Keun-Ho
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.459-462
    • /
    • 2003
  • Selectivity estimation of queries not only provides useful information to the query processing optimization but also may give users with a preview of processing results. In this paper, we investigate the problem of selectivity estimation in the context of a spatial dataset. Although several techniques have been proposed in the literature to estimate spatial query result sizes, most of those techniques still have some drawback in the case that a large amount of memory is required to retain accurate selectivity. To eliminate the drawback of estimation techniques in previous works, we propose a new method called MW Histogram. Our method is based on two techniques: (a) MinSkew partitioning algorithm that processes skewed spatial datasets efficiently (b) Wavelet transformation which compression effect is proven. We evaluate our method via real datasets. With the experimental result, we prove that the MW Histogram has the ability of providing estimates with low relative error and retaining the similar estimates even if memory space is small.

  • PDF

Robust Algorithms for Combining Multiple Term Weighting Vectors for Document Classification

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.2
    • /
    • pp.81-86
    • /
    • 2016
  • Term weighting is a popular technique that effectively weighs the term features to improve accuracy in document classification. While several successful term weighting algorithms have been suggested, none of them appears to perform well consistently across different data domains. In this paper we propose several reasonable methods to combine different term weight vectors to yield a robust document classifier that performs consistently well on diverse datasets. Specifically we suggest two approaches: i) learning a single weight vector that lies in a convex hull of the base vectors while minimizing the class prediction loss, and ii) a mini-max classifier that aims for robustness of the individual weight vectors by minimizing the loss of the worst-performing strategy among the base vectors. We provide efficient solution methods for these optimization problems. The effectiveness and robustness of the proposed approaches are demonstrated on several benchmark document datasets, significantly outperforming the existing term weighting methods.

Model-based Clustering of DOA Data Using von Mises Mixture Model for Sound Source Localization

  • Dinh, Quang Nguyen;Lee, Chang-Hoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.59-66
    • /
    • 2013
  • In this paper, we propose a probabilistic framework for model-based clustering of direction of arrival (DOA) data to obtain stable sound source localization (SSL) estimates. Model-based clustering has been shown capable of handling highly overlapped and noisy datasets, such as those involved in DOA detection. Although the Gaussian mixture model is commonly used for model-based clustering, we propose use of the von Mises mixture model as more befitting circular DOA data than a Gaussian distribution. The EM framework for the von Mises mixture model in a unit hyper sphere is degenerated for the 2D case and used as such in the proposed method. We also use a histogram of the dataset to initialize the number of clusters and the initial values of parameters, thereby saving calculation time and improving the efficiency. Experiments using simulated and real-world datasets demonstrate the performance of the proposed method.

Vector space based augmented structural kinematic feature descriptor for human activity recognition in videos

  • Dharmalingam, Sowmiya;Palanisamy, Anandhakumar
    • ETRI Journal
    • /
    • v.40 no.4
    • /
    • pp.499-510
    • /
    • 2018
  • A vector space based augmented structural kinematic (VSASK) feature descriptor is proposed for human activity recognition. An action descriptor is built by integrating the structural and kinematic properties of the actor using vector space based augmented matrix representation. Using the local or global information separately may not provide sufficient action characteristics. The proposed action descriptor combines both the local (pose) and global (position and velocity) features using augmented matrix schema and thereby increases the robustness of the descriptor. A multiclass support vector machine (SVM) is used to learn each action descriptor for the corresponding activity classification and understanding. The performance of the proposed descriptor is experimentally analyzed using the Weizmann and KTH datasets. The average recognition rate for the Weizmann and KTH datasets is 100% and 99.89%, respectively. The computational time for the proposed descriptor learning is 0.003 seconds, which is an improvement of approximately 1.4% over the existing methods.

Improving Real-Time Efficiency of Case Retrieving Process for Case-Based Reasoning

  • Park, Yoon-Joo
    • Asia pacific journal of information systems
    • /
    • v.25 no.4
    • /
    • pp.626-641
    • /
    • 2015
  • Conventional case-based reasoning (CBR) does not perform efficiently for high-volume datasets because of case retrieval time. To overcome this problem, previous research suggested clustering a case base into several small groups and retrieving neighbors within a corresponding group to a target case. However, this approach generally produces less accurate predictive performance than the conventional CBR. This paper proposes a new case-based reasoning method called the clustering-merging CBR (CM-CBR). The CM-CBR method dynamically indexes a search pool to retrieve neighbors considering the distance between a target case and the centroid of a corresponding cluster. This method is applied to three real-life medical datasets. Results show that the proposed CM-CBR method produces similar or better predictive performance than the conventional CBR and clustering-CBR methods in numerous cases with significantly less computational cost.

Two-Step Filtering Datamining Method Integrating Case-Based Reasoning and Rule Induction

  • Park, Yoon-Joo;Chol, En-Mi;Park, Soo-Hyun
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2007.05a
    • /
    • pp.329-337
    • /
    • 2007
  • Case-based reasoning (CBR) methods are applied to various target problems on the supposition that previous cases are sufficiently similar to current target problems, and the results of previous similar cases support the same result consistently. However, these assumptions are not applicable for some target cases. There are some target cases that have no sufficiently similar cases, or if they have, the results of these previous cases are inconsistent. That is, the appropriateness of CBR is different for each target case, even though they are problems in the same domain. Thus, applying CBR to whole datasets in a domain is not reasonable. This paper presents a new hybrid datamining technique called two-step filtering CBR and Rule Induction (TSFCR), which dynamically selects either CBR or RI for each target case, taking into consideration similarities and consistencies of previous cases. We apply this method to three medical diagnosis datasets and one credit analysis dataset in order to demonstrate that TSFCR outperforms the genuine CBR and RI.

  • PDF

Facial Expression Recognition using 1D Transform Features and Hidden Markov Model

  • Jalal, Ahmad;Kamal, Shaharyar;Kim, Daijin
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.4
    • /
    • pp.1657-1662
    • /
    • 2017
  • Facial expression recognition systems using video devices have emerged as an important component of natural human-machine interfaces which contribute to various practical applications such as security systems, behavioral science and clinical practices. In this work, we present a new method to analyze, represent and recognize human facial expressions using a sequence of facial images. Under our proposed facial expression recognition framework, the overall procedure includes: accurate face detection to remove background and noise effects from the raw image sequences and align each image using vertex mask generation. Furthermore, these features are reduced by principal component analysis. Finally, these augmented features are trained and tested using Hidden Markov Model (HMM). The experimental evaluation demonstrated the proposed approach over two public datasets such as Cohn-Kanade and AT&T datasets of facial expression videos that achieved expression recognition results as 96.75% and 96.92%. Besides, the recognition results show the superiority of the proposed approach over the state of the art methods.

3D Shape Recovery using Line Fitting (Line Fitting 을 이용한 삼차원 형상복원)

  • Shim, Seong-O;Malik, Aamir Saeed;Choi, Tae-Sun
    • Proceedings of the IEEK Conference
    • /
    • 2008.06a
    • /
    • pp.905-906
    • /
    • 2008
  • This paper presents a method where the best focues points are calculated using line fitting. Two datasets are selected for each pixel based on the maximum value which is calculated using Laplacian operator. Then linear regression model is used to find lines that approximate these datasets. The best fit lines are found using least squares method. After approximating the two lines, their intersection point is calculated and weights are assigned to calculate the new value for the depth map.

  • PDF

Robust Similarity Measure for Spectral Clustering Based on Shared Neighbors

  • Ye, Xiucai;Sakurai, Tetsuya
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.540-550
    • /
    • 2016
  • Spectral clustering is a powerful tool for exploratory data analysis. Many existing spectral clustering algorithms typically measure the similarity by using a Gaussian kernel function or an undirected k-nearest neighbor (kNN) graph, which cannot reveal the real clusters when the data are not well separated. In this paper, to improve the spectral clustering, we consider a robust similarity measure based on the shared nearest neighbors in a directed kNN graph. We propose two novel algorithms for spectral clustering: one based on the number of shared nearest neighbors, and one based on their closeness. The proposed algorithms are able to explore the underlying similarity relationships between data points, and are robust to datasets that are not well separated. Moreover, the proposed algorithms have only one parameter, k. We evaluated the proposed algorithms using synthetic and real-world datasets. The experimental results demonstrate that the proposed algorithms not only achieve a good level of performance, they also outperform the traditional spectral clustering algorithms.