• Title/Summary/Keyword: datasets

Search Result 2,091, Processing Time 0.024 seconds

Selectivity Estimation for Spatial Databases

  • Chi, Jeong-Hee;Lee, Jin-Yul;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.766-768
    • /
    • 2003
  • Selectivity estimation for spatial query is curial in Spatial Database Management Systems(SDBMS). Many works have been performed to estimate accurate selectivity. Although they deal with some problems such as false-count, multi-count arising from properties of spatial dataset, they can not get such effects in little memory space.* Therefore, we need to compress spatial dataset into little memory. In this paper, we propose a new technique called MW Histogram which is able to compress summary data and get reasonable results. Our method is based on two techniques:(a)MinSkew partitioning algorithm which deal with skewed spatial datasets. efficiently (b) Wavelet transformation which compression effect is proven. We evaluate our method via real datasets. The experimental result shows that the MW Histogram has the ability of providing estimates with low relative error and retaining the similar estimates even if memory space is small.

  • PDF

A Study on Classifications of Useful Customer Reviews by Applying Text Mining Approach (텍스트 마이닝을 활용한 고객 리뷰의 유용성 지수 개선에 관한 연구)

  • Lee, Hong Joo
    • Journal of Information Technology Services
    • /
    • v.14 no.4
    • /
    • pp.159-169
    • /
    • 2015
  • Customer reviews are one of the important sources for purchase decision makings in online stores. Online stores have tried to provide useful reviews in product pages to customers. To assess the usefulness of customer reviews before other users have voted enough on the reviews, diverse aspects of reviews were utilized in prevous studies. Style and semantic information were utilized in many studies. This study aims to test diverse alogrithms and datasets for identifying a proper classification method and threshold to classify useful reviews. In particular, most researches utilized ratio type helpfulness index as Amazon.com used. However, there is another type of usefulness index utilized in TripAdviser.com or Yelp.com, count type helpfulness index. There was no proper threshold to classify useful reviews yet for count type helpfulness index. This study used reivews and their usefulness votes on restaurnats from Yelp.com to devise diverse datasets and applied text mining approaches to classify useful reviews. Random Forest, SVM, and GLMNET showed the greater values of accuracy than other approaches.

Cooperative Coevolution Differential Evolution Based on Spark for Large-Scale Optimization Problems

  • Tan, Xujie;Lee, Hyun-Ae;Shin, Seong-Yoon
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.3
    • /
    • pp.155-160
    • /
    • 2021
  • Differential evolution is an efficient algorithm for solving continuous optimization problems. However, its performance deteriorates rapidly, and the runtime increases exponentially when differential evolution is applied for solving large-scale optimization problems. Hence, a novel cooperative coevolution differential evolution based on Spark (known as SparkDECC) is proposed. The divide-and-conquer strategy is used in SparkDECC. First, the large-scale problem is decomposed into several low-dimensional subproblems using the random grouping strategy. Subsequently, each subproblem can be addressed in a parallel manner by exploiting the parallel computation capability of the resilient distributed datasets model in Spark. Finally, the optimal solution of the entire problem is obtained using the cooperation mechanism. The experimental results on 13 high-benchmark functions show that the new algorithm performs well in terms of speedup and scalability. The effectiveness and applicability of the proposed algorithm are verified.

Citation-based Article Summarization using a Combination of Lexical Text Similarities: Evaluation with Computational Linguistics Literature Summarization Datasets

  • Kang, In-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.7
    • /
    • pp.31-37
    • /
    • 2019
  • Citation-based article summarization is to create a shortened text for an academic article, reflecting the content of citing sentences which contain other's thoughts about the target article to be summarized. To deal with the problem, this study introduces an extractive summarization method based on calculating a linear combination of various sentence salience scores, which represent the degrees to which a candidate sentence reflects the content of author's abstract text, reader's citing text, and the target article to be summarized. In the current study, salience scores are obtained by computing surface-level textual similarities. Experiments using CL-SciSumm datasets show that the proposed method parallels or outperforms the previous approaches in ROUGE evaluations against SciSumm-2017 human summaries and SciSumm-2016/2017 community summaries.

Enhancing Gene Expression Classification of Support Vector Machines with Generative Adversarial Networks

  • Huynh, Phuoc-Hai;Nguyen, Van Hoa;Do, Thanh-Nghi
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.1
    • /
    • pp.14-20
    • /
    • 2019
  • Currently, microarray gene expression data take advantage of the sufficient classification of cancers, which addresses the problems relating to cancer causes and treatment regimens. However, the sample size of gene expression data is often restricted, because the price of microarray technology on studies in humans is high. We propose enhancing the gene expression classification of support vector machines with generative adversarial networks (GAN-SVMs). A GAN that generates new data from original training datasets was implemented. The GAN was used in conjunction with nonlinear SVMs that efficiently classify gene expression data. Numerical test results on 20 low-sample-size and very high-dimensional microarray gene expression datasets from the Kent Ridge Biomedical and Array Expression repositories indicate that the model is more accurate than state-of-the-art classifying models.

Finding the best suited autoencoder for reducing model complexity

  • Ngoc, Kien Mai;Hwang, Myunggwon
    • Smart Media Journal
    • /
    • v.10 no.3
    • /
    • pp.9-22
    • /
    • 2021
  • Basically, machine learning models use input data to produce results. Sometimes, the input data is too complicated for the models to learn useful patterns. Therefore, feature engineering is a crucial data preprocessing step for constructing a proper feature set to improve the performance of such models. One of the most efficient methods for automating feature engineering is the autoencoder, which transforms the data from its original space into a latent space. However certain factors, including the datasets, the machine learning models, and the number of dimensions of the latent space (denoted by k), should be carefully considered when using the autoencoder. In this study, we design a framework to compare two data preprocessing approaches: with and without autoencoder and to observe the impact of these factors on autoencoder. We then conduct experiments using autoencoders with classifiers on popular datasets. The empirical results provide a perspective regarding the best suited autoencoder for these factors.

UFKLDA: An unsupervised feature extraction algorithm for anomaly detection under cloud environment

  • Wang, GuiPing;Yang, JianXi;Li, Ren
    • ETRI Journal
    • /
    • v.41 no.5
    • /
    • pp.684-695
    • /
    • 2019
  • In a cloud environment, performance degradation, or even downtime, of virtual machines (VMs) usually appears gradually along with anomalous states of VMs. To better characterize the state of a VM, all possible performance metrics are collected. For such high-dimensional datasets, this article proposes a feature extraction algorithm based on unsupervised fuzzy linear discriminant analysis with kernel (UFKLDA). By introducing the kernel method, UFKLDA can not only effectively deal with non-Gaussian datasets but also implement nonlinear feature extraction. Two sets of experiments were undertaken. In discriminability experiments, this article introduces quantitative criteria to measure discriminability among all classes of samples. The results show that UFKLDA improves discriminability compared with other popular feature extraction algorithms. In detection accuracy experiments, this article computes accuracy measures of an anomaly detection algorithm (i.e., C-SVM) on the original performance metrics and extracted features. The results show that anomaly detection with features extracted by UFKLDA improves the accuracy of detection in terms of sensitivity and specificity.

Supervised learning-based DDoS attacks detection: Tuning hyperparameters

  • Kim, Meejoung
    • ETRI Journal
    • /
    • v.41 no.5
    • /
    • pp.560-573
    • /
    • 2019
  • Two supervised learning algorithms, a basic neural network and a long short-term memory recurrent neural network, are applied to traffic including DDoS attacks. The joint effects of preprocessing methods and hyperparameters for machine learning on performance are investigated. Values representing attack characteristics are extracted from datasets and preprocessed by two methods. Binary classification and two optimizers are used. Some hyperparameters are obtained exhaustively for fast and accurate detection, while others are fixed with constants to account for performance and data characteristics. An experiment is performed via TensorFlow on three traffic datasets. Three scenarios are considered to investigate the effects of learning former traffic on sequential traffic analysis and the effects of learning one dataset on application to another dataset, and determine whether the algorithms can be used for recent attack traffic. Experimental results show that the used preprocessing methods, neural network architectures and hyperparameters, and the optimizers are appropriate for DDoS attack detection. The obtained results provide a criterion for the detection accuracy of attacks.

Stochastic vibration analysis of functionally graded beams using artificial neural networks

  • Trinh, Minh-Chien;Jun, Hyungmin
    • Structural Engineering and Mechanics
    • /
    • v.78 no.5
    • /
    • pp.529-543
    • /
    • 2021
  • Inevitable source-uncertainties in geometry configuration, boundary condition, and material properties may deviate the structural dynamics from its expected responses. This paper aims to examine the influence of these uncertainties on the vibration of functionally graded beams. Finite element procedures are presented for Timoshenko beams and utilized to generate reliable datasets. A prerequisite to the uncertainty quantification of the beam vibration using Monte Carlo simulation is generating large datasets, that require executing the numerical procedure many times leading to high computational cost. Utilizing artificial neural networks to model beam vibration can be a good approach. Initially, the optimal network for each beam configuration can be determined based on numerical performance and probabilistic criteria. Instead of executing thousands of times of the finite element procedure in stochastic analysis, these optimal networks serve as good alternatives to which the convergence of the Monte Carlo simulation, and the sensitivity and probabilistic vibration characteristics of each beam exposed to randomness are investigated. The simple procedure presented here is efficient to quantify the uncertainty of different stochastic behaviors of composite structures.

Medical Image Classification using Pre-trained Convolutional Neural Networks and Support Vector Machine

  • Ahmed, Ali
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.6
    • /
    • pp.1-6
    • /
    • 2021
  • Recently, pre-trained convolutional neural network CNNs have been widely used and applied for medical image classification. These models can utilised in three different ways, for feature extraction, to use the architecture of the pre-trained model and to train some layers while freezing others. In this study, the ResNet18 pre-trained CNNs model is used for feature extraction, followed by the support vector machine for multiple classes to classify medical images from multi-classes, which is used as the main classifier. Our proposed classification method was implemented on Kvasir and PH2 medical image datasets. The overall accuracy was 93.38% and 91.67% for Kvasir and PH2 datasets, respectively. The classification results and performance of our proposed method outperformed some of the related similar methods in this area of study.