• Title/Summary/Keyword: large data sets

Search Result 506, Processing Time 0.024 seconds

CBIR-based Data Augmentation and Its Application to Deep Learning (CBIR 기반 데이터 확장을 이용한 딥 러닝 기술)

  • Kim, Sesong;Jung, Seung-Won
    • Journal of Broadcast Engineering
    • /
    • v.23 no.3
    • /
    • pp.403-408
    • /
    • 2018
  • Generally, a large data set is required for learning of deep learning. However, since it is not easy to create large data sets, there are a lot of techniques that make small data sets larger through data expansion such as rotation, flipping, and filtering. However, these simple techniques have limitation on extendibility because they are difficult to escape from the features already possessed. In order to solve this problem, we propose a method to acquire new image data by using existing data. This is done by retrieving and acquiring similar images using existing image data as a query of the content-based image retrieval (CBIR). Finally, we compare the performance of the base model with the model using CBIR.

COMPARISON OF GLOBAL SEA SURFACE TEMPERATURE PRODUCTS

  • Kubota, Masahisa.;Iwasaki, Shinzuke
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.993-996
    • /
    • 2006
  • NOAA operational bulk SST product (Reynolds et al, 2002) is very popular global SST data sets and is extensively used for various studies. However, the original time resolution is weekly and relatively large. On the other hand, there exist many new global SST data sets at present. In this study, we compare many global SST data sets including NOAA operational bulk SST product, CAOS OI SST product, Microwave Optimum Interpolation (MWOI) SST, Real Time Global (RTG) SST and JMA merged satellite and in situ Global Daily (MGD) SST.

  • PDF

Video augmentation technique for human action recognition using genetic algorithm

  • Nida, Nudrat;Yousaf, Muhammad Haroon;Irtaza, Aun;Velastin, Sergio A.
    • ETRI Journal
    • /
    • v.44 no.2
    • /
    • pp.327-338
    • /
    • 2022
  • Classification models for human action recognition require robust features and large training sets for good generalization. However, data augmentation methods are employed for imbalanced training sets to achieve higher accuracy. These samples generated using data augmentation only reflect existing samples within the training set, their feature representations are less diverse and hence, contribute to less precise classification. This paper presents new data augmentation and action representation approaches to grow training sets. The proposed approach is based on two fundamental concepts: virtual video generation for augmentation and representation of the action videos through robust features. Virtual videos are generated from the motion history templates of action videos, which are convolved using a convolutional neural network, to generate deep features. Furthermore, by observing an objective function of the genetic algorithm, the spatiotemporal features of different samples are combined, to generate the representations of the virtual videos and then classified through an extreme learning machine classifier on MuHAVi-Uncut, iXMAS, and IAVID-1 datasets.

Data-Compression-Based Resource Management in Cloud Computing for Biology and Medicine

  • Zhu, Changming
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.1
    • /
    • pp.21-31
    • /
    • 2016
  • With the application and development of biomedical techniques such as next-generation sequencing, mass spectrometry, and medical imaging, the amount of biomedical data have been growing explosively. In terms of processing such data, we face the problems surrounding big data, highly intensive computation, and high dimensionality data. Fortunately, cloud computing represents significant advantages of resource allocation, data storage, computation, and sharing and offers a solution to solve big data problems of biomedical research. In order to improve the efficiency of resource management in cloud computing, this paper proposes a clustering method and adopts Radial Basis Function in order to compress comprehensive data sets found in biology and medicine in high quality, and stores these data with resource management in cloud computing. Experiments have validated that with such a data-compression-based resource management in cloud computing, one can store large data sets from biology and medicine in fewer capacities. Furthermore, with reverse operation of the Radial Basis Function, these compressed data can be reconstructed with high accuracy.

ERS-1 AND CCRS C-SAR Data Integration For Look Direction Bias Correction Using Wavelet Transform

  • Won, J.S.;Moon, Woo-Il M.;Singhroy, Vern;Lowman, Paul-D.Jr.
    • Korean Journal of Remote Sensing
    • /
    • v.10 no.2
    • /
    • pp.49-62
    • /
    • 1994
  • Look direction bias in a single look SAR image can often be misinterpreted in the geological application of radar data. This paper investigates digital processing techniques for SAR image data integration and compensation of the SAR data look direction bias. The two important approaches for reducing look direction bias and integration of multiple SAR data sets are (1) principal component analysis (PCA), and (2) wavelet transform(WT) integration techniques. These two methods were investigated and tested with the ERS-1 (VV-polarization) and CCRS*s airborne (HH-polarization) C-SAR image data sets recorded over the Sudbury test site, Canada. The PCA technique has been very effective for integration of more than two layers of digital image data. When there only two sets of SAR data are available, the PCA thchnique requires at least one more set of auxiliary data for proper rendition of the fine surface features. The WT processing approach of SAR data integration utilizes the property which decomposes images into approximated image ( low frequencies) characterizing the spatially large and relatively distinct structures, and detailed image (high frequencies) in which the information on detailed fine structures are preserved. The test results with the ERS-1and CCRS*s C-SAR data indicate that the new WT approach is more efficient and robust in enhancibng the fine details of the multiple SAR images than the PCA approach.

A Study on Fire Recognition Algorithm Using Deep Learning Artificial Intelligence (딥러닝 인공지능 기법을 이용한 화재인식 알고리즘에 관한 연구)

  • Ryu, Jin-Kyu;Kwak, Dong-Kurl;Kim, Jae-Jung;Choi, Jung-Kyu
    • Proceedings of the KIPE Conference
    • /
    • 2018.07a
    • /
    • pp.275-277
    • /
    • 2018
  • Recently, the importance of an early response has been emphasized due to the large fire. The most efficient method of extinguishing a large fire is early response to a small flame. To implement this solution, we propose a fire detection mechanism based on a deep learning artificial intelligence. In this study, a small amount of data sets is manipulated by an image augmentation technique using rotating, tilting, blurring, and distorting effects in order to increase the number of the data sets by 5 times, and we study the flame detection algorithm using faster R-CNN.

  • PDF

Classification of large-scale data and data batch stream with forward stagewise algorithm (전진적 단계 알고리즘을 이용한 대용량 데이터와 순차적 배치 데이터의 분류)

  • Yoon, Young Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1283-1291
    • /
    • 2014
  • In this paper, we propose forward stagewise algorithm when data are very large or coming in batches sequentially over time. In this situation, ordinary boosting algorithm for large scale data and data batch stream may be greedy and have worse performance with class noise situations. To overcome those and apply to large scale data or data batch stream, we modify the forward stagewise algorithm. This algorithm has better results for both large scale data and data batch stream with or without concept drift on simulated data and real data sets than boosting algorithms.

DNA Sequence Classification Using a Generalized Regression Neural Network and Random Generator (난수발생기와 일반화된 회귀 신경망을 이용한 DNA 서열 분류)

  • 김성모;김근호;김병환
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.53 no.7
    • /
    • pp.525-530
    • /
    • 2004
  • A classifier was constructed by using a generalized regression neural network (GRU) and random generator (RG), which was applied to classify DNA sequences. Three data sets evaluated are eukaryotic and prokaryotic sequences (Data-I), eukaryotic sequences (Data-II), and prokaryotic sequences (Data-III). For each data set, the classifier performance was examined in terms of the total classification sensitivity (TCS), individual classification sensitivity (ICS), total prediction accuracy (TPA), and individual prediction accuracy (IPA). For a given spread, the RG played a role of generating a number of sets of spreads for gaussian functions in the pattern layer Compared to the GRNN, the RG-GRNN significantly improved the TCS by more than 50%, 60%, and 40% for Data-I, Data-II, and Data-III, respectively. The RG-GRNN also demonstrated improved TPA for all data types. In conclusion, the proposed RG-GRNN can effectively be used to classify a large, multivariable promoter sequences.

Subset selection in multiple linear regression: An improved Tabu search

  • Bae, Jaegug;Kim, Jung-Tae;Kim, Jae-Hwan
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.2
    • /
    • pp.138-145
    • /
    • 2016
  • This paper proposes an improved tabu search method for subset selection in multiple linear regression models. Variable selection is a vital combinatorial optimization problem in multivariate statistics. The selection of the optimal subset of variables is necessary in order to reliably construct a multiple linear regression model. Its applications widely range from machine learning, timeseries prediction, and multi-class classification to noise detection. Since this problem has NP-complete nature, it becomes more difficult to find the optimal solution as the number of variables increases. Two typical metaheuristic methods have been developed to tackle the problem: the tabu search algorithm and hybrid genetic and simulated annealing algorithm. However, these two methods have shortcomings. The tabu search method requires a large amount of computing time, and the hybrid algorithm produces a less accurate solution. To overcome the shortcomings of these methods, we propose an improved tabu search algorithm to reduce moves of the neighborhood and to adopt an effective move search strategy. To evaluate the performance of the proposed method, comparative studies are performed on small literature data sets and on large simulation data sets. Computational results show that the proposed method outperforms two metaheuristic methods in terms of the computing time and solution quality.

Note on classification and regression tree analysis (분류와 회귀나무분석에 관한 소고)

  • 임용빈;오만숙
    • Journal of Korean Society for Quality Management
    • /
    • v.30 no.1
    • /
    • pp.152-161
    • /
    • 2002
  • The analysis of large data sets with hundreds of thousands observations and thousands of independent variables is a formidable computational task. A less parametric method, capable of identifying important independent variables and their interactions, is a tree structured approach to regression and classification. It gives a graphical and often illuminating way of looking at data in classification and regression problems. In this paper, we have reviewed and summarized tile methodology used to construct a tree, multiple trees and the sequential strategy for identifying active compounds in large chemical databases.