• Title/Summary/Keyword: Synthetic data set

Search Result 101, Processing Time 0.031 seconds

Human Detection using Real-virtual Augmented Dataset

  • Jongmin, Lee;Yongwan, Kim;Jinsung, Choi;Ki-Hong, Kim;Daehwan, Kim
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.98-102
    • /
    • 2023
  • This paper presents a study on how augmenting semi-synthetic image data improves the performance of human detection algorithms. In the field of object detection, securing a high-quality data set plays the most important role in training deep learning algorithms. Recently, the acquisition of real image data has become time consuming and expensive; therefore, research using synthesized data has been conducted. Synthetic data haves the advantage of being able to generate a vast amount of data and accurately label it. However, the utility of synthetic data in human detection has not yet been demonstrated. Therefore, we use You Only Look Once (YOLO), the object detection algorithm most commonly used, to experimentally analyze the effect of synthetic data augmentation on human detection performance. As a result of training YOLO using the Penn-Fudan dataset, it was shown that the YOLO network model trained on a dataset augmented with synthetic data provided high-performance results in terms of the Precision-Recall Curve and F1-Confidence Curve.

Development of a Converter for Visualizing SEDRIS (SEDRIS 합성 환경 데이터 가시화를 위한 변환기 개발)

  • Kang, Yuna;Kim, Hyungki;Han, Soonhung;Kim, Man Kyu
    • Korean Journal of Computational Design and Engineering
    • /
    • v.18 no.3
    • /
    • pp.189-199
    • /
    • 2013
  • The need for reusing synthetic environment data that are employed in the field of modeling and simulation has recently been rising. SEDRIS (Synthetic Environment Data Representation & Interchange Specification) is a standard to exchange synthetic environment data, and is the specification utilized in various military simulations of the Pentagon for representing and exchanging 3D data. SEDRIS represents environmental areas based on a data model; it can represent wind speed, wind directions, weather changes, the information of buildings, as well as terrain data. In some situations, however, the synthetic environment data stored in SEDRIS format should be converted to various visualization formats. First, because SEDRIS is a form of a super-set, it is necessary to verify whether large scale SEDRIS files are stored successfully through visualization. Second, the synthetic environment data should be visualized in some visualization programs for the simulation results to provide an immersive and realistic sense. In this study, we have developed converters for converting SEDRIS data to various visualization formats and visualized the converted results.

Synthetic Data Augmentation for Plant Disease Image Generation using GAN (GAN을 이용한 식물 병해 이미지 합성 데이터 증강)

  • Nazki, Haseeb;Lee, Jaehwan;Yoon, Sook;Park, Dong Sun
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.459-460
    • /
    • 2018
  • In this paper, we present a data augmentation method that generates synthetic plant disease images using Generative Adversarial Networks (GANs). We propose a training scheme that first uses classical data augmentation techniques to enlarge the training set and then further enlarges the data size and its diversity by applying GAN techniques for synthetic data augmentation. Our method is demonstrated on a limited dataset of 2789 images of tomato plant diseases (Gray mold, Canker, Leaf mold, Plague, Leaf miner, Whitefly etc.).

  • PDF

Synthetic Training Data Generation for Fault Detection Based on Deep Learning (딥러닝 기반 탄성파 단층 해석을 위한 합성 학습 자료 생성)

  • Choi, Woochang;Pyun, Sukjoon
    • Geophysics and Geophysical Exploration
    • /
    • v.24 no.3
    • /
    • pp.89-97
    • /
    • 2021
  • Fault detection in seismic data is well suited to the application of machine learning algorithms. Accordingly, various machine learning techniques are being developed. In recent studies, machine learning models, which utilize synthetic data, are the particular focus when training with deep learning. The use of synthetic training data has many advantages; Securing massive data for training becomes easy and generating exact fault labels is possible with the help of synthetic training data. To interpret real data with the model trained by synthetic data, the synthetic data used for training should be geologically realistic. In this study, we introduce a method to generate realistic synthetic seismic data. Initially, reflectivity models are generated to include realistic fault structures, and then, a one-way wave equation is applied to efficiently generate seismic stack sections. Next, a migration algorithm is used to remove diffraction artifacts and random noise is added to mimic actual field data. A convolutional neural network model based on the U-Net structure is used to verify the generated synthetic data set. From the results of the experiment, we confirm that realistic synthetic data effectively creates a deep learning model that can be applied to field data.

Development of a method of the data generation with maintaining quantile of the sample data

  • Joohyung Lee;Young-Oh Kim
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.244-244
    • /
    • 2023
  • Both the frequency and the magnitude of hydrometeorological extreme events such as severe floods and droughts are increasing. In order to prevent a damage from the climatic disaster, hydrological models are often simulated under various meteorological conditions. While performing the simulations, a synthetic data generated through time series models which maintains the key statistical characteristics of the sample data are widely applied. However, the synthetic data can easily maintains both the average and the variance of the sample data, but the quantile is not maintained well. In this study, we proposes a data generation method which maintains the quantile of the sample data well. The equations of the former maintenance of variance extension (MOVE) are expanded to maintain quantile rather than the average or the variance of the sample data. The equations are derived and the coefficients are determined based on the characteristics of the sample data that we aim to preserve. Monte Carlo simulation is utilized to assess the performance of the proposed data generation method. A time series data (data length of 500) is regarded as the sample data and selected randomly from the sample data to create the data set (data length of 30) for simulation. Data length of the selected data set is expanded from 30 to 500 by using the proposed method. Then, the average, the variance, and the quantile difference between the sample data, and the expanded data are evaluated with relative root mean square error for each simulation. As a result of the simulation, each equation which is designed to maintain the characteristic of data performs well. Moreover, expanded data can preserve the quantile of sample data more precisely than that those expanded through the conventional time series model.

  • PDF

Template Matching-Based Target Recognition Algorithm Development and Verification using SAR Images (SAR 영상을 이용한 템플릿 매칭 기반 자동식별 알고리즘 구현 및 성능시험)

  • Lim, Ho;Chae, Daeyoung;Yoo, Ji Hee;Kwon, Kyung-Il
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.17 no.3
    • /
    • pp.364-377
    • /
    • 2014
  • In this paper, we have developed a target recognition algorithm based on a template matching technique using Synthetic Aperture Radar (SAR) images. For efficient computations, Radon transform-based azimuth estimation algorithm was used with the template matching. MSTAR data set was divided into two groups according to the depression angles, which were a train set and a test set. Template data were generated by rotating and cropping chips which were from MSTAR train set using the azimuth estimation algorithm. Then the template matching process between test data and template data was performed under various conditions. Performance variation according to contrast enhancement preprocessing which is scarce in open literature was also presented. The analysis results show that the target recognition algorithm could be useful for the automatic target recognition using SAR images.

Preliminary Application of Synthetic Computed Tomography Image Generation from Magnetic Resonance Image Using Deep-Learning in Breast Cancer Patients

  • Jeon, Wan;An, Hyun Joon;Kim, Jung-in;Park, Jong Min;Kim, Hyoungnyoun;Shin, Kyung Hwan;Chie, Eui Kyu
    • Journal of Radiation Protection and Research
    • /
    • v.44 no.4
    • /
    • pp.149-155
    • /
    • 2019
  • Background: Magnetic resonance (MR) image guided radiation therapy system, enables real time MR guided radiotherapy (RT) without additional radiation exposure to patients during treatment. However, MR image lacks electron density information required for dose calculation. Image fusion algorithm with deformable registration between MR and computed tomography (CT) was developed to solve this issue. However, delivered dose may be different due to volumetric changes during image registration process. In this respect, synthetic CT generated from the MR image would provide more accurate information required for the real time RT. Materials and Methods: We analyzed 1,209 MR images from 16 patients who underwent MR guided RT. Structures were divided into five tissue types, air, lung, fat, soft tissue and bone, according to the Hounsfield unit of deformed CT. Using the deep learning model (U-NET model), synthetic CT images were generated from the MR images acquired during RT. This synthetic CT images were compared to deformed CT generated using the deformable registration. Pixel-to-pixel match was conducted to compare the synthetic and deformed CT images. Results and Discussion: In two test image sets, average pixel match rate per section was more than 70% (67.9 to 80.3% and 60.1 to 79%; synthetic CT pixel/deformed planning CT pixel) and the average pixel match rate in the entire patient image set was 69.8%. Conclusion: The synthetic CT generated from the MR images were comparable to deformed CT, suggesting possible use for real time RT. Deep learning model may further improve match rate of synthetic CT with larger MR imaging data.

Design of Nonlinear Model Using Type-2 Fuzzy Logic System by Means of C-Means Clustering (C-Means 클러스터링 기반의 Type-2 퍼지 논리 시스템을 이용한 비선형 모델 설계)

  • Baek, Jin-Yeol;Lee, Young-Il;Oh, Sung-Kwun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.6
    • /
    • pp.842-848
    • /
    • 2008
  • This paper deal with uncertainty problem by using Type-2 fuzzy logic set for nonlinear system modeling. We design Type-2 fuzzy logic system in which the antecedent and the consequent part of rules are given as Type-2 fuzzy set and also analyze the performance of the ensuing nonlinear model with uncertainty. Here, the apexes of the antecedent membership functions of rules are decided by C-means clustering algorithm and the apexes of the consequent membership functions of rules are learned by using back-propagation based on gradient decent method. Also, the parameters related to the fuzzy model are optimized by means of particle swarm optimization. The proposed model is demonstrated with the aid of two representative numerical examples, such as mathematical synthetic data set and Mackey-Glass time series data set and also we discuss the approximation as well as generalization abilities for the model.

A Clustering Algorithm for Sequence Data Using Rough Set Theory (러프 셋 이론을 이용한 시퀀스 데이터의 클러스터링 알고리즘)

  • Oh, Seung-Joon;Park, Chan-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.2
    • /
    • pp.113-119
    • /
    • 2008
  • The World Wide Web is a dynamic collection of pages that includes a huge number of hyperlinks and huge volumes of usage informations. The resulting growth in online information combined with the almost unstructured web data necessitates the development of powerful web data mining tools. Recently, a number of approaches have been developed for dealing with specific aspects of web usage mining for the purpose of automatically discovering user profiles. We analyze sequence data, such as web-logs, protein sequences, and retail transactions. In our approach, we propose the clustering algorithm for sequence data using rough set theory. We present a simple example and experimental results using a splice dataset and synthetic datasets.

  • PDF

Experimental Analysis of Equilibrization in Binary Classification for Non-Image Imbalanced Data Using Wasserstein GAN

  • Wang, Zhi-Yong;Kang, Dae-Ki
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.37-42
    • /
    • 2019
  • In this paper, we explore the details of three classic data augmentation methods and two generative model based oversampling methods. The three classic data augmentation methods are random sampling (RANDOM), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN). The two generative model based oversampling methods are Conditional Generative Adversarial Network (CGAN) and Wasserstein Generative Adversarial Network (WGAN). In imbalanced data, the whole instances are divided into majority class and minority class, where majority class occupies most of the instances in the training set and minority class only includes a few instances. Generative models have their own advantages when they are used to generate more plausible samples referring to the distribution of the minority class. We also adopt CGAN to compare the data augmentation performance with other methods. The experimental results show that WGAN-based oversampling technique is more stable than other approaches (RANDOM, SMOTE, ADASYN and CGAN) even with the very limited training datasets. However, when the imbalanced ratio is too small, generative model based approaches cannot achieve satisfying performance than the conventional data augmentation techniques. These results suggest us one of future research directions.