• Title/Summary/Keyword: Generate Data

Search Result 3,065, Processing Time 0.028 seconds

Experimental Analysis of Equilibrization in Binary Classification for Non-Image Imbalanced Data Using Wasserstein GAN

  • Wang, Zhi-Yong;Kang, Dae-Ki
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.37-42
    • /
    • 2019
  • In this paper, we explore the details of three classic data augmentation methods and two generative model based oversampling methods. The three classic data augmentation methods are random sampling (RANDOM), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN). The two generative model based oversampling methods are Conditional Generative Adversarial Network (CGAN) and Wasserstein Generative Adversarial Network (WGAN). In imbalanced data, the whole instances are divided into majority class and minority class, where majority class occupies most of the instances in the training set and minority class only includes a few instances. Generative models have their own advantages when they are used to generate more plausible samples referring to the distribution of the minority class. We also adopt CGAN to compare the data augmentation performance with other methods. The experimental results show that WGAN-based oversampling technique is more stable than other approaches (RANDOM, SMOTE, ADASYN and CGAN) even with the very limited training datasets. However, when the imbalanced ratio is too small, generative model based approaches cannot achieve satisfying performance than the conventional data augmentation techniques. These results suggest us one of future research directions.

Wi-Fi Fingerprint-based Indoor Movement Route Data Generation Method (Wi-Fi 핑거프린트 기반 실내 이동 경로 데이터 생성 방법)

  • Yoon, Chang-Pyo;Hwang, Chi-Gon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.458-459
    • /
    • 2021
  • Recently, researches using deep learning technology based on Wi-Fi fingerprints have been conducted for accurate services in indoor location-based services. Among the deep learning models, an RNN model that can store information from the past can store continuous movements in indoor positioning, thereby reducing positioning errors. At this time, continuous sequential data is required as training data. However, since Wi-Fi fingerprint data is generally managed only with signals for a specific location, it is inappropriate to use it as training data for an RNN model. This paper proposes a path generation method through prediction of a moving path based on Wi-Fi fingerprint data extended to region data through clustering to generate sequential input data of the RNN model.

  • PDF

Development of a Measurement Data Algorithm of Deep Space Network for Korea Pathfinder Lunar Orbiter mission (달 탐사 시험용 궤도선을 위한 심우주 추적망의 관측값 구현 알고리즘 개발)

  • Kim, Hyun-Jeong;Park, Sang-Young;Kim, Min-Sik;Kim, Youngkwang;Lee, Eunji
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.45 no.9
    • /
    • pp.746-756
    • /
    • 2017
  • An algorithm is developed to generate measurement data of deep space network for Korea Pathfinder Lunar Orbiter (KPLO) mission. The algorithm can provide corrected measurement data for the Orbit Determination (OD) module in deep space. This study describes how to generate the computed data such as range, Doppler, azimuth angle and elevation angle. The geometric data were obtained by General Mission Analysis Tool (GMAT) simulation and the corrected data were calculated with measurement models. Therefore, the result of total delay includes effects of tropospheric delay, ionospheric delay, charged particle delay, antenna offset delay, and tropospheric refraction delay. The computed measurement data were validated by comparison with the results from Orbit Determination ToolBoX (ODTBX).

Emerging Internet Technology & Service toward Korean Government 3.0

  • Song, In Kuk
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.2
    • /
    • pp.540-546
    • /
    • 2014
  • Recently a new government has announced an action plan known as the government 3.0, which aims to provide customized services for individual people, generate more jobs and support creative economy. Leading on from previous similar initiatives, the new scheme seeks to focus on open, share, communicate, and collaborate. In promoting Government 3.0, the crucial factor might be how to align the core services and policies of Government 3.0 with correspoding technologies. The paper describes the concepts and features of Government 3.0, identifies emerging Internet-based technologies and services toward the initiative, and finally provides improvement plans for Government 3.0. As a result, 10 issues to be brought together include: Smart Phone Applications and Service, Mobile Internet Computing and Application, Wireless and Sensor Network, Security & Privacy in Internet, Energy-efficient Computing & Smart Grid, Multimedia & Image Processing, Data Mining and Big Data, Software Engineering, Internet Business related Policy, and Management of Internet Application.

Layout of simulator for measuring and evaluating human sensibility (감성 측정평가 시뮬레이터의 설비 배치)

  • Kim, Chae-Bok;Park, Se-Jin;Kim, Cheol-Jung
    • Journal of the Ergonomics Society of Korea
    • /
    • v.18 no.2
    • /
    • pp.121-132
    • /
    • 1999
  • This paper investigates the methodology to develop a layout of the simulator for measuring and evaluating human sensibility. Since the simulator layout is different from general building layouts in that it is organized in order to communicate systematically between facilities, laboratories to evaluate human sensibility and equipments to support experiments in simulator, two approaches based on eigenvector and cut tree are applied to develop a simulator layout. Qualitative input data (relationship chart. space requirements for each laboratory and equipment) are obtained and transformed into quantitative data. The information obtained by two approaches provides several meaningful clues to generate the simulator layout. The simulator layout is presented based on the obtained information by two approaches. Extracted quantitative data by using eigenvector and cut tree are meaningful of generating the simulator layout.

  • PDF

Deformable Surface 3D Reconstruction from a Single Image by Linear Programming

  • Ma, Wenjuan;Sun, Shusen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.6
    • /
    • pp.3121-3142
    • /
    • 2017
  • We present a method for 3D shape reconstruction of inextensible deformable surfaces from a single image. The key of our approach is to represent the surface as a 3D triangulated mesh and formulate the reconstruction problem as a sequence of Linear Programming (LP) problems. The LP problem consists of data constraints which are 3D-to-2D keypoint correspondences and shape constraints which are designed to retain original lengths of mesh edges. We use a closed-form method to generate an initial structure, then refine this structure by solving the LP problem iteratively. Compared with previous methods, ours neither involves smoothness constraints nor temporal consistency, which enables us to recover shapes of surfaces with various deformations from a single image. The robustness and accuracy of our approach are evaluated quantitatively on synthetic data and qualitatively on real data.

Sentence-Chain Based Seq2seq Model for Corpus Expansion

  • Chung, Euisok;Park, Jeon Gue
    • ETRI Journal
    • /
    • v.39 no.4
    • /
    • pp.455-466
    • /
    • 2017
  • This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence-chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4-times the number of n-grams with superior performance for English text.

Automatic Document Title Generation with RNN and Reinforcement Learning (RNN과 강화 학습을 이용한 자동 문서 제목 생성)

  • Cho, Sung-Min;Kim, Wooseng
    • Journal of Information Technology Applications and Management
    • /
    • v.27 no.1
    • /
    • pp.49-58
    • /
    • 2020
  • Lately, a large amount of textual data have been poured out of the Internet and the technology to refine them is needed. Most of these data are long text and often have no title. Therefore, in this paper, we propose a technique to combine the sequence-to-sequence model of RNN and the REINFORCE algorithm to generate the title of the long text automatically. In addition, the TextRank algorithm was applied to extract a summarized text to minimize information loss in order to protect the shortcomings of the sequence-to-sequence model in which an information is lost when long texts are used. Through the experiment, the techniques proposed in this study are shown to be superior to the existing ones.

A Design Methodology for XML Applications (XML 응용시스템 개발을 위한 설계방안)

  • 김경수;주경수
    • Proceedings of the IEEK Conference
    • /
    • 2000.06c
    • /
    • pp.39-42
    • /
    • 2000
  • Extensible Markup Language(XML) is fast emerging as the dominant standard for representing data in the World Wide Web. Sophisticated query engines that allow users to effectively tap the data stored in XML documents will be crucial to exploiting the full power of XML. While there has been a great deal of activity recently proposing new semi-structured data models and query languages for this purpose, this paper explores the more conservative approach of using traditional relational database engines for processing XML documents conforming to Document Type Descriptors(DTDs). In this paper, we describe how to generate relational schemas from XML DTDs. The main issues that must be addressed inc]ode (a) dealing with the complexity of DTD element specifications (b) resolving the conflict between the two-level nature of relational schemas (table and attribute) vs. the arbitrary nesting of XML DTD schemas and (c) dealing with set-valued attributes and recursion. We now propose a set of transformations that can be used to "simplify" any arbitrary DTD without undermining the effectiveness of queries over documents conforming to that DTD.

  • PDF

RELIABILITY ANALYSIS FOR THE TWO-PARAMETER PARETO DISTRIBUTION UNDER RECORD VALUES

  • Wang, Liang;Shi, Yimin;Chang, Ping
    • Journal of applied mathematics & informatics
    • /
    • v.29 no.5_6
    • /
    • pp.1435-1451
    • /
    • 2011
  • In this paper the estimation of the parameters as well as survival and hazard functions are presented for the two-parameter Pareto distribution by using Bayesian and non-Bayesian approaches under upper record values. Maximum likelihood estimation (MLE) and interval estimation are derived for the parameters. Bayes estimators of reliability performances are obtained under symmetric (Squared error) and asymmetric (Linex and general entropy (GE)) losses, when two parameters have discrete and continuous priors, respectively. Finally, two numerical examples with real data set and simulated data, are presented to illustrate the proposed method. An algorithm is introduced to generate records data, then a simulation study is performed and different estimates results are compared.