• Title/Summary/Keyword: Generated Data

Search Result 6,856, Processing Time 0.037 seconds

Document Image Binarization by GAN with Unpaired Data Training

  • Dang, Quang-Vinh;Lee, Guee-Sang
    • International Journal of Contents
    • /
    • v.16 no.2
    • /
    • pp.8-18
    • /
    • 2020
  • Data is critical in deep learning but the scarcity of data often occurs in research, especially in the preparation of the paired training data. In this paper, document image binarization with unpaired data is studied by introducing adversarial learning, excluding the need for supervised or labeled datasets. However, the simple extension of the previous unpaired training to binarization inevitably leads to poor performance compared to paired data training. Thus, a new deep learning approach is proposed by introducing a multi-diversity of higher quality generated images. In this paper, a two-stage model is proposed that comprises the generative adversarial network (GAN) followed by the U-net network. In the first stage, the GAN uses the unpaired image data to create paired image data. With the second stage, the generated paired image data are passed through the U-net network for binarization. Thus, the trained U-net becomes the binarization model during the testing. The proposed model has been evaluated over the publicly available DIBCO dataset and it outperforms other techniques on unpaired training data. The paper shows the potential of using unpaired data for binarization, for the first time in the literature, which can be further improved to replace paired data training for binarization in the future.

Design and Implementation of Security Technique in Electronic Signature System (전자결재 시스템에서 보안기법 설계 및 구현)

  • 유영모;강성수;김완규;송진국
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2001.10a
    • /
    • pp.491-498
    • /
    • 2001
  • IN this Paper we propose an encryption algorithm for security in data communication. this algorithm acts encryption operation after the compression of data in order to reduce the transmission time and storage an encryption key is generated by using a parameter. as soon as key value is generated the parameter is transmitted and key is recreated every 26 times of parameter changing. the random number which is a constituent unit of encryption key is stored in a table the table is reorganized when the key is generated 40 times in order to intensity the security of encryption key. the encryption of data is made through the operation process of the generated key and sour data and the decryption performs the revers operation of encryption after getting decryption key by searching the transmitted parameter. as this algorithm is performed lastly it is possible to be used in practice.

  • PDF

EVALUATION OF AN ENHANCED WEATHER GENERATION TOOL FOR SAN ANTONIO CLIMATE STATION IN TEXAS

  • Lee, Ju-Young
    • Water Engineering Research
    • /
    • v.5 no.1
    • /
    • pp.47-54
    • /
    • 2004
  • Several computer programs have been developed to make stochastically generated weather data from observed daily data. But they require fully dataset to run WGEN. Mostly, meterological data frequently have sporadic missing data as well as totally missing data. The modified WGEN has data filling algorithm for incomplete meterological datasets. Any other WGEN models have not the function of data filling. Modified WGEN with data filling algorithm is processing from the equation of Matalas for first order autoregressive process on a multi dimensional state with known cross and auto correlations among state variables. The parameters of the equation of Matalas are derived from existing dataset and derived parameters are adopted to fill data. In case of WGEN (Richardson and Wright, 1984), it is one of most widely used weather generators. But it has to be modified and added. It uses an exponential distribution to generate precipitation amounts. An exponential distribution is easier to describe the distribution of precipitation amounts. But precipitation data with using exponential distribution has not been expressed well. In this paper, generated precipitation data from WGEN and Modified WGEN were compared with corresponding measured data as statistic parameters. The modified WGEN adopted a formula of CLIGEN for WEPP (Water Erosion Prediction Project) in USDA in 1985. In this paper, the result of other parameters except precipitation is not introduced. It will be introduced through study of verification and review soon

  • PDF

Economic Valuation of Public Sector Data: A Case Study on Small Business Credit Guarantee Data (공공부문 데이터의 경제적 가치평가 연구: 소상공인 신용보증 데이터 사례)

  • Kim, Dong Sung;Kim, Jong Woo;Lee, Hong Joo;Kang, Man Su
    • Knowledge Management Research
    • /
    • v.18 no.1
    • /
    • pp.67-81
    • /
    • 2017
  • As the important breakthrough continues in the field of machine learning and artificial intelligence recently, there has been a growing interest in the analysis and the utilization of the big data which constitutes a foundation for the field. In this background, while the economic value of the data held by the corporates and public institutions is well recognized, the research on the evaluation of its economic value is still insufficient. Therefore, in this study, as a part of the economic value evaluation of the data, we have conducted the economic value measurement of the data generated through the small business guarantee program of Korean Federation of Credit Guarantee Foundations (KOREG). To this end, by examining the previous research related to the economic value measurement of the data and intangible assets at home and abroad, we established the evaluation methods and conducted the empirical analysis. For the data value measurements in this paper, we used 'cost-based approach', 'revenue-based approach', and 'market-based approach'. In order to secure the reliability of the measured result of economic values generated through each approach, we conducted expert verification with the employees. Also, we derived the major considerations and issues in regards to the economic value measurement of the data. These will be able to contribute to the empirical methods for economic value measurement of the data in the future.

Hadoop Based Wavelet Histogram for Big Data in Cloud

  • Kim, Jeong-Joon
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.668-676
    • /
    • 2017
  • Recently, the importance of big data has been emphasized with the development of smartphone, web/SNS. As a result, MapReduce, which can efficiently process big data, is receiving worldwide attention because of its excellent scalability and stability. Since big data has a large amount, fast creation speed, and various properties, it is more efficient to process big data summary information than big data itself. Wavelet histogram, which is a typical data summary information generation technique, can generate optimal data summary information that does not cause loss of information of original data. Therefore, a system applying a wavelet histogram generation technique based on MapReduce has been actively studied. However, existing research has a disadvantage in that the generation speed is slow because the wavelet histogram is generated through one or more MapReduce Jobs. And there is a high possibility that the error of the data restored by the wavelet histogram becomes large. However, since the wavelet histogram generation system based on the MapReduce developed in this paper generates the wavelet histogram through one MapReduce Job, the generation speed can be greatly increased. In addition, since the wavelet histogram is generated by adjusting the error boundary specified by the user, the error of the restored data can be adjusted from the wavelet histogram. Finally, we verified the efficiency of the wavelet histogram generation system developed in this paper through performance evaluation.

Open Internet of Things Data Service Management (공공 사물 데이터 서비스 구축 방안)

  • Chae, C.J.;Choe, J.;Lee, S.H.;Koo, H.J.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.23 no.1
    • /
    • pp.105-115
    • /
    • 2021
  • In this paper, we surveyed the service system disclosed by the government and self-governing province to analyze the status of IoT data service. Survey conditions were focused on data generated from objects such as sensors, OpenAPI that can utilize the generated data and data having a data update cycle of less than 1 month. As a result of the survey, the ratio of IoT data to data released by the government, self-governing province and the private sector was only 1.2%. Therefore, in order to increase the utilization with the development of IoT technology, a dedicated organization that can manage the IoT data service is needed.

Mitigating Data Imbalance in Credit Prediction using the Diffusion Model (Diffusion Model을 활용한 신용 예측 데이터 불균형 해결 기법)

  • Sangmin Oh;Juhong Lee
    • Smart Media Journal
    • /
    • v.13 no.2
    • /
    • pp.9-15
    • /
    • 2024
  • In this paper, a Diffusion Multi-step Classifier (DMC) is proposed to address the imbalance issue in credit prediction. DMC utilizes a Diffusion Model to generate continuous numerical data from credit prediction data and creates categorical data through a Multi-step Classifier. Compared to other algorithms generating synthetic data, DMC produces data with a distribution more similar to real data. Using DMC, data that closely resemble actual data can be generated, outperforming other algorithms for data generation. When experiments were conducted using the generated data, the probability of predicting delinquencies increased by over 20%, and overall predictive accuracy improved by approximately 4%. These research findings are anticipated to significantly contribute to reducing delinquency rates and increasing profits when applied in actual financial institutions.

A Study on the Radioactive Products of Components in Proton Accelerator on Short Term Usage Using Computed Simulation (몬테칼로 시뮬레이션을 활용한 양성자가속기 단기사용 시 구성품의 방사화 평가)

  • Bae, Sang-Il;Kim, Jung-Hoon
    • Journal of radiological science and technology
    • /
    • v.43 no.5
    • /
    • pp.389-395
    • /
    • 2020
  • The evaluation of radioactivated components of heavy-ion accelerator facilities affects the safety of radiation management and the exposure dose for workers. and this is an important issue when predicting the disposal cost of waste during maintenance and dismantling of accelerator facilities. In this study, the FLUKA code was used to simulate the proton treatment device nozzle and classify the radio-nuclides and total radioactivity generated by each component over a short period of time. The source term was evaluated using NIST reference beam data, and the neutron flux generated for each component was calculated using the evaluated beam data. Radioactive isotopes caused by generated neutrons were compared and evaluated using nuclide information from the International Radiation Protection Association and the Korea Radioisotope association. Most of the nuclides produced form of beta rays and electron capture, and short-lived nuclides dominated. However, In the case of 54Mn, which is a radioactive product of iron, the effect of gamma rays should be considered. In the case of tritium generated from a material with a low atomic number, it is considered that handling care should be taken due to its long half-life.

Point Cloud Generation Method Based on Lidar and Stereo Camera for Creating Virtual Space (가상공간 생성을 위한 라이다와 스테레오 카메라 기반 포인트 클라우드 생성 방안)

  • Lim, Yo Han;Jeong, In Hyeok;Lee, San Sung;Hwang, Sung Soo
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.11
    • /
    • pp.1518-1525
    • /
    • 2021
  • Due to the growth of VR industry and rise of digital twin industry, the importance of implementing 3D data same as real space is increasing. However, the fact that it requires expertise personnel and huge amount of time is a problem. In this paper, we propose a system that generates point cloud data with same shape and color as a real space, just by scanning the space. The proposed system integrates 3D geometric information from lidar and color information from stereo camera into one point cloud. Since the number of 3D points generated by lidar is not enough to express a real space with good quality, some of the pixels of 2D image generated by camera are mapped to the correct 3D coordinate to increase the number of points. Additionally, to minimize the capacity, overlapping points are filtered out so that only one point exists in the same 3D coordinates. Finally, 6DoF pose information generated from lidar point cloud is replaced with the one generated from camera image to position the points to a more accurate place. Experimental results show that the proposed system easily and quickly generates point clouds very similar to the scanned space.

Knowledge Transfer Using User-Generated Data within Real-Time Cloud Services

  • Zhang, Jing;Pan, Jianhan;Cai, Zhicheng;Li, Min;Cui, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.77-92
    • /
    • 2020
  • When automatic speech recognition (ASR) is provided as a cloud service, it is easy to collect voice and application domain data from users. Harnessing these data will facilitate the provision of more personalized services. In this paper, we demonstrate our transfer learning-based knowledge service that built with the user-generated data collected through our novel system that deliveries personalized ASR service. First, we discuss the motivation, challenges, and prospects of building up such a knowledge-based service-oriented system. Second, we present a Quadruple Transfer Learning (QTL) method that can learn a classification model from a source domain and transfer it to a target domain. Third, we provide an overview architecture of our novel system that collects voice data from mobile users, labels the data via crowdsourcing, utilises these collected user-generated data to train different machine learning models, and delivers the personalised real-time cloud services. Finally, we use the E-Book data collected from our system to train classification models and apply them in the smart TV domain, and the experimental results show that our QTL method is effective in two classification tasks, which confirms that the knowledge transfer provides a value-added service for the upper-layer mobile applications in different domains.