• Title/Summary/Keyword: Generative Data Augmentation

Search Result 42, Processing Time 0.026 seconds

MAGICal Synthesis: Memory-Efficient Approach for Generative Semiconductor Package Image Construction (MAGICal Synthesis: 반도체 패키지 이미지 생성을 위한 메모리 효율적 접근법)

  • Yunbin Chang;Wonyong Choi;Keejun Han
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.30 no.4
    • /
    • pp.69-78
    • /
    • 2023
  • With the rapid growth of artificial intelligence, the demand for semiconductors is enormously increasing everywhere. To ensure the manufacturing quality and quantity simultaneously, the importance of automatic defect detection during the packaging process has been re-visited by adapting various deep learning-based methodologies into automatic packaging defect inspection. Deep learning (DL) models require a large amount of data for training, but due to the nature of the semiconductor industry where security is important, sharing and labeling of relevant data is challenging, making it difficult for model training. In this study, we propose a new framework for securing sufficient data for DL models with fewer computing resources through a divide-and-conquer approach. The proposed method divides high-resolution images into pre-defined sub-regions and assigns conditional labels to each region, then trains individual sub-regions and boundaries with boundary loss inducing the globally coherent and seamless images. Afterwards, full-size image is reconstructed by combining divided sub-regions. The experimental results show that the images obtained through this research have high efficiency, consistency, quality, and generality.

Generative Adversarial Network Model for Generating Yard Stowage Situation in Container Terminal (컨테이너 터미널의 야드 장치 상태 생성을 위한 생성적 적대 신경망 모형)

  • Jae-Young Shin;Yeong-Il Kim;Hyun-Jun Cho
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.383-384
    • /
    • 2022
  • Following the development of technologies such as digital twin, IoT, and AI after the 4th industrial revolution, decision-making problems are being solved based on high-dimensional data analysis. This has recently been applied to the port logistics sector, and a number of studies on big data analysis, deep learning predictions, and simulations have been conducted on container terminals to improve port productivity. These high-dimensional data analysis techniques generally require a large number of data. However, the global port environment has changed due to the COVID-19 pandemic in 2020. It is not appropriate to apply data before the COVID-19 outbreak to the current port environment, and the data after the outbreak was not sufficiently collected to apply it to data analysis such as deep learning. Therefore, this study intends to present a port data augmentation method for data analysis as one of these problem-solving methods. To this end, we generate the container stowage situation of the yard through a generative adversarial neural network model in terms of container terminal operation, and verify similarity through statistical distribution verification between real and augmented data.

  • PDF

GAN-based research for high-resolution medical image generation (GAN 기반 고해상도 의료 영상 생성을 위한 연구)

  • Ko, Jae-Yeong;Cho, Baek-Hwan;Chung, Myung-Jin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.05a
    • /
    • pp.544-546
    • /
    • 2020
  • 의료 데이터를 이용하여 인공지능 기계학습 연구를 수행할 때 자주 마주하는 문제는 데이터 불균형, 데이터 부족 등이며 특히 정제된 충분한 데이터를 구하기 힘들다는 것이 큰 문제이다. 본 연구에서는 이를 해결하기 위해 GAN(Generative Adversarial Network) 기반 고해상도 의료 영상을 생성하는 프레임워크를 개발하고자 한다. 각 해상도 마다 Scale 의 Gradient 를 동시에 학습하여 빠르게 고해상도 이미지를 생성해낼 수 있도록 했다. 고해상도 이미지를 생성하는 Neural Network 를 고안하였으며, PGGAN, Style-GAN 과의 성능 비교를 통해 제안된 모델이 양질의 고해상도 의료영상 이미지를 더 빠르게 생성할 수 있음을 확인하였다. 이를 통해 인공지능 기계학습 연구에 있어서 의료 영상의 데이터 부족, 데이터 불균형 문제를 해결할 수 있는 Data augmentation 이나, Anomaly detection 등의 연구에 적용할 수 있다.

Semi-Supervised Data Augmentation Method for Korean Fact Verification Using Generative Language Models (자연어 생성 모델을 이용한 준지도 학습 기반 한국어 사실 확인 자료 구축)

  • Jeong, Jae-Hwan;Jeon, Dong-Hyeon;Kim, Seon-Hun;Gang, In-Ho
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.105-111
    • /
    • 2021
  • 한국어 사실 확인 과제는 학습 자료의 부재로 인해 연구에 어려움을 겪고 있다. 본 논문은 수작업으로 구성된 학습 자료를 토대로 자연어 생성 모델을 이용하여 한국어 사실 확인 자료를 구축하는 방법을 제안한다. 본 연구는 임의의 근거를 기반으로 하는 주장을 생성하는 방법 (E2C)과 임의의 주장을 기반으로 근거를 생성하는 방법 (C2E)을 모두 실험해보았다. 이때 기존 학습 자료에 위 두 학습 자료를 각각 추가하여 학습한 사실 확인 분류기가 기존의 학습 자료나 영문 사실 확인 자료 FEVER를 국문으로 기계 번역한 학습 자료를 토대로 구성된 분류기보다 평가 자료에 대해 높은 성능을 기록하였다. 또한, C2E 방법의 경우 수작업으로 구성된 자료 없이 기존의 자연어 추론 과제 자료와 HyperCLOVA Few Shot 예제만으로도 높은 성능을 기록하여, 비지도 학습 방식으로 사실 확인 자료를 구축할 수 있는 가능성 역시 확인하였다.

  • PDF

Analysis of Malware Image Data Augmentation based on GAN (GAN 기반의 악성코드 이미지 데이터 증강 분석)

  • Won-Jun Lee;ChangHoon Kang;Ah Reum Kang
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.99-100
    • /
    • 2024
  • 다양한 변종들의 존재와 잘 알려지지 않은 취약점을 이용한 공격은 악성코드 수집을 어렵게 하는 요인들이다. 부족한 악성코드 수를 보완하고자 생성 모델을 활용한 이미지 기반의 악성코드 데이터를 증강한 연구들도 존재하였다. 하지만 생성 모델이 실제 악성코드를 생성할 수 있는지에 대한 분석은 진행되지 않았다. 본 연구는 VGG-11 모델을 활용해 실제 악성코드와 생성된 악성코드 이미지의 이진 분류하였다. 실험 결과 VGG-11 모델은 99.9%의 정확도로 두 영상을 다르게 판단한다

  • PDF

Analysis and Forecast of Venture Capital Investment on Generative AI Startups: Focusing on the U.S. and South Korea (생성 AI 스타트업에 대한 벤처투자 분석과 예측: 미국과 한국을 중심으로)

  • Lee, Seungah;Jung, Taehyun
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.18 no.4
    • /
    • pp.21-35
    • /
    • 2023
  • Expectations surrounding generative AI technology and its profound ramifications are sweeping across various industrial domains. Given the anticipated pivotal role of the startup ecosystem in the utilization and advancement of generative AI technology, it is imperative to cultivate a deeper comprehension of the present state and distinctive attributes characterizing venture capital (VC) investments within this domain. The current investigation delves into South Korea's landscape of VC investment deals and prognosticates the projected VC investments by juxtaposing these against the United States, the frontrunner in the generative AI industry and its associated ecosystem. For analytical purposes, a compilation of 286 investment deals originating from 117 U.S. generative AI startups spanning the period from 2008 to 2023, as well as 144 investment deals from 42 South Korean generative AI startups covering the years 2011 to 2023, was amassed to construct new datasets. The outcomes of this endeavor reveal an upward trajectory in the count of VC investment deals within both the U.S. and South Korea during recent years. Predominantly, these deals have been concentrated within the early-stage investment realm. Noteworthy disparities between the two nations have also come to light. Specifically, in the U.S., in contrast to South Korea, the quantum of recent VC deals has escalated, marking an augmentation ranging from 285% to 488% in the corresponding developmental stage. While the interval between disparate investment stages demonstrated a slight elongation in South Korea relative to the U.S., this discrepancy did not achieve statistical significance. Furthermore, the proportion of VC investments channeled into generative AI enterprises, relative to the aggregate number of deals, exhibited a higher quotient in South Korea compared to the U.S. Upon a comprehensive sectoral breakdown of generative AI, it was discerned that within the U.S., 59.2% of total deals were concentrated in the text and model sectors, whereas in South Korea, 61.9% of deals centered around the video, image, and chat sectors. Through forecasting, the anticipated VC investments in South Korea from 2023 to 2029 were derived via four distinct models, culminating in an estimated average requirement of 3.4 trillion Korean won (ranging from at least 2.408 trillion won to a maximum of 5.919 trillion won). This research bears pragmatic significance as it methodically dissects VC investments within the generative AI domain across both the U.S. and South Korea, culminating in the presentation of an estimated VC investment projection for the latter. Furthermore, its academic significance lies in laying the groundwork for prospective scholarly inquiries by dissecting the current landscape of generative AI VC investments, a sphere that has hitherto remained void of rigorous academic investigation supported by empirical data. Additionally, the study introduces two innovative methodologies for the prediction of VC investment sums. Upon broader integration, application, and refinement of these methodologies within diverse academic explorations, they stand poised to enhance the prognosticative capacity pertaining to VC investment costs.

  • PDF

An Efficient Wireless Signal Classification Based on Data Augmentation (데이터 증강 기반 효율적인 무선 신호 분류 연구 )

  • Sangsoon Lim
    • Journal of Platform Technology
    • /
    • v.10 no.4
    • /
    • pp.47-55
    • /
    • 2022
  • Recently, diverse devices using different wireless technologies are gradually increasing in the IoT environment. In particular, it is essential to design an efficient feature extraction approach and detect the exact types of radio signals in order to accurately identify various radio signal modulation techniques. However, it is difficult to gather labeled wireless signal in a real environment due to the complexity of the process. In addition, various learning techniques based on deep learning have been proposed for wireless signal classification. In the case of deep learning, if the training dataset is not enough, it frequently meets the overfitting problem, which causes performance degradation of wireless signal classification techniques using deep learning models. In this paper, we propose a generative adversarial network(GAN) based on data augmentation techniques to improve classification performance when various wireless signals exist. When there are various types of wireless signals to be classified, if the amount of data representing a specific radio signal is small or unbalanced, the proposed solution is used to increase the amount of data related to the required wireless signal. In order to verify the validity of the proposed data augmentation algorithm, we generated the additional data for the specific wireless signal and implemented a CNN and LSTM-based wireless signal classifier based on the result of balancing. The experimental results show that the classification accuracy of the proposed solution is higher than when the data is unbalanced.

Waste Classification by Fine-Tuning Pre-trained CNN and GAN

  • Alsabei, Amani;Alsayed, Ashwaq;Alzahrani, Manar;Al-Shareef, Sarah
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.65-70
    • /
    • 2021
  • Waste accumulation is becoming a significant challenge in most urban areas and if it continues unchecked, is poised to have severe repercussions on our environment and health. The massive industrialisation in our cities has been followed by a commensurate waste creation that has become a bottleneck for even waste management systems. While recycling is a viable solution for waste management, it can be daunting to classify waste material for recycling accurately. In this study, transfer learning models were proposed to automatically classify wastes based on six materials (cardboard, glass, metal, paper, plastic, and trash). The tested pre-trained models were ResNet50, VGG16, InceptionV3, and Xception. Data augmentation was done using a Generative Adversarial Network (GAN) with various image generation percentages. It was found that models based on Xception and VGG16 were more robust. In contrast, models based on ResNet50 and InceptionV3 were sensitive to the added machine-generated images as the accuracy degrades significantly compared to training with no artificial data.

Enhancing CT Image Quality Using Conditional Generative Adversarial Networks for Applying Post-mortem Computed Tomography in Forensic Pathology: A Phantom Study (사후전산화단층촬영의 법의병리학 분야 활용을 위한 조건부 적대적 생성 신경망을 이용한 CT 영상의 해상도 개선: 팬텀 연구)

  • Yebin Yoon;Jinhaeng Heo;Yeji Kim;Hyejin Jo;Yongsu Yoon
    • Journal of radiological science and technology
    • /
    • v.46 no.4
    • /
    • pp.315-323
    • /
    • 2023
  • Post-mortem computed tomography (PMCT) is commonly employed in the field of forensic pathology. PMCT was mainly performed using a whole-body scan with a wide field of view (FOV), which lead to a decrease in spatial resolution due to the increased pixel size. This study aims to evaluate the potential for developing a super-resolution model based on conditional generative adversarial networks (CGAN) to enhance the image quality of CT. 1761 low-resolution images were obtained using a whole-body scan with a wide FOV of the head phantom, and 341 high-resolution images were obtained using the appropriate FOV for the head phantom. Of the 150 paired images in the total dataset, which were divided into training set (96 paired images) and validation set (54 paired images). Data augmentation was perform to improve the effectiveness of training by implementing rotations and flips. To evaluate the performance of the proposed model, we used the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) and Deep Image Structure and Texture Similarity (DISTS). Obtained the PSNR, SSIM, and DISTS values of the entire image and the Medial orbital wall, the zygomatic arch, and the temporal bone, where fractures often occur during head trauma. The proposed method demonstrated improvements in values of PSNR by 13.14%, SSIM by 13.10% and DISTS by 45.45% when compared to low-resolution images. The image quality of the three areas where fractures commonly occur during head trauma has also improved compared to low-resolution images.

Generating Sponsored Blog Texts through Fine-Tuning of Korean LLMs (한국어 언어모델 파인튜닝을 통한 협찬 블로그 텍스트 생성)

  • Bo Kyeong Kim;Jae Yeon Byun;Kyung-Ae Cha
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.3
    • /
    • pp.1-12
    • /
    • 2024
  • In this paper, we fine-tuned KoAlpaca, a large-scale Korean language model, and implemented a blog text generation system utilizing it. Blogs on social media platforms are widely used as a marketing tool for businesses. We constructed training data of positive reviews through emotion analysis and refinement of collected sponsored blog texts and applied QLoRA for the lightweight training of KoAlpaca. QLoRA is a fine-tuning approach that significantly reduces the memory usage required for training, with experiments in an environment with a parameter size of 12.8B showing up to a 58.8% decrease in memory usage compared to LoRA. To evaluate the generative performance of the fine-tuned model, texts generated from 100 inputs not included in the training data produced on average more than twice the number of words compared to the pre-trained model, with texts of positive sentiment also appearing more than twice as often. In a survey conducted for qualitative evaluation of generative performance, responses indicated that the fine-tuned model's generated outputs were more relevant to the given topics on average 77.5% of the time. This demonstrates that the positive review generation language model for sponsored content in this paper can enhance the efficiency of time management for content creation and ensure consistent marketing effects. However, to reduce the generation of content that deviates from the category of positive reviews due to elements of the pre-trained model, we plan to proceed with fine-tuning using the augmentation of training data.