• 제목/요약/키워드: Dataset Generation

검색결과 196건 처리시간 0.029초

Counterfactual image generation by disentangling data attributes with deep generative models

  • Jieon Lim;Weonyoung Joo
    • Communications for Statistical Applications and Methods
    • /
    • 제30권6호
    • /
    • pp.589-603
    • /
    • 2023
  • Deep generative models target to infer the underlying true data distribution, and it leads to a huge success in generating fake-but-realistic data. Regarding such a perspective, the data attributes can be a crucial factor in the data generation process since non-existent counterfactual samples can be generated by altering certain factors. For example, we can generate new portrait images by flipping the gender attribute or altering the hair color attributes. This paper proposes counterfactual disentangled variational autoencoder generative adversarial networks (CDVAE-GAN), specialized for data attribute level counterfactual data generation. The structure of the proposed CDVAE-GAN consists of variational autoencoders and generative adversarial networks. Specifically, we adopt a Gaussian variational autoencoder to extract low-dimensional disentangled data features and auxiliary Bernoulli latent variables to model the data attributes separately. Also, we utilize a generative adversarial network to generate data with high fidelity. By enjoying the benefits of the variational autoencoder with the additional Bernoulli latent variables and the generative adversarial network, the proposed CDVAE-GAN can control the data attributes, and it enables producing counterfactual data. Our experimental result on the CelebA dataset qualitatively shows that the generated samples from CDVAE-GAN are realistic. Also, the quantitative results support that the proposed model can produce data that can deceive other machine learning classifiers with the altered data attributes.

멀티턴 대화에서 윤리적인 발화 생성을 위한 새로운 데이터 세트 (A New Dataset for Ethical Dialogue Generation in Multi-Turn Conversations)

  • 장빈;김서현;박규병
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2022년도 추계학술발표대회
    • /
    • pp.446-448
    • /
    • 2022
  • 별개의 분류 모델을 이용하여 비윤리 발화를 억제하려 했던 과거의 시도들과는 달리, 본 연구에서는 데이터 추가를 통한 발화 생성 단계에서의 윤리성 체화에 대해 실험하였다. 본 연구에서는 분류 모델로는 감지하기 어려운 멀티턴 비윤리 공격으로 이루어진 새로운 대화 데이터 세트를 소개하고, 해당 데이터 세트를 통해 개선된 챗봇 대화 모델의 방어 성능을 공개한다.

하이라이트 비디오 생성을 위한 데이터셋 구축을 위한 비디오 탐색 알고리즘 (Video Retrieval Algorithm for Building a Dataset for Highlight Video Generation)

  • 송기연;이재환
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2024년도 춘계학술발표대회
    • /
    • pp.517-518
    • /
    • 2024
  • 본 연구에서는 특정 비디오에서 추출된 비디오 클립이 어떤 비디오에서 추출된 것인지 탐색하는 알고리즘을 제안한다. 국내 이스포츠 리그 중 하나인 LCK의 경기 영상과 하이라이트 영상을 수집하여 알고리즘의 성능을 테스트하였다. 본 연구에서 제안한 알고리즘은 하이라이트 비디오 추출 모델개발에 필요한 비디오-하이라이트 클립 데이터셋을 구축하는 데 도움이 될 것이라 기대한다.

1인가구의 세대별 삶의 만족도 영향요인 비교 연구 (Generation Comparison of the Factors Affecting Life Satisfaction of One-person Households)

  • 김미숙;김안나
    • 한국학교ㆍ지역보건교육학회지
    • /
    • 제21권1호
    • /
    • pp.15-31
    • /
    • 2020
  • Background and objectives: One-person households are the fastest growing family type in Korea. They bring social concerns such as weakened social integration, increasing poverty and social isolation. They are not homogeneous but heterogeneous groups depending on their socio-demographic characteristics including generation. This study compared the life satisfaction level as well as factors affecting it among the three one-person household generations Methods: The 13th wave of the Korea Welfare Panel dataset with 1,187 respondents was utilized. For data analysis Chi-square test, analysis of variance and hierarchical regression analysis were employed. Generations are divided into three, namely young adult (20-39), the middle-aged (40-64) and the elderly (65 and over). Result: The life satisfaction level was highest among the young adult one-person household generation, followed by the middle-aged and the elderly generation. The common factors affecting life satisfaction of the three generations were physical as well as mental health, including self-esteem and depression. However, there were more factors different from generation to generation. As for the young adult, age, religion, and smoking were significant. As for the middle aged and the elderly, gender (male) and income were significant. Additionally, age, home-ownership and drinking were significant to the elderly generation. Conclusions: As there are differences as well as similarities among the three generations, policies for one-person households need to be devised considering these findings. For all generations, both physical and mental health policies are needed. For young adult strengthening social relations, providing decent jobs, and promoting anti-smoking policy are major agenda, and for the middle-aged and the elderly, assisting in social capital accumulation (for male), providing stable jobs and diverse leisure activities, and securing income. Additionally, for the elderly, expanding the social security system and housing support are needed.

Topographic Information Extraction from Kompsat Satellite Stereo Data Using SGM

  • Jang, Yeong Jae;Lee, Jae Wang;Oh, Jae Hong
    • 한국측량학회지
    • /
    • 제37권5호
    • /
    • pp.315-322
    • /
    • 2019
  • DSM (Digital Surface Model) is a digital representation of ground surface topography or terrain that is widely used for hydrology, slope analysis, and urban planning. Aerial photogrammetry and LiDAR (Light Detection And Ranging) are main technology for urban DSM generation but high-resolution satellite imagery is the only ingredient for remote inaccessible areas. Traditional automated DSM generation method is based on correlation-based methods but recent study shows that a modern pixelwise image matching method, SGM (Semi-Global Matching) can be an alternative. Therefore this study investigated the application of SGM for Kompsat satellite data of KARI (Korea Aerospace Research Institute). Firstly, the sensor modeling was carried out for precise ground-to-image computation, followed by the epipolar image resampling for efficient stereo processing. Secondly, SGM was applied using different parameterizations. The generated DSM was evaluated with a reference DSM generated by the first pulse returns of the LIDAR reference dataset.

Multi-layered attentional peephole convolutional LSTM for abstractive text summarization

  • Rahman, Md. Motiur;Siddiqui, Fazlul Hasan
    • ETRI Journal
    • /
    • 제43권2호
    • /
    • pp.288-298
    • /
    • 2021
  • Abstractive text summarization is a process of making a summary of a given text by paraphrasing the facts of the text while keeping the meaning intact. The manmade summary generation process is laborious and time-consuming. We present here a summary generation model that is based on multilayered attentional peephole convolutional long short-term memory (MAPCoL; LSTM) in order to extract abstractive summaries of large text in an automated manner. We added the concept of attention in a peephole convolutional LSTM to improve the overall quality of a summary by giving weights to important parts of the source text during training. We evaluated the performance with regard to semantic coherence of our MAPCoL model over a popular dataset named CNN/Daily Mail, and found that MAPCoL outperformed other traditional LSTM-based models. We found improvements in the performance of MAPCoL in different internal settings when compared to state-of-the-art models of abstractive text summarization.

Computer Science Research Ideas Generation Using Neural Networks

  • Maghraby, Ashwag;Assaeed, Joanna
    • International Journal of Computer Science & Network Security
    • /
    • 제22권6호
    • /
    • pp.127-130
    • /
    • 2022
  • The number of published journals, conferences, and research papers in computer science is increasing rapidly, which has led to a challenge in coming up with new and unique ideas for research. To alleviate the issue, this paper uses artificial neural networks (ANNs) to generate new computer science research ideas. It does so by using a dataset collected from IEEE published journals and conferences to train an ANN model. The results reveal that the model has a 14% success rate in generating usable ideas. The outcome of this paper has implications for helping both new and experienced researchers come up with novel research topics.

사전학습모델을 활용한 수학학습 도구 자동 생성 시스템 (Automatic Generation System of Mathematical Learning Tools Using Pretrained Models)

  • 노명성
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2023년도 제68차 하계학술대회논문집 31권2호
    • /
    • pp.713-714
    • /
    • 2023
  • 본 논문에서는 사전학습모델을 활용한 수학학습 도구 자동 생성 시스템을 제안한다. 본 시스템은 사전학습모델을 활용하여 수학학습 도구를 교과과정 및 단원, 유형별로 다각화하여 자동 생성하고 사전학습모델을 자체 구축한 Dataset을 이용해 Fine-tuning하여 학생들에게 적절한 학습 도구와 적절치 않은 학습 도구를 분류하여 학습 도구의 품질을 높이었다. 본 시스템을 활용하여 학생들에게 양질의 수학학습 도구를 많은 양으로 제공해 줄 수 있는 초석을 다지었으며, 추후 AI 교과서와의 융합연구의 가능성도 열게 되었다.

  • PDF

이미지의 질과 왜곡을 고려한 적대적 생성 신경망과 이를 이용한 비정상 검출 (Anomaly Detection of Generative Adversarial Networks considering Quality and Distortion of Images)

  • 서태문;강민국;강동중
    • 한국인터넷방송통신학회논문지
    • /
    • 제20권3호
    • /
    • pp.171-179
    • /
    • 2020
  • 최근 연구 결과에 따르면, 컨볼루션 신경 회로망은 이미지 분류, 객체 검출, 이미지 생성 등의 문제에서 최고의 성능을 보여주고 있다. 비전 카메라를 사용한 결함 검사는 다른 결함 검사보다 경제적이기 때문에 공장 자동화에 있어서 아주 중요하고, 딥러닝의 지도학습은 전통 기계학습 방식의 결함 검사 성능을 월등히 뛰어넘었다. 하지만, 딥러닝의 지도학습은 엄청난 양의 데이터 주석 작업을 요구하기 때문에, 이를 실제 산업 현장에 적용하는 것은 효율적이지 않다. 따라서 본 연구는 최근 이미지 생성 과업에서 큰 성공을 보여주고 있는 변분 오토인코더와 적대적 생성 신경망을 활용하여 비지도 방식의 비정상 검출을 위한 신경망 회로 구조를 제안하였고, 이를 MNIST, 용접 결함 데이터에 적용하여 비정상 검출 성능을 검증하였다.

Temporal Search Algorithm for Multiple-Pedestrian Tracking

  • Yu, Hye-Yeon;Kim, Young-Nam;Kim, Moon-Hyun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권5호
    • /
    • pp.2310-2325
    • /
    • 2016
  • In this paper, we provide a trajectory-generation algorithm that can identify pedestrians in real time. Typically, the contours for the extraction of pedestrians from the foreground of images are not clear due to factors including brightness and shade; furthermore, pedestrians move in different directions and interact with each other. These issues mean that the identification of pedestrians and the generation of trajectories are somewhat difficult. We propose a new method for trajectory generation regarding multiple pedestrians. The first stage of the method distinguishes between those pedestrian-blob situations that need to be merged and those that require splitting, followed by the use of trained decision trees to separate the pedestrians. The second stage generates the trajectories of each pedestrian by using the point-correspondence method; however, we introduce a new point-correspondence algorithm for which the A* search method has been modified. By using fuzzy membership functions, a heuristic evaluation of the correspondence between the blobs was also conducted. The proposed method was implemented and tested with the PETS 2009 dataset to show an effective multiple-pedestrian-tracking capability in a pedestrian-interaction environment.