• Title/Summary/Keyword: Video Synthesis Network

Search Result 18, Processing Time 0.02 seconds

Style Synthesis of Speech Videos Through Generative Adversarial Neural Networks (적대적 생성 신경망을 통한 얼굴 비디오 스타일 합성 연구)

  • Choi, Hee Jo;Park, Goo Man
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.11
    • /
    • pp.465-472
    • /
    • 2022
  • In this paper, the style synthesis network is trained to generate style-synthesized video through the style synthesis through training Stylegan and the video synthesis network for video synthesis. In order to improve the point that the gaze or expression does not transfer stably, 3D face restoration technology is applied to control important features such as the pose, gaze, and expression of the head using 3D face information. In addition, by training the discriminators for the dynamics, mouth shape, image, and gaze of the Head2head network, it is possible to create a stable style synthesis video that maintains more probabilities and consistency. Using the FaceForensic dataset and the MetFace dataset, it was confirmed that the performance was increased by converting one video into another video while maintaining the consistent movement of the target face, and generating natural data through video synthesis using 3D face information from the source video's face.

5D Light Field Synthesis from a Monocular Video (단안 비디오로부터의 5차원 라이트필드 비디오 합성)

  • Bae, Kyuho;Ivan, Andre;Park, In Kyu
    • Journal of Broadcast Engineering
    • /
    • v.24 no.5
    • /
    • pp.755-764
    • /
    • 2019
  • Currently commercially available light field cameras are difficult to acquire 5D light field video since it can only acquire the still images or high price of the device. In order to solve these problems, we propose a deep learning based method for synthesizing the light field video from monocular video. To solve the problem of obtaining the light field video training data, we use UnrealCV to acquire synthetic light field data by realistic rendering of 3D graphic scene and use it for training. The proposed deep running framework synthesizes the light field video with each sub-aperture image (SAI) of $9{\times}9$ from the input monocular video. The proposed network consists of a network for predicting the appearance flow from the input image converted to the luminance image, and a network for predicting the optical flow between the adjacent light field video frames obtained from the appearance flow.

Study on the Video Stabilizer based on a Triplet CNN and Training Dataset Synthesis (Triplet CNN과 학습 데이터 합성 기반 비디오 안정화기 연구)

  • Yang, Byongho;Lee, Myeong-jin
    • Journal of Broadcast Engineering
    • /
    • v.25 no.3
    • /
    • pp.428-438
    • /
    • 2020
  • The jitter in the digital videos lowers the visibility and degrades the efficiency of image processing and image compressing. In this paper, we propose a video stabilizer architecture based on triplet CNN and a method of synthesizing training datasets based on video synthesis. Compared with a conventional deep-learning video stabilization method, the proposed video stabilizer can reduce wobbling distortion.

Interaction art using Video Synthesis Technology

  • Kim, Sung-Soo;Eom, Hyun-Young;Lim, Chan
    • International Journal of Advanced Culture Technology
    • /
    • v.7 no.2
    • /
    • pp.195-200
    • /
    • 2019
  • Media art, which is a combination of media technology and art, is making a lot of progress in combination with AI, IoT and VR. This paper aims to meet people's needs by creating a video that simulates the dance moves of an object that users admire by using media art that features interactive interactions between users and works. The project proposed a universal image synthesis system that minimizes equipment constraints by utilizing a deep running-based Skeleton estimation system and one of the deep-running neural network structures, rather than a Kinect-based Skeleton image. The results of the experiment showed that the images implemented through the deep learning system were successful in generating the same results as the user did when they actually danced through inference and synthesis of motion that they did not actually behave.

Implementation Method for DASH-based Free-viewpoint Video Streaming System (DASH 기반 자유시점 비디오 스트리밍 시스템 구현)

  • Seo, Minjae;Paik, Jong-ho
    • Journal of Internet Computing and Services
    • /
    • v.20 no.1
    • /
    • pp.47-55
    • /
    • 2019
  • Free-viewpoint video (FVV) service provides multi viewpoints of contents and synthesizes intermediate video files which are not captured on some view angles so that enables users to watch as they choose wherever they want. Synthesizing video is necessary technique to provide FVV video service, because every video of the FVV contents for different view angles cannot be stored to the content server physically. For the reason, fast view synthesis can improve the quality of video service and increase user's satisfaction. One of the studies for FVV service, a method was proposed to transmit FVV service based on DASH (Dynamic Adaptive Streaming over HTTP). There is big advantage on using DASH that it is commonly used to transport video service. However, the method was only a conceptual proposal, so it is difficult to implement the system using the proposal. In this paper, we propose an implementation method to provide real-time FVV service smoothly. We suggest a system structure and operation method on the server and client side in detail, which is to be applicable to synthesize video quickly. Also, we suggest generating FVV service map additionally which controls a FVV service overall. We manage real-time information of the whole service through the service map. The service can be controlled by reducing the possible delay from network situation.

Deep Learning Framework for 5D Light Field Synthesis from Single Video (단안 비디오로부터의 5D 라이트필드 비디오 합성 프레임워크)

  • Bae, Kyuho;Ivan, Andre;Park, In Kyu
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.06a
    • /
    • pp.150-152
    • /
    • 2019
  • 본 논문에서는 기존의 연구를 극복하여 단일 영상이 아닌 단안 비디오로부터 5D 라이트필드 영상을 합성하는 딥러닝 프레임워크를 제안한다. 현재 일반적으로 사용 가능한 Lytro Illum 카메라 등은 초당 3프레임의 비디오만을 취득할 수 있기 때문에 학습용 데이터로 사용하기에 어려움이 있다. 이러한 문제점을 해결하기 위해 본 논문에서는 가상 환경 데이터를 구성하며 이를 위해 UnrealCV를 활용하여 사실적 그래픽 렌더링에 의한 데이터를 취득하고 이를 학습에 사용한다. 제안하는 딥러닝 프레임워크는 두 개의 입력 단안 비디오에서 $5{\times}5$의 각 SAI(sub-aperture image)를 갖는 라이트필드 비디오를 합성한다. 제안하는 네트워크는 luminance 영상으로 변환된 입력 영상으로부터 appearance flow를 추측하는 플로우 추측 네트워크(flow estimation network), appearance flow로부터 얻어진 두 개의 라이트필드 비디오 프레임 간의 optical flow를 추측하는 광학 플로우 추측 네트워크(optical flow estimation network)로 구성되어있다.

  • PDF

From Multimedia Data Mining to Multimedia Big Data Mining

  • Constantin, Gradinaru Bogdanel;Mirela, Danubianu;Luminita, Barila Adina
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.11
    • /
    • pp.381-389
    • /
    • 2022
  • With the collection of huge volumes of text, image, audio, video or combinations of these, in a word multimedia data, the need to explore them in order to discover possible new, unexpected and possibly valuable information for decision making was born. Starting from the already existing data mining, but not as its extension, multimedia mining appeared as a distinct field with increased complexity and many characteristic aspects. Later, the concept of big data was extended to multimedia, resulting in multimedia big data, which in turn attracted the multimedia big data mining process. This paper aims to survey multimedia data mining, starting from the general concept and following the transition from multimedia data mining to multimedia big data mining, through an up-to-date synthesis of works in the field, which is a novelty, from our best of knowledge.

Interactive System using Multiple Signal Processing (다중신호처리를 이용한 인터렉티브 시스템)

  • Kim, Sung-Ill;Yang, Hyo-Sik;Shin, Wee-Jae;Park, Nam-Chun;Oh, Se-Jin
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2005.11a
    • /
    • pp.282-285
    • /
    • 2005
  • This paper discusses the interactive system for smart home environments. In order to realize this, the main emphasis of the paper lies on the description of the multiple signal processing on the basis of the technologies such as fingerprint recognition, video signal processing, speech recognition and synthesis. For essential modules of the interactive system, we adopted the motion detector based on the changes of brightness in pixels as well as the fingerprint identification for adapting home environments to the inhabitants. In addition, the real-time speech recognizer based on the HM-Net(Hidden Markov Network) and the speech synthesis were incorporated into the overall system for interaction between user and system. In experimental evaluation, the results showed that the proposed system was easy to use because the system was able to give special services for specific users in smart home environments, even though the performance of the speech recognizer was not better than the simulation results owing to the noisy environments.

  • PDF

Virtual Dialog System Based on Multimedia Signal Processing for Smart Home Environments (멀티미디어 신호처리에 기초한 스마트홈 가상대화 시스템)

  • Kim, Sung-Ill;Oh, Se-Jin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.2
    • /
    • pp.173-178
    • /
    • 2005
  • This paper focuses on the use of the virtual dialog system whose aim is to build more convenient living environments. In order to realize this, the main emphasis of the paper lies on the description of the multimedia signal processing on the basis of the technologies such as speech recognition, speech synthesis, video, or sensor signal processing. For essential modules of the dialog system, we incorporated the real-time speech recognizer based on HM-Net(Hidden Markov Network) as well as speech synthesis into the overall system. In addition, we adopted the real-time motion detector based on the changes of brightness in pixels, as well as the touch sensor that was used to start system. In experimental evaluation, the results showed that the proposed system was relatively easy to use for controlling electric appliances while sitting in a sofa, even though the performance of the system was not better than the simulation results owing to the noisy environments.

High Resolution Video Synthesis with a Hybrid Camera (하이브리드 카메라를 이용한 고해상도 비디오 합성)

  • Kim, Jong-Won;Kyung, Min-Ho
    • Journal of the Korea Computer Graphics Society
    • /
    • v.13 no.4
    • /
    • pp.7-12
    • /
    • 2007
  • With the advent of digital cinema, more and more movies are digitally produced, distributed via digital medium such as hard drives and network, and finally projected using a digital projector. However, digital cameras capable of shotting at 2K or higher resolution for digital cinema are still very expensive and bulky, which impedes rapid transition to digital production. As a low-cost solution for acquiring high resolution digital videos, we propose a hybrid camera consisting of a low-resolution CCD for capturing videos and a high-resolution CCD for capturing still images at regular intervals. From the output of the hybrid camera, we can synthesize high-resolution videos by software as follows: for each frame, 1. find pixel correspondences from the current frame to the previous and subsequent keyframes associated with high resolution still images, 2. synthesize a high-resolution image for the current frame by copying the image blocks associated with the corresponding pixels from the high-resolution keyframe images, and 3. complete the synthesis by filling holes in the synthesized image. This framework can be extended to making NPR video effects and capturing HDR videos.

  • PDF