• 제목/요약/키워드: Audio-driven

검색결과 20건 처리시간 0.037초

Emotional-Controllable Talking Face Generation on Real-Time System

  • Van-Thien Phan;Hyung-Jeong Yang;Seung-Won Kim;Ji-Eun Shin;Soo-Hyung Kim
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2024년도 추계학술발표대회
    • /
    • pp.523-526
    • /
    • 2024
  • Recent progress in audio-driven talking face generation has focused on achieving more realistic and emotionally expressive lip movements, enhancing the quality of virtual avatars and animated characters for applications in entertainment, education, healthcare, and more. Despite these advances, challenges remain in creating natural and emotionally nuanced lip synchronization efficiently and accurately. To address these issues, we introduce a novel method for audio-driven lip-sync that offers precise control over emotional expressions, outperforming current techniques. Our method utilizes Conditional Deep Variational Autoencoder to produce lifelike lip movements that align seamlessly with audio inputs while dynamically adjusting for various emotional states. Experimental results highlight the advantages of our approach, showing significant improvements in emotional accuracy and the overall quality of the generated facial animations, video sequences on the Crema-D dataset [1].

A Beamforming-Based Video-Zoom Driven Audio-Zoom Algorithm for Portable Digital Imaging Devices

  • Park, Nam In;Kim, Seon Man;Kim, Hong Kook;Kim, Myeong Bo;Kim, Sang Ryong
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제2권1호
    • /
    • pp.11-19
    • /
    • 2013
  • A video-zoom driven audio-zoom algorithm is proposed to provide audio zooming effects according to the degree of video-zoom. The proposed algorithm is designed based on a super-directive beamformer operating with a 4-channel microphone array in conjunction with a soft masking process that uses the phase differences between microphones. The audio-zoom processed signal is obtained by multiplying the audio gain derived from the video-zoom level by the masked signal. The proposed algorithm is then implemented on a portable digital imaging device with a clock speed of 600 MHz after different levels of optimization, such as algorithmic level, C-code and memory optimization. As a result, the processing time of the proposed audio-zoom algorithm occupies 14.6% or less of the clock speed of the device. The performance evaluation conducted in a semi-anechoic chamber shows that the signals from the front direction can be amplified by approximately 10 dB compared to the other directions.

  • PDF

랜드마크 시퀀스를 기반으로 한 개별 오디오 구동 화자 생성 (Individual Audio-Driven Talking Head Generation based on Sequence of Landmark)

  • ;;양형정;신지은;김승원;김수형
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2024년도 추계학술발표대회
    • /
    • pp.553-556
    • /
    • 2024
  • Talking Head Generation is a highly practical task that is closely tied to current technology and has a wide range of applications in everyday life. This technology will be of great help in the fields of photography, online conversation as well as in education and medicine. In this paper, the authors proposed a novel approach for Individual Audio-Driven Talking Head Generation by leveraging a sequence of landmarks and employing a diffusion model for image reconstruction. Building upon previous landmark-based methods and advancements in generative models, the authors introduce an optimized noise addition technique designed to enhance the model's ability to learn temporal information from input data. The proposed method outperforms recent approaches in metrics such as Landmark Distance (LD) and Structural Similarity Index Measure (SSIM), demonstrating the effectiveness of the diffusion model in this domain. However, there are still challenges in optimization. The paper conducts ablation studies to identify these issues and outlines directions for future development.

다층 진동판으로 구동되는 평판 스피커 (A Flat Loudsspeaker driven by Multi-layer diaphragm)

  • 이한량;김병남;오세진
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 2004년도 춘계학술발표대회 논문집 제23권 1호
    • /
    • pp.131-136
    • /
    • 2004
  • 스피커는 진동판의 크기와 형태 외에도 소재 또는 내부구조에 따라 구동 특성이 달라진다. 본 연구에서는 동일한 신호가 입력되는 두 진동판 사이에 공동을 형성시킨 경우, 다공성 흡음재를 삽입한 경우 스피커의 특성 변화를 관측하였다. 특히, 다공성 흡음재를 삽입한 경우, 다공성 물질의 체적을 동일하게 하고 진동판 표면 상태에 따른 영향과 진동판 상의 진동 전달 경로에 의한 영향을 제거하여 내부물질에 따른 특성의 변화를 측정하였다.

  • PDF

Development of a Real-time Vehicle Driving Simulator

  • Kim, Hyun-Ju;Park, Min-Kyu;Lee, Min-Cheoul;You, Wan-Suk
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2001년도 ICCAS
    • /
    • pp.51.2-51
    • /
    • 2001
  • A vehicle driving simulator is a virtual reality device which makes a human being feel as if the one drives a vehicle actually. The driving simulator is effectively used for studying interaction of a driver-vehicle and developing the vehicle system of new concepts. The driving simulator consists of a motion platform, a motion controller, a visual and audio system, a vehicle dynamic analysis system, a vehicle operation system and etc. The vehicle dynamic analysis system supervises overall operation of the simulator and also simulates dynamic motion of a multi-body vehicle model in real-time. In this paper, the main procedures to develop the driving simulator are classified by 4 parts. First, a vehicle motion platform and a motion controller, which generates realistic motion using a six degree of freedom Stewart platform driven hydraulically. Secondly, a visual system generates high fidelity visual scenes which are displayed on a screen ...

  • PDF

인공 지능 기술을 이용한 음성 인식 기술에 대한 고찰 (A Study on Speech Recognition Technology Using Artificial Intelligence Technology )

  • 이영조;이기승;강성진
    • 반도체디스플레이기술학회지
    • /
    • 제23권3호
    • /
    • pp.140-147
    • /
    • 2024
  • This paper explores the recent advancements in speech recognition technology, focusing on the integration of artificial intelligence to improve recognition accuracy in challenging environments, such as noisy or low-quality audio conditions. Traditional speech recognition methods often suffer from performance degradation in noisy settings. However, the application of deep neural networks (DNN) has led to significant improvements, enabling more robust and reliable recognition in various industries, including banking, automotive, healthcare, and manufacturing. A key area of advancement is the use of Silent Speech Interfaces (SSI), which allow communication through non-speech signals, such as visual cues or other auxiliary signals like ultrasound and electromyography, making them particularly useful for individuals with speech impairments. The paper further discusses the development of multi-modal speech recognition, combining both audio and visual inputs, which enhances recognition accuracy in noisy environments. Recent research into lip-reading technology and the use of deep learning architectures, such as CNN and RNN, has significantly improved speech recognition by extracting meaningful features from video signals, even in difficult lighting conditions. Additionally, the paper covers the use of self-supervised learning techniques, like AV-HuBERT, which leverage large-scale, unlabeled audiovisual datasets to improve performance. The future of speech recognition technology is likely to see further integration of AI-driven methods, making it more applicable across diverse industries and for individuals with communication challenges. The conclusion emphasizes the need for further research, especially in languages with complex morphological structures, such as Korean

  • PDF

한국어 동시조음 모델에 기반한 스피치 애니메이션 생성 (Speech Animation Synthesis based on a Korean Co-articulation Model)

  • 장민정;정선진;노준용
    • 한국컴퓨터그래픽스학회논문지
    • /
    • 제26권3호
    • /
    • pp.49-59
    • /
    • 2020
  • 본 논문에서는 규칙 기반의 동시조음 모델을 통해 한국어에 특화된 스피치 애니메이션을 생성하는 모델을 제안한다. 음성에 대응되는 입 모양 애니메이션을 생성하는 기술은 영어를 중심으로 많은 연구가 진행되어 왔으며, 자연스럽고 사실적인 모션이 필요한 영화, 애니메이션, 게임 등의 문화산업 전반에 널리 활용된다. 그러나 많은 국내 콘텐츠의 경우, 스피치 애니메이션을 생략하거나 음성과 상관없이 단순 반복 재생한 뒤 성우가 더빙하는 형태로 시각적으로 매우 부자연스러운 결과를 보여준다. 또한, 한국어에 특화된 모델이 아닌 언어 비의존적 연구는 아직 국내 콘텐츠 제작에 활용될 정도의 퀄리티를 보장하지 못한다. 따라서 본 논문은 음성과 텍스트를 입력받아 한국어의 언어학적 특성을 반영한 자연스러운 스피치 애니메이션 생성 기술을 제안하고자 한다. 한국어에서 입 모양은 대부분 모음에 의해 결정된다는 특성을 반영하여 입술과 혀를 분리한 동시조음 모델을 정의해 기존의 입술 모양에 왜곡이 발생하거나 일부 음소의 특성이 누락되는 문제를 해결하였으며, 더 나아가 운율적 요소에 따른 차이를 반영하여 보다 역동적인 스피치 애니메이션 생성이 가능하다. 제안된 모델은 유저 스터디를 통해 자연스러운 스피치 애니메이션을 생성함을 검증하였으며, 향후 국내 문화산업 발전에 크게 기여할 것으로 기대된다.

이식형 마이크로폰과 진동체를 갖는 인공중이의 이득 보상을 위한 주파수 특성 고찰 (Study on frequency response of implantable microphone and vibrating transducer for the gain compensation of implantable middle ear hearing aid)

  • 정의성;성기웅;임형규;이장우;김동욱;이정현;김명남;조진호
    • 센서학회지
    • /
    • 제19권5호
    • /
    • pp.361-368
    • /
    • 2010
  • ACROSS device, which is composed of an implantable microphone, a signal processor, and a vibrating transducer, is a fullyimplantable middle ear hearing device(F-IMEHD) for the recovery of patients with hearing loss. And since a microphone is implanted under skin and tissue at the temporal bones, the amplitude of the sound wave is attenuated by absorption and scattering. And the vibrating transducer attached to the ossicular chain caused also the different displacement from characteristic of the stapes. For the gain control of auditory signals, most of implantable hearing devices with the digital audio signal processor still apply to fitting rules of conventional hearing aid without regard to the effect of the implanted microphone and the vibrating transducer. So it should be taken into account the effect of the implantable microphone and the vibrating transducer to use the conventional audio fitting rule. The aim of this study was to measure gain characteristics caused by the implanted microphone and the vibrating transducer attached to the ossicle chains for the gain compensation of ACROSS device. Differential floating mass transducers (DFMT) of ACROSS device were clipped on four cadaver temporal bones. And after placing the DFMT on them, displacements of the ossicle chain with the DFMT operated by 1 $mA_{peak}$ current was measured using laser Doppler vibrometer. And the sensitivity of microphones under the sampled pig skin and the skin of 3 rat back were measured by stimulus of pure tones in frequency from 0.1 to 8.9 kHz. And we confirmed that the microphone implanted under skin showed poorer frequency response in the acoustic high-frequency band than it in the low- to mid- frequency band, and the resonant frequency of the stapes vibration was changed by attaching the DFMT on the incus, the displacement of the DFMT driven with 1 $mA_{rms}$ was higher by the amount of about 20 dB than that of cadaver's stapes driven by the sound presssure of 94 dB SPL in resonance frequency range.

DSP를 이용한 전류구동 스피커의 저주파 공진 보상 (Compensation of low Frequency Resonance in Current Driven Loudspeakers using DSP)

  • 박종필;은창수
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2021년도 춘계학술대회
    • /
    • pp.584-588
    • /
    • 2021
  • 음향시스템을 구성하는 스피커의 임피던스는 고정된 값으로 인식되고 있다. 그러나 스피커의 임피던스는 입력신호의 주파수 변화에 따라 계속 변화하고 그 변화량은 스피커의 공진 주파수 대역에서 매우 크다. 스피커의 음압 레벨은 스피커를 구성하는 코일에 흐르는 전류에 따라 결정되는데 스피커를 전압 구동 할 경우 변화하는 임피던스에 의해 음압 레벨의 왜곡이 발생한다. 스피커를 전류 구동 할 경우 이러한 문제는 해결되지만 저주파에서 공진의 영향으로 음압 레벨의 왜곡이 발생하는데 이는 음향시스템의 음질 저하를 가져올 수 있다. 본 논문에서는 전류구동 음향시스템의 음질 개선을 위해 DSP(Digital Signal Processing)를 이용하여 음압레벨의 왜곡을 보정하는 공진 보상회로를 제안한다. 본 논문은 스피커의 등가 모델을 이용한 음향 시스템의 전류 구동 모의실험을 통해 주파수 변화에 따른 음압 레벨 왜곡을 확인하고 이를 보정하는 회로를 제안하는 것으로 구성하였다. 제안한 회로는 상태변수필터를 이용하여 구성하였고 주파수 및 출력이 조절 가능하여 다양한 음향 시스템에 적용 가능 할 것으로 보인다.

  • PDF

병렬구조형 차량운전 모사장치의 성능평가 및 분석 (Analysis and performance evaluation of the parallel typed for a vehicle driving simulator)

  • 박일경;박경균;김정하;이운성
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 1997년도 한국자동제어학술회의논문집; 한국전력공사 서울연수원; 17-18 Oct. 1997
    • /
    • pp.1481-1484
    • /
    • 1997
  • The vehicle driving simulator expects vehicle motion with real-time simulation arise from driver's steering, accelerating, stopping and simulates motion of vehicl with visula, audio and washout algorithm. And it gives a vivid feeling to driver in reality. Vehicle driving simulator with vehicle integration control system is used for analysis of analysis of vehicle controllaility, steering capacity and safety in various pseudo environment alike. basides, it analyzeds vehicle safety factor dirver's reaction and promotes traffic safety without driver's own risks. The main proceduress of development of the vehicle driving simulator are classified by 3 parts. first the motion base system which can be generated by the motion queues, should be developed. Secondly, real-time vehicle software which can afford the vehicle dynamics, might be constructed. The third procedure is the integration of vehicle driing simulator which can be interconnected between visual systems with motion base. In this study, we are to study of the motion base for a vehicle driving simulator design and that of its real time control and using an extra gyro sensor and accelerometers to find a position and an orientatiion of the moving platform except for calculating forward kinematics. To drive the motion base, we use National Instruments corp's Labview software. Furthemore, we use analysis module for the vehicle motionand the washout algorithm module to consummate driving simulator, which can be driven by human in reality, so we are doing experimentally process about various vehicle motion conditon.

  • PDF