• Title/Summary/Keyword: Audio-driven

Search Result 20, Processing Time 0.028 seconds

Emotional-Controllable Talking Face Generation on Real-Time System

  • Van-Thien Phan;Hyung-Jeong Yang;Seung-Won Kim;Ji-Eun Shin;Soo-Hyung Kim
    • Annual Conference of KIPS
    • /
    • 2024.10a
    • /
    • pp.523-526
    • /
    • 2024
  • Recent progress in audio-driven talking face generation has focused on achieving more realistic and emotionally expressive lip movements, enhancing the quality of virtual avatars and animated characters for applications in entertainment, education, healthcare, and more. Despite these advances, challenges remain in creating natural and emotionally nuanced lip synchronization efficiently and accurately. To address these issues, we introduce a novel method for audio-driven lip-sync that offers precise control over emotional expressions, outperforming current techniques. Our method utilizes Conditional Deep Variational Autoencoder to produce lifelike lip movements that align seamlessly with audio inputs while dynamically adjusting for various emotional states. Experimental results highlight the advantages of our approach, showing significant improvements in emotional accuracy and the overall quality of the generated facial animations, video sequences on the Crema-D dataset [1].

A Beamforming-Based Video-Zoom Driven Audio-Zoom Algorithm for Portable Digital Imaging Devices

  • Park, Nam In;Kim, Seon Man;Kim, Hong Kook;Kim, Myeong Bo;Kim, Sang Ryong
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.2 no.1
    • /
    • pp.11-19
    • /
    • 2013
  • A video-zoom driven audio-zoom algorithm is proposed to provide audio zooming effects according to the degree of video-zoom. The proposed algorithm is designed based on a super-directive beamformer operating with a 4-channel microphone array in conjunction with a soft masking process that uses the phase differences between microphones. The audio-zoom processed signal is obtained by multiplying the audio gain derived from the video-zoom level by the masked signal. The proposed algorithm is then implemented on a portable digital imaging device with a clock speed of 600 MHz after different levels of optimization, such as algorithmic level, C-code and memory optimization. As a result, the processing time of the proposed audio-zoom algorithm occupies 14.6% or less of the clock speed of the device. The performance evaluation conducted in a semi-anechoic chamber shows that the signals from the front direction can be amplified by approximately 10 dB compared to the other directions.

  • PDF

Individual Audio-Driven Talking Head Generation based on Sequence of Landmark (랜드마크 시퀀스를 기반으로 한 개별 오디오 구동 화자 생성)

  • Son Thanh-Hoang Vo;Quang-Vinh Nguyen;Hyung-Jeong Yang;Jieun Shin;Seungwon Kim;Soo-Huyng Kim
    • Annual Conference of KIPS
    • /
    • 2024.10a
    • /
    • pp.553-556
    • /
    • 2024
  • Talking Head Generation is a highly practical task that is closely tied to current technology and has a wide range of applications in everyday life. This technology will be of great help in the fields of photography, online conversation as well as in education and medicine. In this paper, the authors proposed a novel approach for Individual Audio-Driven Talking Head Generation by leveraging a sequence of landmarks and employing a diffusion model for image reconstruction. Building upon previous landmark-based methods and advancements in generative models, the authors introduce an optimized noise addition technique designed to enhance the model's ability to learn temporal information from input data. The proposed method outperforms recent approaches in metrics such as Landmark Distance (LD) and Structural Similarity Index Measure (SSIM), demonstrating the effectiveness of the diffusion model in this domain. However, there are still challenges in optimization. The paper conducts ablation studies to identify these issues and outlines directions for future development.

A Flat Loudsspeaker driven by Multi-layer diaphragm (다층 진동판으로 구동되는 평판 스피커)

  • Yi H.R.;Kim B.N.;Oh S.J.
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.131-136
    • /
    • 2004
  • 스피커는 진동판의 크기와 형태 외에도 소재 또는 내부구조에 따라 구동 특성이 달라진다. 본 연구에서는 동일한 신호가 입력되는 두 진동판 사이에 공동을 형성시킨 경우, 다공성 흡음재를 삽입한 경우 스피커의 특성 변화를 관측하였다. 특히, 다공성 흡음재를 삽입한 경우, 다공성 물질의 체적을 동일하게 하고 진동판 표면 상태에 따른 영향과 진동판 상의 진동 전달 경로에 의한 영향을 제거하여 내부물질에 따른 특성의 변화를 측정하였다.

  • PDF

Development of a Real-time Vehicle Driving Simulator

  • Kim, Hyun-Ju;Park, Min-Kyu;Lee, Min-Cheoul;You, Wan-Suk
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.51.2-51
    • /
    • 2001
  • A vehicle driving simulator is a virtual reality device which makes a human being feel as if the one drives a vehicle actually. The driving simulator is effectively used for studying interaction of a driver-vehicle and developing the vehicle system of new concepts. The driving simulator consists of a motion platform, a motion controller, a visual and audio system, a vehicle dynamic analysis system, a vehicle operation system and etc. The vehicle dynamic analysis system supervises overall operation of the simulator and also simulates dynamic motion of a multi-body vehicle model in real-time. In this paper, the main procedures to develop the driving simulator are classified by 4 parts. First, a vehicle motion platform and a motion controller, which generates realistic motion using a six degree of freedom Stewart platform driven hydraulically. Secondly, a visual system generates high fidelity visual scenes which are displayed on a screen ...

  • PDF

A Study on Speech Recognition Technology Using Artificial Intelligence Technology (인공 지능 기술을 이용한 음성 인식 기술에 대한 고찰)

  • Young Jo Lee;Ki Seung Lee;Sung Jin Kang
    • Journal of the Semiconductor & Display Technology
    • /
    • v.23 no.3
    • /
    • pp.140-147
    • /
    • 2024
  • This paper explores the recent advancements in speech recognition technology, focusing on the integration of artificial intelligence to improve recognition accuracy in challenging environments, such as noisy or low-quality audio conditions. Traditional speech recognition methods often suffer from performance degradation in noisy settings. However, the application of deep neural networks (DNN) has led to significant improvements, enabling more robust and reliable recognition in various industries, including banking, automotive, healthcare, and manufacturing. A key area of advancement is the use of Silent Speech Interfaces (SSI), which allow communication through non-speech signals, such as visual cues or other auxiliary signals like ultrasound and electromyography, making them particularly useful for individuals with speech impairments. The paper further discusses the development of multi-modal speech recognition, combining both audio and visual inputs, which enhances recognition accuracy in noisy environments. Recent research into lip-reading technology and the use of deep learning architectures, such as CNN and RNN, has significantly improved speech recognition by extracting meaningful features from video signals, even in difficult lighting conditions. Additionally, the paper covers the use of self-supervised learning techniques, like AV-HuBERT, which leverage large-scale, unlabeled audiovisual datasets to improve performance. The future of speech recognition technology is likely to see further integration of AI-driven methods, making it more applicable across diverse industries and for individuals with communication challenges. The conclusion emphasizes the need for further research, especially in languages with complex morphological structures, such as Korean

  • PDF

Speech Animation Synthesis based on a Korean Co-articulation Model (한국어 동시조음 모델에 기반한 스피치 애니메이션 생성)

  • Jang, Minjung;Jung, Sunjin;Noh, Junyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.3
    • /
    • pp.49-59
    • /
    • 2020
  • In this paper, we propose a speech animation synthesis specialized in Korean through a rule-based co-articulation model. Speech animation has been widely used in the cultural industry, such as movies, animations, and games that require natural and realistic motion. Because the technique for audio driven speech animation has been mainly developed for English, however, the animation results for domestic content are often visually very unnatural. For example, dubbing of a voice actor is played with no mouth motion at all or with an unsynchronized looping of simple mouth shapes at best. Although there are language-independent speech animation models, which are not specialized in Korean, they are yet to ensure the quality to be utilized in a domestic content production. Therefore, we propose a natural speech animation synthesis method that reflects the linguistic characteristics of Korean driven by an input audio and text. Reflecting the features that vowels mostly determine the mouth shape in Korean, a coarticulation model separating lips and the tongue has been defined to solve the previous problem of lip distortion and occasional missing of some phoneme characteristics. Our model also reflects the differences in prosodic features for improved dynamics in speech animation. Through user studies, we verify that the proposed model can synthesize natural speech animation.

Study on frequency response of implantable microphone and vibrating transducer for the gain compensation of implantable middle ear hearing aid (이식형 마이크로폰과 진동체를 갖는 인공중이의 이득 보상을 위한 주파수 특성 고찰)

  • Jung, Eui-Sung;Seong, Ki-Woong;Lim, Hyung-Gyu;Lee, Jang-Woo;Kim, Dong-Wook;Lee, Jyung-Hyun;Kim, Myoung-Nam;Cho, Jin-Ho
    • Journal of Sensor Science and Technology
    • /
    • v.19 no.5
    • /
    • pp.361-368
    • /
    • 2010
  • ACROSS device, which is composed of an implantable microphone, a signal processor, and a vibrating transducer, is a fullyimplantable middle ear hearing device(F-IMEHD) for the recovery of patients with hearing loss. And since a microphone is implanted under skin and tissue at the temporal bones, the amplitude of the sound wave is attenuated by absorption and scattering. And the vibrating transducer attached to the ossicular chain caused also the different displacement from characteristic of the stapes. For the gain control of auditory signals, most of implantable hearing devices with the digital audio signal processor still apply to fitting rules of conventional hearing aid without regard to the effect of the implanted microphone and the vibrating transducer. So it should be taken into account the effect of the implantable microphone and the vibrating transducer to use the conventional audio fitting rule. The aim of this study was to measure gain characteristics caused by the implanted microphone and the vibrating transducer attached to the ossicle chains for the gain compensation of ACROSS device. Differential floating mass transducers (DFMT) of ACROSS device were clipped on four cadaver temporal bones. And after placing the DFMT on them, displacements of the ossicle chain with the DFMT operated by 1 $mA_{peak}$ current was measured using laser Doppler vibrometer. And the sensitivity of microphones under the sampled pig skin and the skin of 3 rat back were measured by stimulus of pure tones in frequency from 0.1 to 8.9 kHz. And we confirmed that the microphone implanted under skin showed poorer frequency response in the acoustic high-frequency band than it in the low- to mid- frequency band, and the resonant frequency of the stapes vibration was changed by attaching the DFMT on the incus, the displacement of the DFMT driven with 1 $mA_{rms}$ was higher by the amount of about 20 dB than that of cadaver's stapes driven by the sound presssure of 94 dB SPL in resonance frequency range.

Compensation of low Frequency Resonance in Current Driven Loudspeakers using DSP (DSP를 이용한 전류구동 스피커의 저주파 공진 보상)

  • Park, Jong-phil;Eun, Changsoo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.584-588
    • /
    • 2021
  • The impedance of the speaker is likely to be recognized as a fixed value. However, speaker impedance continues to vary with frequency variation, especially larger in resonant frequency region. The sound pressure level of loudspeakers is determined by the current flowing throughout the coil that consists loudspeakers. If loudspeakers are driven by voltage, sound pressure level of the loudspeaker is distorted by the variation of loudspeaker impedance. Current-drive of loudspeakers can solve this problem, but distortion of sound pressure level occurs in low frequencies due to resonance. The distortion can degrade the sound quality of the sound system. So to solve this problem, In this paper, we propose a resonance compensation circuit using DSP. we simulates audio systems using an equivalent model of loudspeakers to verify distortion of sound pressure level due to impedance variation and propose a circuit to compensate it. The proposed circuit is configured using a state variable filter and it can adjust the center frequency and output, so it will be used various sound systems.

  • PDF

Analysis and performance evaluation of the parallel typed for a vehicle driving simulator (병렬구조형 차량운전 모사장치의 성능평가 및 분석)

  • 박일경;박경균;김정하;이운성
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1997.10a
    • /
    • pp.1481-1484
    • /
    • 1997
  • The vehicle driving simulator expects vehicle motion with real-time simulation arise from driver's steering, accelerating, stopping and simulates motion of vehicl with visula, audio and washout algorithm. And it gives a vivid feeling to driver in reality. Vehicle driving simulator with vehicle integration control system is used for analysis of analysis of vehicle controllaility, steering capacity and safety in various pseudo environment alike. basides, it analyzeds vehicle safety factor dirver's reaction and promotes traffic safety without driver's own risks. The main proceduress of development of the vehicle driving simulator are classified by 3 parts. first the motion base system which can be generated by the motion queues, should be developed. Secondly, real-time vehicle software which can afford the vehicle dynamics, might be constructed. The third procedure is the integration of vehicle driing simulator which can be interconnected between visual systems with motion base. In this study, we are to study of the motion base for a vehicle driving simulator design and that of its real time control and using an extra gyro sensor and accelerometers to find a position and an orientatiion of the moving platform except for calculating forward kinematics. To drive the motion base, we use National Instruments corp's Labview software. Furthemore, we use analysis module for the vehicle motionand the washout algorithm module to consummate driving simulator, which can be driven by human in reality, so we are doing experimentally process about various vehicle motion conditon.

  • PDF