Search | Korea Science

A Study on RTP-based Lip Synchronization Control for Very Low Delay in Video Communication (초저지연 비디오 통신을 위한 RTP 기반 립싱크 제어 기술에 관한 연구)

Kim, Byoung-Yong;Lee, Dong-Jin;Kwon, Jae-Cheol;Sim, Dong-Gyu
- Journal of Korea Multimedia Society
- /
- v.10 no.8
- /
- pp.1039-1051
- /
- 2007
In this paper, a new lip synchronization control method is proposed to achieve very low delay in the video communication. The lip control is so much vital in video communication as delay reduction. In a general way, to control the lip synchronization, both the playtime and capture time calculated from RTP time stamp are used. RTP timestamp is created by stream sender and sent to the receiver along the stream. It is extracted from the received packet by stream receiver to calculate playtime and capture time. In this paper, we propose the method of searching most adjacent corresponding frame of the audio signal, which is assumed to be played with uniform speed. Encoding buffer of stream sender is removed to reduce the buffering delay. Besides, decoder buffer of receiver, which is used to correct the cracked packet, is resulted to process only 3 frames. These mechanisms enable us to achieve ultra low delay less than 100 ms, which is essential to video communication. Through simulations, the proposed method shows below the 100 ms delay and controlled the lip synchronization between audio and video.
PDF

Text-driven Speech Animation with Emotion Control

Chae, Wonseok;Kim, Yejin
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.8
- /
- pp.3473-3487
- /
- 2020
In this paper, we present a new approach to creating speech animation with emotional expressions using a small set of example models. To generate realistic facial animation, two example models called key visemes and expressions are used for lip-synchronization and facial expressions, respectively. The key visemes represent lip shapes of phonemes such as vowels and consonants while the key expressions represent basic emotions of a face. Our approach utilizes a text-to-speech (TTS) system to create a phonetic transcript for the speech animation. Based on a phonetic transcript, a sequence of speech animation is synthesized by interpolating the corresponding sequence of key visemes. Using an input parameter vector, the key expressions are blended by a method of scattered data interpolation. During the synthesizing process, an importance-based scheme is introduced to combine both lip-synchronization and facial expressions into one animation sequence in real time (over 120Hz). The proposed approach can be applied to diverse types of digital content and applications that use facial animation with high accuracy (over 90%) in speech recognition.
https://doi.org/10.3837/tiis.2020.08.018 인용 PDF KSCI HTML

Lip and Voice Synchronization Using Visual Attention (시각적 어텐션을 활용한 입술과 목소리의 동기화 연구)

Dongryun Yoon;Hyeonjoong Cho
- The Transactions of the Korea Information Processing Society
- /
- v.13 no.4
- /
- pp.166-173
- /
- 2024
This study explores lip-sync detection, focusing on the synchronization between lip movements and voices in videos. Typically, lip-sync detection techniques involve cropping the facial area of a given video, utilizing the lower half of the cropped box as input for the visual encoder to extract visual features. To enhance the emphasis on the articulatory region of lips for more accurate lip-sync detection, we propose utilizing a pre-trained visual attention-based encoder. The Visual Transformer Pooling (VTP) module is employed as the visual encoder, originally designed for the lip-reading task, predicting the script based solely on visual information without audio. Our experimental results demonstrate that, despite having fewer learning parameters, our proposed method outperforms the latest model, VocaList, on the LRS2 dataset, achieving a lip-sync detection accuracy of 94.5% based on five context frames. Moreover, our approach exhibits an approximately 8% superiority over VocaList in lip-sync detection accuracy, even on an untrained dataset, Acappella.
https://doi.org/10.3745/TKIPS.2024.13.4.166 인용 PDF

Synchronizationof Synthetic Facial Image Sequences and Synthetic Speech for Virtual Reality (가상현실을 위한 합성얼굴 동영상과 합성음성의 동기구현)

최장석;이기영
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.35S no.7
- /
- pp.95-102
- /
- 1998
This paper proposes a synchronization method of synthetic facial iamge sequences and synthetic speech. The LP-PSOLA synthesizes the speech for each demi-syllable. We provide the 3,040 demi-syllables for unlimited synthesis of the Korean speech. For synthesis of the Facial image sequences, the paper defines the total 11 fundermental patterns for the lip shapes of the Korean consonants and vowels. The fundermental lip shapes allow us to pronounce all Korean sentences. Image synthesis method assigns the fundermental lip shapes to the key frames according to the initial, the middle and the final sound of each syllable in korean input text. The method interpolates the naturally changing lip shapes in inbetween frames. The number of the inbetween frames is estimated from the duration time of each syllable of the synthetic speech. The estimation accomplishes synchronization of the facial image sequences and speech. In speech synthesis, disk memory is required to store 3,040 demi-syllable. In synthesis of the facial image sequences, however, the disk memory is required to store only one image, because all frames are synthesized from the neutral face. Above method realizes synchronization of system which can real the Korean sentences with the synthetic speech and the synthetic facial iage sequences.
PDF

Wireless Network Synchronization Algorithm based on IEEE 802.11 WLANs (Wireless Local Area Networks) for Multimedia Services (멀티미디어 서비스를 위한 IEEE 802.11 WLANs 기반의 무선 네트워크 동기화 알고리즘)

Yoon, Jong-Won;Joung, Jin-Oo
- Journal of the Korea Society of Computer and Information
- /
- v.13 no.6
- /
- pp.225-232
- /
- 2008
When a single source of multimedia contents is distributed to multiple reproduction devices, the audio and video contents require synchronous play for multi-channel stereo sound and lip-synchronization. The multimedia system in vehicle, especially, has researched to move to wireless environments from legacy wired environments. This paper proposes the advanced algorithm for providing synchronized services of real-time multimedia traffic in IEEE 802.11 WLANs [1]. For these, we implement the advanced IEEE 1588 Precision Time Protocol [2] and the environments for simulation. Also, we estimate and analysis performance of the algorithm, then we experiment and analysis after the porting of algorithm in wireless LAN devices (Linksys wrt-350n AP network device) to characterize timing synchronization accuracy.
PDF

DTV Lip-Sync Test Using Embedded Audio-Video Time Indexed Signals (숨겨진 오디오 비디오 시간 인덱스 신호를 사용한 DTV 립싱크 테스트)

한찬호;송규익
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.41 no.3
- /
- pp.155-162
- /
- 2004
This paper concentrated on lip synchronization (lip sync) test for DTV with respect to audio and video signals using a finite digital bitstream In this paper, we propose a new lip sync test method which does not effect on the current program by use of the transient effect area test signals (TATS) and audio-video time index lip sync test signals (TILS).the experimental result shows that the time difference between audio and video signal can be easily measured by captured oscilloscope waveform at any time.
PDF KSCI

A Study on the Implementation of Realtime Phonetic Recognition and LIP-synchronization (실시간 음성인식 및 립싱크 구현에 관한 연구)

Lee, H.H.;Choi, D.I.;Cho, W.Y.
- Proceedings of the KIEE Conference
- /
- 2000.11d
- /
- pp.812-814
- /
- 2000
본 논문에서는 실시간 음성 인식에 의한 립싱크(Lip-synchronization) 애니메이션 제공 방법에 관한 것으로서, 소정의 음성정보를 인식하여 이 음성 정보에 부합되도록 애니메이션의 입모양을 변화시켜 음성정보를 시각적으로 전달하도록 하는 립싱크 방법에 대한 연구이다. 인간의 실제 발음 모습에 보다 유사한 립싱크와 생동감 있는 캐릭터의 얼굴 형태를 실시간으로 표현할 수 있도록 마이크 등의 입력을 받고 신경망을 이용하여 실시간으로 음성을 인식하고 인식된 결과에 따라 2차원 애니메이션을 모핑 하도록 모델을 상고 있다.
PDF

Multicontents Integrated Image Animation within Synthesis for Hiqh Quality Multimodal Video (고화질 멀티 모달 영상 합성을 통한 다중 콘텐츠 통합 애니메이션 방법)

Jae Seung Roh;Jinbeom Kang
- Journal of Intelligence and Information Systems
- /
- v.29 no.4
- /
- pp.257-269
- /
- 2023
There is currently a burgeoning demand for image synthesis from photos and videos using deep learning models. Existing video synthesis models solely extract motion information from the provided video to generate animation effects on photos. However, these synthesis models encounter challenges in achieving accurate lip synchronization with the audio and maintaining the image quality of the synthesized output. To tackle these issues, this paper introduces a novel framework based on an image animation approach. Within this framework, upon receiving a photo, a video, and audio input, it produces an output that not only retains the unique characteristics of the individuals in the photo but also synchronizes their movements with the provided video, achieving lip synchronization with the audio. Furthermore, a super-resolution model is employed to enhance the quality and resolution of the synthesized output.
https://doi.org/10.13088/jiis.2023.29.4.257 인용 PDF

Human-like Fuzzy Lip Synchronization of 3D Facial Model Based on Speech Speed (발화속도를 고려한 3차원 얼굴 모형의 퍼지 모델 기반 립싱크 구현)

Park Jong-Ryul;Choi Cheol-Wan;Park Min-Yong
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2006.05a
- /
- pp.416-419
- /
- 2006
본 논문에서는 음성 속도를 고려한 새로운 립싱크 방법에 대해서 제안한다. 실험을 통해 구축한 데이터베이스로부터 음성속도와 입모양 및 크기와의 관계를 퍼지 알고리즘을 이용하여 정립하였다. 기존 립싱크 방법은 음성 속도를 고려하지 않기 때문에 말의 속도와 상관없이 일정한 입술의 모양과 크기를 보여준다. 본 논문에서 제안한 방법은 음성 속도와 입술 모양의 관계를 적용하여 보다 인간에 근접한 립싱크의 구현이 가능하다. 또한 퍼지 이론을 사용함으로써 수치적으로 정확하게 표현할 수 없는 애매한 입 크기와 모양의 변화를 모델링 할 수 있다. 이를 증명하기 위해 제안된 립싱크 알고리즘과 기존의 방법을 비교하고 3차원 그래픽 플랫폼을 제작하여 실제 응용 프로그램에 적용한다.
PDF

Lip and Voice Synchronization with SMS Messages for Mobile 3D Avatar (SMS 메시지에 따른 모바일 3D 아바타의 입술 모양과 음성 동기화)

Youn, Jae-Hong;Song, Yong-Gyu;Kim, Eun-Seok;Hur, Gi-Taek
- Proceedings of the Korea Contents Association Conference
- /
- 2006.11a
- /
- pp.682-686
- /
- 2006
There have been increasing interests in 3D mobile content service with emergence of a terminal equipping with a mobile 3D engine and growth of mobile content market. Mobile 3D Avatar is the most effective product displaying the character of a personalized mobile device user. However, previous studies on the method of expressing 3D Avatar have been mainly focused on natural and realistic expressions according to the change in facial expressions and lip shape of a character in PC based virtual environments. In this paper, we propose a method of synchronizing the lip shape with voice by applying a SMS message received in mobile environments to 3D mobile Avatar. The proposed method enables to realize a natural and effective SMS message reading service of mobile Avatar by disassembling a received message sentence into units of a syllable and then synchronizing the lip shape of 3D Avatar with the corresponding voice.
PDF

Search Result 15, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)