Search | Korea Science

Korean Lip-Reading: Data Construction and Sentence-Level Lip-Reading (한국어 립리딩: 데이터 구축 및 문장수준 립리딩)

Sunyoung Cho;Soosung Yoon
- Journal of the Korea Institute of Military Science and Technology
- /
- v.27 no.2
- /
- pp.167-176
- /
- 2024
Lip-reading is the task of inferring the speaker's utterance from silent video based on learning of lip movements. It is very challenging due to the inherent ambiguities present in the lip movement such as different characters that produce the same lip appearances. Recent advances in deep learning models such as Transformer and Temporal Convolutional Network have led to improve the performance of lip-reading. However, most previous works deal with English lip-reading which has limitations in directly applying to Korean lip-reading, and moreover, there is no a large scale Korean lip-reading dataset. In this paper, we introduce the first large-scale Korean lip-reading dataset with more than 120 k utterances collected from TV broadcasts containing news, documentary and drama. We also present a preprocessing method which uniformly extracts a facial region of interest and propose a transformer-based model based on grapheme unit for sentence-level Korean lip-reading. We demonstrate that our dataset and model are appropriate for Korean lip-reading through statistics of the dataset and experimental results.
https://doi.org/10.9766/KIMST.2024.27.2.167 인용 PDF

Robustness of Bimodal Speech Recognition on Degradation of Lip Parameter Estimation Performance (음성인식에서 입술 파라미터 열화에 따른 견인성 연구)

Kim Jinyoung;Shin Dosung;Choi Seungho
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.205-208
- /
- 2002
Bimodal speech recognition based on lip reading has been studied as a representative method of speech recognition under noisy environments. There are three integration methods of speech and lip modalities as like direct identification, separate identification and dominant recording. In this paper we evaluate the robustness of lip reading methods under the assumption that lip parameters are estimated with errors. We show that the dominant recording approach is more robust than other methods with lip reading experiments. Also, a measure of lip parameter degradation is proposed. This measure can be used in the determination of weighting values of video information.
PDF

A Study on Lip-reading enhancement using RATSTA fileter (RASTA 필터를 이용한 립리딩 성능향상에 관한 연구)

Shin Dosung;Kim Jinyoung;Choi Seungho;Kim Sanghun
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.191-194
- /
- 2002
Lip-reading technology that is studied them is used to compensate speech recognition degradation in noise environment in bi-modal's form. The most important thing is that search for correct lips area in this lip-reading. But, it is hard to forecast stable performance in dynamic environment. Used RASTA filter that show good performance to remove noise in the speech to compensate. This filter shows that improve performance of using time domain of digital filter. To this experiment observes performance of speech recognition only using image information, service chooses possible 22 words and did recognition experiment in car. We used hidden Markov model by speech recognition algorithm to compare this words' recognition performance.
PDF

An Experimental Multimodal Command Control Interface toy Car Navigation Systems

Kim, Kyungnam;Ko, Jong-Gook;SeungHo choi;Kim, Jin-Young;Kim, Ki-Jung
- Proceedings of the IEEK Conference
- /
- 2000.07a
- /
- pp.249-252
- /
- 2000
An experimental multimodal system combining natural input modes such as speech, lip movement, and gaze is proposed in this paper. It benefits from novel human-compute. interaction (HCI) modalities and from multimodal integration for tackling the problem of the HCI bottleneck. This system allows the user to select menu items on the screen by employing speech recognition, lip reading, and gaze tracking components in parallel. Face tracking is a supplementary component to gaze tracking and lip movement analysis. These key components are reviewed and preliminary results are shown with multimodal integration and user testing on the prototype system. It is noteworthy that the system equipped with gaze tracking and lip reading is very effective in noisy environment, where the speech recognition rate is low, moreover, not stable. Our long term interest is to build a user interface embedded in a commercial car navigation system (CNS).
PDF

Korean Lip Reading System Using MobileNet (MobileNet을 이용한 한국어 입모양 인식 시스템)

Won-Jong Lee;Joo-Ah Kim;Seo-Won Son;Dong Ho Kim
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2022.11a
- /
- pp.211-213
- /
- 2022
Lip Reading(독순술(讀脣術)) 이란 입술의 움직임을 보고 상대방이 무슨 말을 하는지 알아내는 기술이다. 본 논문에서는 MBC, SBS 뉴스 클로징 영상에서 쓰이는 문장 10개를 데이터로 사용하고 CNN(Convolutional Neural Network) 아키텍처 중 모바일 기기에서 동작을 목표로 한 MobileNet을 모델로 이용하여 발화자의 입모양을 통해 문장 인식 연구를 진행한 결과를 제시한다. 본 연구는 MobileNet과 LSTM을 활용하여 한국어 입모양을 인식하는데 목적이 있다. 본 연구에서는 뉴스 클로징 영상을 프레임 단위로 잘라 실험 문장 10개를 수집하여 데이터셋(Dataset)을 만들고 발화한 입력 영상으로부터 입술 인식과 검출을 한 후, 전처리 과정을 수행한다. 이후 MobileNet과 LSTM을 이용하여 뉴스 클로징 문장을 발화하는 입모양을 학습 시킨 후 정확도를 알아보는 실험을 진행하였다.
PDF

Isolation and Expression Analysis of a GDSL-like Lipase Gene from Brassica napus L.

Ling, Hua;Zhao, Jingya;Zuo, Kaijing;Qiu, Chengxiang;Yao, Hongyan;Qin, Jie;Sun, Xiaofen;Tang, Kexuan
- BMB Reports
- /
- v.39 no.3
- /
- pp.297-303
- /
- 2006
As lipolytic enzymes, GDSL lipases play an important role in plant growth and development. In order to identify their functions and roles, the full-length cDNA of a GDSL lipase gene, designated BnLIP2, was isolated from Brassica napus L. BnLIP2 was 1,300 bp long, with 1,122 bp open reading frame (ORF) encoding 373 amino acid residues. Sequence analysis indicated that BnLIP2 belonged to GDSL family. Southern blot analysis indicated that BnLIP2 belonged to a small gene family in rapeseed genome. RT-PCR analysis revealed that BnLIP2 was a tissue-specific expressing gene during reproductive growth and strongly expressed during seed germination. BnLIP2 expression could not be detected until three days after germination, and it subsequently became stronger. The transcript of this gene was deficient in root of seedlings growing at different stages. When juvenile seedlings were treated by methyl jasmonate (MeJ), salicylic acid (SA) and naphthalene acetic acid (NAA), BnLIP2 expression could not be induced in root. Our study implicates that BnLIP2 probably plays an important role in rapeseed germination, morphogenesis, flowering, but independent of root growth and development.
https://doi.org/10.5483/BMBRep.2006.39.3.297 인용 PDF

A Study on Speech Recognition Technology Using Artificial Intelligence Technology (인공 지능 기술을 이용한 음성 인식 기술에 대한 고찰)

Young Jo Lee;Ki Seung Lee;Sung Jin Kang
- Journal of the Semiconductor & Display Technology
- /
- v.23 no.3
- /
- pp.140-147
- /
- 2024
This paper explores the recent advancements in speech recognition technology, focusing on the integration of artificial intelligence to improve recognition accuracy in challenging environments, such as noisy or low-quality audio conditions. Traditional speech recognition methods often suffer from performance degradation in noisy settings. However, the application of deep neural networks (DNN) has led to significant improvements, enabling more robust and reliable recognition in various industries, including banking, automotive, healthcare, and manufacturing. A key area of advancement is the use of Silent Speech Interfaces (SSI), which allow communication through non-speech signals, such as visual cues or other auxiliary signals like ultrasound and electromyography, making them particularly useful for individuals with speech impairments. The paper further discusses the development of multi-modal speech recognition, combining both audio and visual inputs, which enhances recognition accuracy in noisy environments. Recent research into lip-reading technology and the use of deep learning architectures, such as CNN and RNN, has significantly improved speech recognition by extracting meaningful features from video signals, even in difficult lighting conditions. Additionally, the paper covers the use of self-supervised learning techniques, like AV-HuBERT, which leverage large-scale, unlabeled audiovisual datasets to improve performance. The future of speech recognition technology is likely to see further integration of AI-driven methods, making it more applicable across diverse industries and for individuals with communication challenges. The conclusion emphasizes the need for further research, especially in languages with complex morphological structures, such as Korean
PDF

Cloning, Expression, and Characterization of a Cold-Adapted Lipase Gene from an Antarctic Deep-Sea Psychrotrophic Bacterium, Psychrobacter sp. 7195

Zhang, Jinwei;Lin, Shu;Zeng, Runying
- Journal of Microbiology and Biotechnology
- /
- v.17 no.4
- /
- pp.604-610
- /
- 2007
A psychrotrophic strain 7195 showing extracellular lipolytic activity towards tributyrin was isolated from deep-sea sediment of Prydz Bay and identified as a Psychrobacter species. By screening a genomic DNA library of Psychrobacter sp. 7195, an open reading frame of 954 bp coding for a lipase gene, lipA1, was identified, cloned, and sequenced. The deduced LipA1 consisted of 317 amino acids with a molecular mass of 35,210 kDa. It had one consensus motif, G-N-S-M-G (GXSXG), containing the putative active-site serine, which was conserved in other cold-adapted lipolytic enzymes. The recombinant LipA1 was purified by column chromatography with DEAE Sepharose CL-4B, and Sephadex G-75, and preparative polyacrylamide gel electrophoresis, in sequence. The purified enzyme showed highest activity at $30^{\circ}C$, and was unstable at temperatures higher than $30^{\circ}C$, indicating that it was a typical cold-adapted enzyme. The optimal pH for activity was 9.0, and the enzyme was stable between pH 7.0-10.0 after 24h incubation at $4^{\circ}C$. The addition of $Ca^{2+}\;and\;Mg^{2+}$ enhanced the enzyme activity of LipA1, whereas the $Cd^{2+},\;Zn^{2+},\;CO^{2+},\;Fe^{3+},\;Hg^{2+},\;Fe^{2+},\;Rb^{2+}$, and EDTA strongly inhibited the activity. The LipA1 was activated by various detergents, such as Triton X-100, Tween 80, Tween 40, Span 60, Span 40, CHAPS, and SDS, and showed better resistance towards them. Substrate specificity analysis showed that there was a preference for trimyristin and p-nitrophenyl myristate $(C_{14}\;acyl\; groups)$.
PDF KSCI

Monosyllable Speech Recognition through Facial Movement Analysis (안면 움직임 분석을 통한 단음절 음성인식)

Kang, Dong-Won;Seo, Jeong-Woo;Choi, Jin-Seung;Choi, Jae-Bong;Tack, Gye-Rae
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.63 no.6
- /
- pp.813-819
- /
- 2014
The purpose of this study was to extract accurate parameters of facial movement features using 3-D motion capture system in speech recognition technology through lip-reading. Instead of using the features obtained through traditional camera image, the 3-D motion system was used to obtain quantitative data for actual facial movements, and to analyze 11 variables that exhibit particular patterns such as nose, lip, jaw and cheek movements in monosyllable vocalizations. Fourteen subjects, all in 20s of age, were asked to vocalize 11 types of Korean vowel monosyllables for three times with 36 reflective markers on their faces. The obtained facial movement data were then calculated into 11 parameters and presented as patterns for each monosyllable vocalization. The parameter patterns were performed through learning and recognizing process for each monosyllable with speech recognition algorithms with Hidden Markov Model (HMM) and Viterbi algorithm. The accuracy rate of 11 monosyllables recognition was 97.2%, which suggests the possibility of voice recognition of Korean language through quantitative facial movement analysis.
https://doi.org/10.5370/KIEE.2014.63.6.813 인용 PDF KSCI KPUBS HTML

Subword-based Lip Reading Using State-tied HMM (상태공유 HMM을 이용한 서브워드 단위 기반 립리딩)

Kim, Jin-Young;Shin, Do-Sung
- Speech Sciences
- /
- v.8 no.3
- /
- pp.123-132
- /
- 2001
In recent years research on HCI technology has been very active and speech recognition is being used as its typical method. Its recognition, however, is deteriorated with the increase of surrounding noise. To solve this problem, studies concerning the multimodal HCI are being briskly made. This paper describes automated lipreading for bimodal speech recognition on the basis of image- and speech information. It employs audio-visual DB containing 1,074 words from 70 voice and tri-viseme as a recognition unit, and state tied HMM as a recognition model. Performance of automated recognition of 22 to 1,000 words are evaluated to achieve word recognition of 60.5% in terms of 22word recognizer.
PDF

Search Result 12, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)