Search | Korea Science

Multimodal audiovisual speech recognition architecture using a three-feature multi-fusion method for noise-robust systems

Sanghun Jeon;Jieun Lee;Dohyeon Yeo;Yong-Ju Lee;SeungJun Kim
- ETRI Journal
- /
- v.46 no.1
- /
- pp.22-34
- /
- 2024
Exposure to varied noisy environments impairs the recognition performance of artificial intelligence-based speech recognition technologies. Degraded-performance services can be utilized as limited systems that assure good performance in certain environments, but impair the general quality of speech recognition services. This study introduces an audiovisual speech recognition (AVSR) model robust to various noise settings, mimicking human dialogue recognition elements. The model converts word embeddings and log-Mel spectrograms into feature vectors for audio recognition. A dense spatial-temporal convolutional neural network model extracts features from log-Mel spectrograms, transformed for visual-based recognition. This approach exhibits improved aural and visual recognition capabilities. We assess the signal-to-noise ratio in nine synthesized noise environments, with the proposed model exhibiting lower average error rates. The error rate for the AVSR model using a three-feature multi-fusion method is 1.711%, compared to the general 3.939% rate. This model is applicable in noise-affected environments owing to its enhanced stability and recognition rate.
https://doi.org/10.4218/etrij.2023-0266 인용 PDF

Scene-based Nonuniformity Correction Algorithm Based on Temporal Median Filter

Geng, Lixiang;Chen, Qian;Qian, Weixian;Zhang, Yuzhen
- Journal of the Optical Society of Korea
- /
- v.17 no.3
- /
- pp.255-261
- /
- 2013
Scene-based nonuniformity correction techniques for infrared focal-plane arrays have been widely considered as a key technology, and various algorithms have been proposed to compensate for fixed-pattern noise. However, the existed algorithms' capability is always restricted by the problems of convergence speed and ghosting artifacts. In this paper, an effective scene-based nonuniformity correction method is proposed to solve these problems. The algorithm is an improvement over the constant statistics method and a temporal median is utilized with the Gaussian kernel to estimate the nonuniformity parameters. Also theoretical analysis is conducted to demonstrate that effective ghosting artifacts elimination and superior convergence speed can be obtained with the proposed method. Finally, the performance of the proposed technique is tested with infrared image sequences with simulated nonuniformity and with infrared imagery with real nonuniformity. The results show the proposed method is able to estimate each detector's gain and to offset reliably and that it performs better in increasing convergence speed and reducing ghosting artifacts compared with the conventional techniques.
https://doi.org/10.3807/JOSK.2013.17.3.255 인용 PDF KSCI

Fast Encoder Design for Multi-view Video

Zhao, Fan;Liao, Kaiyang;Zhang, Erhu;Qu, Fangying
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.8 no.7
- /
- pp.2464-2479
- /
- 2014
Multi-view video coding is an international encoding standard that attains good performance by fully utilizing temporal and inter-view correlations. However, it suffers from high computational complexity. This paper presents a fast encoder design to reduce the level of complexity. First, when the temporal correlation of a group of pictures is sufficiently strong, macroblock-based inter-view prediction is not employed for the non-anchor pictures of B-views. Second, when the disparity between two adjacent views is above some threshold, frame-based inter-view prediction is disabled. Third, inter-view prediction is not performed on boundary macroblocks in the auxiliary views, because the references for these blocks may not exist in neighboring views. Fourth, finer partitions of inter-view prediction are cancelled for macroblocks in static image areas. Finally, when estimating the disparity of a macroblock, the search range is adjusted according to the mode size distribution of the neighboring view. Compared with reference software, these techniques produce an average time reduction of 83.65%, while the bit-rate increase and peak signal-to-noise ratio loss are less than 0.54% and 0.05dB, respectively.
https://doi.org/10.3837/tiis.2014.07.015 인용 PDF KSCI KPUBS HTML

Environmental Sound Classification for Selective Noise Cancellation in Industrial Sites (산업현장에서의 선택적 소음 제거를 위한 환경 사운드 분류 기술)

Choi, Hyunkook;Kim, Sangmin;Park, Hochong
- Journal of Broadcast Engineering
- /
- v.25 no.6
- /
- pp.845-853
- /
- 2020
In this paper, we propose a method for classifying environmental sound for selective noise cancellation in industrial sites. Noise in industrial sites causes hearing loss in workers, and researches on noise cancellation have been widely conducted. However, the conventional methods have a problem of blocking all sounds and cannot provide the optimal operation per noise type because of common cancellation method for all types of noise. In order to perform selective noise cancellation, therefore, we propose a method for environmental sound classification based on deep learning. The proposed method uses new sets of acoustic features consisting of temporal and statistical properties of Mel-spectrogram, which can overcome the limitation of Mel-spectrogram features, and uses convolutional neural network as a classifier. We apply the proposed method to five-class sound classification with three noise classes and two non-noise classes. We confirm that the proposed method provides improved classification accuracy by 6.6% point, compared with that using conventional Mel-spectrogram features.
https://doi.org/10.5909/JBE.2020.25.6.845 인용 PDF KSCI KPUBS

Error Resilient MPEG-4 Encoding Method (오류 내성을 갖는 MPEG-4 부호화 기법)

현기수;문지용;김기두;강동욱
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2002.11a
- /
- pp.105-109
- /
- 2002
The main ideas of hybrid video coding methods are to reduce the spatial and temporal redundancy for efficient data compression. If compressed video stream is transmitted through the error-prone channel, bitstream can be critically damaged and the spatio-temporal error propagates through successive frames at the decoder because of drift noise in the references between encoder and decoder. In this paper, I propose the lagrangian multiplier selection method in the error-prone environment. Finally, it is shown that the performance comparisons of the R-D optimized mode decision are made against the conventional method and simulation results are given in the following.
PDF

A New Video Coding Algorithm using 3D-Subband Coding and Lattice Vector Quantization

Park, Joong-Han;Lee, Keun-Young
- Journal of Electrical Engineering and information Science
- /
- v.2 no.6
- /
- pp.131-137
- /
- 1997
In this paper, we propose an efficient motion adaptive 3-dimensional (3D) video coding algorithm using 3D subband coding (3D-SBC) and lattice vector quantization (LVQ) for low bit rate. Instead of splitting input video sequences into the fixed number of subbands along the temporal axes, we decompose them into temporal subbands of variable size according to motions in frames. Each spatio-temporally splitted 7 subbands are partitioned by quadtree technique and coded with lattice vector quantization(LVQ). The simulation results show 0.1∼4.3dB gain over H.261 in peak signal to noise ratio (PSNR) at low bit rate(64Kbps).
PDF

Utilization of Visual Context for Robust Object Recognition in Intelligent Mobile Robots (지능형 이동 로봇에서 강인 물체 인식을 위한 영상 문맥 정보 활용 기법)

Kim, Sung-Ho;Kim, Jun-Sik;Kweon, In-So
- The Journal of Korea Robotics Society
- /
- v.1 no.1
- /
- pp.36-45
- /
- 2006
In this paper, we introduce visual contexts in terms of types and utilization methods for robust object recognition with intelligent mobile robots. One of the core technologies for intelligent robots is visual object recognition. Robust techniques are strongly required since there are many sources of visual variations such as geometric, photometric, and noise. For such requirements, we define spatial context, hierarchical context, and temporal context. According to object recognition domain, we can select such visual contexts. We also propose a unified framework which can utilize the whole contexts and validates it in real working environment. Finally, we also discuss the future research directions of object recognition technologies for intelligent robots.
PDF

A Recognition System for Multi-Form Korean Characters Based on Hierarchical Temporal Memory

Haibao, Nan;Bae, Sun-Gap;Bae, Jong-Min;Kang, Hyun-Syug
- Journal of Korea Multimedia Society
- /
- v.12 no.12
- /
- pp.1718-1727
- /
- 2009
Traditional character recognition systems usually aim at characters with simple variation. With the development of multimedia technology, printed characters may appear more diversely. Existing recognition technologies can't deal with Hangul recognition effectively in diverse environments. This paper presents a recognition system for multi-form Korean characters called RSMFK, which is based on the model of Hierarchical Temporal Memory (HTM). Our system can effectively recognize the printed Korean characters of different fonts, scales, rotation, noise and background. HTM is a model which simulates the neocortex of human brain to recognize and memorize intelligently. Experimental results show that RSMFK performs a good recognition rate of 97.8% on average, which is proved to be obviously improved over the conventional methods.
PDF

Adaptive Reconstruction of Multi-periodic Harmonic Time Series with Only Negative Errors: Simulation Study

Lee, Sang-Hoon
- Korean Journal of Remote Sensing
- /
- v.26 no.6
- /
- pp.721-730
- /
- 2010
In satellite remote sensing, irregular temporal sampling is a common feature of geophysical and biological process on the earth's surface. Lee (2008) proposed a feed-back system using a harmonic model of single period to adaptively reconstruct observation image series contaminated by noises resulted from mechanical problems or environmental conditions. However, the simple sinusoidal model of single period may not be appropriate for temporal physical processes of land surface. A complex model of multiple periods would be more proper to represent inter-annual and inner-annual variations of surface parameters. This study extended to use a multi-periodic harmonic model, which is expressed as the sum of a series of sine waves, for the adaptive system. For the system assessment, simulation data were generated from a model of negative errors, based on the fact that the observation is mainly suppressed by bad weather. The experimental results of this simulation study show the potentiality of the proposed system for real-time monitoring on the image series observed by imperfect sensing technology from the environment which are frequently influenced by bad weather.
https://doi.org/10.7780/kjrs.2010.26.6.721 인용 PDF KSCI

개선된 시간축 정보량 감축 기술 기반 오디오 부호화 기술

Beack, Seungkwon;Lim, Wootaek;Lee, Taejin
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2021.06a
- /
- pp.32-35
- /
- 2021
본 논문에서는 시간축 정보량을 감축하여 오디오 부호화 효율을 개선하기 위한 기술을 제안한다. 시간축 정보량 감축 방법은 종전의 오디오 코덱에서도 활용되었던 대표적인 기술로 TNS(temporal noise shaping) 기술이 있다. 그러나 TNS 기술은 오디오 신호의 천이구간에서 선별적으로 유효하게 동작하며 그 효율성도 간헐적으로 나타나는데 이는 MDCT(modified discrete cosine transform)에서 예측 과정을 수행하는 구조적인 문제를 갖고 있기 때문이다. 본 논문에서는 종전의 TNS 기술의 취약점을 보완한 ITES(intensive temporal envelope shaping) 기술을 제안하였다. 제안 기술은 TNS 보다 유효한 오디오 시간영역 정보량을 예측하고 감축하였으며, 개선된 음질을 나타냄을 주관적 평가를 수행하여 검증하였다.
PDF

Search Result 288, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)