• Title/Summary/Keyword: encoder- decoder

Search Result 454, Processing Time 0.03 seconds

Multi-stage Transformer for Video Anomaly Detection

  • Viet-Tuan Le;Khuong G. T. Diep;Tae-Seok Kim;Yong-Guk Kim
    • Annual Conference of KIPS
    • /
    • 2023.11a
    • /
    • pp.648-651
    • /
    • 2023
  • Video anomaly detection aims to detect abnormal events. Motivated by the power of transformers recently shown in vision tasks, we propose a novel transformer-based network for video anomaly detection. To capture long-range information in video, we employ a multi-scale transformer as an encoder. A convolutional decoder is utilized to predict the future frame from the extracted multi-scale feature maps. The proposed method is evaluated on three benchmark datasets: USCD Ped2, CUHK Avenue, and ShanghaiTech. The results show that the proposed method achieves better performance compared to recent methods.

Multi-label Lane Detection Algorithm for Autonomous Vehicle Using Deep Learning (자율주행 차량을 위한 멀티 레이블 차선 검출 딥러닝 알고리즘)

  • Chae Song Park;Kyong Su Yi
    • Journal of Auto-vehicle Safety Association
    • /
    • v.16 no.1
    • /
    • pp.29-34
    • /
    • 2024
  • This paper presents a multi-label lane detection method for autonomous vehicles based on deep learning. The proposed algorithm can detect two types of lanes: center lane and normal lane. The algorithm uses a convolution neural network with an encoder-decoder architecture to extract features from input images and produce a multi-label heatmap for predicting lane's label. This architecture has the potential to detect more diverse types of lanes in that it can add the number of labels by extending the heatmap's dimension. The proposed algorithm was tested on an OpenLane dataset and achieved 85 Frames Per Second (FPS) in end to-end inference time. The results demonstrate the usability and computational efficiency of the proposed algorithm for the lane detection in autonomous vehicles.

Trace-Back Viterbi Decoder with Sequential State Transition Control (순서적 역방향 상태천이 제어에 의한 역추적 비터비 디코더)

  • 정차근
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.40 no.11
    • /
    • pp.51-62
    • /
    • 2003
  • This paper presents a novel survivor memeory management and decoding techniques with sequential backward state transition control in the trace back Viterbi decoder. The Viterbi algorithm is an maximum likelihood decoding scheme to estimate the likelihood of encoder state for channel error detection and correction. This scheme is applied to a broad range of digital communication such as intersymbol interference removing and channel equalization. In order to achieve the area-efficiency VLSI chip design with high throughput in the Viterbi decoder in which recursive operation is implied, more research is required to obtain a simple systematic parallel ACS architecture and surviver memory management. As a method of solution to the problem, this paper addresses a progressive decoding algorithm with sequential backward state transition control in the trace back Viterbi decoder. Compared to the conventional trace back decoding techniques, the required total memory can be greatly reduced in the proposed method. Furthermore, the proposed method can be implemented with a simple pipelined structure with systolic array type architecture. The implementation of the peripheral logic circuit for the control of memory access is not required, and memory access bandwidth can be reduced Therefore, the proposed method has characteristics of high area-efficiency and low power consumption with high throughput. Finally, the examples of decoding results for the received data with channel noise and application result are provided to evaluate the efficiency of the proposed method.

Lightweight video coding using spatial correlation and symbol-level error-correction channel code (공간적 유사성과 심볼단위 오류정정 채널 코드를 이용한 경량화 비디오 부호화 방법)

  • Ko, Bong-Hyuck;Shim, Hiuk-Jae;Jeon, Byeung-Woo
    • Journal of Broadcast Engineering
    • /
    • v.13 no.2
    • /
    • pp.188-199
    • /
    • 2008
  • In conventional video coding, encoder complexity is much higher than that of decoder. However, investigations for lightweight encoder to eliminate motion prediction/compensation claiming most complexity in encoder have recently become an important issue. The Wyner-Ziv coding is one of the representative schemes for the problem and, in this scheme, since encoder generates only parity bits of a current frame without performing any type of processes extracting correlation information between frames, it has an extremely simple structure compared to conventional coding techniques. However, in Wyner-Ziv coding, channel decoding errors occur when noisy side information is used in channel decoding process. These channel decoding errors appear more frequently, especially, when there is not enough correlation between frames to generate accurate side information and, as a result, those errors look like Salt & Pepper type noise in the reconstructed frame. Since this noise severely deteriorates subjective video quality even though such noise rarely occurs, previously we proposed a computationally extremely light encoding method based on selective median filter that corrects such noise using spatial correlation of a frame. However, in the previous method, there is a problem that loss of texture from filtering may exceed gain from error correction by the filter for video sequences having complex torture. Therefore, in this paper, we propose an improved lightweight encoding method that minimizes loss of texture detail from filtering by allowing information of texture and that of noise in side information to be utilized by the selective median filter. Our experiments have verified average PSNR gain of up to 0.84dB compared to the previous method.

Adaptive Hard Decision Aided Fast Decoding Method in Distributed Video Coding (적응적 경판정 출력을 이용한 고속 분산 비디오 복호화 기술)

  • Oh, Ryang-Geun;Shim, Hiuk-Jae;Jeon, Byeung-Woo
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.6
    • /
    • pp.66-74
    • /
    • 2010
  • Recently distributed video coding (DVC) is spotlighted for the environment which has restriction in computing resource at encoder. Wyner-Ziv (WZ) coding is a representative scheme of DVC. The WZ encoder independently encodes key frame and WZ frame respectively by conventional intra coding and channel code. WZ decoder generates side information from reconstructed two key frames (t-1, t+1) based on temporal correlation. The side information is regarded as a noisy version of original WZ frame. Virtual channel noise can be removed by channel decoding process. So the performance of WZ coding greatly depends on the performance of channel code. Among existing channel codes, Turbo code and LDPC code have the most powerful error correction capability. These channel codes use stochastically iterative decoding process. However the iterative decoding process is quite time-consuming, so complexity of WZ decoder is considerably increased. Analysis of the complexity of LPDCA with real video data shows that the portion of complexity of LDPCA decoding is higher than 60% in total WZ decoding complexity. Using the HDA (Hard Decision Aided) method proposed in channel code area, channel decoding complexity can be much reduced. But considerable RD performance loss is possible according to different thresholds and its proper value is different for each sequence. In this paper, we propose an adaptive HDA method which sets up a proper threshold according to sequence. The proposed method shows about 62% and 32% of time saving, respectively in LDPCA and WZ decoding process, while RD performance is not that decreased.

Design of QDI Model Based Encoder/Decoder Circuits for Low Delay-Power Product Data Transfers in GALS Systems (GALS 시스템에서의 저비용 데이터 전송을 위한 QDI모델 기반 인코더/디코더 회로 설계)

  • Oh Myeong-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.43 no.1 s.343
    • /
    • pp.27-36
    • /
    • 2006
  • Conventional delay-insensitive (DI) data encodings usually require 2N+1 wires for transferring N-bit. To reduce complexity and power dissipation of wires in designing a large scaled chip, an encoder and a decoder circuits, where N-bit data transfer can be peformed with only N+l wires, are proposed. These circuits are based on a quasi delay-insensitive (QDI) model and designed by using current-mode multiple valued logic (CMMVL). The effectiveness of the proposed data transfer mechanism is validated by comparisons with conventional data transfer mechanisms using dual-rail and 1-of-4 encodings through simulation at the 0.25 um CMOS technology. In general, simulation results with wire lengths of 4 mm or larger show that the CMMVL scheme significantly reduces delay-power product ($D{\ast}P$) values of the dual-rail encoding with data rate of 5 MHz or more and the 1-of-4 encoding with data rate of 18 MHz or more. In addition, simulation results using the buffer-inserted dual-rail and 1-of-4 encodings for high performance with the wire length of 10 mm and 32-bit data demonstrate that the proposed CMMVL scheme reduces the D*P values of the dual-rail encoding with data rate of 4 MHz or more and 1-of-4 encoding with data rate of 25 MHz or more by up to $57.7\%\;and\;17.9\%,$ respectively.

CPLD-based Controller for Bi-directional Communication in a Capsule Endoscope (캡슐형 무선 내시경의 양방향 통신을 위한 CPLD 기반의 제어기 설계 및 구현)

  • Lee Jyung Hyun;Moon Yeon Kwan;Park Hee Joon;Won Chul Ho;Lee Seung Ha;Choi Hyun Chul;Cho Jin Ho
    • Journal of Biomedical Engineering Research
    • /
    • v.25 no.6
    • /
    • pp.447-453
    • /
    • 2004
  • In the case of a capsule that can acquire and transmit images from the intestines, the size of the module and the battery capacity in the capsule are subject to restriction. The capsule must be swallowable and the battery must maintain the stable power during the capsule travels in the gastrointestinal tract. Therefore, it is important to control the endoscope using bi-directional wireless communication. In this study, encoder and decoder CPLD modules for bi-directional capsule endoscopes were designed and implemented. The designed controller for capsule endoscope can transmit the images of GI-track from inside to outside of the body and the capsules can be controlled by external controller simultaneously. The designed and implemented controller was verified by an in-vivo animal experiments. From these experiments, it was verified that the CPLD module for bi-directional capsule endoscope satisfied the design specifications.

Pixel-level Crack Detection in X-ray Computed Tomography Image of Granite using Deep Learning (딥러닝을 이용한 화강암 X-ray CT 영상에서의 균열 검출에 관한 연구)

  • Hyun, Seokhwan;Lee, Jun Sung;Jeon, Seonghwan;Kim, Yejin;Kim, Kwang Yeom;Yun, Tae Sup
    • Tunnel and Underground Space
    • /
    • v.29 no.3
    • /
    • pp.184-196
    • /
    • 2019
  • This study aims to extract a 3D image of micro-cracks generated by hydraulic fracturing tests, using the deep learning method and X-ray computed tomography images. The pixel-level cracks are difficult to be detected via conventional image processing methods, such as global thresholding, canny edge detection, and the region growing method. Thus, the convolutional neural network-based encoder-decoder network is adapted to extract and analyze the micro-crack quantitatively. The number of training data can be acquired by dividing, rotating, and flipping images and the optimum combination for the image augmentation method is verified. Application of the optimal image augmentation method shows enhanced performance for not only the validation dataset but also the test dataset. In addition, the influence of the original number of training data to the performance of the deep learning-based neural network is confirmed, and it leads to succeed the pixel-level crack detection.

Visual analysis of attention-based end-to-end speech recognition (어텐션 기반 엔드투엔드 음성인식 시각화 분석)

  • Lim, Seongmin;Goo, Jahyun;Kim, Hoirin
    • Phonetics and Speech Sciences
    • /
    • v.11 no.1
    • /
    • pp.41-49
    • /
    • 2019
  • An end-to-end speech recognition model consisting of a single integrated neural network model was recently proposed. The end-to-end model does not need several training steps, and its structure is easy to understand. However, it is difficult to understand how the model recognizes speech internally. In this paper, we visualized and analyzed the attention-based end-to-end model to elucidate its internal mechanisms. We compared the acoustic model of the BLSTM-HMM hybrid model with the encoder of the end-to-end model, and visualized them using t-SNE to examine the difference between neural network layers. As a result, we were able to delineate the difference between the acoustic model and the end-to-end model encoder. Additionally, we analyzed the decoder of the end-to-end model from a language model perspective. Finally, we found that improving end-to-end model decoder is necessary to yield higher performance.

Low-complexity Local Illuminance Compensation for Bi-prediction mode (양방향 예측 모드를 위한 저복잡도 LIC 방법 연구)

  • Choi, Han Sol;Byeon, Joo Hyung;Bang, Gun;Sim, Dong Gyu
    • Journal of Broadcast Engineering
    • /
    • v.24 no.3
    • /
    • pp.463-471
    • /
    • 2019
  • This paper proposes a method for reducing the complexity of LIC (Local Illuminance Compensation) for bi-directional inter prediction. The LIC performs local illumination compensation using neighboring reconstruction samples of the current block and the reference block to improve the accuracy of the inter prediction. Since the weight and offset required for local illumination compensation are calculated at both sides of the encoder and decoder using the reconstructed samples, there is an advantage that the coding efficiency is improved without signaling any information. Since the weight and the offset are obtained in the encoding prediction step and the decoding step, encoder and decoder complexity are increased. This paper proposes two methods for low complexity LIC. The first method is a method of applying illumination compensation with offset only in bi-directional prediction, and the second is a method of applying LIC after weighted average step of reference block obtained by bidirectional prediction. To evaluate the performance of the proposed method, BD-rate is compared with BMS-2.0.1 using B, C, and D classes of MPEG standard experimental image under RA (Random Access) condition. Experimental results show that the proposed method reduces the average of 0.29%, 0.23%, 0.04% for Y, U, and V in terms of BD-rate performance compared to BMS-2.0.1 and encoding/decoding time is almost same. Although the BD-rate was lost, the calculation complexity of the LIC was greatly reduced as the multiplication operation was removed and the addition operation was halved in the LIC parameter derivation process.