• Title/Summary/Keyword: Skip Structure

Search Result 38, Processing Time 0.019 seconds

A study on the waveform-based end-to-end deep convolutional neural network for weakly supervised sound event detection (약지도 음향 이벤트 검출을 위한 파형 기반의 종단간 심층 콘볼루션 신경망에 대한 연구)

  • Lee, Seokjin;Kim, Minhan;Jeong, Youngho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.1
    • /
    • pp.24-31
    • /
    • 2020
  • In this paper, the deep convolutional neural network for sound event detection is studied. Especially, the end-to-end neural network, which generates the detection results from the input audio waveform, is studied for weakly supervised problem that includes weakly-labeled and unlabeled dataset. The proposed system is based on the network structure that consists of deeply-stacked 1-dimensional convolutional neural networks, and enhanced by the skip connection and gating mechanism. Additionally, the proposed system is enhanced by the sound event detection and post processings, and the training step using the mean-teacher model is added to deal with the weakly supervised data. The proposed system was evaluated by the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 Task 4 dataset, and the result shows that the proposed system has F1-scores of 54 % (segment-based) and 32 % (event-based).

Fast Inter CU Partitioning Algorithm using MAE-based Prediction Accuracy Functions for VVC (MAE 기반 예측 정확도 함수를 이용한 VVC의 고속 화면간 CU 분할 알고리즘)

  • Won, Dong-Jae;Moon, Joo-Hee
    • Journal of Broadcast Engineering
    • /
    • v.27 no.3
    • /
    • pp.361-368
    • /
    • 2022
  • Quaternary tree plus multi-type tree (QT+MTT) structure was adopted in the Versatile Video Coding (VVC) standard as a block partitioning tool. QT+MTT provides excellent coding gain; however, it has huge encoding complexity due to the flexibility of the binary tree (BT) and ternary tree (TT) splits. This paper proposes a fast inter coding unit (CU) partitioning algorithm for BT and TT split types based on prediction accuracy functions using the mean of the absolute error (MAE). The MAE-based decision model was established to achieve a consistent time-saving encoding with stable coding loss for a practical low complexity VVC encoder. Experimental results under random access test configuration showed that the proposed algorithm achieved the encoding time saving from 24.0% to 31.7% with increasing luminance Bjontegaard delta (BD) rate from 1.0% to 2.1%.

2-Stage Detection and Classification Network for Kiosk User Analysis (디스플레이형 자판기 사용자 분석을 위한 이중 단계 검출 및 분류 망)

  • Seo, Ji-Won;Kim, Mi-Kyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.5
    • /
    • pp.668-674
    • /
    • 2022
  • Machine learning techniques using visual data have high usability in fields of industry and service such as scene recognition, fault detection, security and user analysis. Among these, user analysis through the videos from CCTV is one of the practical way of using vision data. Also, many studies about lightweight artificial neural network have been published to increase high usability for mobile and embedded environment so far. In this study, we propose the network combining the object detection and classification for mobile graphic processing unit. This network detects pedestrian and face, classifies age and gender from detected face. Proposed network is constructed based on MobileNet, YOLOv2 and skip connection. Both detection and classification models are trained individually and combined as 2-stage structure. Also, attention mechanism is used to improve detection and classification ability. Nvidia Jetson Nano is used to run and evaluate the proposed system.

Enhanced Deep Feature Reconstruction : Texture Defect Detection and Segmentation through Preservation of Multi-scale Features (개선된 Deep Feature Reconstruction : 다중 스케일 특징의 보존을 통한 텍스쳐 결함 감지 및 분할)

  • Jongwook Si;Sungyoung Kim
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.6
    • /
    • pp.369-377
    • /
    • 2023
  • In the industrial manufacturing sector, quality control is pivotal for minimizing defect rates; inadequate management can result in additional costs and production delays. This study underscores the significance of detecting texture defects in manufactured goods and proposes a more precise defect detection technique. While the DFR(Deep Feature Reconstruction) model adopted an approach based on feature map amalgamation and reconstruction, it had inherent limitations. Consequently, we incorporated a new loss function using statistical methodologies, integrated a skip connection structure, and conducted parameter tuning to overcome constraints. When this enhanced model was applied to the texture category of the MVTec-AD dataset, it recorded a 2.3% higher Defect Segmentation AUC compared to previous methods, and the overall defect detection performance was improved. These findings attest to the significant contribution of the proposed method in defect detection through the reconstruction of feature map combinations.

A Fast Motion Estimation Algorithm Based on Multi-Resolution Frame Structure (다 해상도 프레임 구조에 기반한 고속 움직임 추정 기법)

  • Song, Byung-Cheol;Ra, Jong-Beom
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.37 no.5
    • /
    • pp.54-63
    • /
    • 2000
  • We present a multi-resolution block matching algorithm (BMA) for fast motion estimation At the coarsest level, a motion vector (MV) having minimum matching error is chosen via a full search, and a MV with minimum matching error is concurrently found among the MVs of the spatially adjacent blocks Here, to examine the spatial MVs accurately, we propose an efficient method for searching full resolution MV s without MV quantization even at the coarsest level The chosen two MV s are used as the initial search centers at the middle level At the middle level, the local search is performed within much smaller search area around each search center If the method used at the coarsest level is adopted here, the local searches can be done at integer-pel accuracy A MV having minimum matching error is selected within the local search areas, and then the final level search is performed around this initial search center Since the local searches are performed at integer-pel accuracy at the middle level, the local search at the finest level does not take an effect on the overall performance So we can skip the final level search without performance degradation, thereby the search speed increases Simulation results show that in comparison with full search BMA, the proposed BMA without the final level search achieves a speed-up factor over 200 with minor PSNR degradation of 02dB at most, under a normal MPEG2 coding environment Furthermore, our scheme IS also suitable for hardware implementation due to regular data-flow.

  • PDF

A Network-adaptive Context Extraction Method for JPEG2000 Using Tree-Structure of Coefficients from DWT (DWT 계수의 트리구조를 이용한 네트워크-적응적 JPEG2000 컨텍스트 추출방법)

  • Choi Hyun-Jun;Seo Young-Ho;Kim Dong-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.9C
    • /
    • pp.939-948
    • /
    • 2005
  • In EBCOT, the context extraction process takes excessive calculation time and this paper proposed a method to reduce this calculation time. That is, if a coefficient is less than a pre-defined threshold value the coefficient and its descendents skip the context extraction process. There is a trade-off relationship between the calculation time and the image quality or the amount of output data such that as this threshold value increases, the calculation time and the amount of output data decreases, but the image degradation increases. Therefore, by deciding this threshold value according to the network environments or conditions, it is possible to establish a network-adaptive context extraction method. The experimental results showed that the range of the threshold values for acceptable image quality(better than 30dB) is from 0 to 4. The experimental results showed that in this range the Resulting reduction rate in calculation time was from $3\%\;to\;64\%$ in average, the reduction rate in output data was from $32\%$ to $73\%$ in average, which means that large reduction in calculation time and output data can be obtained with a cost of an acceptable image quality degradation. Therefore, the proposed method is expected to be used efficiently in the application area such as the real-time image/video data communication in wireless environments, etc.

Scalable Video Coding using Super-Resolution based on Convolutional Neural Networks for Video Transmission over Very Narrow-Bandwidth Networks (초협대역 비디오 전송을 위한 심층 신경망 기반 초해상화를 이용한 스케일러블 비디오 코딩)

  • Kim, Dae-Eun;Ki, Sehwan;Kim, Munchurl;Jun, Ki Nam;Baek, Seung Ho;Kim, Dong Hyun;Choi, Jeung Won
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.132-141
    • /
    • 2019
  • The necessity of transmitting video data over a narrow-bandwidth exists steadily despite that video service over broadband is common. In this paper, we propose a scalable video coding framework for low-resolution video transmission over a very narrow-bandwidth network by super-resolution of decoded frames of a base layer using a convolutional neural network based super resolution technique to improve the coding efficiency by using it as a prediction for the enhancement layer. In contrast to the conventional scalable high efficiency video coding (SHVC) standard, in which upscaling is performed with a fixed filter, we propose a scalable video coding framework that replaces the existing fixed up-scaling filter by using the trained convolutional neural network for super-resolution. For this, we proposed a neural network structure with skip connection and residual learning technique and trained it according to the application scenario of the video coding framework. For the application scenario where a video whose resolution is $352{\times}288$ and frame rate is 8fps is encoded at 110kbps, the quality of the proposed scalable video coding framework is higher than that of the SHVC framework.

Effect of Long-Term Steeping and Enzyme Treatment of Glutinous Rice on Yukwa Characteristics - II. Physicochemical Characteristics of Enzyme-treated Glutinous Rice Flour - (찹쌀의 장기 수침 및 효소처리가 유과의 특성에 미치는 영향 -제 2 보: 효소처리시킨 찹쌀가루의 이화학적 특성 연구-)

  • Sohn, Kyung-Hee;Park, Jun
    • Korean journal of food and cookery science
    • /
    • v.14 no.3
    • /
    • pp.225-231
    • /
    • 1998
  • Enzyme-treated glutinous rice flour, which was developed to shorten or skip a steeping process during the preparation of Yukwa, was analyzed for its physicochemical characteristics and compared with glutinous rice flour made by 28-day-steeping method. Total sugar content of the 28-day-steeped flour was the highest among all groups, on the other hand, the reducing sugar content was higher in enzyme-treated glutinous rice flour. The viscosity of enzyme-treated flours was significantly lower than that of the 28-day-steeped and particularly showed the lowest value at 65$^{\circ}C$. The contents of Ca$\^$2+/ and Mg$\^$2+/ in enzyme-treated glutinous rice flours were higher than those of the 28-day-steeped group, however, the content of P$\^$+/ was lower. Free sugar detected in glutinous rice flour prepared from 28-day-steeping method was glucose only, but enzyme-treated flours contained maltose and glucose, and the content of total free sugar was much higher than that of the 28-day-steeped group. In microscopic structure, both 28-day-steeped and enzyme-treated flours showed the particle size decreased and porous surface on some part of the flour granule.

  • PDF