• Title/Summary/Keyword: convolution model

Search Result 394, Processing Time 0.027 seconds

Improving Transformer with Dynamic Convolution and Shortcut for Video-Text Retrieval

  • Liu, Zhi;Cai, Jincen;Zhang, Mengmeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.7
    • /
    • pp.2407-2424
    • /
    • 2022
  • Recently, Transformer has made great progress in video retrieval tasks due to its high representation capability. For the structure of a Transformer, the cascaded self-attention modules are capable of capturing long-distance feature dependencies. However, the local feature details are likely to have deteriorated. In addition, increasing the depth of the structure is likely to produce learning bias in the learned features. In this paper, an improved Transformer structure named TransDCS (Transformer with Dynamic Convolution and Shortcut) is proposed. A Multi-head Conv-Self-Attention module is introduced to model the local dependencies and improve the efficiency of local features extraction. Meanwhile, the augmented shortcuts module based on a dual identity matrix is applied to enhance the conduction of input features, and mitigate the learning bias. The proposed model is tested on MSRVTT, LSMDC and Activity-Net benchmarks, and it surpasses all previous solutions for the video-text retrieval task. For example, on the LSMDC benchmark, a gain of about 2.3% MdR and 6.1% MnR is obtained over recently proposed multimodal-based methods.

Prediction of Wind Power Generation using Deep Learnning (딥러닝을 이용한 풍력 발전량 예측)

  • Choi, Jeong-Gon;Choi, Hyo-Sang
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.2
    • /
    • pp.329-338
    • /
    • 2021
  • This study predicts the amount of wind power generation for rational operation plan of wind power generation and capacity calculation of ESS. For forecasting, we present a method of predicting wind power generation by combining a physical approach and a statistical approach. The factors of wind power generation are analyzed and variables are selected. By collecting historical data of the selected variables, the amount of wind power generation is predicted using deep learning. The model used is a hybrid model that combines a bidirectional long short term memory (LSTM) and a convolution neural network (CNN) algorithm. To compare the prediction performance, this model is compared with the model and the error which consist of the MLP(:Multi Layer Perceptron) algorithm, The results is presented to evaluate the prediction performance.

Development of ResNet based Crop Growth Stage Estimation Model (ResNet 기반 작물 생육단계 추정 모델 개발)

  • Park, Jun;Kim, June-Yeong;Park, Sung-Wook;Jung, Se-Hoon;Sim, Chun-Bo
    • Smart Media Journal
    • /
    • v.11 no.2
    • /
    • pp.53-62
    • /
    • 2022
  • Due to the accelerated global warming phenomenon after industrialization, the frequency of changes in the existing environment and abnormal climate is increasing. Agriculture is an industry that is very sensitive to climate change, and global warming causes problems such as reducing crop yields and changing growing regions. In addition, environmental changes make the growth period of crops irregular, making it difficult for even experienced farmers to easily estimate the growth stage of crops, thereby causing various problems. Therefore, in this paper, we propose a CNN model for estimating the growth stage of crops. The proposed model was a model that modified the pooling layer of ResNet, and confirmed the accuracy of higher performance than the growth stage estimation of the ResNet and DenseNet models.

FGW-FER: Lightweight Facial Expression Recognition with Attention

  • Huy-Hoang Dinh;Hong-Quan Do;Trung-Tung Doan;Cuong Le;Ngo Xuan Bach;Tu Minh Phuong;Viet-Vu Vu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.9
    • /
    • pp.2505-2528
    • /
    • 2023
  • The field of facial expression recognition (FER) has been actively researched to improve human-computer interaction. In recent years, deep learning techniques have gained popularity for addressing FER, with numerous studies proposing end-to-end frameworks that stack or widen significant convolutional neural network layers. While this has led to improved performance, it has also resulted in larger model sizes and longer inference times. To overcome this challenge, our work introduces a novel lightweight model architecture. The architecture incorporates three key factors: Depth-wise Separable Convolution, Residual Block, and Attention Modules. By doing so, we aim to strike a balance between model size, inference speed, and accuracy in FER tasks. Through extensive experimentation on popular benchmark FER datasets, our proposed method has demonstrated promising results. Notably, it stands out due to its substantial reduction in parameter count and faster inference time, while maintaining accuracy levels comparable to other lightweight models discussed in the existing literature.

Image based Concrete Compressive Strength Prediction Model using Deep Convolution Neural Network (심층 컨볼루션 신경망을 활용한 영상 기반 콘크리트 압축강도 예측 모델)

  • Jang, Youjin;Ahn, Yong Han;Yoo, Jane;Kim, Ha Young
    • Korean Journal of Construction Engineering and Management
    • /
    • v.19 no.4
    • /
    • pp.43-51
    • /
    • 2018
  • As the inventory of aged apartments is expected to increase explosively, the importance of maintenance to improve the durability of concrete facilities is increasing. Concrete compressive strength is a representative index of durability of concrete facilities, and is an important item in the precision safety diagnosis for facility maintenance. However, existing methods for measuring the concrete compressive strength and determining the maintenance of concrete facilities have limitations such as facility safety problem, high cost problem, and low reliability problem. In this study, we proposed a model that can predict the concrete compressive strength through images by using deep convolution neural network technique. Learning, validation and testing were conducted by applying the concrete compressive strength dataset constructed through the concrete specimen which is produced in the laboratory environment. As a result, it was found that the concrete compressive strength could be learned by using the images, and the validity of the proposed model was confirmed.

A Korean speech recognition based on conformer (콘포머 기반 한국어 음성인식)

  • Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.488-495
    • /
    • 2021
  • We propose a speech recognition system based on conformer. Conformer is known to be convolution-augmented transformer, which combines transfer model for capturing global information with Convolution Neural Network (CNN) for exploiting local feature effectively. The baseline system is developed to be a transfer-based speech recognition using Long Short-Term Memory (LSTM)-based language model. The proposed system is a system which uses conformer instead of transformer with transformer-based language model. When Electronics and Telecommunications Research Institute (ETRI) speech corpus in AI-Hub is used for our evaluation, the proposed system yields 5.7 % of Character Error Rate (CER) while the baseline system results in 11.8 % of CER. Even though speech corpus is extended into other domain of AI-hub such as NHNdiguest speech corpus, the proposed system makes a robust performance for two domains. Throughout those experiments, we can prove a validation of the proposed system.

History of the Photon Beam Dose Calculation Algorithm in Radiation Treatment Planning System

  • Kim, Dong Wook;Park, Kwangwoo;Kim, Hojin;Kim, Jinsung
    • Progress in Medical Physics
    • /
    • v.31 no.3
    • /
    • pp.54-62
    • /
    • 2020
  • Dose calculation algorithms play an important role in radiation therapy and are even the basis for optimizing treatment plans, an important feature in the development of complex treatment technologies such as intensity-modulated radiation therapy. We reviewed the past and current status of dose calculation algorithms used in the treatment planning system for radiation therapy. The radiation-calculating dose calculation algorithm can be broadly classified into three main groups based on the mechanisms used: (1) factor-based, (2) model-based, and (3) principle-based. Factor-based algorithms are a type of empirical dose calculation that interpolates or extrapolates the dose in some basic measurements. Model-based algorithms, represented by the pencil beam convolution, analytical anisotropic, and collapse cone convolution algorithms, use a simplified physical process by using a convolution equation that convolutes the primary photon energy fluence with a kernel. Model-based algorithms allowing side scattering when beams are transmitted to the heterogeneous media provide more precise dose calculation results than correction-based algorithms. Principle-based algorithms, represented by Monte Carlo dose calculations, simulate all real physical processes involving beam particles during transportation; therefore, dose calculations are accurate but time consuming. For approximately 70 years, through the development of dose calculation algorithms and computing technology, the accuracy of dose calculation seems close to our clinical needs. Next-generation dose calculation algorithms are expected to include biologically equivalent doses or biologically effective doses, and doctors expect to be able to use them to improve the quality of treatment in the near future.

DATCN: Deep Attention fused Temporal Convolution Network for the prediction of monitoring indicators in the tunnel

  • Bowen, Du;Zhixin, Zhang;Junchen, Ye;Xuyan, Tan;Wentao, Li;Weizhong, Chen
    • Smart Structures and Systems
    • /
    • v.30 no.6
    • /
    • pp.601-612
    • /
    • 2022
  • The prediction of structural mechanical behaviors is vital important to early perceive the abnormal conditions and avoid the occurrence of disasters. Especially for underground engineering, complex geological conditions make the structure more prone to disasters. Aiming at solving the problems existing in previous studies, such as incomplete consideration factors and can only predict the continuous performance, the deep attention fused temporal convolution network (DATCN) is proposed in this paper to predict the spatial mechanical behaviors of structure, which integrates both the temporal effect and spatial effect and realize the cross-time prediction. The temporal convolution network (TCN) and self-attention mechanism are employed to learn the temporal correlation of each monitoring point and the spatial correlation among different points, respectively. Then, the predicted result obtained from DATCN is compared with that obtained from some classical baselines, including SVR, LR, MLP, and RNNs. Also, the parameters involved in DATCN are discussed to optimize the prediction ability. The prediction result demonstrates that the proposed DATCN model outperforms the state-of-the-art baselines. The prediction accuracy of DATCN model after 24 hours reaches 90 percent. Also, the performance in last 14 hours plays a domain role to predict the short-term behaviors of the structure. As a study case, the proposed model is applied in an underwater shield tunnel to predict the stress variation of concrete segments in space.

A Novel Face Recognition Algorithm based on the Deep Convolution Neural Network and Key Points Detection Jointed Local Binary Pattern Methodology

  • Huang, Wen-zhun;Zhang, Shan-wen
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.1
    • /
    • pp.363-372
    • /
    • 2017
  • This paper presents a novel face recognition algorithm based on the deep convolution neural network and key point detection jointed local binary pattern methodology to enhance the accuracy of face recognition. We firstly propose the modified face key feature point location detection method to enhance the traditional localization algorithm to better pre-process the original face images. We put forward the grey information and the color information with combination of a composite model of local information. Then, we optimize the multi-layer network structure deep learning algorithm using the Fisher criterion as reference to adjust the network structure more accurately. Furthermore, we modify the local binary pattern texture description operator and combine it with the neural network to overcome drawbacks that deep neural network could not learn to face image and the local characteristics. Simulation results demonstrate that the proposed algorithm obtains stronger robustness and feasibility compared with the other state-of-the-art algorithms. The proposed algorithm also provides the novel paradigm for the application of deep learning in the field of face recognition which sets the milestone for further research.

Semantic crack-image identification framework for steel structures using atrous convolution-based Deeplabv3+ Network

  • Ta, Quoc-Bao;Dang, Ngoc-Loi;Kim, Yoon-Chul;Kam, Hyeon-Dong;Kim, Jeong-Tae
    • Smart Structures and Systems
    • /
    • v.30 no.1
    • /
    • pp.17-34
    • /
    • 2022
  • For steel structures, fatigue cracks are critical damage induced by long-term cycle loading and distortion effects. Vision-based crack detection can be a solution to ensure structural integrity and performance by continuous monitoring and non-destructive assessment. A critical issue is to distinguish cracks from other features in captured images which possibly consist of complex backgrounds such as handwritings and marks, which were made to record crack patterns and lengths during periodic visual inspections. This study presents a parametric study on image-based crack identification for orthotropic steel bridge decks using captured images with complicated backgrounds. Firstly, a framework for vision-based crack segmentation using the atrous convolution-based Deeplapv3+ network (ACDN) is designed. Secondly, features on crack images are labeled to build three databanks by consideration of objects in the backgrounds. Thirdly, evaluation metrics computed from the trained ACDN models are utilized to evaluate the effects of obstacles on crack detection results. Finally, various training parameters, including image sizes, hyper-parameters, and the number of training images, are optimized for the ACDN model of crack detection. The result demonstrated that fatigue cracks could be identified by the trained ACDN models, and the accuracy of the crack-detection result was improved by optimizing the training parameters. It enables the applicability of the vision-based technique for early detecting tiny fatigue cracks in steel structures.