• Title/Summary/Keyword: ResNet-Transformer

Search Result 9, Processing Time 0.022 seconds

Enhancing Search Functionality for Website Posts and Product Reviews: Improving BM25 Ranking Algorithm Performance Using the ResNet-Transformer Model

  • Hong-Ju Yang;In-Yeop Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.11
    • /
    • pp.67-77
    • /
    • 2024
  • This paper proposes a method to improve the search functionality for website posts and product reviews by using a ResNet-Transformer model in conjunction with the BM25 ranking algorithm. BM25 is a widely used algorithm in text-based search that ranks documents by evaluating their relevance to user queries. However, it has limitations in capturing local features of words and understanding the context of a sentences. To address these issues, this study applies a classification approach that combines the ResNet model, which excels at extracting local features, with the Transformer model, known for its strong contextual understanding, as weights for BM25. Experimental results demonstrate that the proposed method improves the nDCG metric by 9.38% and the aP@5 metric by 11.82% compared to BM25 alone. This suggests that implementing this method in search engines across various websites can provide more accurate results for post and review searches.

Textile material classification in clothing images using deep learning (딥러닝을 이용한 의류 이미지의 텍스타일 소재 분류)

  • So Young Lee;Hye Seon Jeong;Yoon Sung Choi;Choong Kwon Lee
    • Smart Media Journal
    • /
    • v.12 no.7
    • /
    • pp.43-51
    • /
    • 2023
  • As online transactions increase, the image of clothing has a great influence on consumer purchasing decisions. The importance of image information for clothing materials has been emphasized, and it is important for the fashion industry to analyze clothing images and grasp the materials used. Textile materials used for clothing are difficult to identify with the naked eye, and much time and cost are consumed in sorting. This study aims to classify the materials of textiles from clothing images based on deep learning algorithms. Classifying materials can help reduce clothing production costs, increase the efficiency of the manufacturing process, and contribute to the service of recommending products of specific materials to consumers. We used machine vision-based deep learning algorithms ResNet and Vision Transformer to classify clothing images. A total of 760,949 images were collected and preprocessed to detect abnormal images. Finally, a total of 167,299 clothing images, 19 textile labels and 20 fabric labels were used. We used ResNet and Vision Transformer to classify clothing materials and compared the performance of the algorithms with the Top-k Accuracy Score metric. As a result of comparing the performance, the Vision Transformer algorithm outperforms ResNet.

A Complex Valued ResNet Network Based Object Detection Algorithm in SAR Images (복소수 ResNet 네트워크 기반의 SAR 영상 물체 인식 알고리즘)

  • Hwang, Insu
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.4
    • /
    • pp.392-400
    • /
    • 2021
  • Unlike optical equipment, SAR(Synthetic Aperture Radar) has the advantage of obtaining images in all weather, and object detection in SAR images is an important issue. Generally, deep learning-based object detection was mainly performed in real-valued network using only amplitude of SAR image. Since the SAR image is complex data consist of amplitude and phase data, a complex-valued network is required. In this paper, a complex-valued ResNet network is proposed. SAR image object detection was performed by combining the ROI transformer detector specialized for aerial image detection and the proposed complex-valued ResNet. It was confirmed that higher accuracy was obtained in complex-valued network than in existing real-valued network.

Efficient Recognition of Easily-confused Chinese Herbal Slices Images Using Enhanced ResNeSt

  • Qi Zhang;Jinfeng Ou;Huaying Zhou
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.8
    • /
    • pp.2103-2118
    • /
    • 2024
  • Chinese herbal slices (CHS) automated recognition based on computer vision plays a critical role in the practical application of intelligent Chinese medicine. Due to the complexity and similarity of herbal images, identifying Chinese herbal slices is still a challenging task. Especially, easily-confused CHS have higher inter-class and intra-class complexity and similarity issues, the existing deep learning models are less adaptable to identify them efficiently. To comprehensively address these problems, a novel tiny easily-confused CHS dataset has been built firstly, which includes six pairs of twelve categories with about 2395 samples. Furthermore, we propose a ResNeSt-CHS model that combines multilevel perception fusion (MPF) and perceptive sparse fusion (PSF) blocks for efficiently recognizing easilyconfused CHS images. To verify the superiority of the ResNeSt-CHS and the effectiveness of our dataset, experiments have been employed, validating that the ResNeSt-CHS is optimal for easily-confused CHS recognition, with 2.1% improvement of the original ResNeSt model. Additionally, the results indicate that ResNeSt-CHS is applied on a relatively small-scale dataset yet high accuracy. This model has obtained state-of-the-art easily-confused CHS classification performance, with accuracy of 90.8%, far beyond other models (EfficientNet, Transformer, and ResNeSt, etc) in terms of evaluation criteria.

Disease Diagnosis on Fundus Images: A Cross-Dataset Study (망막 이미지에서의 질병 진단: 교차 데이터셋 연구)

  • Van-Nguyen Pham;Sun Xiaoying;Hyunseung Choo
    • Annual Conference of KIPS
    • /
    • 2024.10a
    • /
    • pp.754-755
    • /
    • 2024
  • This paper presents a comparative study of five deep learning models-ResNet50, DenseNet121, Vision Transformer (ViT), Swin Transformer (SwinT), and CoatNet-on the task of multi-label classification of fundus images for ocular diseases. The models were trained on the Ocular Disease Recognition (ODIR) dataset and validated on the Retinal Fundus Multi-disease Image Dataset (RFMiD), with a focus on five disease classes: diabetic retinopathy, glaucoma, cataract, age-related macular degeneration, and myopia. The performance was evaluated using the area under the receiver operating characteristic curve (AUC-ROC) score for each class. CoatNet achieved the best AUC-ROC scores for diabetic retinopathy, glaucoma, cataract, and myopia, while ViT outperformed CoatNet for age-related macular degeneration. Overall, CoatNet exhibited the highest average performance across all classes, highlighting the effectiveness of hybrid architectures in medical image classification. These findings suggest that CoatNet may be a promising model for multi-label classification of fundus images in cross-dataset scenarios.

Performance Evaluation of Vision Transformer-based Pneumonia Detection Model using Chest X-ray Images (흉부 X-선 영상을 이용한 Vision transformer 기반 폐렴 진단 모델의 성능 평가)

  • Junyong Chang;Youngeun Choi;Seungwan Lee
    • Journal of the Korean Society of Radiology
    • /
    • v.18 no.5
    • /
    • pp.541-549
    • /
    • 2024
  • The various structures of artificial neural networks, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), have been extensively studied and served as the backbone of numerous models. Among these, a transformer architecture has demonstrated its potential for natural language processing and become a subject of in-depth research. Currently, the techniques can be adapted for image processing through the modifications of its internal structure, leading to the development of Vision transformer (ViT) models. The ViTs have shown high accuracy and performance with large data-sets. This study aims to develop a ViT-based model for detecting pneumonia using chest X-ray images and quantitatively evaluate its performance. The various architectures of the ViT-based model were constructed by varying the number of encoder blocks, and different patch sizes were applied for network training. Also, the performance of the ViT-based model was compared to the CNN-based models, such as VGGNet, GoogLeNet, and ResNet. The results showed that the traninig efficiency and accuracy of the ViT-based model depended on the number of encoder blocks and the patch size, and the F1 scores of the ViT-based model ranged from 0.875 to 0.919. The training effeciency of the ViT-based model with a large patch size was superior to the CNN-based models, and the pneumonia detection accuracy of the ViT-based model was higher than that of the VGGNet. In conclusion, the ViT-based model can be potentially used for pneumonia detection using chest X-ray images, and the clinical availability of the ViT-based model would be improved by this study.

Determination of High-pass Filter Frequency with Deep Learning for Ground Motion (딥러닝 기반 지반운동을 위한 하이패스 필터 주파수 결정 기법)

  • Lee, Jin Koo;Seo, JeongBeom;Jeon, SeungJin
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.28 no.4
    • /
    • pp.183-191
    • /
    • 2024
  • Accurate seismic vulnerability assessment requires high quality and large amounts of ground motion data. Ground motion data generated from time series contains not only the seismic waves but also the background noise. Therefore, it is crucial to determine the high-pass cut-off frequency to reduce the background noise. Traditional methods for determining the high-pass filter frequency are based on human inspection, such as comparing the noise and the signal Fourier Amplitude Spectrum (FAS), f2 trend line fitting, and inspection of the displacement curve after filtering. However, these methods are subject to human error and unsuitable for automating the process. This study used a deep learning approach to determine the high-pass filter frequency. We used the Mel-spectrogram for feature extraction and mixup technique to overcome the lack of data. We selected convolutional neural network (CNN) models such as ResNet, DenseNet, and EfficientNet for transfer learning. Additionally, we chose ViT and DeiT for transformer-based models. The results showed that ResNet had the highest performance with R2 (the coefficient of determination) at 0.977 and the lowest mean absolute error (MAE) and RMSE (root mean square error) at 0.006 and 0.074, respectively. When applied to a seismic event and compared to the traditional methods, the determination of the high-pass filter frequency through the deep learning method showed a difference of 0.1 Hz, which demonstrates that it can be used as a replacement for traditional methods. We anticipate that this study will pave the way for automating ground motion processing, which could be applied to the system to handle large amounts of data efficiently.

Knowledge Distillation based-on Internal/External Correlation Learning

  • Hun-Beom Bak;Seung-Hwan Bae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.31-39
    • /
    • 2023
  • In this paper, we propose an Internal/External Knowledge Distillation (IEKD), which utilizes both external correlations between feature maps of heterogeneous models and internal correlations between feature maps of the same model for transferring knowledge from a teacher model to a student model. To achieve this, we transform feature maps into a sequence format and extract new feature maps suitable for knowledge distillation by considering internal and external correlations through a transformer. We can learn both internal and external correlations by distilling the extracted feature maps and improve the accuracy of the student model by utilizing the extracted feature maps with feature matching. To demonstrate the effectiveness of our proposed knowledge distillation method, we achieved 76.23% Top-1 image classification accuracy on the CIFAR-100 dataset with the "ResNet-32×4/VGG-8" teacher and student combination and outperformed the state-of-the-art KD methods.

A Study about Learning Graph Representation on Farmhouse Apple Quality Images with Graph Transformer (그래프 트랜스포머 기반 농가 사과 품질 이미지의 그래프 표현 학습 연구)

  • Ji Hun Bae;Ju Hwan Lee;Gwang Hyun Yu;Gyeong Ju Kwon;Jin Young Kim
    • Smart Media Journal
    • /
    • v.12 no.1
    • /
    • pp.9-16
    • /
    • 2023
  • Recently, a convolutional neural network (CNN) based system is being developed to overcome the limitations of human resources in the apple quality classification of farmhouse. However, since convolutional neural networks receive only images of the same size, preprocessing such as sampling may be required, and in the case of oversampling, information loss of the original image such as image quality degradation and blurring occurs. In this paper, in order to minimize the above problem, to generate a image patch based graph of an original image and propose a random walk-based positional encoding method to apply the graph transformer model. The above method continuously learns the position embedding information of patches which don't have a positional information based on the random walk algorithm, and finds the optimal graph structure by aggregating useful node information through the self-attention technique of graph transformer model. Therefore, it is robust and shows good performance even in a new graph structure of random node order and an arbitrary graph structure according to the location of an object in an image. As a result, when experimented with 5 apple quality datasets, the learning accuracy was higher than other GNN models by a minimum of 1.3% to a maximum of 4.7%, and the number of parameters was 3.59M, which was about 15% less than the 23.52M of the ResNet18 model. Therefore, it shows fast reasoning speed according to the reduction of the amount of computation and proves the effect.