• Title/Summary/Keyword: Pretrained

Search Result 96, Processing Time 0.027 seconds

Analysis of the effect of class classification learning on the saliency map of Self-Supervised Transformer (클래스분류 학습이 Self-Supervised Transformer의 saliency map에 미치는 영향 분석)

  • Kim, JaeWook;Kim, Hyeoncheol
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.67-70
    • /
    • 2022
  • NLP 분야에서 적극 활용되기 시작한 Transformer 모델을 Vision 분야에서 적용하기 시작하면서 object detection과 segmentation 등 각종 분야에서 기존 CNN 기반 모델의 정체된 성능을 극복하며 향상되고 있다. 또한, label 데이터 없이 이미지들로만 자기지도학습을 한 ViT(Vision Transformer) 모델을 통해 이미지에 포함된 여러 중요한 객체의 영역을 검출하는 saliency map을 추출할 수 있게 되었으며, 이로 인해 ViT의 자기지도학습을 통한 object detection과 semantic segmentation 연구가 활발히 진행되고 있다. 본 논문에서는 ViT 모델 뒤에 classifier를 붙인 모델에 일반 학습한 모델과 자기지도학습의 pretrained weight을 사용해서 전이학습한 모델의 시각화를 통해 각 saliency map들을 비교 분석하였다. 이를 통해, 클래스 분류 학습 기반 전이학습이 transformer의 saliency map에 미치는 영향을 확인할 수 있었다.

  • PDF

An Ensemble Model for Credit Default Discrimination: Incorporating BERT-based NLP and Transformer

  • Sophot Ky;Ju-Hong Lee
    • Annual Conference of KIPS
    • /
    • 2023.05a
    • /
    • pp.624-626
    • /
    • 2023
  • Credit scoring is a technique used by financial institutions to assess the creditworthiness of potential borrowers. This involves evaluating a borrower's credit history to predict the likelihood of defaulting on a loan. This paper presents an ensemble of two Transformer based models within a framework for discriminating the default risk of loan applications in the field of credit scoring. The first model is FinBERT, a pretrained NLP model to analyze sentiment of financial text. The second model is FT-Transformer, a simple adaptation of the Transformer architecture for the tabular domain. Both models are trained on the same underlying data set, with the only difference being the representation of the data. This multi-modal approach allows us to leverage the unique capabilities of each model and potentially uncover insights that may not be apparent when using a single model alone. We compare our model with two famous ensemble-based models, Random Forest and Extreme Gradient Boosting.

MAdapter: A Refinement of Adapters by Augmenting Efficient Middle Layers (MAdapter: 효율적인 중간 층 도입을 통한 Adapter 구조 개선)

  • Jinhyeon Kim;Taeuk Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.517-521
    • /
    • 2023
  • 최근 거대 언어모델의 등장과 동시에, 많은 매개변수를 효과적으로 학습하는 방법인 효율적인 매개변수 미세조정(Parameter Efficient Fine-Tuning) 연구가 활발히 진행되고 있다. 이 중에서 Adapter는 사전학습 언어모델(Pretrained Language Models)에 몇 개의 추가 병목 구조 모듈을 삽입하여 이를 학습하는 방식으로, 등장한 이후 다양한 연구 영역에서 주목받고 있다. 그러나 몇몇 연구에서는 병목 차원을 증가시켜 미세 조정보다 더 나은 성능을 얻는다는 주장이 나오면서, 원래의 의도와는 다른 방향으로 발전하고 있다는 의견도 있다. 이러한 맥락에서, 본 연구에서는 기존의 Adapter 구조를 개선한 MAdapter를 제안한다. MAdapter는 본래 Adapter에 중간 층을 추가하되 학습 가능한 매개변수의 수는 오히려 줄이는 방법으로, 전체 매개변수 수 대비 1% 내외 만을 학습에 활용하며, Adapter 대비 절반 정도의 매개변수만을 사용하여 기존 결과와 비슷하거나 더 나은 성능을 얻을 수 있는 것을 확인할 수 있다. 또한, 병목차원 크기 비교와 중간 층 개수 분석을 통한 최적의 MAdapter 구조를 찾고, 이로써 효율적인 매개변수 미세조정 방법을 제시한다.

  • PDF

Multimodal Supervised Contrastive Learning for Crop Disease Diagnosis (멀티 모달 지도 대조 학습을 이용한 농작물 병해 진단 예측 방법)

  • Hyunseok Lee;Doyeob Yeo;Gyu-Sung Ham;Kanghan Oh
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.6
    • /
    • pp.285-292
    • /
    • 2023
  • With the wide spread of smart farms and the advancements in IoT technology, it is easy to obtain additional data in addition to crop images. Consequently, deep learning-based crop disease diagnosis research utilizing multimodal data has become important. This study proposes a crop disease diagnosis method using multimodal supervised contrastive learning by expanding upon the multimodal self-supervised learning. RandAugment method was used to augment crop image and time series of environment data. These augmented data passed through encoder and projection head for each modality, yielding low-dimensional features. Subsequently, the proposed multimodal supervised contrastive loss helped features from the same class get closer while pushing apart those from different classes. Following this, the pretrained model was fine-tuned for crop disease diagnosis. The visualization of t-SNE result and comparative assessments of crop disease diagnosis performance substantiate that the proposed method has superior performance than multimodal self-supervised learning.

Transfer-learning-based classification of pathological brain magnetic resonance images

  • Serkan Savas;Cagri Damar
    • ETRI Journal
    • /
    • v.46 no.2
    • /
    • pp.263-276
    • /
    • 2024
  • Different diseases occur in the brain. For instance, hereditary and progressive diseases affect and degenerate the white matter. Although addressing, diagnosing, and treating complex abnormalities in the brain is challenging, different strategies have been presented with significant advances in medical research. With state-of-art developments in artificial intelligence, new techniques are being applied to brain magnetic resonance images. Deep learning has been recently used for the segmentation and classification of brain images. In this study, we classified normal and pathological brain images using pretrained deep models through transfer learning. The EfficientNet-B5 model reached the highest accuracy of 98.39% on real data, 91.96% on augmented data, and 100% on pathological data. To verify the reliability of the model, fivefold cross-validation and a two-tier cross-test were applied. The results suggest that the proposed method performs reasonably on the classification of brain magnetic resonance images.

Transformer-based reranking for improving Korean morphological analysis systems

  • Jihee Ryu;Soojong Lim;Oh-Woog Kwon;Seung-Hoon Na
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.137-153
    • /
    • 2024
  • This study introduces a new approach in Korean morphological analysis combining dictionary-based techniques with Transformer-based deep learning models. The key innovation is the use of a BERT-based reranking system, significantly enhancing the accuracy of traditional morphological analysis. The method generates multiple suboptimal paths, then employs BERT models for reranking, leveraging their advanced language comprehension. Results show remarkable performance improvements, with the first-stage reranking achieving over 20% improvement in error reduction rate compared with existing models. The second stage, using another BERT variant, further increases this improvement to over 30%. This indicates a significant leap in accuracy, validating the effectiveness of merging dictionary-based analysis with contemporary deep learning. The study suggests future exploration in refined integrations of dictionary and deep learning methods as well as using probabilistic models for enhanced morphological analysis. This hybrid approach sets a new benchmark in the field and offers insights for similar challenges in language processing applications.

KMSAV: Korean multi-speaker spontaneous audiovisual dataset

  • Kiyoung Park;Changhan Oh;Sunghee Dong
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.71-81
    • /
    • 2024
  • Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open-source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state-of-the-art ASR and AVSR techniques, capitalizing on both pretrained models and fine-tuning processes. After fine-tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.

Large Multimodal Model for Context-aware Construction Safety Monitoring

  • Taegeon Kim;Seokhwan Kim;Minkyu Koo;Minwoo Jeong;Hongjo Kim
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.415-422
    • /
    • 2024
  • Recent advances in construction automation have led to increased use of deep learning-based computer vision technology for construction monitoring. However, monitoring systems based on supervised learning struggle with recognizing complex risk factors in construction environments, highlighting the need for adaptable solutions. Large multimodal models, pretrained on extensive image-text datasets, present a promising solution with their capability to recognize diverse objects and extract semantic information. This paper proposes a methodology that generates training data for multimodal models, including safety-centric descriptions using GPT-4V, and fine-tunes the LLaVA model using the LoRA method. Experimental results from seven construction site hazard scenarios show that the fine-tuned model accurately assesses safety status in images. These findings underscore the proposed approach's effectiveness in enhancing construction site safety monitoring and illustrate the potential of large multimodal models to tackle domain-specific challenges.

AI Fire Detection & Notification System

  • Na, You-min;Hyun, Dong-hwan;Park, Do-hyun;Hwang, Se-hyun;Lee, Soo-hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.63-71
    • /
    • 2020
  • In this paper, we propose a fire detection technology using YOLOv3 and EfficientDet, the most reliable artificial intelligence detection algorithm recently, an alert service that simultaneously transmits four kinds of notifications: text, web, app and e-mail, and an AWS system that links fire detection and notification service. There are two types of our highly accurate fire detection algorithms; the fire detection model based on YOLOv3, which operates locally, used more than 2000 fire data and learned through data augmentation, and the EfficientDet, which operates in the cloud, has conducted transfer learning on the pretrained model. Four types of notification services were established using AWS service and FCM service; in the case of the web, app, and mail, notifications were received immediately after notification transmission, and in the case of the text messaging system through the base station, the delay time was fast enough within one second. We proved the accuracy of our fire detection technology through fire detection experiments using the fire video, and we also measured the time of fire detection and notification service to check detecting time and notification time. Our AI fire detection and notification service system in this paper is expected to be more accurate and faster than past fire detection systems, which will greatly help secure golden time in the event of fire accidents.

Building robust Korean speech recognition model by fine-tuning large pretrained model (대형 사전훈련 모델의 파인튜닝을 통한 강건한 한국어 음성인식 모델 구축)

  • Changhan Oh;Cheongbin Kim;Kiyoung Park
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.75-82
    • /
    • 2023
  • Automatic speech recognition (ASR) has been revolutionized with deep learning-based approaches, among which self-supervised learning methods have proven to be particularly effective. In this study, we aim to enhance the performance of OpenAI's Whisper model, a multilingual ASR system on the Korean language. Whisper was pretrained on a large corpus (around 680,000 hours) of web speech data and has demonstrated strong recognition performance for major languages. However, it faces challenges in recognizing languages such as Korean, which is not major language while training. We address this issue by fine-tuning the Whisper model with an additional dataset comprising about 1,000 hours of Korean speech. We also compare its performance against a Transformer model that was trained from scratch using the same dataset. Our results indicate that fine-tuning the Whisper model significantly improved its Korean speech recognition capabilities in terms of character error rate (CER). Specifically, the performance improved with increasing model size. However, the Whisper model's performance on English deteriorated post fine-tuning, emphasizing the need for further research to develop robust multilingual models. Our study demonstrates the potential of utilizing a fine-tuned Whisper model for Korean ASR applications. Future work will focus on multilingual recognition and optimization for real-time inference.