• Title/Summary/Keyword: 이미지 데이터 셋

Search Result 302, Processing Time 0.026 seconds

Raindrop Removal and Background Information Recovery in Coastal Wave Video Imagery using Generative Adversarial Networks (적대적생성신경망을 이용한 연안 파랑 비디오 영상에서의 빗방울 제거 및 배경 정보 복원)

  • Huh, Dong;Kim, Jaeil;Kim, Jinah
    • Journal of the Korea Computer Graphics Society
    • /
    • v.25 no.5
    • /
    • pp.1-9
    • /
    • 2019
  • In this paper, we propose a video enhancement method using generative adversarial networks to remove raindrops and restore the background information on the removed region in the coastal wave video imagery distorted by raindrops during rainfall. Two experimental models are implemented: Pix2Pix network widely used for image-to-image translation and Attentive GAN, which is currently performing well for raindrop removal on a single images. The models are trained with a public dataset of paired natural images with and without raindrops and the trained models are evaluated their performance of raindrop removal and background information recovery of rainwater distortion of coastal wave video imagery. In order to improve the performance, we have acquired paired video dataset with and without raindrops at the real coast and conducted transfer learning to the pre-trained models with those new dataset. The performance of fine-tuned models is improved by comparing the results from pre-trained models. The performance is evaluated using the peak signal-to-noise ratio and structural similarity index and the fine-tuned Pix2Pix network by transfer learning shows the best performance to reconstruct distorted coastal wave video imagery by raindrops.

Object Detection Performance Analysis between On-GPU and On-Board Analysis for Military Domain Images

  • Du-Hwan Hur;Dae-Hyeon Park;Deok-Woong Kim;Jae-Yong Baek;Jun-Hyeong Bak;Seung-Hwan Bae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.8
    • /
    • pp.157-164
    • /
    • 2024
  • In this paper, we propose a discussion that the feasibility of deploying a deep learning-based detector on the resource-limited board. Although many studies evaluate the detector on machines with high-performed GPUs, evaluation on the board with limited computation resources is still insufficient. Therefore, in this work, we implement the deep-learning detectors and deploy them on the compact board by parsing and optimizing a detector. To figure out the performance of deep learning based detectors on limited resources, we monitor the performance of several detectors with different H/W resource. On COCO detection datasets, we compare and analyze the evaluation results of detection model in On-Board and the detection model in On-GPU in terms of several metrics with mAP, power consumption, and execution speed (FPS). To demonstrate the effect of applying our detector for the military area, we evaluate them on our dataset consisting of thermal images considering the flight battle scenarios. As a results, we investigate the strength of deep learning-based on-board detector, and show that deep learning-based vision models can contribute in the flight battle scenarios.

Comparison of environmental sound classification performance of convolutional neural networks according to audio preprocessing methods (오디오 전처리 방법에 따른 콘벌루션 신경망의 환경음 분류 성능 비교)

  • Oh, Wongeun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.3
    • /
    • pp.143-149
    • /
    • 2020
  • This paper presents the effect of the feature extraction methods used in the audio preprocessing on the classification performance of the Convolutional Neural Networks (CNN). We extract mel spectrogram, log mel spectrogram, Mel Frequency Cepstral Coefficient (MFCC), and delta MFCC from the UrbanSound8K dataset, which is widely used in environmental sound classification studies. Then we scale the data to 3 distributions. Using the data, we test four CNNs, VGG16, and MobileNetV2 networks for performance assessment according to the audio features and scaling. The highest recognition rate is achieved when using the unscaled log mel spectrum as the audio features. Although this result is not appropriate for all audio recognition problems but is useful for classifying the environmental sounds included in the Urbansound8K.

Optimal Algorithm and Number of Neurons in Deep Learning (딥러닝 학습에서 최적의 알고리즘과 뉴론수 탐색)

  • Jang, Ha-Young;You, Eun-Kyung;Kim, Hyeock-Jin
    • Journal of Digital Convergence
    • /
    • v.20 no.4
    • /
    • pp.389-396
    • /
    • 2022
  • Deep Learning is based on a perceptron, and is currently being used in various fields such as image recognition, voice recognition, object detection, and drug development. Accordingly, a variety of learning algorithms have been proposed, and the number of neurons constituting a neural network varies greatly among researchers. This study analyzed the learning characteristics according to the number of neurons of the currently used SGD, momentum methods, AdaGrad, RMSProp, and Adam methods. To this end, a neural network was constructed with one input layer, three hidden layers, and one output layer. ReLU was applied to the activation function, cross entropy error (CEE) was applied to the loss function, and MNIST was used for the experimental dataset. As a result, it was concluded that the number of neurons 100-300, the algorithm Adam, and the number of learning (iteraction) 200 would be the most efficient in deep learning learning. This study will provide implications for the algorithm to be developed and the reference value of the number of neurons given new learning data in the future.

Skin Disease Classification Technique Based on Convolutional Neural Network Using Deep Metric Learning (Deep Metric Learning을 활용한 합성곱 신경망 기반의 피부질환 분류 기술)

  • Kim, Kang Min;Kim, Pan-Koo;Chun, Chanjun
    • Smart Media Journal
    • /
    • v.10 no.4
    • /
    • pp.45-54
    • /
    • 2021
  • The skin is the body's first line of defense against external infection. When a skin disease strikes, the skin's protective role is compromised, necessitating quick diagnosis and treatment. Recently, as artificial intelligence has advanced, research for technical applications has been done in a variety of sectors, including dermatology, to reduce the rate of misdiagnosis and obtain quick treatment using artificial intelligence. Although previous studies have diagnosed skin diseases with low incidence, this paper proposes a method to classify common illnesses such as warts and corns using a convolutional neural network. The data set used consists of 3 classes and 2,515 images, but there is a problem of lack of training data and class imbalance. We analyzed the performance using a deep metric loss function and a cross-entropy loss function to train the model. When comparing that in terms of accuracy, recall, F1 score, and accuracy, the former performed better.

Utilizing Mean Teacher Semi-Supervised Learning for Robust Pothole Image Classification

  • Inki Kim;Beomjun Kim;Jeonghwan Gwak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.5
    • /
    • pp.17-28
    • /
    • 2023
  • Potholes that occur on paved roads can have fatal consequences for vehicles traveling at high speeds and may even lead to fatalities. While manual detection of potholes using human labor is commonly used to prevent pothole-related accidents, it is economically and temporally inefficient due to the exposure of workers on the road and the difficulty in predicting potholes in certain categories. Therefore, completely preventing potholes is nearly impossible, and even preventing their formation is limited due to the influence of ground conditions closely related to road environments. Additionally, labeling work guided by experts is required for dataset construction. Thus, in this paper, we utilized the Mean Teacher technique, one of the semi-supervised learning-based knowledge distillation methods, to achieve robust performance in pothole image classification even with limited labeled data. We demonstrated this using performance metrics and GradCAM, showing that when using semi-supervised learning, 15 pre-trained CNN models achieved an average accuracy of 90.41%, with a minimum of 2% and a maximum of 9% performance difference compared to supervised learning.

Copyright Protection for Fire Video Images using an Effective Watermarking Method (효과적인 워터마킹 기법을 사용한 화재 비디오 영상의 저작권 보호)

  • Nguyen, Truc;Kim, Jong-Myon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.8
    • /
    • pp.579-588
    • /
    • 2013
  • This paper proposes an effective watermarking approach for copyright protection of fire video images. The proposed watermarking approach efficiently utilizes the inherent characteristics of fire data with respect to color and texture by using a gray level co-occurrence matrix (GLCM) and fuzzy c-means (FCM) clustering. GLCM is used to generate a texture feature dataset by computing energy and homogeneity properties for each candidate fire image block. FCM is used to segment color of the fire image and to select fire texture blocks for embedding watermarks. Each selected block is then decomposed into a one-level wavelet structure with four subbands [LL, LH, HL, HH] using a discrete wavelet transform (DWT), and LH subband coefficients with a gain factor are selected for embedding watermark, where the visibility of the image does not affect. Experimental results show that the proposed watermarking approach achieves about 48 dB of high peak-signal-to-noise ratio (PSNR) and 1.6 to 2.0 of low M-singular value decomposition (M-SVD) values. In addition, the proposed approach outperforms conventional image watermarking approach in terms of normalized correlation (NC) values against several image processing attacks including noise addition, filtering, cropping, and JPEG compression.

Comparative Study of Fish Detection and Classification Performance Using the YOLOv8-Seg Model (YOLOv8-Seg 모델을 이용한 어류 탐지 및 분류 성능 비교연구)

  • Sang-Yeup Jin;Heung-Bae Choi;Myeong-Soo Han;Hyo-tae Lee;Young-Tae Son
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.30 no.2
    • /
    • pp.147-156
    • /
    • 2024
  • The sustainable management and enhancement of marine resources are becoming increasingly important issues worldwide. This study was conducted in response to these challenges, focusing on the development and performance comparison of fish detection and classification models as part of a deep learning-based technique for assessing the effectiveness of marine resource enhancement projects initiated by the Korea Fisheries Resources Agency. The aim was to select the optimal model by training various sizes of YOLOv8-Seg models on a fish image dataset and comparing each performance metric. The dataset used for model construction consisted of 36,749 images and label files of 12 different species of fish, with data diversity enhanced through the application of augmentation techniques during training. When training and validating five different YOLOv8-Seg models under identical conditions, the medium-sized YOLOv8m-Seg model showed high learning efficiency and excellent detection and classification performance, with the shortest training time of 13 h and 12 min, an of 0.933, and an inference speed of 9.6 ms. Considering the balance between each performance metric, this was deemed the most efficient model for meeting real-time processing requirements. The use of such real-time fish detection and classification models could enable effective surveys of marine resource enhancement projects, suggesting the need for ongoing performance improvements and further research.

Design and Implementation of an HTML Converter Supporting Frame for the Wireless Internet (무선 인터넷을 위한 프레임 지원 HTML 변환기의 설계 및 구현)

  • Han, Jin-Seop;Park, Byung-Joon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.6
    • /
    • pp.1-10
    • /
    • 2005
  • This paper describes the implementation of HTML converter for wireless internet access in wireless application protocol environment. The implemented HTML converter consists of the contents conversion module, the conversion rule set, the WML file generation module, and the frame contents reformatting module. Plain text contents are converted to WML contents through one by one mapping, referring to the converting rule set in the contents converting module. For frame contents, the first frameset sources are parsed and the request messages are reconstructed with all the file names, reconnecting to web server as much as the number of files to receive each documents and append to the first document. Finally, after the process of reformatting in the frame contents reformatting module, frame contents are converted to WML's table contents. For image map contents, the image map related tags are parsed and the names of html documents which are linked to any sites are extracted to be replaced with WML contents data and linked to those contents. The proposed conversion method for frame contents provides a better interface for the users convenience and interactions compared to the existing converters. Conversion of image maps in our converter is one of the features not currently supported by other converters.

Quantitative Evaluations of Deep Learning Models for Rapid Building Damage Detection in Disaster Areas (재난지역에서의 신속한 건물 피해 정도 감지를 위한 딥러닝 모델의 정량 평가)

  • Ser, Junho;Yang, Byungyun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.40 no.5
    • /
    • pp.381-391
    • /
    • 2022
  • This paper is intended to find one of the prevailing deep learning models that are a type of AI (Artificial Intelligence) that helps rapidly detect damaged buildings where disasters occur. The models selected are SSD-512, RetinaNet, and YOLOv3 which are widely used in object detection in recent years. These models are based on one-stage detector networks that are suitable for rapid object detection. These are often used for object detection due to their advantages in structure and high speed but not for damaged building detection in disaster management. In this study, we first trained each of the algorithms on xBD dataset that provides the post-disaster imagery with damage classification labels. Next, the three models are quantitatively evaluated with the mAP(mean Average Precision) and the FPS (Frames Per Second). The mAP of YOLOv3 is recorded at 34.39%, and the FPS reached 46. The mAP of RetinaNet recorded 36.06%, which is 1.67% higher than YOLOv3, but the FPS is one-third of YOLOv3. SSD-512 received significantly lower values than the results of YOLOv3 on two quantitative indicators. In a disaster situation, a rapid and precise investigation of damaged buildings is essential for effective disaster response. Accordingly, it is expected that the results obtained through this study can be effectively used for the rapid response in disaster management.