• Title/Summary/Keyword: deep similarity

Search Result 227, Processing Time 0.026 seconds

Object Tracking with Histogram weighted Centroid augmented Siamese Region Proposal Network

  • Budiman, Sutanto Edward;Lee, Sukho
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.2
    • /
    • pp.156-165
    • /
    • 2021
  • In this paper, we propose an histogram weighted centroid based Siamese region proposal network for object tracking. The original Siamese region proposal network uses two identical artificial neural networks which take two different images as the inputs and decide whether the same object exist in both input images based on a similarity measure. However, as the Siamese network is pre-trained offline, it experiences many difficulties in the adaptation to various online environments. Therefore, in this paper we propose to incorporate the histogram weighted centroid feature into the Siamese network method to enhance the accuracy of the object tracking. The proposed method uses both the histogram information and the weighted centroid location of the top 10 color regions to decide which of the proposed region should become the next predicted object region.

Similarity Determination of Conversational Utterances Using Field Dataset and Deep Learning Technology (현장 데이터셋과 딥러닝 기술을 이용한 대화 utterance 유사성 판별)

  • Kim, Juhee;Lee, Eunseo;Nam, Jeehee;Koh, Nakyeong;Bae, Sanghwan;Shim, Junho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.568-570
    • /
    • 2022
  • 객체 유사도를 판별하는 기술은 정보 처리의 여러 분야에서 응용되고 있다. 본 연구에서는 현장 자연어 텍스트 데이터셋과 딥러닝 모델을 이용하여 챗봇 등에서 응용되는 데이터 유사성을 판별하고, 해당 모델의 성능을 측정해보았다.

A Study on the Sea Water DTEC Power Generation System of the FPSO (FPSO의 온배수를 활용한 해수 DTEC 발전시스템에 대한 연구)

  • Song, Young-Uk
    • Journal of Navigation and Port Research
    • /
    • v.42 no.1
    • /
    • pp.9-16
    • /
    • 2018
  • The development of limited petroleum resources for use with mankind inevitably explores and seeks to develop oil fields in the deep sea area, under the rise of the oil prices market situation. The use of Oceanic Thermal Energy Conversion (OTEC) technology, which operates the power generation facility using the temperature differences between the deep water and the surface water, is progressing actively as a trend to follow. In this study, the application of the Discharged Thermal Energy Conversion (DTEC) was designed and analyzed under the condition that the supply condition of seawater used in the FPSO installed in the deep sea area is changed up to 400m depth. In this case, it was confirmed that the design of the system that can generate more electric power according to the depth of water is confirmed, by thus applying the DTEC system by taking the cooling water at a deeper water depth than the existing design water depth. The FPSO considers the similarity of the OTEC power generation facilities, and will apply the DTEC system to FPSO in the deep sea area to accumulate technology and the conversion to further utilize the OTEC power generation facilities after the end of life cycle of oil production, which could be a solution to two important issues, namely, resource development and sustainable development.

Utilization of age information for speaker verification using multi-task learning deep neural networks (멀티태스크 러닝 심층신경망을 이용한 화자인증에서의 나이 정보 활용)

  • Kim, Ju-ho;Heo, Hee-Soo;Jung, Jee-weon;Shim, Hye-jin;Kim, Seung-Bin;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.5
    • /
    • pp.593-600
    • /
    • 2019
  • The similarity in tones between speakers can lower the performance of speaker verification. To improve the performance of speaker verification systems, we propose a multi-task learning technique using deep neural network to learn speaker information and age information. Multi-task learning can improve generalization performances, because it helps deep neural networks to prevent hidden layers from overfitting into one task. However, we found in experiments that learning of age information does not work well in the process of learning the deep neural network. In order to improve the learning, we propose a method to dynamically change the objective function weights of speaker identification and age estimation in the learning process. Results show the equal error rate based on RSR2015 evaluation data set, 6.91 % for the speaker verification system without using age information, 6.77 % using age information only, and 4.73 % using age information when weight change technique was applied.

Applying deep learning based super-resolution technique for high-resolution urban flood analysis (고해상도 도시 침수 해석을 위한 딥러닝 기반 초해상화 기술 적용)

  • Choi, Hyeonjin;Lee, Songhee;Woo, Hyuna;Kim, Minyoung;Noh, Seong Jin
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.10
    • /
    • pp.641-653
    • /
    • 2023
  • As climate change and urbanization are causing unprecedented natural disasters in urban areas, it is crucial to have urban flood predictions with high fidelity and accuracy. However, conventional physically- and deep learning-based urban flood modeling methods have limitations that require a lot of computer resources or data for high-resolution flooding analysis. In this study, we propose and implement a method for improving the spatial resolution of urban flood analysis using a deep learning based super-resolution technique. The proposed approach converts low-resolution flood maps by physically based modeling into the high-resolution using a super-resolution deep learning model trained by high-resolution modeling data. When applied to two cases of retrospective flood analysis at part of City of Portland, Oregon, U.S., the results of the 4-m resolution physical simulation were successfully converted into 1-m resolution flood maps through super-resolution. High structural similarity between the super-solution image and the high-resolution original was found. The results show promising image quality loss within an acceptable limit of 22.80 dB (PSNR) and 0.73 (SSIM). The proposed super-resolution method can provide efficient model training with a limited number of flood scenarios, significantly reducing data acquisition efforts and computational costs.

Efficient CT Image Denoising Using Deformable Convolutional AutoEncoder Model

  • Eon Seung, Seong;Seong Hyun, Han;Ji Hye, Heo;Dong Hoon, Lim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.3
    • /
    • pp.25-33
    • /
    • 2023
  • Noise generated during the acquisition and transmission of CT images acts as a factor that degrades image quality. Therefore, noise removal to solve this problem is an important preprocessing process in image processing. In this paper, we remove noise by using a deformable convolutional autoencoder (DeCAE) model in which deformable convolution operation is applied instead of the existing convolution operation in the convolutional autoencoder (CAE) model of deep learning. Here, the deformable convolution operation can extract features of an image in a more flexible area than the conventional convolution operation. The proposed DeCAE model has the same encoder-decoder structure as the existing CAE model, but the encoder is composed of deformable convolutional layers and the decoder is composed of conventional convolutional layers for efficient noise removal. To evaluate the performance of the DeCAE model proposed in this paper, experiments were conducted on CT images corrupted by various noises, that is, Gaussian noise, impulse noise, and Poisson noise. As a result of the performance experiment, the DeCAE model has more qualitative and quantitative measures than the traditional filters, that is, the Mean filter, Median filter, Bilateral filter and NL-means method, as well as the existing CAE models, that is, MAE (Mean Absolute Error), PSNR (Peak Signal-to-Noise Ratio) and SSIM. (Structural Similarity Index Measure) showed excellent results.

Enhancing CT Image Quality Using Conditional Generative Adversarial Networks for Applying Post-mortem Computed Tomography in Forensic Pathology: A Phantom Study (사후전산화단층촬영의 법의병리학 분야 활용을 위한 조건부 적대적 생성 신경망을 이용한 CT 영상의 해상도 개선: 팬텀 연구)

  • Yebin Yoon;Jinhaeng Heo;Yeji Kim;Hyejin Jo;Yongsu Yoon
    • Journal of radiological science and technology
    • /
    • v.46 no.4
    • /
    • pp.315-323
    • /
    • 2023
  • Post-mortem computed tomography (PMCT) is commonly employed in the field of forensic pathology. PMCT was mainly performed using a whole-body scan with a wide field of view (FOV), which lead to a decrease in spatial resolution due to the increased pixel size. This study aims to evaluate the potential for developing a super-resolution model based on conditional generative adversarial networks (CGAN) to enhance the image quality of CT. 1761 low-resolution images were obtained using a whole-body scan with a wide FOV of the head phantom, and 341 high-resolution images were obtained using the appropriate FOV for the head phantom. Of the 150 paired images in the total dataset, which were divided into training set (96 paired images) and validation set (54 paired images). Data augmentation was perform to improve the effectiveness of training by implementing rotations and flips. To evaluate the performance of the proposed model, we used the Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) and Deep Image Structure and Texture Similarity (DISTS). Obtained the PSNR, SSIM, and DISTS values of the entire image and the Medial orbital wall, the zygomatic arch, and the temporal bone, where fractures often occur during head trauma. The proposed method demonstrated improvements in values of PSNR by 13.14%, SSIM by 13.10% and DISTS by 45.45% when compared to low-resolution images. The image quality of the three areas where fractures commonly occur during head trauma has also improved compared to low-resolution images.

Effects of Spatio-temporal Features of Dynamic Hand Gestures on Learning Accuracy in 3D-CNN (3D-CNN에서 동적 손 제스처의 시공간적 특징이 학습 정확성에 미치는 영향)

  • Yeongjee Chung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.145-151
    • /
    • 2023
  • 3D-CNN is one of the deep learning techniques for learning time series data. Such three-dimensional learning can generate many parameters, so that high-performance machine learning is required or can have a large impact on the learning rate. When learning dynamic hand-gestures in spatiotemporal domain, it is necessary for the improvement of the efficiency of dynamic hand-gesture learning with 3D-CNN to find the optimal conditions of input video data by analyzing the learning accuracy according to the spatiotemporal change of input video data without structural change of the 3D-CNN model. First, the time ratio between dynamic hand-gesture actions is adjusted by setting the learning interval of image frames in the dynamic hand-gesture video data. Second, through 2D cross-correlation analysis between classes, similarity between image frames of input video data is measured and normalized to obtain an average value between frames and analyze learning accuracy. Based on this analysis, this work proposed two methods to effectively select input video data for 3D-CNN deep learning of dynamic hand-gestures. Experimental results showed that the learning interval of image data frames and the similarity of image frames between classes can affect the accuracy of the learning model.

An Efficient CT Image Denoising using WT-GAN Model

  • Hae Chan Jeong;Dong Hoon Lim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.5
    • /
    • pp.21-29
    • /
    • 2024
  • Reducing the radiation dose during CT scanning can lower the risk of radiation exposure, but not only does the image resolution significantly deteriorate, but the effectiveness of diagnosis is reduced due to the generation of noise. Therefore, noise removal from CT images is a very important and essential processing process in the image restoration. Until now, there are limitations in removing only the noise by separating the noise and the original signal in the image area. In this paper, we aim to effectively remove noise from CT images using the wavelet transform-based GAN model, that is, the WT-GAN model in the frequency domain. The GAN model used here generates images with noise removed through a U-Net structured generator and a PatchGAN structured discriminator. To evaluate the performance of the WT-GAN model proposed in this paper, experiments were conducted on CT images damaged by various noises, namely Gaussian noise, Poisson noise, and speckle noise. As a result of the performance experiment, the WT-GAN model is better than the traditional filter, that is, the BM3D filter, as well as the existing deep learning models, such as DnCNN, CDAE model, and U-Net GAN model, in qualitative and quantitative measures, that is, PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index Measure) showed excellent results.

Automatic Extraction of Hangul Stroke Element Using Faster R-CNN for Font Similarity (글꼴 유사도 판단을 위한 Faster R-CNN 기반 한글 글꼴 획 요소 자동 추출)

  • Jeon, Ja-Yeon;Park, Dong-Yeon;Lim, Seo-Young;Ji, Yeong-Seo;Lim, Soon-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.8
    • /
    • pp.953-964
    • /
    • 2020
  • Ever since media contents took over the world, the importance of typography has increased, and the influence of fonts has be n recognized. Nevertheless, the current Hangul font system is very poor and is provided passively, so it is practically impossible to understand and utilize all the shape characteristics of more than six thousand Hangul fonts. In this paper, the characteristics of Hangul font shapes were selected based on the Hangul structure of similar fonts. The stroke element detection training was performed by fine tuning Faster R-CNN Inception v2, one of the deep learning object detection models. We also propose a system that automatically extracts the stroke element characteristics from characters by introducing an automatic extraction algorithm. In comparison to the previous research which showed poor accuracy while using SVM(Support Vector Machine) and Sliding Window Algorithm, the proposed system in this paper has shown the result of 10 % accuracy to properly detect and extract stroke elements from various fonts. In conclusion, if the stroke element characteristics based on the Hangul structural information extracted through the system are used for similar classification, problems such as copyright will be solved in an era when typography's competitiveness becomes stronger, and an automated process will be provided to users for more convenience.