• Title/Summary/Keyword: deep neural net

Search Result 327, Processing Time 0.025 seconds

Semantic Classification of DSM Using Convolutional Neural Network Based Deep Learning (합성곱 신경망 기반의 딥러닝에 의한 수치표면모델의 객체분류)

  • Lee, Dae Geon;Cho, Eun Ji;Lee, Dong-Cheon
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.6
    • /
    • pp.435-444
    • /
    • 2019
  • Recently, DL (Deep Learning) has been rapidly applied in various fields. In particular, classification and object recognition from images are major tasks in computer vision. Most of the DL utilizing imagery is primarily based on the CNN (Convolutional Neural Network) and improving performance of the DL model is main issue. While most CNNs are involve with images for training data, this paper aims to classify and recognize objects using DSM (Digital Surface Model), and slope and aspect information derived from the DSM instead of images. The DSM data sets used in the experiment were established by DGPF (German Society for Photogrammetry, Remote Sensing and Geoinformatics) and provided by ISPRS (International Society for Photogrammetry and Remote Sensing). The CNN-based SegNet model, that is evaluated as having excellent efficiency and performance, was used to train the data sets. In addition, this paper proposed a scheme for training data generation efficiently from the limited number of data. The results demonstrated DSM and derived data could be feasible for semantic classification with desirable accuracy using DL.

End-to-end non-autoregressive fast text-to-speech (End-to-end 비자기회귀식 가속 음성합성기)

  • Kim, Wiback;Nam, Hosung
    • Phonetics and Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.47-53
    • /
    • 2021
  • Autoregressive Text-to-Speech (TTS) models suffer from inference instability and slow inference speed. Inference instability occurs when a poorly predicted sample at time step t affects all the subsequent predictions. Slow inference speed arises from a model structure that forces the predicted samples from time steps 1 to t-1 to predict the sample at time step t. In this study, an end-to-end non-autoregressive fast text-to-speech model is suggested as a solution to these problems. The results of this study show that this model's Mean Opinion Score (MOS) is close to that of Tacotron 2 - WaveNet, while this model's inference speed and stability are higher than those of Tacotron 2 - WaveNet. Further, this study aims to offer insight into the improvement of non-autoregressive models.

Analysis of methods for the model extraction without training data (학습 데이터가 없는 모델 탈취 방법에 대한 분석)

  • Hyun Kwon;Yonggi Kim;Jun Lee
    • Convergence Security Journal
    • /
    • v.23 no.5
    • /
    • pp.57-64
    • /
    • 2023
  • In this study, we analyzed how to steal the target model without training data. Input data is generated using the generative model, and a similar model is created by defining a loss function so that the predicted values of the target model and the similar model are close to each other. At this time, the target model has a process of learning so that the similar model is similar to it by gradient descent using the logit (logic) value of each class for the input data. The tensorflow machine learning library was used as an experimental environment, and CIFAR10 and SVHN were used as datasets. A similar model was created using the ResNet model as a target model. As a result of the experiment, it was found that the model stealing method generated a similar model with an accuracy of 86.18% for CIFAR10 and 96.02% for SVHN, producing similar predicted values to the target model. In addition, considerations on the model stealing method, military use, and limitations were also analyzed.

Fashion Category Oversampling Automation System

  • Minsun Yeu;Do Hyeok Yoo;SuJin Bak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.31-40
    • /
    • 2024
  • In the realm of domestic online fashion platform industry the manual registration of product information by individual business owners leads to inconvenience and reliability issues, especially when dealing with simultaneous registrations of numerous product groups. Moreover, bias is significantly heightened due to the low quality of product images and an imbalance in data quantity. Therefore, this study proposes a ResNet50 model aimed at minimizing data bias through oversampling techniques and conducting multiple classifications for 13 fashion categories. Transfer learning is employed to optimize resource utilization and reduce prolonged learning times. The results indicate improved discrimination of up to 33.4% for data augmentation in classes with insufficient data compared to the basic convolution neural network (CNN) model. The reliability of all outcomes is underscored by precision and affirmed by the recall curve. This study is suggested to advance the development of the domestic online fashion platform industry to a higher echelon.

Dual-stream Co-enhanced Network for Unsupervised Video Object Segmentation

  • Hongliang Zhu;Hui Yin;Yanting Liu;Ning Chen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.4
    • /
    • pp.938-958
    • /
    • 2024
  • Unsupervised Video Object Segmentation (UVOS) is a highly challenging problem in computer vision as the annotation of the target object in the testing video is unknown at all. The main difficulty is to effectively handle the complicated and changeable motion state of the target object and the confusion of similar background objects in video sequence. In this paper, we propose a novel deep Dual-stream Co-enhanced Network (DC-Net) for UVOS via bidirectional motion cues refinement and multi-level feature aggregation, which can fully take advantage of motion cues and effectively integrate different level features to produce high-quality segmentation mask. DC-Net is a dual-stream architecture where the two streams are co-enhanced by each other. One is a motion stream with a Motion-cues Refine Module (MRM), which learns from bidirectional optical flow images and produces fine-grained and complete distinctive motion saliency map, and the other is an appearance stream with a Multi-level Feature Aggregation Module (MFAM) and a Context Attention Module (CAM) which are designed to integrate the different level features effectively. Specifically, the motion saliency map obtained by the motion stream is fused with each stage of the decoder in the appearance stream to improve the segmentation, and in turn the segmentation loss in the appearance stream feeds back into the motion stream to enhance the motion refinement. Experimental results on three datasets (Davis2016, VideoSD, SegTrack-v2) demonstrate that DC-Net has achieved comparable results with some state-of-the-art methods.

Channel Attention Module in Convolutional Neural Network and Its Application to SAR Target Recognition Under Limited Angular Diversity Condition (합성곱 신경망의 Channel Attention 모듈 및 제한적인 각도 다양성 조건에서의 SAR 표적영상 식별로의 적용)

  • Park, Ji-Hoon;Seo, Seung-Mo;Yoo, Ji Hee
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.24 no.2
    • /
    • pp.175-186
    • /
    • 2021
  • In the field of automatic target recognition(ATR) with synthetic aperture radar(SAR) imagery, it is usually impractical to obtain SAR target images covering a full range of aspect views. When the database consists of SAR target images with limited angular diversity, it can lead to performance degradation of the SAR-ATR system. To address this problem, this paper proposes a deep learning-based method where channel attention modules(CAMs) are inserted to a convolutional neural network(CNN). Motivated by the idea of the squeeze-and-excitation(SE) network, the CAM is considered to help improve recognition performance by selectively emphasizing discriminative features and suppressing ones with less information. After testing various CAM types included in the ResNet18-type base network, the SE CAM and its modified forms are applied to SAR target recognition using MSTAR dataset with different reduction ratios in order to validate recognition performance improvement under the limited angular diversity condition.

Deep Learning: High-quality Imaging through Multicore Fiber

  • Wu, Liqing;Zhao, Jun;Zhang, Minghai;Zhang, Yanzhu;Wang, Xiaoyan;Chen, Ziyang;Pu, Jixiong
    • Current Optics and Photonics
    • /
    • v.4 no.4
    • /
    • pp.286-292
    • /
    • 2020
  • Imaging through multicore fiber (MCF) is of great significance in the biomedical domain. Although several techniques have been developed to image an object from a signal passing through MCF, these methods are strongly dependent on the surroundings, such as vibration and the temperature fluctuation of the fiber's environment. In this paper, we apply a new, strong technique called deep learning to reconstruct the phase image through a MCF in which each core is multimode. To evaluate the network, we employ the binary cross-entropy as the loss function of a convolutional neural network (CNN) with improved U-net structure. The high-quality reconstruction of input objects upon spatial light modulation (SLM) can be realized from the speckle patterns of intensity that contain the information about the objects. Moreover, we study the effect of MCF length on image recovery. It is shown that the shorter the fiber, the better the imaging quality. Based on our findings, MCF may have applications in fields such as endoscopic imaging and optical communication.

Application of Deep Learning for Classification of Ancient Korean Roof-end Tile Images (딥러닝을 활용한 고대 수막새 이미지 분류 검토)

  • KIM Younghyun
    • Korean Journal of Heritage: History & Science
    • /
    • v.57 no.3
    • /
    • pp.24-35
    • /
    • 2024
  • Recently, research using deep learning technologies such as artificial intelligence, convolutional neural networks, etc. has been actively conducted in various fields including healthcare, manufacturing, autonomous driving, and security, and is having a significant influence on society. In line with this trend, the present study attempted to apply deep learning to the classification of archaeological artifacts, specifically ancient Korean roof-end tiles. Using 100 images of roof-end tiles from each of the Goguryeo, Baekje, and Silla dynasties, for a total of 300 base images, a dataset was formed and expanded to 1,200 images using data augmentation techniques. After building a model using transfer learning from the pre-trained EfficientNetB0 model and conducting five-fold cross-validation, an average training accuracy of 98.06% and validation accuracy of 97.08% were achieved. Furthermore, when model performance was evaluated with a test dataset of 240 images, it could classify the roof-end tile images from the three dynasties with a minimum accuracy of 91%. In particular, with a learning rate of 0.0001, the model exhibited the highest performance, with accuracy of 92.92%, precision of 92.96%, recall of 92.92%, and F1 score of 92.93%. This optimal result was obtained by preventing overfitting and underfitting issues using various learning rate settings and finding the optimal hyperparameters. The study's findings confirm the potential for applying deep learning technologies to the classification of Korean archaeological materials, which is significant. Additionally, it was confirmed that the existing ImageNet dataset and parameters could be positively applied to the analysis of archaeological data. This approach could lead to the creation of various models for future archaeological database accumulation, the use of artifacts in museums, and classification and organization of artifacts.

A Study on Similar Trademark Search Model Using Convolutional Neural Networks (합성곱 신경망(Convolutional Neural Network)을 활용한 지능형 유사상표 검색 모형 개발)

  • Yoon, Jae-Woong;Lee, Suk-Jun;Song, Chil-Yong;Kim, Yeon-Sik;Jung, Mi-Young;Jeong, Sang-Il
    • Management & Information Systems Review
    • /
    • v.38 no.3
    • /
    • pp.55-80
    • /
    • 2019
  • Recently, many companies improving their management performance by building a powerful brand value which is recognized for trademark rights. However, as growing up the size of online commerce market, the infringement of trademark rights is increasing. According to various studies and reports, cases of foreign and domestic companies infringing on their trademark rights are increased. As the manpower and the cost required for the protection of trademark are enormous, small and medium enterprises(SMEs) could not conduct preliminary investigations to protect their trademark rights. Besides, due to the trademark image search service does not exist, many domestic companies have a problem that investigating huge amounts of trademarks manually when conducting preliminary investigations to protect their rights of trademark. Therefore, we develop an intelligent similar trademark search model to reduce the manpower and cost for preliminary investigation. To measure the performance of the model which is developed in this study, test data selected by intellectual property experts was used, and the performance of ResNet V1 101 was the highest. The significance of this study is as follows. The experimental results empirically demonstrate that the image classification algorithm shows high performance not only object recognition but also image retrieval. Since the model that developed in this study was learned through actual trademark image data, it is expected that it can be applied in the real industrial environment.

Deep learning-based automatic segmentation of the mandibular canal on panoramic radiographs: A multi-device study

  • Moe Thu Zar Aung;Sang-Heon Lim;Jiyong Han;Su Yang;Ju-Hee Kang;Jo-Eun Kim;Kyung-Hoe Huh;Won-Jin Yi;Min-Suk Heo;Sam-Sun Lee
    • Imaging Science in Dentistry
    • /
    • v.54 no.1
    • /
    • pp.81-91
    • /
    • 2024
  • Purpose: The objective of this study was to propose a deep-learning model for the detection of the mandibular canal on dental panoramic radiographs. Materials and Methods: A total of 2,100 panoramic radiographs (PANs) were collected from 3 different machines: RAYSCAN Alpha (n=700, PAN A), OP-100 (n=700, PAN B), and CS8100 (n=700, PAN C). Initially, an oral and maxillofacial radiologist coarsely annotated the mandibular canals. For deep learning analysis, convolutional neural networks (CNNs) utilizing U-Net architecture were employed for automated canal segmentation. Seven independent networks were trained using training sets representing all possible combinations of the 3 groups. These networks were then assessed using a hold-out test dataset. Results: Among the 7 networks evaluated, the network trained with all 3 available groups achieved an average precision of 90.6%, a recall of 87.4%, and a Dice similarity coefficient (DSC) of 88.9%. The 3 networks trained using each of the 3 possible 2-group combinations also demonstrated reliable performance for mandibular canal segmentation, as follows: 1) PAN A and B exhibited a mean DSC of 87.9%, 2) PAN A and C displayed a mean DSC of 87.8%, and 3) PAN B and C demonstrated a mean DSC of 88.4%. Conclusion: This multi-device study indicated that the examined CNN-based deep learning approach can achieve excellent canal segmentation performance, with a DSC exceeding 88%. Furthermore, the study highlighted the importance of considering the characteristics of panoramic radiographs when developing a robust deep-learning network, rather than depending solely on the size of the dataset.