• Title/Summary/Keyword: convolution layer

Search Result 138, Processing Time 0.023 seconds

Improved Handwritten Hangeul Recognition using Deep Learning based on GoogLenet (GoogLenet 기반의 딥 러닝을 이용한 향상된 한글 필기체 인식)

  • Kim, Hyunwoo;Chung, Yoojin
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.7
    • /
    • pp.495-502
    • /
    • 2018
  • The advent of deep learning technology has made rapid progress in handwritten letter recognition in many languages. Handwritten Chinese recognition has improved to 97.2% accuracy while handwritten Japanese recognition approached 99.53% percent accuracy. Hanguel handwritten letters have many similar characters due to the characteristics of Hangeul, so it was difficult to recognize the letters because the number of data was small. In the handwritten Hanguel recognition using Hybrid Learning, it used a low layer model based on lenet and showed 96.34% accuracy in handwritten Hanguel database PE92. In this paper, 98.64% accuracy was obtained by organizing deep CNN (Convolution Neural Network) in handwritten Hangeul recognition. We designed a new network for handwritten Hangeul data based on GoogLenet without using the data augmentation or the multitasking techniques used in Hybrid learning.

A Deep Neural Network Architecture for Real-Time Semantic Segmentation on Embedded Board (임베디드 보드에서 실시간 의미론적 분할을 위한 심층 신경망 구조)

  • Lee, Junyeop;Lee, Youngwan
    • Journal of KIISE
    • /
    • v.45 no.1
    • /
    • pp.94-98
    • /
    • 2018
  • We propose Wide Inception ResNet (WIR Net) an optimized neural network architecture as a real-time semantic segmentation method for autonomous driving. The neural network architecture consists of an encoder that extracts features by applying a residual connection and inception module, and a decoder that increases the resolution by using transposed convolution and a low layer feature map. We also improved the performance by applying an ELU activation function and optimized the neural network by reducing the number of layers and increasing the number of filters. The performance evaluations used an NVIDIA Geforce GTX 1080 and TX1 boards to assess the class and category IoU for cityscapes data in the driving environment. The experimental results show that the accuracy of class IoU 53.4, category IoU 81.8 and the execution speed of $640{\times}360$, $720{\times}480$ resolution image processing 17.8fps and 13.0fps on TX1 board.

Light weight architecture for acoustic scene classification (음향 장면 분류를 위한 경량화 모형 연구)

  • Lim, Soyoung;Kwak, Il-Youp
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.979-993
    • /
    • 2021
  • Acoustic scene classification (ASC) categorizes an audio file based on the environment in which it has been recorded. This has long been studied in the detection and classification of acoustic scenes and events (DCASE). In this study, we considered the problem that ASC faces in real-world applications that the model used should have low-complexity. We compared several models that apply light-weight techniques. First, a base CNN model was proposed using log mel-spectrogram, deltas, and delta-deltas features. Second, depthwise separable convolution, linear bottleneck inverted residual block was applied to the convolutional layer, and Quantization was applied to the models to develop a low-complexity model. The model considering low-complexity was similar or slightly inferior to the performance of the base model, but the model size was significantly reduced from 503 KB to 42.76 KB.

Feature Extraction on a Periocular Region and Person Authentication Using a ResNet Model (ResNet 모델을 이용한 눈 주변 영역의 특징 추출 및 개인 인증)

  • Kim, Min-Ki
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.12
    • /
    • pp.1347-1355
    • /
    • 2019
  • Deep learning approach based on convolution neural network (CNN) has extensively studied in the field of computer vision. However, periocular feature extraction using CNN was not well studied because it is practically impossible to collect large volume of biometric data. This study uses the ResNet model which was trained with the ImageNet dataset. To overcome the problem of insufficient training data, we focused on the training of multi-layer perception (MLP) having simple structure rather than training the CNN having complex structure. It first extracts features using the pretrained ResNet model and reduces the feature dimension by principle component analysis (PCA), then trains a MLP classifier. Experimental results with the public periocular dataset UBIPr show that the proposed method is effective in person authentication using periocular region. Especially it has the advantage which can be directly applied for other biometric traits.

A comparison of methods to reduce overfitting in neural networks

  • Kim, Ho-Chan;Kang, Min-Jae
    • International journal of advanced smart convergence
    • /
    • v.9 no.2
    • /
    • pp.173-178
    • /
    • 2020
  • A common problem with neural network learning is that it is too suitable for the specificity of learning. In this paper, various methods were compared to avoid overfitting: regularization, drop-out, different numbers of data and different types of neural networks. Comparative studies of the above-mentioned methods have been provided to evaluate the test accuracy. I found that the more data using method is better than the regularization and dropout methods. Moreover, we know that deep convolutional neural networks outperform multi-layer neural networks and simple convolution neural networks.

Multiple Plankton Detection and Recognition in Microscopic Images with Homogeneous Clumping and Heterogeneous Interspersion

  • Soh, Youngsung;Song, Jaehyun;Hae, Yongsuk
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.19 no.2
    • /
    • pp.35-41
    • /
    • 2018
  • The analysis of plankton species distribution in sea or fresh water is very important in preserving marine ecosystem health. Since manual analysis is infeasible, many automatic approaches were proposed. They usually use images from in situ towed underwater imaging sensor or specially designed, lab mounted microscopic imaging system. Normally they assume that only single plankton is present in an image so that, if there is a clumping among multiple plankton of same species (homogeneous clumping) or if there are multiple plankton of different species scattered in an image (heterogeneous interspersion), they have a difficulty in recognition. In this work, we propose a deep learning based method that can detect and recognize individual plankton in images with homogeneous clumping, heterogeneous interspersion, or combination of both.

Soil-Structure Interaction Analysis in the Time Domain Using Explicit Frequency-Dependent Two Dimensional Infinite Elements (명시적 주파수종속 2차원 무한요소를 사용한 지반-구조물 상호작용의 시간영역해석)

  • 윤정방;김두기
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 1997.10a
    • /
    • pp.42-49
    • /
    • 1997
  • In this paper, the method for soil-structure interaction analyses in the time domain is proposed. The far field soil region which is the outside of the artificial boundary is modeled by using explicit frequency-dependent two dimensional infinite elements which can include multiple wave components propagating into the unbounded medium. Since the dynamic stiffness matrix of the far field soil region using the proposed infinite elements is obtained explicitly in terms of exciting frequencies and constants in the frequency domain, the matrix can be easily transformed into the displacement unit-impulse response matrix, which corresponds to a convolution integral of it in the time domain. To verify the proposed method for soil-structure interaction analyses in the time domain, the displacement responses due to an impulse load on the surface of a soil layer with the rigid bed rock are compared with those obtained by the method in the frequency domain and those by models with extend finite element meshes. Good agreements have been found between them.

  • PDF

Performance Analysis of Multimedia CDMA Network with Concatenated Coding and RAKE Receiver

  • Roh Jae-Sung;Kim Choon-Gil;Cho Sung-Joon
    • Journal of information and communication convergence engineering
    • /
    • v.2 no.3
    • /
    • pp.139-144
    • /
    • 2004
  • In order to transmit various types of multimedia data (i.e. voice, video, and data) over a wireless channel, the coding and modulation scheme needs to be flexible and capable of providing a variable quality of service, data rates, and latency. In this paper, we study a mobile multimedia COMA network combined with the concatenated Reed-Solomon/Rate Compatible Punctured Convolution code (RS/RCPC). Also, this paper propose the combination of concatenated RS/RCPC coder and COMA RAKE receiver for multimedia COMA traffic which can be sent over wireless channels. From the results, using a frequency selective Rayleigh fading channel model, it is shown that concatenated RS/RCPC coder at the wireless physical layer can be effective in providing reliable wireless multimedia CDMA network. And the proposed scheme that combine concatenated RS/RCPC coder and CDMA RAKE receiver provides a significant gain in the BER performance over multi-user interference and multipath frequency selective fading channels.

Multi-focus Image Fusion using Fully Convolutional Two-stream Network for Visual Sensors

  • Xu, Kaiping;Qin, Zheng;Wang, Guolong;Zhang, Huidi;Huang, Kai;Ye, Shuxiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.5
    • /
    • pp.2253-2272
    • /
    • 2018
  • We propose a deep learning method for multi-focus image fusion. Unlike most existing pixel-level fusion methods, either in spatial domain or in transform domain, our method directly learns an end-to-end fully convolutional two-stream network. The framework maps a pair of different focus images to a clean version, with a chain of convolutional layers, fusion layer and deconvolutional layers. Our deep fusion model has advantages of efficiency and robustness, yet demonstrates state-of-art fusion quality. We explore different parameter settings to achieve trade-offs between performance and speed. Moreover, the experiment results on our training dataset show that our network can achieve good performance with subjective visual perception and objective assessment metrics.

Compressed Representation of CNN for Image Compression in MPEG-NNR (MPEG-NNR의 영상 압축을 위한 CNN 의 압축 표현 기법)

  • Moon, HyeonCheol;Kim, Jae-Gon
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.06a
    • /
    • pp.84-85
    • /
    • 2019
  • MPEG-NNR (Compression of Neural Network for Multimedia Content Description and Analysis) aims to define a compressed and interoperable representation of trained neural networks. In this paper, we present a low-rank approximation to compress a CNN used for image compression, which is one of MPEG-NNR use cases. In the presented method, the low-rank approximation decomposes one 2D kernel matrix of weights into two 1D kernel matrix values in each convolution layer to reduce the data amount of weights. The evaluation results show that the model size of the original CNN is reduced to half as well as the inference runtime is reduced up to about 30% with negligible loss in PSNR.

  • PDF