• Title/Summary/Keyword: Residual neural networks

Search Result 58, Processing Time 0.027 seconds

Light Field Angular Super-Resolution Algorithm Using Dilated Convolutional Neural Network with Residual Network (잔차 신경망과 팽창 합성곱 신경망을 이용한 라이트 필드 각 초해상도 기법)

  • Kim, Dong-Myung;Suh, Jae-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.12
    • /
    • pp.1604-1611
    • /
    • 2020
  • Light field image captured by a microlens array-based camera has many limitations in practical use due to its low spatial resolution and angular resolution. High spatial resolution images can be easily acquired with a single image super-resolution technique that has been studied a lot recently. But there is a problem in that high angular resolution images are distorted in the process of using disparity information inherent among images, and thus it is difficult to obtain a high-quality angular resolution image. In this paper, we propose light field angular super-resolution that extracts an initial feature map using an dilated convolutional neural network in order to effectively extract the view difference information inherent among images and generates target image using a residual neural network. The proposed network showed superior performance in PSNR and subjective image quality compared to existing angular super-resolution networks.

Latent Shifting and Compensation for Learned Video Compression (신경망 기반 비디오 압축을 위한 레이턴트 정보의 방향 이동 및 보상)

  • Kim, Yeongwoong;Kim, Donghyun;Jeong, Se Yoon;Choi, Jin Soo;Kim, Hui Yong
    • Journal of Broadcast Engineering
    • /
    • v.27 no.1
    • /
    • pp.31-43
    • /
    • 2022
  • Traditional video compression has developed so far based on hybrid compression methods through motion prediction, residual coding, and quantization. With the rapid development of technology through artificial neural networks in recent years, research on image compression and video compression based on artificial neural networks is also progressing rapidly, showing competitiveness compared to the performance of traditional video compression codecs. In this paper, a new method capable of improving the performance of such an artificial neural network-based video compression model is presented. Basically, we take the rate-distortion optimization method using the auto-encoder and entropy model adopted by the existing learned video compression model and shifts some components of the latent information that are difficult for entropy model to estimate when transmitting compressed latent representation to the decoder side from the encoder side, and finally compensates the distortion of lost information. In this way, the existing neural network based video compression framework, MFVC (Motion Free Video Compression) is improved and the BDBR (Bjøntegaard Delta-Rate) calculated based on H.264 is nearly twice the amount of bits (-27%) of MFVC (-14%). The proposed method has the advantage of being widely applicable to neural network based image or video compression technologies, not only to MFVC, but also to models using latent information and entropy model.

A Hybrid Optimized Deep Learning Techniques for Analyzing Mammograms

  • Bandaru, Satish Babu;Deivarajan, Natarajasivan;Gatram, Rama Mohan Babu
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.73-82
    • /
    • 2022
  • Early detection continues to be the mainstay of breast cancer control as well as the improvement of its treatment. Even so, the absence of cancer symptoms at the onset has early detection quite challenging. Therefore, various researchers continue to focus on cancer as a topic of health to try and make improvements from the perspectives of diagnosis, prevention, and treatment. This research's chief goal is development of a system with deep learning for classification of the breast cancer as non-malignant and malignant using mammogram images. The following two distinct approaches: the first one with the utilization of patches of the Region of Interest (ROI), and the second one with the utilization of the overall images is used. The proposed system is composed of the following two distinct stages: the pre-processing stage and the Convolution Neural Network (CNN) building stage. Of late, the use of meta-heuristic optimization algorithms has accomplished a lot of progress in resolving these problems. Teaching-Learning Based Optimization algorithm (TIBO) meta-heuristic was originally employed for resolving problems of continuous optimization. This work has offered the proposals of novel methods for training the Residual Network (ResNet) as well as the CNN based on the TLBO and the Genetic Algorithm (GA). The classification of breast cancer can be enhanced with direct application of the hybrid TLBO- GA. For this hybrid algorithm, the TLBO, i.e., a core component, will combine the following three distinct operators of the GA: coding, crossover, and mutation. In the TLBO, there is a representation of the optimization solutions as students. On the other hand, the hybrid TLBO-GA will have further division of the students as follows: the top students, the ordinary students, and the poor students. The experiments demonstrated that the proposed hybrid TLBO-GA is more effective than TLBO and GA.

Scalable Video Coding using Super-Resolution based on Convolutional Neural Networks for Video Transmission over Very Narrow-Bandwidth Networks (초협대역 비디오 전송을 위한 심층 신경망 기반 초해상화를 이용한 스케일러블 비디오 코딩)

  • Kim, Dae-Eun;Ki, Sehwan;Kim, Munchurl;Jun, Ki Nam;Baek, Seung Ho;Kim, Dong Hyun;Choi, Jeung Won
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.132-141
    • /
    • 2019
  • The necessity of transmitting video data over a narrow-bandwidth exists steadily despite that video service over broadband is common. In this paper, we propose a scalable video coding framework for low-resolution video transmission over a very narrow-bandwidth network by super-resolution of decoded frames of a base layer using a convolutional neural network based super resolution technique to improve the coding efficiency by using it as a prediction for the enhancement layer. In contrast to the conventional scalable high efficiency video coding (SHVC) standard, in which upscaling is performed with a fixed filter, we propose a scalable video coding framework that replaces the existing fixed up-scaling filter by using the trained convolutional neural network for super-resolution. For this, we proposed a neural network structure with skip connection and residual learning technique and trained it according to the application scenario of the video coding framework. For the application scenario where a video whose resolution is $352{\times}288$ and frame rate is 8fps is encoded at 110kbps, the quality of the proposed scalable video coding framework is higher than that of the SHVC framework.

Image Super-Resolution Using Deep Convolutional Neural Networks Based on Residual Blocks (잔차 블록 기반의 깊은 합성곱 신경망을 통한 단일 영상 초해상도 복원)

  • Kim, Ingu;Yu, Songhyun;Jeong, Jaechang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2018.11a
    • /
    • pp.62-65
    • /
    • 2018
  • 신경망은 깊어질수록 gradient vanishing/exploding과 같은 네트워크가 불안정해지는 문제가 발생 한다. 잔차 블록을 이용하여 이러한 문제를 해결 할 수 있다. 본 논문에서는 영상 인식 분야에서 훌륭한 성능을 보여준 잔차 블록 기반의 깊은 합성곱 신경망을 통한 단일 영상 초해상도 복원 기법을 제안 한다. 제안한 알고리듬은 EDSR에 사용된 잔차 블록을 다양한 크기의 합성곱 연산을 통해 영상의 특징들을 다르게 분석하도록 수정하고 VDSR과 비슷한 수준의 복잡도로 구성하여 향상된 성능을 얻었다. 실험 결과, VDSR에 비해 PSNR이 최대 0.1dB까지 증가했다.

  • PDF

Emotion Transfer with Strength Control for End-to-End TTS (감정 제어 가능한 종단 간 음성합성 시스템)

  • Jeon, Yejin;Lee, Gary Geunbae
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.423-426
    • /
    • 2021
  • 본 논문은 전역 스타일 토큰(Global Style Token)을 기준으로 하여 감정의 세기를 조절할 수 있는 방법을 소개한다. 기존의 전역 스타일 토큰 연구에서는 원하는 스타일이 포함된 참조 오디오(reference audio)을 사용하여 음성을 합성하였다. 그러나, 참조 오디오의 스타일대로만 음성합성이 가능하기 때문에 세밀한 감정 조절에 어려움이 있었다. 이 문제를 해결하기 위해 본 논문에서는 전역 스타일 토큰의 레퍼런스 인코더 부분을 잔여 블록(residual block)과 컴퓨터 비전 분야에서 사용되는 AlexNet으로 대체하였다. AlexNet은 5개의 함성곱 신경망(convolutional neural networks) 으로 구성되어 있지만, 본 논문에서는 1개의 신경망을 제외한 4개의 레이어만 사용했다. 청취 평가(Mean Opinion Score)를 통해 제시된 방법으로 감정 세기의 조절 가능성을 보여준다.

  • PDF

Applicability of Image Classification Using Deep Learning in Small Area : Case of Agricultural Lands Using UAV Image (딥러닝을 이용한 소규모 지역의 영상분류 적용성 분석 : UAV 영상을 이용한 농경지를 대상으로)

  • Choi, Seok-Keun;Lee, Soung-Ki;Kang, Yeon-Bin;Seong, Seon-Kyeong;Choi, Do-Yeon;Kim, Gwang-Ho
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.38 no.1
    • /
    • pp.23-33
    • /
    • 2020
  • Recently, high-resolution images can be easily acquired using UAV (Unmanned Aerial Vehicle), so that it is possible to produce small area observation and spatial information at low cost. In particular, research on the generation of cover maps in crop production areas is being actively conducted for monitoring the agricultural environment. As a result of comparing classification performance by applying RF(Random Forest), SVM(Support Vector Machine) and CNN(Convolutional Neural Network), deep learning classification method has many advantages in image classification. In particular, land cover classification using satellite images has the advantage of accuracy and time of classification using satellite image data set and pre-trained parameters. However, UAV images have different characteristics such as satellite images and spatial resolution, which makes it difficult to apply them. In order to solve this problem, we conducted a study on the application of deep learning algorithms that can be used for analyzing agricultural lands where UAV data sets and small-scale composite cover exist in Korea. In this study, we applied DeepLab V3 +, FC-DenseNet (Fully Convolutional DenseNets) and FRRN-B (Full-Resolution Residual Networks), the semantic image classification of the state-of-art algorithm, to UAV data set. As a result, DeepLab V3 + and FC-DenseNet have an overall accuracy of 97% and a Kappa coefficient of 0.92, which is higher than the conventional classification. The applicability of the cover classification using UAV images of small areas is shown.

Cox Model Improvement Using Residual Blocks in Neural Networks: A Study on the Predictive Model of Cervical Cancer Mortality (신경망 내 잔여 블록을 활용한 콕스 모델 개선: 자궁경부암 사망률 예측모형 연구)

  • Nang Kyeong Lee;Joo Young Kim;Ji Soo Tak;Hyeong Rok Lee;Hyun Ji Jeon;Jee Myung Yang;Seung Won Lee
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.6
    • /
    • pp.260-268
    • /
    • 2024
  • Cervical cancer is the fourth most common cancer in women worldwide, and more than 604,000 new cases were reported in 2020 alone, resulting in approximately 341,831 deaths. The Cox regression model is a major model widely adopted in cancer research, but considering the existence of nonlinear associations, it faces limitations due to linear assumptions. To address this problem, this paper proposes ResSurvNet, a new model that improves the accuracy of cervical cancer mortality prediction using ResNet's residual learning framework. This model showed accuracy that outperforms the DNN, CPH, CoxLasso, Cox Gradient Boost, and RSF models compared in this study. As this model showed accuracy that outperformed the DNN, CPH, CoxLasso, Cox Gradient Boost, and RSF models compared in this study, this excellent predictive performance demonstrates great value in early diagnosis and treatment strategy establishment in the management of cervical cancer patients and represents significant progress in the field of survival analysis.

Deep Learning based Raw Audio Signal Bandwidth Extension System (딥러닝 기반 음향 신호 대역 확장 시스템)

  • Kim, Yun-Su;Seok, Jong-Won
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.1122-1128
    • /
    • 2020
  • Bandwidth Extension refers to restoring and expanding a narrow band signal(NB) that is damaged or damaged in the encoding and decoding process due to the lack of channel capacity or the characteristics of the codec installed in the mobile communication device. It means converting to a wideband signal(WB). Bandwidth extension research mainly focuses on voice signals and converts high bands into frequency domains, such as SBR (Spectral Band Replication) and IGF (Intelligent Gap Filling), and restores disappeared or damaged high bands based on complex feature extraction processes. In this paper, we propose a model that outputs an bandwidth extended signal based on an autoencoder among deep learning models, using the residual connection of one-dimensional convolutional neural networks (CNN), the bandwidth is extended by inputting a time domain signal of a certain length without complicated pre-processing. In addition, it was confirmed that the damaged high band can be restored even by training on a dataset containing various types of sound sources including music that is not limited to the speech.

The Effect of regularization and identity mapping on the performance of activation functions (정규화 및 항등사상이 활성함수 성능에 미치는 영향)

  • Ryu, Seo-Hyeon;Yoon, Jae-Bok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.10
    • /
    • pp.75-80
    • /
    • 2017
  • In this paper, we describe the effect of the regularization method and the network with identity mapping on the performance of the activation functions in deep convolutional neural networks. The activation functions act as nonlinear transformation. In early convolutional neural networks, a sigmoid function was used. To overcome the problem of the existing activation functions such as gradient vanishing, various activation functions were developed such as ReLU, Leaky ReLU, parametric ReLU, and ELU. To solve the overfitting problem, regularization methods such as dropout and batch normalization were developed on the sidelines of the activation functions. Additionally, data augmentation is usually applied to deep learning to avoid overfitting. The activation functions mentioned above have different characteristics, but the new regularization method and the network with identity mapping were validated only using ReLU. Therefore, we have experimentally shown the effect of the regularization method and the network with identity mapping on the performance of the activation functions. Through this analysis, we have presented the tendency of the performance of activation functions according to regularization and identity mapping. These results will reduce the number of training trials to find the best activation function.