• Title/Summary/Keyword: FCN

Search Result 39, Processing Time 0.026 seconds

Speech Emotion Recognition Using 2D-CNN with Mel-Frequency Cepstrum Coefficients

  • Eom, Youngsik;Bang, Junseong
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.3
    • /
    • pp.148-154
    • /
    • 2021
  • With the advent of context-aware computing, many attempts were made to understand emotions. Among these various attempts, Speech Emotion Recognition (SER) is a method of recognizing the speaker's emotions through speech information. The SER is successful in selecting distinctive 'features' and 'classifying' them in an appropriate way. In this paper, the performances of SER using neural network models (e.g., fully connected network (FCN), convolutional neural network (CNN)) with Mel-Frequency Cepstral Coefficients (MFCC) are examined in terms of the accuracy and distribution of emotion recognition. For Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, by tuning model parameters, a two-dimensional Convolutional Neural Network (2D-CNN) model with MFCC showed the best performance with an average accuracy of 88.54% for 5 emotions, anger, happiness, calm, fear, and sadness, of men and women. In addition, by examining the distribution of emotion recognition accuracies for neural network models, the 2D-CNN with MFCC can expect an overall accuracy of 75% or more.

Neural Network Design for Predicting Shear Modulus of Food Printability Enhancers (식품 인쇄 적성 증진제의 전단탄성률 예측 신경망 설계)

  • Yoo, Hyun-Ju;Moon, Nammee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.731-732
    • /
    • 2021
  • 인쇄 적성 증진제는 식품용 3D 프린팅에서 겔화 소재의 인쇄 적성을 향상시키는 요소 중 하나이다. 이 때, 인쇄 적성 증진제의 평가는 전단응력을 받을 때 일어나는 변형의 정도를 나타내는 전단탄성률 기반으로 한다. 그러나, 전단 탄성률 측정은 식품 원재료의 다양함으로 인해 소재별로 측정하는데 많은 시간과 비용이 소요되는 단점이 있다. 이에 본 연구에서는 FCN과 RNN을 사용하여 전단탄성률을 예측하는 신경망 설계를 제안함으로써 인쇄 적성 증진제의 전단탄성률을 측정하는 시간과 비용을 절감하고자 한다.

Design and Implementation of HRNet Model Combined with Spatial Information Attention Module of Polarized Self-attention (편광 셀프어텐션의 공간정보 강조 모듈을 결합한 HRNet 모델 설계 및 구현)

  • Jin-Seong Kim;Jun Park;Se-Hoon Jung;Chun-Bo Sim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.485-487
    • /
    • 2023
  • 컴퓨터 비전의 하위 태스크(Task)인 의미론적 분할(Semantic Segmentation)은 자율주행, 해상에서 선박찾기 등 다양한 분야에서 연구되고 있다. 기존 FCN(Fully Conovlutional Networks) 기반 의미론적 분할 모델은 다운샘플링(Dowsnsampling)과정에서 공간정보의 손실이 발생하여 정확도가 하락했다. 본 논문에서는 공간정보 손실을 완화하고자 PSA(Polarized Self-attention)의 공간정보 강조 모듈을 HRNet(High-resolution Networks)의 합성곱 블록 사이에 추가한다. 실험결과 파라미터는 3.1M, GFLOPs는 3.2G 증가했으나 mIoU는 0.26% 증가했다. 공간정보가 의미론적 분할 정확도에 영향이 미치는 것을 확인했다.

Effects of Glucose and Inorganic Phosphate on the Development of Rat 8-Cell Embryos In Vitro (Glucose와 Inorganic Phosphate가 Rat 8-세포기 난자의 체외배양에 미치는 영향)

  • 이홍미;진동일
    • Korean Journal of Animal Reproduction
    • /
    • v.20 no.3
    • /
    • pp.251-258
    • /
    • 1996
  • This study was designed to evaluate the potential inhibitory effects of glucose (5.56 mM vs. 0 mM) and/or phosphate (potassium phosphate, 1.19 mM vs. 0 mM) on the in vitro devel-opment of rat 8-cell embryos (n=345 embryos from 36 mature rats). Evaluation of embryos at 48 h for developmental stage (STG) indicated that 37% (31/84), 70% (64/91), 69% (59/85), and 77% (67/85) developed to the blastocyst stage in media with glucose+phosphate, glucose only, phosphate only, and no glucose or phosphate, respectively. Embryo development (2.90${\pm}$0.097 for STG) in medium with glucose + phosphate was significantly reduced (P<0.001), while no significant differences were observed between all other media (3.4~3.5${\pm}$0.093-0.097 for STG). Evaluation of embryos for final cell number (FCN) indicated that the greatest number of cells (nuclei) resulted in medium with glucose alone (29.3${\pm}$0.97 cells, P<0.001). No significant differences were observed for FCN for the remaining three media (l7.5${\pm}$1, 04 cells, 18.6${\pm}$1.Ol cells, and 19.8${\pm}$1.01 cells for glucose+phosphate, phosphate only, and no glucose or phosphate, respectively). Our results suggest that glucose and phosphate together exert an inhibitory effect on 8-cell rat embryo development, while glucose alone was beneficial, yielding greater numbers of cells per embryo.

  • PDF

Real-time Segmentation of Black Ice Region in Infrared Road Images

  • Li, Yu-Jie;Kang, Sun-Kyoung;Jung, Sung-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.33-42
    • /
    • 2022
  • In this paper, we proposed a deep learning model based on multi-scale dilated convolution feature fusion for the segmentation of black ice region in road image to send black ice warning to drivers in real time. In the proposed multi-scale dilated convolution feature fusion network, different dilated ratio convolutions are connected in parallel in the encoder blocks, and different dilated ratios are used in different resolution feature maps, and multi-layer feature information are fused together. The multi-scale dilated convolution feature fusion improves the performance by diversifying and expending the receptive field of the network and by preserving detailed space information and enhancing the effectiveness of diated convolutions. The performance of the proposed network model was gradually improved with the increase of the number of dilated convolution branch. The mIoU value of the proposed method is 96.46%, which was higher than the existing networks such as U-Net, FCN, PSPNet, ENet, LinkNet. The parameter was 1,858K, which was 6 times smaller than the existing LinkNet model. From the experimental results of Jetson Nano, the FPS of the proposed method was 3.63, which can realize segmentation of black ice field in real time.

Development of Deep Learning Based Ensemble Land Cover Segmentation Algorithm Using Drone Aerial Images (드론 항공영상을 이용한 딥러닝 기반 앙상블 토지 피복 분할 알고리즘 개발)

  • Hae-Gwang Park;Seung-Ki Baek;Seung Hyun Jeong
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.1
    • /
    • pp.71-80
    • /
    • 2024
  • In this study, a proposed ensemble learning technique aims to enhance the semantic segmentation performance of images captured by Unmanned Aerial Vehicles (UAVs). With the increasing use of UAVs in fields such as urban planning, there has been active development of techniques utilizing deep learning segmentation methods for land cover segmentation. The study suggests a method that utilizes prominent segmentation models, namely U-Net, DeepLabV3, and Fully Convolutional Network (FCN), to improve segmentation prediction performance. The proposed approach integrates training loss, validation accuracy, and class score of the three segmentation models to enhance overall prediction performance. The method was applied and evaluated on a land cover segmentation problem involving seven classes: buildings,roads, parking lots, fields, trees, empty spaces, and areas with unspecified labels, using images captured by UAVs. The performance of the ensemble model was evaluated by mean Intersection over Union (mIoU), and the results of comparing the proposed ensemble model with the three existing segmentation methods showed that mIoU performance was improved. Consequently, the study confirms that the proposed technique can enhance the performance of semantic segmentation models.

A New Object Region Detection and Classification Method using Multiple Sensors on the Driving Environment (다중 센서를 사용한 주행 환경에서의 객체 검출 및 분류 방법)

  • Kim, Jung-Un;Kang, Hang-Bong
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.8
    • /
    • pp.1271-1281
    • /
    • 2017
  • It is essential to collect and analyze target information around the vehicle for autonomous driving of the vehicle. Based on the analysis, environmental information such as location and direction should be analyzed in real time to control the vehicle. In particular, obstruction or cutting of objects in the image must be handled to provide accurate information about the vehicle environment and to facilitate safe operation. In this paper, we propose a method to simultaneously generate 2D and 3D bounding box proposals using LiDAR Edge generated by filtering LiDAR sensor information. We classify the classes of each proposal by connecting them with Region-based Fully-Covolutional Networks (R-FCN), which is an object classifier based on Deep Learning, which uses two-dimensional images as inputs. Each 3D box is rearranged by using the class label and the subcategory information of each class to finally complete the 3D bounding box corresponding to the object. Because 3D bounding boxes are created in 3D space, object information such as space coordinates and object size can be obtained at once, and 2D bounding boxes associated with 3D boxes do not have problems such as occlusion.

Performance Comparison of Gas Leak Region Segmentation Based on Transfer Learning (Transfer Learning 기법을 이용한 가스 누출 영역 분할 성능 비교)

  • Marshall, Marshall;Park, Jang-Sik;Park, Seong-Mi
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.23 no.3
    • /
    • pp.481-489
    • /
    • 2020
  • Safety and security during the handling of hazardous materials is a great concern for anyone in the field. One driving point in the security field is the ability to detect the source of the danger and take action against it as quickly as possible. Via the usage of a fully convolutional network, it is possible to create the label map of an input image, indicating what object is occupying the specific area of the image. This research employs the usage of U-net, which was constructed in biomedical field segmentation to segment cells, instead of the original FCN. One of the challenges that this research faces is the availability of ground truth with precise labeling for the dataset. Testing the network after training resulted in some images where the network pronounces even better detail than the expected label map. With better detailed label map, the network might be able to produce better segmentation is something to be studied in further research.

Improved Sliding Shapes for Instance Segmentation of Amodal 3D Object

  • Lin, Jinhua;Yao, Yu;Wang, Yanjie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5555-5567
    • /
    • 2018
  • State-of-art instance segmentation networks are successful at generating 2D segmentation mask for region proposals with highest classification score, yet 3D object segmentation task is limited to geocentric embedding or detector of Sliding Shapes. To this end, we propose an amodal 3D instance segmentation network called A3IS-CNN, which extends the detector of Deep Sliding Shapes to amodal 3D instance segmentation by adding a new branch of 3D ConvNet called A3IS-branch. The A3IS-branch which takes 3D amodal ROI as input and 3D semantic instances as output is a fully convolution network(FCN) sharing convolutional layers with existing 3d RPN which takes 3D scene as input and 3D amodal proposals as output. For two branches share computation with each other, our 3D instance segmentation network adds only a small overhead of 0.25 fps to Deep Sliding Shapes, trading off accurate detection and point-to-point segmentation of instances. Experiments show that our 3D instance segmentation network achieves at least 10% to 50% improvement over the state-of-art network in running time, and outperforms the state-of-art 3D detectors by at least 16.1 AP.

Improved Multi-modal Network Using Dilated Convolution Pyramid Pooling (팽창된 합성곱 계층 연산 풀링을 이용한 멀티 모달 네트워크 성능 향상 방법)

  • Park, Jun-Young;Ho, Yo-Sung
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2018.11a
    • /
    • pp.84-86
    • /
    • 2018
  • 요즘 자율주행과 같은 최신 기술의 발전과 더불어 촬영된 영상 장면에 대한 깊이있는 이해가 필요하게 되었다. 특히, 기계학습 기술이 발전하면서 카메라로 찍은 영상에 대한 의미론적 분할 기술에 대한 연구도 활발히 진행되고 있다. FuseNet은 인코더-디코더 구조를 이용하여 장면 내에 있는 객체에 대한 의미론적 분할 기술을 적용할 수 있는 신경망 모델이다. FuseNet은 오직 RGB 입력을 받는 기존의 FCN보다 깊이정보까지 활용하여 RGB 정보를 기반으로 추출한 특징지도와의 요소합 연산을 통해 멀티 모달 구조를 구현했다. 의미론적 분할 연구에서는 객체의 전역 컨텍스트가 고려되는 것이 중요한데, 이를 위해 여러 계층을 깊게 쌓으면 연산량이 많아지는 단점이 있다. 이를 극복하기 위해서 기존의 합성곱 방식을 벗어나 새롭게 제안된 팽창 합성곱 연산(Dilated Convolution)을 이용하면 객체의 수용 영역이 효과적으로 넓어지고 연산량이 적어질 수 있다. 본 논문에서는 컨볼루션 연산의 새로운 방법론적 접근 중 하나인 팽창된 합성곱 연산을 이용해 의미론적 분할 연구에서 새로운 멀티 모달 네트워크의 성능 향상 방법을 적용하여 계층을 더 깊게 쌓지 않더라도 파라미터의 증가 없이 해상도를 유지하면서 네트워크의 전체 성능을 향상할 수 있는 최적화된 방법을 제안한다.

  • PDF