• Title/Summary/Keyword: Feature extractor

Search Result 75, Processing Time 0.028 seconds

ECG Pattern Classification Using Back-Propagation Neural Network (역전달 신경회로망을 이용한 심전도 패턴분류)

  • Lee, Je-Suk;Kwon, Hyuk-Je;Lee, Jung-Whan;Lee, Myoung-Ho
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1992 no.11
    • /
    • pp.47-50
    • /
    • 1992
  • This paper describes pattern classification algorithm of ECG using back-propagation neural network. We presents new feature extractor using second order approximating function as the input signals of neural network. We use 9 significant parameters which were extracted by feature extractor. 5 most characterized ECG signal pattern is classified accurately by neural network. We use AHA database to evaluate the performance ol the proposed pattern classification algorithm.

  • PDF

FPGA Design of a SURF-based Feature Extractor (SURF 알고리즘 기반 특징점 추출기의 FPGA 설계)

  • Ryu, Jae-Kyung;Lee, Su-Hyun;Jeong, Yong-Jin
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.3
    • /
    • pp.368-377
    • /
    • 2011
  • This paper explains the hardware structure of SURF(Speeded Up Robust Feature) based feature point extractor and its FPGA verification result. SURF algorithm produces novel scale- and rotation-invariant feature point and descriptor which can be used for object recognition, creation of panorama image, 3D Image restoration. But the feature point extraction processing takes approximately 7,200msec for VGA-resolution in embedded environment using ARM11(667Mhz) processor and 128Mbytes DDR memory, hence its real-time operation is not guaranteed. We analyzed integral image memory access pattern which is a key component of SURF algorithm to reduce memory access and memory usage to operate in c real-time. We assure feature extraction that using a Vertex-5 FPGA gives 60frame/sec of VGA image at 100Mhz.

Development of Emotion Recognition Model Using Audio-video Feature Extraction Multimodal Model (음성-영상 특징 추출 멀티모달 모델을 이용한 감정 인식 모델 개발)

  • Jong-Gu Kim;Jang-Woo Kwon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.4
    • /
    • pp.221-228
    • /
    • 2023
  • Physical and mental changes caused by emotions can affect various behaviors, such as driving or learning behavior. Therefore, recognizing these emotions is a very important task because it can be used in various industries, such as recognizing and controlling dangerous emotions while driving. In this paper, we attempted to solve the emotion recognition task by implementing a multimodal model that recognizes emotions using both audio and video data from different domains. After extracting voice from video data using RAVDESS data, features of voice data are extracted through a model using 2D-CNN. In addition, the video data features are extracted using a slowfast feature extractor. And the information contained in the audio and video data, which have different domains, are combined into one feature that contains all the information. Afterwards, emotion recognition is performed using the combined features. Lastly, we evaluate the conventional methods that how to combine results from models and how to vote two model's results and a method of unifying the domain through feature extraction, then combining the features and performing classification using a classifier.

Pedestrian Classification using CNN's Deep Features and Transfer Learning (CNN의 깊은 특징과 전이학습을 사용한 보행자 분류)

  • Chung, Soyoung;Chung, Min Gyo
    • Journal of Internet Computing and Services
    • /
    • v.20 no.4
    • /
    • pp.91-102
    • /
    • 2019
  • In autonomous driving systems, the ability to classify pedestrians in images captured by cameras is very important for pedestrian safety. In the past, after extracting features of pedestrians with HOG(Histogram of Oriented Gradients) or SIFT(Scale-Invariant Feature Transform), people classified them using SVM(Support Vector Machine). However, extracting pedestrian characteristics in such a handcrafted manner has many limitations. Therefore, this paper proposes a method to classify pedestrians reliably and effectively using CNN's(Convolutional Neural Network) deep features and transfer learning. We have experimented with both the fixed feature extractor and the fine-tuning methods, which are two representative transfer learning techniques. Particularly, in the fine-tuning method, we have added a new scheme, called M-Fine(Modified Fine-tuning), which divideslayers into transferred parts and non-transferred parts in three different sizes, and adjusts weights only for layers belonging to non-transferred parts. Experiments on INRIA Person data set with five CNN models(VGGNet, DenseNet, Inception V3, Xception, and MobileNet) showed that CNN's deep features perform better than handcrafted features such as HOG and SIFT, and that the accuracy of Xception (threshold = 0.5) isthe highest at 99.61%. MobileNet, which achieved similar performance to Xception and learned 80% fewer parameters, was the best in terms of efficiency. Among the three transfer learning schemes tested above, the performance of the fine-tuning method was the best. The performance of the M-Fine method was comparable to or slightly lower than that of the fine-tuningmethod, but higher than that of the fixed feature extractor method.

Super Resolution Performance Analysis of GAN according to Feature Extractor (특징 추출기에 따른 SRGAN의 초해상 성능 분석)

  • Park, Sung-Wook;Kim, Jun-Yeong;Park, Jun;Jung, Se-Hoon;Sim, Chun-Bo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.501-503
    • /
    • 2022
  • 초해상이란 해상도가 낮은 영상을 해상도가 높은 영상으로 합성하는 기술이다. 딥러닝은 영상의 해상도를 높이는 초해상 기술에도 응용되며 실현은 2아4년에 발표된 SRCNN(Super Resolution Convolutional Neural Network) 모델로부터 시작됐다. 이후 오토인코더 (Autoencoders) 구조로는 SRCAE(Super Resolution Convolutional Autoencoders), 합성된 영상을 실제 영상과 통계적으로 구분되지 않도록 강제하는 GAN (Generative Adversarial Networks) 구조로는 SRGAN(Super Resolution Generative Adversarial Networks) 모델이 발표됐다. 모두 SRCNN의 성능을 웃도는 모델들이나 그중 가장 높은 성능을 끌어내는 SRGAN 조차 아직 완벽한 성능을 내진 못한다. 본 논문에서는 SRGAN의 성능을 개선하기 위해 사전 훈련된 특징 추출기(Pre-trained Feature Extractor) VGG(Visual Geometry Group)-19 모델을 변경하고, 기존 모델과 성능을 비교한다. 실험 결과, VGG-19 모델보다 윤곽이 뚜렷하고, 실제 영상과 더 가까운 영상을 합성할 수 있는 모델을 발견할 수 있을 것으로 기대된다.

Bottleneck-based Siam-CNN Algorithm for Object Tracking (객체 추적을 위한 보틀넥 기반 Siam-CNN 알고리즘)

  • Lim, Su-Chang;Kim, Jong-Chan
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.1
    • /
    • pp.72-81
    • /
    • 2022
  • Visual Object Tracking is known as the most fundamental problem in the field of computer vision. Object tracking localize the region of target object with bounding box in the video. In this paper, a custom CNN is created to extract object feature that has strong and various information. This network was constructed as a Siamese network for use as a feature extractor. The input images are passed convolution block composed of a bottleneck layers, and features are emphasized. The feature map of the target object and the search area, extracted from the Siamese network, was input as a local proposal network. Estimate the object area using the feature map. The performance of the tracking algorithm was evaluated using the OTB2013 dataset. Success Plot and Precision Plot were used as evaluation matrix. As a result of the experiment, 0.611 in Success Plot and 0.831 in Precision Plot were achieved.

A real-time vision system for SMT automation

  • Hwang, Shin-Hwan;Kim, Dong-Sik;Yun, Il-Dong;Choi, Jin-Woo;Lee, Sang-Uk;Choi, Jong-Soo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1990.10b
    • /
    • pp.923-928
    • /
    • 1990
  • This paper describes the design and implementation of a real-time, high-precision vision system and its application to SMT(surface mounting technology) automation. The vision system employs a 32 bit MC68030 as a main processor, and consists of image acquisition unit. DSP56001 DSP based vision processor, and several algorithmically dedicated hardware modules. The image acquisition unit provides 512*480*8 bit image for high-precision vision tasks. The DSP vision processor and hardware modules, such as histogram extractor and feature extractor, are designed for a real-time excution of vision algorithms. Especially, the implementation of multi-processing architecture based on DSP vision processors allows us to employ more sophisticated and flexible vision algorithms for real-time operation. The developed vision system is combined with an Adept Robot system to form a complete SMD system. It has been found that the vision guided SMD assembly system is able to provide a satisfactory performance for SND automation.

  • PDF

Design of Lightweight Artificial Intelligence System for Multimodal Signal Processing (멀티모달 신호처리를 위한 경량 인공지능 시스템 설계)

  • Kim, Byung-Soo;Lee, Jea-Hack;Hwang, Tae-Ho;Kim, Dong-Sun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.5
    • /
    • pp.1037-1042
    • /
    • 2018
  • The neuromorphic technology has been researched for decades, which learns and processes the information by imitating the human brain. The hardware implementations of neuromorphic systems are configured with highly parallel processing structures and a number of simple computational units. It can achieve high processing speed, low power consumption, and low hardware complexity. Recently, the interests of the neuromorphic technology for low power and small embedded systems have been increasing rapidly. To implement low-complexity hardware, it is necessary to reduce input data dimension without accuracy loss. This paper proposed a low-complexity artificial intelligent engine which consists of parallel neuron engines and a feature extractor. A artificial intelligent engine has a number of neuron engines and its controller to process multimodal sensor data. We verified the performance of the proposed neuron engine including the designed artificial intelligent engines, the feature extractor, and a Micro Controller Unit(MCU).

Face Recognition Evaluation of an Illumination Property of Subspace Based Feature Extractor (부분공간 기반 특징 추출기의 조명 변인에 대한 얼굴인식 성능 분석)

  • Kim, Kwang-Soo;Boo, Deok-Hee;Ahn, Jung-Ho;Kwak, Soo-Yeong;Byun, Hye-Ran
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.7
    • /
    • pp.681-687
    • /
    • 2007
  • Face recognition technique is very popular for a personal information security and user identification in recent years. However, the face recognition system is very hard to be implemented due to the difficulty where change in illumination, pose and facial expression. In this paper, we consider that an illumination change causing the variety of face appearance, virtual image data is generated and added to the D-LDA which was selected as the most suitable feature extractor. A less sensitive recognition system in illumination is represented in this paper. This way that consider nature of several illumination directions generate the virtual training image data that considered an illumination effect of the directions and the change of illumination density. As result of experiences, D-LDA has a less sensitive property in an illumination through ORL, Yale University and Pohang University face database.

Deep Learning-based Gaze Direction Vector Estimation Network Integrated with Eye Landmark Localization (딥 러닝 기반의 눈 랜드마크 위치 검출이 통합된 시선 방향 벡터 추정 네트워크)

  • Joo, Heeyoung;Ko, Min-Soo;Song, Hyok
    • Journal of Broadcast Engineering
    • /
    • v.26 no.6
    • /
    • pp.748-757
    • /
    • 2021
  • In this paper, we propose a gaze estimation network in which eye landmark position detection and gaze direction vector estimation are integrated into one deep learning network. The proposed network uses the Stacked Hourglass Network as a backbone structure and is largely composed of three parts: a landmark detector, a feature map extractor, and a gaze direction estimator. The landmark detector estimates the coordinates of 50 eye landmarks, and the feature map extractor generates a feature map of the eye image for estimating the gaze direction. And the gaze direction estimator estimates the final gaze direction vector by combining each output result. The proposed network was trained using virtual synthetic eye images and landmark coordinate data generated through the UnityEyes dataset, and the MPIIGaze dataset consisting of real human eye images was used for performance evaluation. Through the experiment, the gaze estimation error showed a performance of 3.9, and the estimation speed of the network was 42 FPS (Frames per second).