• Title/Summary/Keyword: Deep Features

Search Result 1,096, Processing Time 0.026 seconds

Building Change Detection Using Deep Learning for Remote Sensing Images

  • Wang, Chang;Han, Shijing;Zhang, Wen;Miao, Shufeng
    • Journal of Information Processing Systems
    • /
    • v.18 no.4
    • /
    • pp.587-598
    • /
    • 2022
  • To increase building change recognition accuracy, we present a deep learning-based building change detection using remote sensing images. In the proposed approach, by merging pixel-level and object-level information of multitemporal remote sensing images, we create the difference image (DI), and the frequency-domain significance technique is used to generate the DI saliency map. The fuzzy C-means clustering technique pre-classifies the coarse change detection map by defining the DI saliency map threshold. We then extract the neighborhood features of the unchanged pixels and the changed (buildings) from pixel-level and object-level feature images, which are then used as valid deep neural network (DNN) training samples. The trained DNNs are then utilized to identify changes in DI. The suggested strategy was evaluated and compared to current detection methods using two datasets. The results suggest that our proposed technique can detect more building change information and improve change detection accuracy.

Musical Genre Classification Based on Deep Residual Auto-Encoder and Support Vector Machine

  • Xue Han;Wenzhuo Chen;Changjian Zhou
    • Journal of Information Processing Systems
    • /
    • v.20 no.1
    • /
    • pp.13-23
    • /
    • 2024
  • Music brings pleasure and relaxation to people. Therefore, it is necessary to classify musical genres based on scenes. Identifying favorite musical genres from massive music data is a time-consuming and laborious task. Recent studies have suggested that machine learning algorithms are effective in distinguishing between various musical genres. However, meeting the actual requirements in terms of accuracy or timeliness is challenging. In this study, a hybrid machine learning model that combines a deep residual auto-encoder (DRAE) and support vector machine (SVM) for musical genre recognition was proposed. Eight manually extracted features from the Mel-frequency cepstral coefficients (MFCC) were employed in the preprocessing stage as the hybrid music data source. During the training stage, DRAE was employed to extract feature maps, which were then used as input for the SVM classifier. The experimental results indicated that this method achieved a 91.54% F1-score and 91.58% top-1 accuracy, outperforming existing approaches. This novel approach leverages deep architecture and conventional machine learning algorithms and provides a new horizon for musical genre classification tasks.

A General Distributed Deep Learning Platform: A Review of Apache SINGA

  • Lee, Chonho;Wang, Wei;Zhang, Meihui;Ooi, Beng Chin
    • Communications of the Korean Institute of Information Scientists and Engineers
    • /
    • v.34 no.3
    • /
    • pp.31-34
    • /
    • 2016
  • This article reviews Apache SINGA, a general distributed deep learning (DL) platform. The system components and its architecture are presented, as well as how to configure and run SINGA for different types of distributed training using model/data partitioning. Besides, several features and performance are compared with other popular DL tools.

Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance (음성인식 성능 개선을 위한 다중작업 오토인코더와 와설스타인식 생성적 적대 신경망의 결합)

  • Kao, Chao Yuan;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.6
    • /
    • pp.670-677
    • /
    • 2019
  • As the presence of background noise in acoustic signal degrades the performance of speech or acoustic event recognition, it is still challenging to extract noise-robust acoustic features from noisy signal. In this paper, we propose a combined structure of Wasserstein Generative Adversarial Network (WGAN) and MultiTask AutoEncoder (MTAE) as deep learning architecture that integrates the strength of MTAE and WGAN respectively such that it estimates not only noise but also speech features from noisy acoustic source. The proposed MTAE-WGAN structure is used to estimate speech signal and the residual noise by employing a gradient penalty and a weight initialization method for Leaky Rectified Linear Unit (LReLU) and Parametric ReLU (PReLU). The proposed MTAE-WGAN structure with the adopted gradient penalty loss function enhances the speech features and subsequently achieve substantial Phoneme Error Rate (PER) improvements over the stand-alone Deep Denoising Autoencoder (DDAE), MTAE, Redundant Convolutional Encoder-Decoder (R-CED) and Recurrent MTAE (RMTAE) models for robust speech recognition.

Multimodal Biometrics Recognition from Facial Video with Missing Modalities Using Deep Learning

  • Maity, Sayan;Abdel-Mottaleb, Mohamed;Asfour, Shihab S.
    • Journal of Information Processing Systems
    • /
    • v.16 no.1
    • /
    • pp.6-29
    • /
    • 2020
  • Biometrics identification using multiple modalities has attracted the attention of many researchers as it produces more robust and trustworthy results than single modality biometrics. In this paper, we present a novel multimodal recognition system that trains a deep learning network to automatically learn features after extracting multiple biometric modalities from a single data source, i.e., facial video clips. Utilizing different modalities, i.e., left ear, left profile face, frontal face, right profile face, and right ear, present in the facial video clips, we train supervised denoising auto-encoders to automatically extract robust and non-redundant features. The automatically learned features are then used to train modality specific sparse classifiers to perform the multimodal recognition. Moreover, the proposed technique has proven robust when some of the above modalities were missing during the testing. The proposed system has three main components that are responsible for detection, which consists of modality specific detectors to automatically detect images of different modalities present in facial video clips; feature selection, which uses supervised denoising sparse auto-encoders network to capture discriminative representations that are robust to the illumination and pose variations; and classification, which consists of a set of modality specific sparse representation classifiers for unimodal recognition, followed by score level fusion of the recognition results of the available modalities. Experiments conducted on the constrained facial video dataset (WVU) and the unconstrained facial video dataset (HONDA/UCSD), resulted in a 99.17% and 97.14% Rank-1 recognition rates, respectively. The multimodal recognition accuracy demonstrates the superiority and robustness of the proposed approach irrespective of the illumination, non-planar movement, and pose variations present in the video clips even in the situation of missing modalities.

Feature Extraction and Recognition of Myanmar Characters Based on Deep Learning (딥러닝 기반 미얀마 문자의 특징 추출 및 인식)

  • Ohnmar, Khin;Lee, Sung-Keun
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.5
    • /
    • pp.977-984
    • /
    • 2022
  • Recently, with the economic development of Southeast Asia, the use of information devices is widely spreading, and the demand for application services using intelligent character recognition is increasing. This paper discusses deep learning-based feature extraction and recognition of Myanmar, one of the Southeast Asian countries. Myanmar alphabet (33 letters) and Myanmar numerals (10 numbers) are used for feature extraction. In this paper, the number of nine features are extracted and more than three new features are proposed. Extracted features of each characters and numbers are expressed with successful results. In the recognition part, convolutional neural networks are used to assess its execution on character distinction. Its algorithm is implemented on captured image data-sets and its implementation is evaluated. The precision of models on the input data set is 96 % and uses a real-time input image.

A New Face Morphing Method using Texture Feature-based Control Point Selection Algorithm and Parallel Deep Convolutional Neural Network (텍스처 특징 기반 제어점 선택 알고리즘과 병렬 심층 컨볼루션 신경망을 이용한 새로운 얼굴 모핑 방법)

  • Park, Jin Hyeok;Khan, Rafiul Hasan;Lim, Seon-Ja;Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.2
    • /
    • pp.176-188
    • /
    • 2022
  • In this paper, we propose a compact method for anthropomorphism that uses Deep Convolutional Neural Networks (DCNN) to detect the similarities between a human face and an animal face. We also apply texture feature-based morphing between them. We propose a basic texture feature-based morphing system for morphing between human faces only. The entire anthropomorphism process starts with the creation of an animal face classifier using a parallel DCNN that determines the most similar animal face to a given human face. The significance of our network is that it contains four sets of convolutional functions that run in parallel, allowing it to extract more features than a linear DCNN network. Our employed texture feature algorithm-based automatic morphing system recognizes the facial features of the human face and takes the Control Points automatically, rather than the traditional human aiding manual morphing system, once the similarity was established. The simulation results show that our suggested DCNN surpasses its competitors with a 92.0% accuracy rate. It also ensures that the most similar animal classes are found, and the texture-based morphing technology automatically completes the morphing process, ensuring a smooth transition from one image to another.

A study on skip-connection with time-frequency self-attention for improving speech enhancement based on complex-valued spectrum (복소 스펙트럼 기반 음성 향상의 성능 향상을 위한 time-frequency self-attention 기반 skip-connection 기법 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.2
    • /
    • pp.94-101
    • /
    • 2023
  • A deep neural network composed of encoders and decoders, such as U-Net, used for speech enhancement, concatenates the encoder to the decoder through skip-connection. Skip-connection helps reconstruct the enhanced spectrum and complement the lost information. The features of the encoder and the decoder connected by the skip-connection are incompatible with each other. In this paper, for complex-valued spectrum based speech enhancement, Self-Attention (SA) method is applied to skip-connection to transform the feature of encoder to be compatible with the features of decoder. SA is a technique in which when generating an output sequence in a sequence-to-sequence tasks the weighted average of input is used to put attention on subsets of input, showing that noise can be effectively eliminated by being applied in speech enhancement. The three models using encoder and decoder features to apply SA to skip-connection are studied. As experimental results using TIMIT database, the proposed methods show improvements in all evaluation metrics compared to the Deep Complex U-Net (DCUNET) with skip-connection only.

Spam Image Detection Model based on Deep Learning for Improving Spam Filter

  • Seong-Guk Nam;Dong-Gun Lee;Yeong-Seok Seo
    • Journal of Information Processing Systems
    • /
    • v.19 no.3
    • /
    • pp.289-301
    • /
    • 2023
  • Due to the development and dissemination of modern technology, anyone can easily communicate using services such as social network service (SNS) through a personal computer (PC) or smartphone. The development of these technologies has caused many beneficial effects. At the same time, bad effects also occurred, one of which was the spam problem. Spam refers to unwanted or rejected information received by unspecified users. The continuous exposure of such information to service users creates inconvenience in the user's use of the service, and if filtering is not performed correctly, the quality of service deteriorates. Recently, spammers are creating more malicious spam by distorting the image of spam text so that optical character recognition (OCR)-based spam filters cannot easily detect it. Fortunately, the level of transformation of image spam circulated on social media is not serious yet. However, in the mail system, spammers (the person who sends spam) showed various modifications to the spam image for neutralizing OCR, and therefore, the same situation can happen with spam images on social media. Spammers have been shown to interfere with OCR reading through geometric transformations such as image distortion, noise addition, and blurring. Various techniques have been studied to filter image spam, but at the same time, methods of interfering with image spam identification using obfuscated images are also continuously developing. In this paper, we propose a deep learning-based spam image detection model to improve the existing OCR-based spam image detection performance and compensate for vulnerabilities. The proposed model extracts text features and image features from the image using four sub-models. First, the OCR-based text model extracts the text-related features, whether the image contains spam words, and the word embedding vector from the input image. Then, the convolution neural network-based image model extracts image obfuscation and image feature vectors from the input image. The extracted feature is determined whether it is a spam image by the final spam image classifier. As a result of evaluating the F1-score of the proposed model, the performance was about 14 points higher than the OCR-based spam image detection performance.

Mortality Prediction of Older Adults Using Random Forest and Deep Learning (랜덤 포레스트와 딥러닝을 이용한 노인환자의 사망률 예측)

  • Park, Junhyeok;Lee, Songwook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.10
    • /
    • pp.309-316
    • /
    • 2020
  • We predict the mortality of the elderly patients visiting the emergency department who are over 65 years old using Feed Forward Neural Network (FFNN) and Convolutional Neural Network (CNN) respectively. Medical data consist of 99 features including basic information such as sex, age, temperature, and heart rate as well as past history, various blood tests and culture tests, and etc. Among these, we used random forest to select features by measuring the importance of features in the prediction of mortality. As a result, using the top 80 features with high importance is best in the mortality prediction. The performance of the FFNN and CNN is compared by using the selected features for training each neural network. To train CNN with images, we convert medical data to fixed size images. We acquire better results with CNN than with FFNN. With CNN for mortality prediction, F1 score and the AUC for test data are 56.9 and 92.1 respectively.