• 제목/요약/키워드: Image to Speech

검색결과 190건 처리시간 0.026초

딥 뉴럴 네트워크 지원을 위한 뉴로모픽 소프트웨어 플랫폼 기술 동향 (Trends in Neuromorphic Software Platform for Deep Neural Network)

  • 유미선;하영목;김태호
    • 전자통신동향분석
    • /
    • 제33권4호
    • /
    • pp.14-22
    • /
    • 2018
  • Deep neural networks (DNNs) are widely used in various domains such as speech and image recognition. DNN software frameworks such as Tensorflow and Caffe contributed to the popularity of DNN because of their easy programming environment. In addition, many companies are developing neuromorphic processing units (NPU) such as Tensor Processing Units (TPUs) and Graphical Processing Units (GPUs) to improve the performance of DNN processing. However, there is a large gap between NPUs and DNN software frameworks due to the lack of framework support for various NPUs. A bridge for the gap is a DNN software platform including DNN optimized compilers and DNN libraries. In this paper, we review the technical trends of DNN software platforms.

가변적 템플릿 메모리를 갖는 디지털 프로그래머블 CNN 구현에 관한 연구 (A study on implementation digital programmable CNN with variable template memory)

  • 윤유권;문성룡
    • 전자공학회논문지C
    • /
    • 제34C권10호
    • /
    • pp.59-66
    • /
    • 1997
  • Neural networks has widely been be used for several practical applications such as speech, image processing, and pattern recognition. Thus, a approach to the voltage-controlled current source in areas of neural networks, the key features of CNN in locally connected only to its netighbors. Because the architecture of the interconnection elements between cells in very simple and space invariant, CNNs are suitable for VLSI implementation. In this paper, processing element of digital programmable CNN with variable template memory was implemented using CMOS circuit. CNN PE circuit was designe dto control gain for obtaining the optimal solutions in the CNN output. Performance of operation for 4*4 CNN circuit applied for fixed template and variable template analyzed with the result of simulation using HSPICE tool. As a result of simulations, the proposed variable template method verified to improve performance of operation in comparison with the fixed template method.

  • PDF

Auxiliary Stacked Denoising Autoencoder based Collaborative Filtering Recommendation

  • Mu, Ruihui;Zeng, Xiaoqin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권6호
    • /
    • pp.2310-2332
    • /
    • 2020
  • In recent years, deep learning techniques have achieved tremendous successes in natural language processing, speech recognition and image processing. Collaborative filtering(CF) recommendation is one of widely used methods and has significant effects in implementing the new recommendation function, but it also has limitations in dealing with the problem of poor scalability, cold start and data sparsity, etc. Combining the traditional recommendation algorithm with the deep learning model has brought great opportunity for the construction of a new recommender system. In this paper, we propose a novel collaborative recommendation model based on auxiliary stacked denoising autoencoder(ASDAE), the model learns effective the preferences of users from auxiliary information. Firstly, we integrate auxiliary information with rating information. Then, we design a stacked denoising autoencoder based collaborative recommendation model to learn the preferences of users from auxiliary information and rating information. Finally, we conduct comprehensive experiments on three real datasets to compare our proposed model with state-of-the-art methods. Experimental results demonstrate that our proposed model is superior to other recommendation methods.

오디오 신호에 기반한 음란 동영상 판별 (Classification of Phornographic Videos Based on the Audio Information)

  • 김봉완;최대림;이용주
    • 대한음성학회지:말소리
    • /
    • 제63호
    • /
    • pp.139-151
    • /
    • 2007
  • As the Internet becomes prevalent in our lives, harmful contents, such as phornographic videos, have been increasing on the Internet, which has become a very serious problem. To prevent such an event, there are many filtering systems mainly based on the keyword-or image-based methods. The main purpose of this paper is to devise a system that classifies pornographic videos based on the audio information. We use the mel-cepstrum modulation energy (MCME) which is a modulation energy calculated on the time trajectory of the mel-frequency cepstral coefficients (MFCC) as well as the MFCC as the feature vector. For the classifier, we use the well-known Gaussian mixture model (GMM). The experimental results showed that the proposed system effectively classified 98.3% of pornographic data and 99.8% of non-pornographic data. We expect the proposed method can be applied to the more accurate classification system which uses both video and audio information.

  • PDF

고정 소수점 연산에 의한 고속 DCT 알고리듬의 오차해석 (A Fixed-Point Error Analysis of fast DCT Algorithms)

  • 연일동;이상욱
    • 대한전기학회논문지
    • /
    • 제40권4호
    • /
    • pp.331-341
    • /
    • 1991
  • The discrete cosine transform (DCT) is widely used in many signal processing areas, including image and speech data compression. In this paper, we investigate a fixed-point error analysis for fast DCT algorithms, namely, Lee [6], Hou [7] and Vetterli [8]. A statistical model for fixed-point error is analyzed to predict the output noise due to the fixed-point implementation. This paper deals with two's complement fixed-point data representation with truncation and rounding. For a comparison purpose, we also investigate the direct form DCT algorithm. We also propose a suitable scaling model for the fixed-point implementation to avoid an overflow occurring in the addition operation. Computer simulation results reveal that there is a close agreement between the theoretical and the experimental results. The result shows that Vetterli's algorithm is better than the other algorithms in terms of SNR.

  • PDF

지능형 로봇 아이로비큐(IrobiQ)를 활용한 학교폭력 예방 프로그램 개발 (Contents Development of IrobiQ on School Violence Prevention Program for Young Children)

  • 현은자;이하원;연혜민
    • 한국콘텐츠학회논문지
    • /
    • 제13권9호
    • /
    • pp.455-466
    • /
    • 2013
  • 본 연구의 목적은 지능형 로봇 IrobiQ를 활용한 유아용 학교폭력 예방교육 프로그램 [모두 지킴이]를 개발하는 것이다. 개발 내용은 첫째, 현장에서 실제 발생될 수 있는 폭력 유형인 집단 따돌림(왕따), 성폭력 그리고 기본 인성교육이다. 둘째, 각 주제에 적합한 활동형태는 대집단, 개별, 소집단, 자유선택활동 및 학교와 부모의 연계를 목적으로 하는 부모교육이다. 셋째, 활동유형은 동시, 동화, 동요, 미술, 이야기 나누기 등이다. 넷째, 콘텐츠는 이미지, TTS(text to speech), 터치기능, 음량인식기능 및 녹음기능 등을 활용하여 제작하였다. 본 콘텐츠를 유아에게 적용하고 30명의 전문가들을 대상으로 시연하여 수용성 설문을 실시한 결과, 긍정 반응을 보였다. 본 연구의 결과는 로봇을 활용한 학교 폭력 예방 프로그램의 효과를 최적화하기 위한 기초 자료로서 상호 작용성을 보다 증진시킬 수 있는 추후 연구를 제안한다.

웨이블릿 부호화 자기공명영상 (Wavelet Encoded MR Imaging)

  • 김응규;이수종
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2005년도 추계종합학술대회
    • /
    • pp.343-346
    • /
    • 2005
  • In this study, a basic concept of wavelet encoding and its advantages over Fourier based phase encoding application. Wavelet encoding has been proposed as an alternative way to Fourier based phase encoding in magnetic resonance imaging. In wavelet encoding, the RF pulse is designed to generate wavelet-shaped excitation profile of spins. From the resulting echo signals, the wavelet transform coefficients of spin distribution are acquired and an original spin density is reconstructed from wavelet expansion. Wavelet encoding has several advantages over phase encoding. By minimizing redundancy of the data acquisition in a dynamic series of images, we can avoid some encoding steps without serious loss of quality in reconstructed image. This strategy may be regarded as data compression during imaging. Although there are some limitations in wavelet encoding, it is a promising scheme in a dynamic imaging.

  • PDF

성도 자기공명 영상과 음향정보(F1/F2)를 이용한 한국어 단모음 [이, 에, 아, 오, 우, 으] 판별 (A Vowel Discrimination of Korean Monophthongs [i, e, a, o, u, ${\omega}$] Using Vocal Tract Magnetic Resonance Image and F1/F2)

  • 성철재;박종원;김귀룡
    • 대한음성학회지:말소리
    • /
    • 제56호
    • /
    • pp.103-125
    • /
    • 2005
  • We present a new method of measuring the volume and cross-sectional area of the vocal tract from magnetic resonance images. The vocal tract was divided by the 2 constriction points on the horizontal and vertical planes. The ratios of the volumes of the segment vocal tracts to that of the entire vocal tract play a crucial role in discriminating Korean monophthongs in that vowels were successfully discriminated by the ratios. The discriminant analysis also demonstrated that the acoustic parameters F1 and F2, in addition to the segment volumes, serve as significant parameters in discriminating Korean monophthongs.

  • PDF

IMT2000을 위한 하중 hadamard 변환의 다차원 reverse jacket 매트릭스의 서브샘플링 (The multidimensional subsampling of reverse jacket matrix of wighted hadamard transform for IMT2000)

  • 박주용;이문호
    • 한국통신학회논문지
    • /
    • 제22권11호
    • /
    • pp.2512-2520
    • /
    • 1997
  • The classes of Reverse Jacket matrix [RJ]$_{N}$ and the corresponding Restclass Reverse Jacket matrix ([RRJ]$_{N}$) are defined;the main property of [RJ]$_{N}$ is that the inverse matrices of them can be obtained very easily and have a special structure. [RJ]$_{N}$ is derived from the weighted hadamard Transform corresponding to hadamard matrix [H]$_{N}$ and a basic symmertric matrix D. the classes of [RJ]$_{2}$ can be used as a generalize Quincunx subsampling matrix and serveral polygonal subsampling matrices. In this paper, we will present in particular the systematical block-wise extending-method for {RJ]$_{N}$. We have deduced a new orthorgonal matrix $M_{1}$.mem.[RRJ]$_{N}$ from a nonorthogonal matrix $M_{O}$.mem.[RJ]$_{N}$. These matrices can be used to develop efficient algorithms in IMT2000 signal processing, multidimensional subsampling, spectrum analyzers, and signal screamblers, as well as in speech and image signal processing.gnal processing.g.

  • PDF

Convolutional Neural Networks for Character-level Classification

  • Ko, Dae-Gun;Song, Su-Han;Kang, Ki-Min;Han, Seong-Wook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제6권1호
    • /
    • pp.53-59
    • /
    • 2017
  • Optical character recognition (OCR) automatically recognizes text in an image. OCR is still a challenging problem in computer vision. A successful solution to OCR has important device applications, such as text-to-speech conversion and automatic document classification. In this work, we analyze character recognition performance using the current state-of-the-art deep-learning structures. One is the AlexNet structure, another is the LeNet structure, and the other one is the SPNet structure. For this, we have built our own dataset that contains digits and upper- and lower-case characters. We experiment in the presence of salt-and-pepper noise or Gaussian noise, and report the performance comparison in terms of recognition error. Experimental results indicate by five-fold cross-validation that the SPNet structure (our approach) outperforms AlexNet and LeNet in recognition error.