• Title/Summary/Keyword: speech communication

Search Result 888, Processing Time 0.028 seconds

A Multimodal Emotion Recognition Using the Facial Image and Speech Signal

  • Go, Hyoun-Joo;Kim, Yong-Tae;Chun, Myung-Geun
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.1
    • /
    • pp.1-6
    • /
    • 2005
  • In this paper, we propose an emotion recognition method using the facial images and speech signals. Six basic emotions including happiness, sadness, anger, surprise, fear and dislike are investigated. Facia] expression recognition is performed by using the multi-resolution analysis based on the discrete wavelet. Here, we obtain the feature vectors through the ICA(Independent Component Analysis). On the other hand, the emotion recognition from the speech signal method has a structure of performing the recognition algorithm independently for each wavelet subband and the final recognition is obtained from the multi-decision making scheme. After merging the facial and speech emotion recognition results, we obtained better performance than previous ones.

Parts-Based Feature Extraction of Spectrum of Speech Signal Using Non-Negative Matrix Factorization

  • Park, Jeong-Won;Kim, Chang-Keun;Lee, Kwang-Seok;Koh, Si-Young;Hur, Kang-In
    • Journal of information and communication convergence engineering
    • /
    • v.1 no.4
    • /
    • pp.209-212
    • /
    • 2003
  • In this paper, we proposed new speech feature parameter through parts-based feature extraction of speech spectrum using Non-Negative Matrix Factorization (NMF). NMF can effectively reduce dimension for multi-dimensional data through matrix factorization under the non-negativity constraints, and dimensionally reduced data should be presented parts-based features of input data. For speech feature extraction, we applied Mel-scaled filter bank outputs to inputs of NMF, than used outputs of NMF for inputs of speech recognizer. From recognition experiment result, we could confirm that proposed feature parameter is superior in recognition performance than mel frequency cepstral coefficient (MFCC) that is used generally.

Real-time implementation and performance evaluation of speech classifiers in speech analysis-synthesis

  • Kumar, Sandeep
    • ETRI Journal
    • /
    • v.43 no.1
    • /
    • pp.82-94
    • /
    • 2021
  • In this work, six voiced/unvoiced speech classifiers based on the autocorrelation function (ACF), average magnitude difference function (AMDF), cepstrum, weighted ACF (WACF), zero crossing rate and energy of the signal (ZCR-E), and neural networks (NNs) have been simulated and implemented in real time using the TMS320C6713 DSP starter kit. These speech classifiers have been integrated into a linear-predictive-coding-based speech analysis-synthesis system and their performance has been compared in terms of the percentage of the voiced/unvoiced classification accuracy, speech quality, and computation time. The results of the percentage of the voiced/unvoiced classification accuracy and speech quality show that the NN-based speech classifier performs better than the ACF-, AMDF-, cepstrum-, WACF- and ZCR-E-based speech classifiers for both clean and noisy environments. The computation time results show that the AMDF-based speech classifier is computationally simple, and thus its computation time is less than that of other speech classifiers, while that of the NN-based speech classifier is greater compared with other classifiers.

Robust Entropy Based Voice Activity Detection Using Parameter Reconstruction in Noisy Environment

  • Han, Hag-Yong;Lee, Kwang-Seok;Koh, Si-Young;Hur, Kang-In
    • Journal of information and communication convergence engineering
    • /
    • v.1 no.4
    • /
    • pp.205-208
    • /
    • 2003
  • Voice activity detection is a important problem in the speech recognition and speech communication. This paper introduces new feature parameter which are reconstructed by spectral entropy of information theory for robust voice activity detection in the noise environment, then analyzes and compares it with energy method of voice activity detection and performance. In experiments, we confirmed that spectral entropy and its reconstructed parameter are superior than the energy method for robust voice activity detection in the various noise environment.

Voice Portal based on SMS Authentication at CTI Module Implementation by Speech Recognition (SMS 인증 기반의 보이스포탈에서의 음성인식을 위한 CTI 모듈 구현)

  • Oh, Se-Il;Kim, Bong-Hyun;Koh, Jin-Hwan;Park, Won-Tea
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2001.04b
    • /
    • pp.1177-1180
    • /
    • 2001
  • 전화를 통해 인터넷 정보를 들을 수 있는 보이스 포탈(Voice Portal) 서비스가 인기를 얻고 있다. Voice Portal 서비스란 알고자 하는 정보를 Speech Recognition System에 음성으로 명령하면 전화를 통해 음성으로 원하는 정보를 듣는 서비스이다. Authentication의 절차를 수행하는 SMS (Short Message Service) 서버 Module, PSTN과 Database 서버사이의 Interface를 제공하는 CTI (Computer Telephony Integration) Module, CTI 서버와 WWW (World Wide Web) 사이의 Voice XML Module, 정보를 검색하기 위한 Searching Module들이 필요하다. 본 논문은 Speech Recognition technology를 기반으로 한 CTI Module 설계를 구현하였다. 또한 인정 방식으로 Random한 일회용 password를 기반으로 한 SMS Authentication을 택하므로 더욱 더 안정된 서비스 제공을 목적으로 하였다.

  • PDF

Application of the Wavelet transformation to denoising and analyzing the speech

  • Hung Phan Duy;Lan Huong Nguyen Thi;Ngoc Yen Pham Thi;Castelli Eric
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.249-253
    • /
    • 2004
  • Wavelet transform (WT) has attracted most engineers and scientists because of its excellent properties. The coherence of practical approach and a theoretical basis not only solves currently important problems, but also gives the potential of formulating and solving completely new problems. It has been show that multi-resolution analysis of Wavelet transforms is good solution in speech analysis and threshold of wavelet coefficients has near optimal noise reduction property for many classes of signals. This paper proposed applications of wavelet in speech processing: pitch detection, voice-unvoice (V -UV) decision, denoising with the detailed algorithms and results.

  • PDF

A Study on User Authentication for Wireless Communication Security in the Telematics Environment (텔레메틱스 환경에서 무선통신 보안을 위한 사용자 인증에 관한 연구)

  • Kim, Hyoung-Gook
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.9 no.2
    • /
    • pp.104-109
    • /
    • 2010
  • In this paper, we propose a user authentication technology to protect wiretapping and attacking from others in the telematics environment, which users in vehicle can use internet service in local area network via mobile device. In the proposed user authentication technology, the packet speech data is encrypted by speech-based biometric key, which is generated from the user's speech signal. Thereafter, the encrypted data packet is submitted to the information communication server(ICS). At the ICS, the speech feature of the user is reconstructed from the encrypted data packet and is compared with the preregistered speech-based biometric key for user authentication. Based on implementation of our proposed communication method, we confirm that our proposed method is secure from various attack methods.

HUMAN MOTION AND SPEECH ANALYSIS TO CONSTRUCT DECISION MODEL FOR A ROBOT TO END COMMUNICATING WITH A HUMAN

  • Otsuka, Naoki;Murakami, Makoto
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.01a
    • /
    • pp.719-722
    • /
    • 2009
  • The purpose of this paper is to develop a robot that moves independently, communicates with a human, and explicitly extracts information from the human mind that is rarely expressed verbally. In a spoken dialog system for information collection, it is desirable to continue communicating with the user as long as possible, but not if the user does not wish to communicate. Therefore, the system should be able to terminate the communication before the user starts to object to using it. In this paper, to enable the construction of a decision model for a system to decide when to stop communicating with a human, we acquired speech and motion data from individuals who were asked many questions by another person. We then analyze their speech and body motion when they do not mind answering the questions, and also when they wish the questioning to cease. From the results, we can identify differences in speech power, length of pauses, speech rate, and body motion.

  • PDF