• Title/Summary/Keyword: Simulation speech

Search Result 301, Processing Time 0.021 seconds

A DSP Implementation of Subband Sound Localization System

  • Park, Kyusik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4E
    • /
    • pp.52-60
    • /
    • 2001
  • This paper describes real time implementation of subband sound localization system on a floating-point DSP TI TMS320C31. The system determines two dimensional location of an active speaker in a closed room environment with real noise presents. The system consists of an two microphone array connected to TI DSP hosted by PC. The implemented sound localization algorithm is Subband CPSP which is an improved version of traditional CPSP (Cross-Power Spectrum Phase) method. The algorithm first split the input speech signal into arbitrary number of subband using subband filter banks and calculate the CPSP in each subband. It then averages out the CPSP results on each subband and compute a source location estimate. The proposed algorithm has an advantage over CPSP such that it minimize the overall estimation error in source location by limiting the specific band dominant noise to that subband. As a result, it makes possible to set up a robust real time sound localization system. For real time simulation, the input speech is captured using two microphone and digitized by the DSP at sampling rate 8192 hz, 16 bit/sample. The source location is then estimated at once per second to satisfy real-time computational constraints. The performance of the proposed system is confirmed by several real time simulation of the speech at a distance of 1m, 2m, 3m with various speech source locations and it shows over 5% accuracy improvement for the source location estimation.

  • PDF

Deep Learning-Based Speech Emotion Recognition Technology Using Voice Feature Filters (음성 특징 필터를 이용한 딥러닝 기반 음성 감정 인식 기술)

  • Shin Hyun Sam;Jun-Ki Hong
    • The Journal of Bigdata
    • /
    • v.8 no.2
    • /
    • pp.223-231
    • /
    • 2023
  • In this study, we propose a model that extracts and analyzes features from deep learning-based speech signals, generates filters, and utilizes these filters to recognize emotions in speech signals. We evaluate the performance of emotion recognition accuracy using the proposed model. According to the simulation results using the proposed model, the average emotion recognition accuracy of DNN and RNN was very similar, at 84.59% and 84.52%, respectively. However, we observed that the simulation time for DNN was approximately 44.5% shorter than that of RNN, enabling quicker emotion prediction.

A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection

  • Iem, Byeong-Gwan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.4
    • /
    • pp.276-280
    • /
    • 2016
  • A fixed rate speech coder based on the filter bank and the non-uniform sampling technique is proposed. The non-uniform sampling is achieved by the detection of inflection points (IPs). A speech block is band passed by the filter bank, and the subband signals are processed by the IP detector, and the detected IP patterns are compared with entries of the IP database. For each subband signal, the address of the closest member of the database and the energy of the IP pattern are transmitted through channel. In the receiver, the decoder recovers the subband signals using the received addresses and the energy information, and reconstructs the speech via the filter bank summation. As results, the coder shows fixed data rate contrary to the existing speech coders based on the non-uniform sampling. Through computer simulation, the usefulness of the proposed technique is confirmed. The signal-to-noise ratio (SNR) performance of the proposed method is comparable to that of the uniform sampled pulse code modulation (PCM) below 20 kbps data rate.

Speech Enhancement Using Microphone Array with MMSE-STSA Estimator Based Post-Processing (MMSE-STSA 추정치에 기반한 후처리를 갖는 마이크로폰 배열을 이용한 음성 개선)

  • Kwon Hong Seok;Son Jong Mok;Bae Keun Sung
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.187-190
    • /
    • 2002
  • In this paper, a speech enhancement system using microphone array with MMSE-STSA (Minimum Mean Square Error-Short Time Spectral Amplitude) estimator based post-processing is proposed. Speech enhancement is first carried out by conventional delay-and-sum beamforming (DSB). A new MMSE-STSA estimator is then obtained by refining MMSE-STSA estimators from each microphone, which is applied to the output of conventional DSB to obtain additional speech enhancement. Computer simulation for white and pink noises show that the proposed system is superior to other approaches.

  • PDF

An Proposal and Evaluation of the New formant Tracking Algorithm for Speech Recognition (음성인식을 위한 새로운 포만트트랙킹 알고리즘의 제안과 평가)

  • 송정영
    • Journal of Internet Computing and Services
    • /
    • v.3 no.4
    • /
    • pp.51-59
    • /
    • 2002
  • For the speech recognition, this paper proposes a improved new formant tracking algorithm The recognition data for the simulation on this paper are used with the Korean digit speech. The recognition rate of the improved algorithm for the Korean digit speech shows 91% for 300 digit speech The effectiveness of this research has been confirmed through recognition simulations.

  • PDF

Multi-Pulse Amplitude and Location Estimation by Maximum-Likelihood Estimation in MPE-LPC Speech Synthesis (MPE-LPC음성합성에서 Maximum- Likelihood Estimation에 의한 Multi-Pulse의 크기와 위치 추정)

  • 이기용;최홍섭;안수길
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.9
    • /
    • pp.1436-1443
    • /
    • 1989
  • In this paper, we propose a maximum-likelihood estimation(MLE) method to obtain the location and the amplitude of the pulses in MPE( multi-pulse excitation)-LPC speech synthesis using multi-pulses as excitation source. This MLE method computes the value maximizing the likelihood function with respect to unknown parameters(amplitude and position of the pulses) for the observed data sequence. Thus in the case of overlapped pulses, the method is equivalent to Ozawa's crosscorrelation method, resulting in equal amount of computation and sound quality with the cross-correlation method. We show by computer simulation: the multi-pulses obtained by MLE method are(1) pseudo-periodic in pitch in the case of voicde sound, (2) the pulses are random for unvoiced sound, (3) the pulses change from random to periodic in the interval where the original speech signal changes from unvoiced to voiced. Short time power specta of original speech and syunthesized speech obtained by using multi-pulses as excitation source are quite similar to each other at the formants.

  • PDF

Fixed Point Implementation of the QCELP Speech Coder

  • Yoon, Byung-Sik;Kim, Jae-Won;Lee, Won-Myoung;Jang, Seok-Jin;Choi, Song_in;Lim, Myoung-Seon
    • ETRI Journal
    • /
    • v.19 no.3
    • /
    • pp.242-258
    • /
    • 1997
  • The Qualcomm code excited linear prediction (QCELP) speech coder was adopted to increase the capacity of the CDMA Mobile System (CMS). In this paper, we implemented the QCELP speech coding algorithm by using TMS320C50 fixed point DSP chip. Also the fixed point simulation was done with C language. The computation complexity of QCELP on TMS320C50 was 10k words and data memory was 4k words. In the normal call test on the CMS, where mobile to mobile call test was done in the bypass mode without double vocoding, mean opinion score for the speech quality was he Qualcomm code excited linear prediction (QCELP) speech quality was 3.11.

  • PDF

Korean ESL Learners' Perception of English Segments: a Cochlear Implant Simulation Study (인공와우 시뮬레이션에서 나타난 건청인 영어학습자의 영어 말소리 지각)

  • Yim, Ae-Ri;Kim, Dahee;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.91-99
    • /
    • 2014
  • Although it is well documented that patients with cochlear implant experience hearing difficulties when processing their first language, very little is known whether or not and to what extent cochlear implant patients recognize segments in a second language. This preliminary study examines how Korean learners of English identify English segments in a normal hearing and cochlear implant simulation conditions. Participants heard English vowels and consonants in the following three conditions: normal hearing condition, 12-channel noise vocoding with 0mm spectral shift, and 12-channel noise vocoding with 3mm spectral shift. Results confirmed that nonnative listeners could also retrieve spectral information from vocoded speech signal, as they recognized vowel features fairly accurately despite the vocoding. In contrast, the intelligibility of manner and place features of consonants was significantly decreased by vocoding. In addition, we found that spectral shift affected listeners' vowel recognition, probably because information regarding F1 is diminished by spectral shifting. Results suggest that patients with cochlear implant and normal hearing second language learners would experience different patterns of listening errors when processing their second language(s).

Comparison of Sound Pressure Level and Speech Intelligibility of Emergency Broadcasting System at T-junction Corridor Space (T자형 복도 공간의 비상 방송용 확성기 배치별 음압 레벨과 음성 명료도 비교)

  • Jeong, Jeong-Ho;Lee, Sung-Chan
    • Fire Science and Engineering
    • /
    • v.33 no.1
    • /
    • pp.105-112
    • /
    • 2019
  • In this study, an architectural acoustics simulation was conducted to examine the clear and uniform transmission of emergency broadcasting sound in a T junction corridor space. The sound absorption performance of the corridor space and the location and spacing of the loudspeaker for emergency broadcasting were varied. The distribution of the sound pressure level and the distribution of sound transmission indices (STI, RASTI) were compared. The simulation showed that the loudspeaker for emergency broadcasting should be installed approximately 10 m from the center of the T junction corridor connection for clear voice transmission. Narrowing the 25 m installation interval of the NFSC shows that an even clearer and sufficient volume of emergency broadcast sound can be delivered evenly.

Comparison of Sound Pressure Level and Speech Intelligibility of Emergency Broadcasting System at Longitudinal Corridor (장방향 복도 공간의 비상방송설비에 대한 음압 레벨과 음성 명료도 비교)

  • Jeong, Jeong-Ho;Lee, Sung-Chan
    • Fire Science and Engineering
    • /
    • v.32 no.4
    • /
    • pp.42-49
    • /
    • 2018
  • In this study, in order to investigate whether or not the emergency broadcasting sound generated from an emergency broadcasting speaker is clearly transmitted to the occupant through architectural sound simulation, when the loudspeaker for emergency broadcasting is installed at intervals of 25 m according to NFSC 202 for a rectangular hallway. The sound pressure level and speech intelligibility index were analyzed according to changes in building finishing materials. With a reflective material finishing, sound pressure level satisfied the standard while speech intelligibility index was low. As a result of applying the sound absorbing material finishing, clarity and speech transmission index was improved to a level that could be understood by the occupant, whereas the sound pressure level delivered to the occupant decreased in the same space.