• Title/Summary/Keyword: Speech signals

Search Result 498, Processing Time 0.035 seconds

An Acoustic Echo Canceler Using Multimedia PC (멀티미디어 PC를 이용한 음향반향제거기)

  • 박장식;손경식
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 1998.04a
    • /
    • pp.122-127
    • /
    • 1998
  • NLMS algorithm is widely used as acoustic echo canceller. NLMS algorithm is degraded by the ambient noises and the near-end speech signals. In this paper, a robust acoustic echo cancellation algorithm is proposed. To enhance the echo cancellation performance, the step size of the proposed algorithm is normalized by the sum of the power of the reference signals and the primary signals. As results of comparing the excess mean square errors, it is shown that the proposed algorithm can enhance the performance of cancelling the echo signals. Some experiments, which is used multimedia personal computer, are carried out. As results of experiments, the proposed algorithm shows better performance than conventional ones.

  • PDF

Emotion Recognition using Short-Term Multi-Physiological Signals

  • Kang, Tae-Koo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.1076-1094
    • /
    • 2022
  • Technology for emotion recognition is an essential part of human personality analysis. To define human personality characteristics, the existing method used the survey method. However, there are many cases where communication cannot make without considering emotions. Hence, emotional recognition technology is an essential element for communication but has also been adopted in many other fields. A person's emotions are revealed in various ways, typically including facial, speech, and biometric responses. Therefore, various methods can recognize emotions, e.g., images, voice signals, and physiological signals. Physiological signals are measured with biological sensors and analyzed to identify emotions. This study employed two sensor types. First, the existing method, the binary arousal-valence method, was subdivided into four levels to classify emotions in more detail. Then, based on the current techniques classified as High/Low, the model was further subdivided into multi-levels. Finally, signal characteristics were extracted using a 1-D Convolution Neural Network (CNN) and classified sixteen feelings. Although CNN was used to learn images in 2D, sensor data in 1D was used as the input in this paper. Finally, the proposed emotional recognition system was evaluated by measuring actual sensors.

A New Feature for Speech Segments Extraction with Hidden Markov Models (숨은마코프모형을 이용하는 음성구간 추출을 위한 특징벡터)

  • Hong, Jeong-Woo;Oh, Chang-Hyuck
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.2
    • /
    • pp.293-302
    • /
    • 2008
  • In this paper we propose a new feature, average power, for speech segments extraction with hidden Markov models, which is based on mel frequencies of speech signals. The average power is compared with the mel frequency cepstral coefficients, MFCC, and the power coefficient. To compare performances of three types of features, speech data are collected for words with explosives which are generally known hard to be detected. Experiments show that the average power is more accurate and efficient than MFCC and the power coefficient for speech segments extraction in environments with various levels of noise.

Pitch Detection by the Analysis of Speech and EGG Signals (2-채널 (음성 및 EGG) 신호 분석에 의한 피치검출)

  • Shin, Mu-Yong;Kim, Jeong-Cheol;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.5
    • /
    • pp.5-12
    • /
    • 1996
  • We propose a two-channel(Speech & EGG) pitch detection algorithm. The EGG signal monitors the vibratory motion of vocal folds very well. Therefore, using the EGG signal as well as speech signal, we obtain a reliable and robust pitch detection algorithm that minimizers problems occuring in the pitch detection with speech only. The proposed algorithm gives precise pitch markers that are synchronized to the speech in the time domain. Experimental results demonstrate the superiority of the two-channel pitch detection algorithm over the conventional method, and it can be used in obtaining reference pitch for evaluation of other pitch detection algorithms.

  • PDF

On a Pitch Extraction of Speech Signal using Residual Signal of the Uniform Quantizer (균일양자화기의 잔여신호를 이용한 음성신호의 피치검출)

  • Bae, Myung-Jin;Han, Ki-Cheon;Cha, Jin-Jong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.36-40
    • /
    • 1997
  • In speech signal processing, it is necessary and important to detect exactly the pitch. The algorithms of pitch extraction which have been proposed until now are difficult exactly pitches over wide range speech signals. In this paper, thus, we proposed a new pitch detection algorithm that finds the fundamental period of speech signal in the residual signal quantized by the uniform quantizer as PCM. The proposed method shows little gross error of average 0.25% for clean speech and average 3.39% for SNR of 0dB. It also achieves results of the pitch contours, improving the accuracy of pitch detection in transient phonemes and noise environments.

  • PDF

Spectral characteristics of resonance disorders in submucosal type cleft palate patients (점막하구개열 환자 공명장애의 스펙트럼 특성 연구)

  • Kim, Hyun-Chul;Lee, Jong-Seok;Leem, Dae-Ho;Baek, Jin-A;Shin, Hyo-Keum;Kim, Hyun-Ki
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.152-154
    • /
    • 2007
  • Submucosal type cleft palate is subdivision of cleft palate. Because of late detection, the treatment - for example, the operation or the speech therapy - for the submucosal type cleft palate patient usually late. In this study, we want to find the objective characteristics of submucosal type cleft palate patient, comparing with the normal and the complete cleft palate patient. Experimental groups are 10 submucosal type cleft palate patients who got the operation in our hospital, 10 complete cleft palate patients. And, 10 normals as control group. The sentence patterns using in this study is simple 5 vowels. Using CSL program we evaluate the Formant, Bandwidth. We analized the spectral characteristics of speech signals of 3 groups, before and after the operation.

  • PDF

The Interlanguage Speech Intelligibility Benefit (ISIB) of English Prosody: The Case of Focal Prominence for Korean Learners of English and Natives

  • Lee, Joo-Kyeong;Han, Jeong-Im;Choi, Tae-Hwan;Lim, Injae
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.53-68
    • /
    • 2012
  • This study investigated the speech intelligibility of Korean-accented and native English focus speech for Korean and native English listeners. Three different types of focus in English, broad, narrow and contrastive, were naturally induced in semantically optimal dialogues. Seven high and seven low proficiency Korean speakers and seven native speakers participated in recording the stimuli with another native speaker. Fifteen listeners from each of Korean high & low proficiency and native groups judged audio signals of focus sentences. Results showed that Korean listeners were more accurate at identifying the focal prominence for Korean speakers' narrow focus speech than that of native speakers, and this suggests that the interlanguage speech intelligibility benefit-talker (ISIB-T) held true for narrow focus regardless of Korean speakers' and listeners' proficiency. However, Korean listeners did not outperform native listeners for Korean speakers' production of narrow focus, which did not support for the ISIB-listener (L). Broad and contrastive focus speech did not provide evidence for either the ISIB-T or ISIB-L. These findings are explained by the interlanguage shared by Korean speakers and listeners where they have established more L1-like common phonetic features and phonological representations. Once semantically and syntactically interpreted in a higher level processing in Korean narrow focus speech, the narrow focus was phonetically realized in a more intelligible way to Korean listeners due to the interlanguage. This may elicit ISIB. However, Korean speakers did not appear to make complete semantic/syntactic access to either broad or contrastive focus, which might lead to detrimental effects on lower level phonetic outputs in top-down processing. This is, therefore, attributed to the fact that Korean listeners did not take advantage over native listeners for Korean talkers and vice versa.

A Study on Processing of Speech Recognition Korean Words (한글 단어의 음성 인식 처리에 관한 연구)

  • Nam, Kihun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.4
    • /
    • pp.407-412
    • /
    • 2019
  • In this paper, we propose a technique for processing of speech recognition in korean words. Speech recognition is a technology that converts acoustic signals from sensors such as microphones into words or sentences. Most foreign languages have less difficulty in speech recognition. On the other hand, korean consists of vowels and bottom consonants, so it is inappropriate to use the letters obtained from the voice synthesis system. That improving the conventional structure speech recognition can the correct words recognition. In order to solve this problem, a new algorithm was added to the existing speech recognition structure to increase the speech recognition rate. Perform the preprocessing process of the word and then token the results. After combining the result processed in the Levenshtein distance algorithm and the hashing algorithm, the normalized words is output through the consonant comparison algorithm. The final result word is compared with the standardized table and output if it exists, registered in the table dose not exists. The experimental environment was developed by using a smartphone application. The proposed structure shows that the recognition rate is improved by 2% in standard language and 7% in dialect.

Blind Noise Separation Method of Convolutive Mixed Signals (컨볼루션 혼합신호의 암묵 잡음분리방법)

  • Lee, Haeng-Woo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.3
    • /
    • pp.409-416
    • /
    • 2022
  • This paper relates to the blind noise separation method of time-delayed convolutive mixed signals. Since the mixed model of acoustic signals in a closed space is multi-channel, a convolutive blind signal separation method is applied and time-delayed data samples of the two microphone input signals is used. For signal separation, the mixing coefficient is calculated using an inverse model rather than directly calculating the separation coefficient, and the coefficient update is performed by repeated calculations based on secondary statistical properties to estimate the speech signal. Many simulations were performed to verify the performance of the proposed blind signal separation. As a result of the simulation, noise separation using this method operates safely regardless of convolutive mixing, and PESQ is improved by 0.3 points compared to the general adaptive FIR filter structure.

Statistical Speech Feature Selection for Emotion Recognition

  • Kwon Oh-Wook;Chan Kwokleung;Lee Te-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.4E
    • /
    • pp.144-151
    • /
    • 2005
  • We evaluate the performance of emotion recognition via speech signals when a plain speaker talks to an entertainment robot. For each frame of a speech utterance, we extract the frame-based features: pitch, energy, formant, band energies, mel frequency cepstral coefficients (MFCCs), and velocity/acceleration of pitch and MFCCs. For discriminative classifiers, a fixed-length utterance-based feature vector is computed from the statistics of the frame-based features. Using a speaker-independent database, we evaluate the performance of two promising classifiers: support vector machine (SVM) and hidden Markov model (HMM). For angry/bored/happy/neutral/sad emotion classification, the SVM and HMM classifiers yield $42.3\%\;and\;40.8\%$ accuracy, respectively. We show that the accuracy is significant compared to the performance by foreign human listeners.