• Title/Summary/Keyword: Speech signals

Search Result 503, Processing Time 0.022 seconds

Word Recognition Using VQ and Fuzzy Theory (VQ와 Fuzzy 이론을 이용한 단어인식)

  • Kim, Ja-Ryong;Choi, Kap-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.4
    • /
    • pp.38-47
    • /
    • 1991
  • The frequency variation among speakers is one of problems in the speech recognition. This paper applies fuzzy theory to solve the variation problem of frequency features. Reference patterns are expressed by fuzzified patterns which are produced by the peak frequency and the peak energy extracted from codebooks which are generated from training words uttered by several speakers, as they should include common features of speech signals. Words are recognized by fuzzy inference which uses the certainty factor between the reference patterns and the test fuzzified patterns which are produced by the peak frequency and the peak energy extracted from the power spectrum of input speech signals. Practically, in computing the certainty factor, to reduce memory capacity and computation requirements we propose a new equation which calculates the improved certainty factor using only the difference between two fuzzy values. As a result of experiments to test this word recognition method by fuzzy interence with Korean digits, it is shown that this word recognition method using the new equation presented in this paper, can solve the variation problem of frequency features and that the memory capacity and computation requirements are reduced.

  • PDF

Low Rate Speech Coding Using the Harmonic Coding Combined with CELP Coding (하모닉 코딩과 CELP방법을 이용한 저 전송률 음성 부호화 방법)

  • 김종학;이인성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.26-34
    • /
    • 2000
  • In this paper, we propose a 4kbps speech coder that combines the harmonic vector excitation coding with time-separated transition coding. The harmonic vector excitation coding uses the harmonic excitation coding in the voiced frame and uses the vector excitation coding with the structure of analysis-by-synthesis in the unvoiced frame, respectively. But two mode coding method is not effective for transition frame mixed in voiced and unvoiced signal and a new method beyond using unvoiced/voiced mode coding is needed. Thus, we designed a time-separated transition coding method for transition frame in which a voiced/unvoiced decision algorithm separates unvoiced and voiced duration in a frame, and harmonic-harmonic excitation coding and vector-harmonic excitation coding method is selectively used depending on the previous frame U/V decision. In the decoder, the voiced excitation signals are generated efficiently through the inverse FFT of harmonic magnitudes and the unvoiced excitation signals are made by the inverse vector quantization. The reconstructed speech signal are synthesized by the Overlap/Add method.

  • PDF

A STUDY ON THE IMPLEMENTATION OF ARTIFICIAL NEURAL NET MODELS WITH FEATURE SET INPUT FOR RECOGNITION OF KOREAN PLOSIVE CONSONANTS (한국어 파열음 인식을 위한 피쳐 셉 입력 인공 신경망 모델에 관한 연구)

  • Kim, Ki-Seok;Kim, In-Bum;Hwang, Hee-Yeung
    • Proceedings of the KIEE Conference
    • /
    • 1990.07a
    • /
    • pp.535-538
    • /
    • 1990
  • The main problem in speech recognition is the enormous variability in acoustic signals due to complex but predictable contextual effects. Especially in plosive consonants it is very difficult to find invariant cue due to various contextual effects, but humans use these contextual effects as helpful information in plosive consonant recognition. In this paper we experimented on three artificial neural net models for the recognition of plosive consonants. Neural Net Model I used "Multi-layer Perceptron ". Model II used a variation of the "Self-organizing Feature Map Model". And Model III used "Interactive and Competitive Model" to experiment contextual effects. The recognition experiment was performed on 9 Korean plosive consonants. We used VCV speech chains for the experiment on contextual effects. The speech chain consists of Korean plosive consonants /g, d, b, K, T, P, k, t, p/ (/ㄱ, ㄷ, ㅂ, ㄲ, ㄸ, ㅃ, ㅋ, ㅌ, ㅍ/) and eight Korean monothongs. The inputs to Neural Net Models were several temporal cues - duration of the silence, transition and vot -, and the extent of the VC formant transitions to the presence of voicing energy during closure, burst intensity, presence of asperation, amount of low frequency energy present at voicing onset, and CV formant transition extent from the acoustic signals. Model I showed about 55 - 67 %, Model II showed about 60%, and Model III showed about 67% recognition rate.

  • PDF

Chaotic Speech Secure Communication Using Self-feedback Masking Techniques (자기피드백 마스킹 기법을 사용한 카오스 음성비화통신)

  • Lee, Ik-Soo;Ryeo, Ji-Hwan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.6
    • /
    • pp.698-703
    • /
    • 2003
  • This paper presents analog secure communication system about safe speech transmission using chaotic signals. We applied various conditions that happen in actuality communication environment modifying chaotic synchronization and chaotic communication schemes and analyzed restoration performance of speech signal to computer simulation. In transmitter, we made the chaotic masking signal which is added voice signal to chaotic signal using PC(Pecora & Carroll) and SFB(self-feedback) control techniques and transmitted encryption signal to noisy communication channel And in order to calculate the degree of restoration performance, we proposed the definition of analog average power of recovered error signals in receiver chaotic system. The simulation results show that feedback control techniques can certify that restoration performance is superior to quantitative data than PC method about masking degree, susceptibility of parameters and channel noise. We experimentally computed the table of relation of parameter fluxion to restoration error rate which is applied the encryption key values to the chaotic secure communication.

A Novel Approach to COVID-19 Diagnosis Based on Mel Spectrogram Features and Artificial Intelligence Techniques

  • Alfaidi, Aseel;Alshahrani, Abdullah;Aljohani, Maha
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.195-207
    • /
    • 2022
  • COVID-19 has remained one of the most serious health crises in recent history, resulting in the tragic loss of lives and significant economic impacts on the entire world. The difficulty of controlling COVID-19 poses a threat to the global health sector. Considering that Artificial Intelligence (AI) has contributed to improving research methods and solving problems facing diverse fields of study, AI algorithms have also proven effective in disease detection and early diagnosis. Specifically, acoustic features offer a promising prospect for the early detection of respiratory diseases. Motivated by these observations, this study conceptualized a speech-based diagnostic model to aid in COVID-19 diagnosis. The proposed methodology uses speech signals from confirmed positive and negative cases of COVID-19 to extract features through the pre-trained Visual Geometry Group (VGG-16) model based on Mel spectrogram images. This is used in addition to the K-means algorithm that determines effective features, followed by a Genetic Algorithm-Support Vector Machine (GA-SVM) classifier to classify cases. The experimental findings indicate the proposed methodology's capability to classify COVID-19 and NOT COVID-19 of varying ages and speaking different languages, as demonstrated in the simulations. The proposed methodology depends on deep features, followed by the dimension reduction technique for features to detect COVID-19. As a result, it produces better and more consistent performance than handcrafted features used in previous studies.

Overlap and Add Sinusoidal Synthesis Method of Speech Signal Lising the Damping Harmonic Magnitude Parameter (감쇄(damping) 하모닉 크기 파라미터를 이용한 음성의 중첩합산 정현파 합성 방법)

  • Park, Jong-Bae;Kim, Young-Joon;Lee, In-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.3C
    • /
    • pp.251-256
    • /
    • 2009
  • In this paper, we propose a new method with the improved continuity performance of overlap and add speech signal synthesis method using damping harmonic amplitude parameter. The existing method uses the average value of past and current parameters for the sinusoidal amplitude used as the weight of phase error function. But, the proposed method extracts the more accurate sinusoidal amplitude by using a correlation between the original signals and the synthesized signals for the sinusodal amplitude used as the weights. To verify the performance of the proposed method, we observed the average differential error value between the synthesized signals.

SPECTRAL CHARACTERISTICS OF RESONANCE DISORDERS IN SUBMUCOSAL TYPE CLEFT PALATE PATIENTS (점막하 구개열 환자 공명장애의 스펙트럼 특성 연구)

  • Kim, Hyun-Chul;Leem, Dae-Ho;Baek, Jin-A;Shin, Hyo-Keun;Kim, Oh-Hwan;Kim, Hyun-Ki
    • Maxillofacial Plastic and Reconstructive Surgery
    • /
    • v.28 no.4
    • /
    • pp.310-319
    • /
    • 2006
  • Submucosal type cleft palate is subdivision of the cleft palate. It is very difficult to find submucosal cleft, because when we exam submucosal type cleft palate patients, it seems to be normal. But in fact, there are abnormal union of palatal muscles of submucosal type cleft palate patients. Because of late detection, the treatment - for example, the operation or the speech therapy - for the submucosal type cleft palate patient usually becomes late. Some patients visited our hospital due to speech disorder nevertheless normal intraoral appearance. After precise intraoral examination, we found out submucosal cleft palate. We evaluated the speech before and after surgery of these patients. In this study, we want to find the objective characteristics of submucosal type cleft palate patients, comparing with the normal and the complete cleft palate patients. Experimental groups were 10 submucosal type cleft palate patients and 10 complete cleft palate patients who got the operation in our hospital. And, the controls were 10 normal person. The sentence patterns using in this study were simple 5 vowels. Using CSL program we evaluated the Formant, Bandwidth. We analized the spectral characteristics of speech signals of 3 groups, before and after the operation. In most cases, the formant scores were higher in experimental groups (complete cleft palate group and submucosal type cleft palate group) than controls. There were small differences when speeching /a/, /i/, /e/ between experimental groups and control groups, large differences when speeching /o/, /u/. After surgery the formant scores were decreased in experimental groups (complete cleft palate group and submucosal type cleft palate group). In bandwidth scores, there were no significant differences between experimental groups and controls.

16kbps Windeband Sideband Speech Codec (16kbps 광대역 음성 압축기 개발)

  • 박호종;송재종
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1
    • /
    • pp.5-10
    • /
    • 2002
  • This paper proposes new 16 kbps wideband speech codec with bandwidth of 7 kHz. The proposed codec decomposes the input speech signal into low-band and high-band signals using QMF (Quadrature Mirror Filter), then AMR (Adaptive Multi Rate) speech codec processes the low-band signal and new transform-domain codec based on G.722.1 wideband cosec compresses the high-band signal. The proposed codec allocates different number of bits to each band in an adaptive way according to the property of input signal, which provides better performance than the codec with the fixed bit allocation scheme. In addition, the proposed cosec processes high-band signal using wavelet transform for better performance. The performance of proposed codec is measured in a subjective method. and the simulations with various speech data show that the proposed coders has better performance than G.722 48 kbps SB-ADPCM.

A Study on Pitch Extraction Method using FIR-STREAK Digital Filter (FIR-STREAK 디지털 필터를 사용한 피치추출 방법에 관한 연구)

  • Lee, Si-U
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.1
    • /
    • pp.247-252
    • /
    • 1999
  • In order to realize a speech coding at low bit rates, a pitch information is useful parameter. In case of extracting an average pitch information form continuous speech, the several pitch errors appear in a frame which consonant and vowel are coexistent; in the boundary between adjoining frames and beginning or ending of a sentence. In this paper, I propose an Individual Pitch (IP) extraction method using residual signals of the FIR-STREAK digital filter in order to restrict the pitch extraction errors. This method is based on not averaging pitch intervals in order to accomodate the changes in each pitch interval. As a result, in case of Ip extraction method suing FIR-STREAK digital filter, I can't find the pitch errors in a frame which consonant and vowel are consistent; in the boundary between adjoining frames and beginning or ending of a sentence. This method has the capability of being applied to many fields, such as speech coding, speech analysis, speech synthesis and speech recognition.

  • PDF

Speech Quality Estimation Algorithm using a Harmonic Modeling of Reverberant Signals (반향 음성 신호의 하모닉 모델링을 이용한 음질 예측 알고리즘)

  • Yang, Jae-Mo;Kang, Hong-Goo
    • Journal of Broadcast Engineering
    • /
    • v.18 no.6
    • /
    • pp.919-926
    • /
    • 2013
  • The acoustic signal from a distance sound source in an enclosed space often produces reverberant sound that varies depending on room impulse response. The estimation of the level of reverberation or the quality of the observed signal is important because it provides valuable information on the condition of system operating environment. It is also useful for designing a dereverberation system. This paper proposes a speech quality estimation method based on the harmonicity of received signal, a unique characteristic of voiced speech. At first, we show that the harmonic signal modeling to a reverberant signal is reasonable. Then, the ratio between the harmonically modeled signal and the estimated non-harmonic signal is used as a measure of standard room acoustical parameter, which is related to speech clarity. Experimental results show that the proposed method successfully estimates speech quality when the reverberation time varies from 0.2s to 1.0s. Finally, we confirm the superiority of the proposed method in both background noise and reverberant environments.