• Title/Summary/Keyword: Noise speech data

Search Result 144, Processing Time 0.024 seconds

Chaotic Speech Secure Communication Using Self-feedback Masking Techniques (자기피드백 마스킹 기법을 사용한 카오스 음성비화통신)

  • Lee, Ik-Soo;Ryeo, Ji-Hwan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.6
    • /
    • pp.698-703
    • /
    • 2003
  • This paper presents analog secure communication system about safe speech transmission using chaotic signals. We applied various conditions that happen in actuality communication environment modifying chaotic synchronization and chaotic communication schemes and analyzed restoration performance of speech signal to computer simulation. In transmitter, we made the chaotic masking signal which is added voice signal to chaotic signal using PC(Pecora & Carroll) and SFB(self-feedback) control techniques and transmitted encryption signal to noisy communication channel And in order to calculate the degree of restoration performance, we proposed the definition of analog average power of recovered error signals in receiver chaotic system. The simulation results show that feedback control techniques can certify that restoration performance is superior to quantitative data than PC method about masking degree, susceptibility of parameters and channel noise. We experimentally computed the table of relation of parameter fluxion to restoration error rate which is applied the encryption key values to the chaotic secure communication.

A Study for Acoustic Cues of Pyoung-An Do Dialect Using LPC (LPC를 이용한 평안방언의 음향지표에 관한 연구)

  • Song, Chul-Gyu;Lee, Myoung-Ho;Kim, Young-Bae
    • Journal of Biomedical Engineering Research
    • /
    • v.13 no.3
    • /
    • pp.195-200
    • /
    • 1992
  • This paper deal with the acoustic cues of Pyoung-An Do dialect using linear prediction. Also, this paper descrbes a statistical comparison between standard tone speech data and Pyoung-An Do dia lects. The analysis done mainly focused on the distribution of formants and pitch periods accord to ac- cents variation. For the purpose of objective comparison, the experiments are performed by extracts for- mant LPC spectrum and pithch periods from average magnitude difference function waveforms. Summing up the results, In disyllable words (VCV pattern) , prepositioned vowels have longer phona lion time than postpositioned vowels and the intrin, iii phonation time is whore longer in the low vowels than in the high ones. The africative consonants show the mixed characteristics of the plosive and frlc ative consonants. The remarkable acoustic cues are the low frequency noise-like waves just before the 1st formants in the plosive consonants, the high frequency noise-like waves in the fricative consonants, and phonation time is not affected by the kinds of prepositioned or postpositioned vowels.

  • PDF

A Study on Lip-reading Enhancement Using Time-domain Filter (시간영역 필터를 이용한 립리딩 성능향상에 관한 연구)

  • 신도성;김진영;최승호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.375-382
    • /
    • 2003
  • Lip-reading technique based on bimodal is to enhance speech recognition rate in noisy environment. It is most important to detect the correct lip-image. But it is hard to estimate stable performance in dynamic environment, because of many factors to deteriorate Lip-reading's performance. There are illumination change, speaker's pronunciation habit, versatility of lips shape and rotation or size change of lips etc. In this paper, we propose the IIR filtering in time-domain for the stable performance. It is very proper to remove the noise of speech, to enhance performance of recognition by digital filtering in time domain. While the lip-reading technique in whole lip image makes data massive, the Principal Component Analysis of pre-process allows to reduce the data quantify by detection of feature without loss of image information. For the observation performance of speech recognition using only image information, we made an experiment on recognition after choosing 22 words in available car service. We used Hidden Markov Model by speech recognition algorithm to compare this words' recognition performance. As a result, while the recognition rate of lip-reading using PCA is 64%, Time-domain filter applied to lip-reading enhances recognition rate of 72.4%.

A study on the robust speaker recognition algorithm in noise surroundings (주변 잡음 환경에 강한 화자인식 알고리즘 연구)

  • Jung Jong-Soon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.10 no.6 s.38
    • /
    • pp.47-54
    • /
    • 2005
  • In the most of speaker recognition system, speaker's characteristics is extracted from acoustic parameter by speech analysis and we make speaker's reference pattern. Parameters used in speaker recognition system are desirable expressing speaker's characteristics fully and being a few difference whenever it is spoken. Therefore we su99est following to solve this problem. This paper is proposed to use strong spectrum characteristic in non-noise circumstance and prosodic information in noise circumstance. In a stage of making code book, we make the number of data we need to combine spectrum characteristic and Prosodic information. We decide acceptance or rejection comparing test pattern and each model distance. As a result, we obtained more improved recognition rate than we use spectrum and prosodic information especially we obtained stational recognition rate in noise circumstance.

  • PDF

Signal Enhancement of a Variable Rate Vocoder with a Hybrid domain SNR Estimator

  • Park, Hyung Woo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.962-977
    • /
    • 2019
  • The human voice is a convenient method of information transfer between different objects such as between men, men and machine, between machines. The development of information and communication technology, the voice has been able to transfer farther than before. The way to communicate, it is to convert the voice to another form, transmit it, and then reconvert it back to sound. In such a communication process, a vocoder is a method of converting and re-converting a voice and sound. The CELP (Code-Excited Linear Prediction) type vocoder, one of the voice codecs, is adapted as a standard codec since it provides high quality sound even though its transmission speed is relatively low. The EVRC (Enhanced Variable Rate CODEC) and QCELP (Qualcomm Code-Excited Linear Prediction), variable bit rate vocoders, are used for mobile phones in 3G environment. For the real-time implementation of a vocoder, the reduction of sound quality is a typical problem. To improve the sound quality, that is important to know the size and shape of noise. In the existing sound quality improvement method, the voice activated is detected or used, or statistical methods are used by the large mount of data. However, there is a disadvantage in that no noise can be detected, when there is a continuous signal or when a change in noise is large.This paper focused on finding a better way to decrease the reduction of sound quality in lower bit transmission environments. Based on simulation results, this study proposed a preprocessor application that estimates the SNR (Signal to Noise Ratio) using the spectral SNR estimation method. The SNR estimation method adopted the IMBE (Improved Multi-Band Excitation) instead of using the SNR, which is a continuous speech signal. Finally, this application improves the quality of the vocoder by enhancing sound quality adaptively.

Comparative Study on Acoustic Characteristics of Vocal Fold Paralysis and Benign Mucosal Disorders of Vocal Fold (성대마비와 양성 성대점막질환의 음향학적 특성비교)

  • Kong, Il-Seung;Cho, Young-Ju;Lee, Myung-Hee;Kim, Jong-Seung;Yang, Yun-Su;Hong, Ki-Hwan
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.18 no.2
    • /
    • pp.122-128
    • /
    • 2007
  • This study aims to analyze the voices of the patients with voice disorders including vocal fold paralysis, vocal fold cyst and vocal nodule/polyp in the aspect of acoustic phonetics. This study intends to collect subsidiary acoustic data in order to make a speech treatment and an standardization of vocal disorders. Subjects and Methods: The subjects of this study were 64 adult patients who underwent indirect laryngoscopy and laryngostroboscopy, and were diagnosed as vocal fold paralysis, vocal fold cyst or vocal nodule/polyp. Experimental group consisted of 20 patients who were diagnosed as vocal fold paralysis, 21 patients who were diagnosed as vocal fold cyst and had the average age of 42.0 $({\pm}10.03)$ ; and 23 patients who were diagnosed as vocal nodule/polyp and had the average age of 40.9 $({\pm}13.75)$. For the methodology of this study, the patients listed above were asked to sit in a comfortable position at intervals of 10cm apart from the patient's mouth and a microphone, and subsequently to phonate a vowel sound /e/ for the maximum phonation time with natural tone and vocal volume then the sound was directly inputted on a computer. During recording, sampling rate was set to 44,100Hz and the 1-second area corresponding to stable zone except the first and the last stage of waveform of the vowel sound /e/ vocalized by the individual patients was analyzed. Results: First, there was no statistically significant difference in jitter and shimmer between vocal fold paralysis and vocal fold cyst, while there was highly statistically significant difference in them between vocal fold paralysis and vocal nodule/polyp. Second, looking into the mean values obtained from NNE, HNR and SNR results associated with noise ratio, the disease showing the most abnormal characteristics was vocal fold paralysis, followed by cyst and nodule/polyp in order. For NNE, there was statistically significant difference between vocal nodule/polyp, and cyst or paralysis. In other words, it was found that the NNE of vocal nodule/polyp was weaker than that of cyst or paralysis. Similarly, HNR and SNR also showed the same characteristics; there was statistically significant difference between vocal fold paralysis and vocal fold cyst or nodule/polyp, and HNR and SNR values of vocal fold paralysis were lower than those of vocal fold cyst or nodule/polyp. Conclusion: For vocal fold paralysis, the abnormal values of acoustic parameters associated with frequency, amplitude and noise ratio were statistically significantly higher than those of vocal fold cyst and nodule/polyp. This finding suggests that the voices of the patients with vocal fold paralysis are the most severely injured due to less stability of vocal fold movement, asymmetry and incomplete glottic closure. In addition, there was no statistically significant difference in the acoustic parameters of tremor among vocal fold paralysis, vocal fold cyst and vocal nodule/polyp. Further studies need to ascertain reasonable acoustic parameters with various vocal disorders as well as to clarify the correlation between acoustics-based objective tools and subjective evaluations.

  • PDF

The Effects of Changing the Respiratory Muscles and Acoustic Parameters on the Children With Spastic Cerebral Palsy (체간 조절을 통한 앉기 자세 교정이 경직형 뇌성마비 아동들의 호흡근과 음향학적 측정치들의 변화에 미치는 효과)

  • Kim, Sun-Hee;Ahn, Jong-Bok;Seo, Hye-Jung;Kwon, Do-Ha
    • Physical Therapy Korea
    • /
    • v.16 no.2
    • /
    • pp.16-23
    • /
    • 2009
  • The purpose of this study was to investigate the effects postural changes on respiratory muscles and acoustic parameters of the children with spastic cerebral palsy. Nine children with spastic cerebral palsy who required assistance when walking were selected. The ages of the children ranged from 6 to 9 years old. The phonation of the sustained vowel /a/ and the voice qualities of each child such as fundamental frequency($F_0$; Hz), pitch variation (Jitter; %), amplitude variation (Shimmer; %) and noise to harmonic ratio (NHR) were analyzed by Multi-Dimensional Voice Program (MDVP). The muscle activity of three major respiratory muscles: pectoralis major muscle, upper trapezius muscle and rectus abdorminalis muscle, were measured by examining the root mean square (RMS) of the surface EMG to investigate the impact of changes in the adjusted sitting posture of each subject. However, the RMS of pectoralis major muscle showed a significant differences (p<.05). Secondly, there were no significant differences in $F_0$, Jitter and Shimmer between pre and post posture change, but there was a significant difference in NHR (p<.05). The data were collected in each individual; once prior and once after the sitting posture change. The data were analyzed by Wilcoxon signed ranks-test using SPSS version 14.0 for Windows. The findings of this study were as follows; Firstly, the RMS of upper trapezius and rectus abdorminalis muscle were not significant different between pre and post sitting posture changes. From the result, it is concluded that changes in the adjusted sitting posture decreases the abnormal respiratory patterns in the children with spastic cerebral palsy which is characterized by the hyperactivity of the respiratory muscles in breathing. Also, there is increased on the voice qualities in children with spastic cerebral palsy.

  • PDF

Characteristics of Noise Induced Hearing Loss of Fishermen Visiting a General Hospital (일개 종합병원을 방문한 어선원에서 발생한 소음성 난청의 특징)

  • You Sun Chung;Chang Hoi Kim
    • Journal of agricultural medicine and community health
    • /
    • v.48 no.1
    • /
    • pp.41-49
    • /
    • 2023
  • Objectives: To obtain audiologic basic data to diagnose the noise induced hearing loss of workers in fisheries. Methods: The charts of the referred fishermen with noise induced hearing loss from November 2022 to February 2023 at a general hospital were retrospectively reviewed. Pure tone audiometry, speech audiometry, auditory brainstem response test and auditory steady state response test were conducted. Results: All of them were men over 60 years of age, and the average duration of exposure to noise was 38.9 ± 10.8 years, and the average symptom duration of hearing loss was 13.4 ± 4.3 years. Although the hearing thresholds in the high frequencies were higher than thresholds in the low frequencies, the audiogram showed a down-sloping pattern without rebound at 8 kHz. 10.5% of the cases had thresholds greater than 75 dB in high frequencies, but 57.9% had thresholds greater than 40 dB in low frequencies. Other hearing test results of fishermen were similar to those of general noise-induced hearing loss. Conclusions: Although the fishermen were exposed to noise for a long time, they recognized hearing loss late. The hearing threshold in lower frequencies of the fishermen was higher than expected. Further studies will be needed to analyze the audiologic characteristics of noise-induced hearing loss of the fishermen after confirming noise exposure by conducting a survey on the working environment, such as the noise level and working hours.

CNN based dual-channel sound enhancement in the MAV environment (MAV 환경에서의 CNN 기반 듀얼 채널 음향 향상 기법)

  • Kim, Young-Jin;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.12
    • /
    • pp.1506-1513
    • /
    • 2019
  • Recently, as the industrial scope of multi-rotor unmanned aerial vehicles(UAV) is greatly expanded, the demands for data collection, processing, and analysis using UAV are also increasing. However, the acoustic data collected by using the UAV is greatly corrupted by the UAV's motor noise and wind noise, which makes it difficult to process and analyze the acoustic data. Therefore, we have studied a method to enhance the target sound from the acoustic signal received through microphones connected to UAV. In this paper, we have extended the densely connected dilated convolutional network, one of the existing single channel acoustic enhancement technique, to consider the inter-channel characteristics of the acoustic signal. As a result, the extended model performed better than the existed model in all evaluation measures such as SDR, PESQ, and STOI.

Decoding Brain States during Auditory Perception by Supervising Unsupervised Learning

  • Porbadnigk, Anne K.;Gornitz, Nico;Kloft, Marius;Muller, Klaus-Robert
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.2
    • /
    • pp.112-121
    • /
    • 2013
  • The last years have seen a rise of interest in using electroencephalography-based brain computer interfacing methodology for investigating non-medical questions, beyond the purpose of communication and control. One of these novel applications is to examine how signal quality is being processed neurally, which is of particular interest for industry, besides providing neuroscientific insights. As for most behavioral experiments in the neurosciences, the assessment of a given stimulus by a subject is required. Based on an EEG study on speech quality of phonemes, we will first discuss the information contained in the neural correlate of this judgement. Typically, this is done by analyzing the data along behavioral responses/labels. However, participants in such complex experiments often guess at the threshold of perception. This leads to labels that are only partly correct, and oftentimes random, which is a problematic scenario for using supervised learning. Therefore, we propose a novel supervised-unsupervised learning scheme, which aims to differentiate true labels from random ones in a data-driven way. We show that this approach provides a more crisp view of the brain states that experimenters are looking for, besides discovering additional brain states to which the classical analysis is blind.