• Title/Summary/Keyword: Non-speech

Search Result 470, Processing Time 0.031 seconds

Relationship between Speech Perception in Noise and Phonemic Restoration of Speech in Noise in Individuals with Normal Hearing

  • Vijayasarathy, Srikar;Barman, Animesh
    • Korean Journal of Audiology
    • /
    • v.24 no.4
    • /
    • pp.167-173
    • /
    • 2020
  • Background and Objectives: Top-down restoration of distorted speech, tapped as phonemic restoration of speech in noise, maybe a useful tool to understand robustness of perception in adverse listening situations. However, the relationship between phonemic restoration and speech perception in noise is not empirically clear. Subjects and Methods: 20 adults (40-55 years) with normal audiometric findings were part of the study. Sentence perception in noise performance was studied with various signal-to-noise ratios (SNRs) to estimate the SNR with 50% score. Performance was also measured for sentences interrupted with silence and for those interrupted by speech noise at -10, -5, 0, and 5 dB SNRs. The performance score in the noise interruption condition was subtracted by quiet interruption condition to determine the phonemic restoration magnitude. Results: Fairly robust improvements in speech intelligibility was found when the sentences were interrupted with speech noise instead of silence. Improvement with increasing noise levels was non-monotonic and reached a maximum at -10 dB SNR. Significant correlation between speech perception in noise performance and phonemic restoration of sentences interrupted with -10 dB SNR speech noise was found. Conclusions: It is possible that perception of speech in noise is associated with top-down processing of speech, tapped as phonemic restoration of interrupted speech. More research with a larger sample size is indicated since the restoration is affected by the type of speech material and noise used, age, working memory, and linguistic proficiency, and has a large individual variability.

Comparison of Acoustic Characteristics between Seoul and Busan Dialect on Fricatives (서울 방언과 부산 방언의 마찰음에 대한 음향학적 특성 비교)

  • Lee, Kyung-Hee
    • Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.223-235
    • /
    • 2002
  • Unlike Seoul dialect, in the Busan dialect, /ㅅ/ and /ㅆ/ are phonemically non-distinctive and realization of tensing is non-productive, on the other hand, that of voicing is productive. In order to discover causes of such characteristics in Busan dialect, this paper firstly compared acoustic characteristics of Seoul dialect with those of Busan dialect on fricative /ㅅ/ and /ㅆ/. The result showed that Busan dialect has much shorter length of friction and aspiration intervals of word initial and word-medial position than Seoul dialect. I expect that these results are important keys to discover causes of the following characteristics of Busan-dialect - non-distinction, non-productivity of tensing, and productivity of voicing - on Fricative /ㅅ/ and /ㅆ/.

  • PDF

Weighted Finite State Transducer-Based Endpoint Detection Using Probabilistic Decision Logic

  • Chung, Hoon;Lee, Sung Joo;Lee, Yun Keun
    • ETRI Journal
    • /
    • v.36 no.5
    • /
    • pp.714-720
    • /
    • 2014
  • In this paper, we propose the use of data-driven probabilistic utterance-level decision logic to improve Weighted Finite State Transducer (WFST)-based endpoint detection. In general, endpoint detection is dealt with using two cascaded decision processes. The first process is frame-level speech/non-speech classification based on statistical hypothesis testing, and the second process is a heuristic-knowledge-based utterance-level speech boundary decision. To handle these two processes within a unified framework, we propose a WFST-based approach. However, a WFST-based approach has the same limitations as conventional approaches in that the utterance-level decision is based on heuristic knowledge and the decision parameters are tuned sequentially. Therefore, to obtain decision knowledge from a speech corpus and optimize the parameters at the same time, we propose the use of data-driven probabilistic utterance-level decision logic. The proposed method reduces the average detection failure rate by about 14% for various noisy-speech corpora collected for an endpoint detection evaluation.

Speech Denoising via Low-Rank and Sparse Matrix Decomposition

  • Huang, Jianjun;Zhang, Xiongwei;Zhang, Yafei;Zou, Xia;Zeng, Li
    • ETRI Journal
    • /
    • v.36 no.1
    • /
    • pp.167-170
    • /
    • 2014
  • In this letter, we propose an unsupervised framework for speech noise reduction based on the recent development of low-rank and sparse matrix decomposition. The proposed framework directly separates the speech signal from noisy speech by decomposing the noisy speech spectrogram into three submatrices: the noise structure matrix, the clean speech structure matrix, and the residual noise matrix. Evaluations on the Noisex-92 dataset show that the proposed method achieves a signal-to-distortion ratio approximately 2.48 dB and 3.23 dB higher than that of the robust principal component analysis method and the non-negative matrix factorization method, respectively, when the input SNR is -5 dB.

A Single Channel Speech Enhancement for Automatic Speech Recognition

  • Lee, Jinkyu;Seo, Hyunson;Kang, Hong-Goo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2011.07a
    • /
    • pp.85-88
    • /
    • 2011
  • This paper describes a single channel speech enhancement as the pre-processor of automatic speech recognition system. The improvements are based on using optimally modified log-spectra (OM-LSA) gain function with a non-causal a priori signal-to-noise ratio (SNR) estimation. Experimental results show that the proposed method gives better perceptual evaluation of speech quality score (PESQ) and lower log-spectral distance, and also better word accuracy. In the enhancement system, parameters was turned for automatic speech recognition.

  • PDF

Development of English Speech Recognizer for Pronunciation Evaluation (발성 평가를 위한 영어 음성인식기의 개발)

  • Park Jeon Gue;Lee June-Jo;Kim Young-Chang;Hur Yongsoo;Rhee Seok-Chae;Lee Jong-Hyun
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.37-40
    • /
    • 2003
  • This paper presents the preliminary result of the automatic pronunciation scoring for non-native English speakers, and shows the developmental process for an English speech recognizer for the educational and evaluational purposes. The proposed speech recognizer, featuring two refined acoustic model sets, implements the noise-robust data compensation, phonetic alignment, highly reliable rejection, key-word and phrase detection, easy-to-use language modeling toolkit, etc., The developed speech recognizer achieves 0.725 as the average correlation between the human raters and the machine scores, based on the speech database YOUTH for training and K-SEC for test.

  • PDF

How Different are Learner Speech and Loanword Phonology?

  • Kim, Jong-Mi
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.3-18
    • /
    • 2009
  • Do loanword properties emerge in the acquisition of a foreign language and if so, how? Classic studies in adult language learning assumed loanword properties that range from near-ceiling to near-chance level of appearance depending on speech proficiency. The present research argues that such variations reflect different phonological types, rather than speech proficiency. To investigate the difference between learner speech and loanword phonology, the current research analyzes the speech data from five different proficiency levels of 92 Korean speakers who read 19 pairs of English words and sentences that contained loanwords. The experimental method is primarily an acoustical one, by which the phonological cause in the loanwords (e.g., the insertion of [$\Box$] at the end of the word stamp) would be attested to appear in learner speech, in comparison with native speech from 11 English speakers and 11 Korean speakers. The data investigated for the research are of segment deletion, insertion, substitution, and alternation in both learner speech and the native speech. The results indicate that learner speech does not present the loanword properties in many cases, but depends on the types of phonological causes. The relatively easy acquisition of target pronunciation is evidenced in the cases of segment deletion, insertion, substitution, and alternation, except when the loanword property involves the successful command of the target phonology such as the de-aspiration of [p] in apple. Such a case of difficult learning draws a sharp distinction from the cases of easy learning in the development of learner speech, particularly beyond the intermediate level of proficiency. Overall, learner speech departs from loanword phonology and develops toward the native speech value, depending on phonological contrasts in the native and foreign languages.

  • PDF

Robust Histogram Equalization Using Compensated Probability Distribution

  • Kim, Sung-Tak;Kim, Hoi-Rin
    • MALSORI
    • /
    • v.55
    • /
    • pp.131-142
    • /
    • 2005
  • A mismatch between the training and the test conditions often causes a drastic decrease in the performance of the speech recognition systems. In this paper, non-linear transformation techniques based on histogram equalization in the acoustic feature space are studied for reducing the mismatched condition. The purpose of histogram equalization(HEQ) is to convert the probability distribution of test speech into the probability distribution of training speech. While conventional histogram equalization methods consider only the probability distribution of a test speech, for noise-corrupted test speech, its probability distribution is also distorted. The transformation function obtained by this distorted probability distribution maybe bring about miss-transformation of feature vectors, and this causes the performance of histogram equalization to decrease. Therefore, this paper proposes a new method of calculating noise-removed probability distribution by using assumption that the CDF of noisy speech feature vectors consists of component of speech feature vectors and component of noise feature vectors, and this compensated probability distribution is used in HEQ process. In the AURORA-2 framework, the proposed method reduced the error rate by over $44\%$ in clean training condition compared to the baseline system. For multi training condition, the proposed methods are also better than the baseline system.

  • PDF

A STUDY ON SPEECH PROBLEMS IN PATIENTS WITH VELOPHARYNGEAL INCOMPETENCY (연구개(軟口蓋) 인두간(咽頭間) 폐쇄부전(閉鎖不全)(Velopharyngeal Incompetency) 환자(患者)에 있어서 발음(發音) 장애(障碍)에 관한 연구(硏究))

  • Choi, Jin-Young;Min, Byoung-il
    • Maxillofacial Plastic and Reconstructive Surgery
    • /
    • v.14 no.1_2
    • /
    • pp.22-39
    • /
    • 1992
  • The purpose of this study was to evaluate hypernasality, nasal air emission, glottal stop, articulation disorder in patients with velopharyngeal incompetency(V.P.I.) and to analyze speech improvement after pharyngoplasty. In this study 61 patients with velopharyngeal incompetency were tested, and in patents with pharyngoplasty speech problems before pharyngoplasty were compared with those after pharyngoplasty. The results obtained are as follows : 1. There are few speech problems in pronouncing the vowel sounds. 2. There are many speech problems in pronouncing the pressure sounds and few speech problems in non-pressure sounds. 3. Speech problems in patients with cleft palate are influenced not by anatomical defect but by severity of velopharyngeal incompetence after palatorrhaphy. 4. Operation methods which decrease the velopharygeal incompetence must be considered for reducing the speech problems. 5. Among the 61 cases with V.P.I. 19 cases(31%) showed nasal air emission and 24 cases(39%) showed glottal stop. 6. Pharyngoplasty is of benefit to primary precipitating components such as hypernasality, nasal air emission but of no benefit to secondary compensating component such as glottal stop. 7. There as no significant difference in speech improvement between pre-and post-pharyngoplasty(p<0.05).

  • PDF

A Study of Articulation Abilities and Feeding Functions on the Children With Spastic Cerebral Palsy (경직형 뇌성마비 아동의 조음능력 관련 변인 연구)

  • Kim, Sun-Hee
    • Therapeutic Science for Rehabilitation
    • /
    • v.9 no.1
    • /
    • pp.79-90
    • /
    • 2020
  • Objective : The purpose of this study was to investigate the age, motor function, and characteristics of feeding and articulation in children with spastic cerebral palsy. Methods : This study included 34 boys and 27 girls who lived in Seoul, Gyeonggi, Daegu, Daejeon, or Busan, and were diagnosed with cerebral palsy by rehabilitation doctors. A total of 61 children with cerebral palsy were identified as dependent variables for verbal intelligibility, diadochokinesis speed and intelligibility. Multiple regression analysis was performed. Results : Verbal intelligibility among the articulating abilities of children with spastic cerebral palsy was significantly associated with the non-speech mechanisms of cup drinking, and spoon feeding. Diadochokinesis speed was significantly associated with the speech mechanism of porridge feeding, the non-speech mechanism of cup drinking, and motor function. Diadochokinesis intelligibility also showed significant results with regards to the speech mechanism of porridge feeding and the non-speech mechanism of cup drinking. Conclusion : It is useful to evaluate and treat eating function by considering speech and non-speech mechanisms and motor function, according to eating function, among the variables related to articulation ability in children with spastic cerebral palsy. In addition, it would be very useful to study the improvement of articulation abilities after developing and applying a feeding program for children with cerebral palsy.