• 제목/요약/키워드: Non-speech

검색결과 468건 처리시간 0.026초

A Simple Speech/Non-speech Classifier Using Adaptive Boosting

  • Kwon, Oh-Wook;Lee, Te-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • 제22권3E호
    • /
    • pp.124-132
    • /
    • 2003
  • We propose a new method for speech/non-speech classifiers based on concepts of the adaptive boosting (AdaBoost) algorithm in order to detect speech for robust speech recognition. The method uses a combination of simple base classifiers through the AdaBoost algorithm and a set of optimized speech features combined with spectral subtraction. The key benefits of this method are the simple implementation, low computational complexity and the avoidance of the over-fitting problem. We checked the validity of the method by comparing its performance with the speech/non-speech classifier used in a standard voice activity detector. For speech recognition purpose, additional performance improvements were achieved by the adoption of new features including speech band energies and MFCC-based spectral distortion. For the same false alarm rate, the method reduced 20-50% of miss errors.

A Robust Non-Speech Rejection Algorithm

  • Ahn, Young-Mok
    • The Journal of the Acoustical Society of Korea
    • /
    • 제17권1E호
    • /
    • pp.10-13
    • /
    • 1998
  • We propose a robust non-speech rejection algorithm using the three types of pitch-related parameters. The robust non-speech rejection algorithm utilizes three kinds of pitch parameters : (1) pitch range, (2) difference of the successive pitch range, and (3) the number of successive pitches satisfying constraints related with the previous two parameters. The acceptance rate of the speech commands was 95% for -2.8dB signal-to-noise ratio (SNR) speech database that consisted of 2440 utterances. The rejection rate of the non-speech sounds was 100% while the acceptance rate of the speech commands was 97% in an office environment.

  • PDF

이러닝 콘텐츠에서 비음성 사운드에 대한 학습자 인식 분석 (Learners' Perceptions toward Non-speech Sounds Designed in e-Learning Contents)

  • 김태현;나일주
    • 한국콘텐츠학회논문지
    • /
    • 제10권7호
    • /
    • pp.470-480
    • /
    • 2010
  • 이러닝 콘텐츠에는 시각자료와 함께 다양한 청각자료를 포함하고 있음에도 불구하고 그동안 학습자료에서 청각정보 설계에 대한 연구는 극히 제한적으로 이루어져 왔다. 청각정보의 한 유형인 비음성 사운드가 학습자들에게 피드백 제공 및 행위유도를 즉시적으로 할 수 있다는 장점을 감안한다면 비음성 사운드의 체계적 설계가 요구된다. 이에 본 논문은 다차원척도법을 활용하여 학습자들이 이러닝 콘텐츠에 설계된 비음성 사운드를 어떠한 방식으로 인식하고 있는지를 경험적으로 탐색하는 것을 목적으로 수행되었다. 한국교육학술정보원에서 제공하는 이러닝 콘텐츠에 설계된 비음성 사운드 중 대표성이 있는 11개의 비음성 사운드가 선정되었다. A 대학교 3학년 학생 66명을 대상으로 11개의 비음성 사운드들 간의 유사 정도에 대해 응답하도록 하였고 그 결과가 다차원 공간에 표현되었다. 연구결과, 학습자들은 비음성 사운드의 길이와 비음성 사운드가 전달하는 긍정적 혹은 부정적 분위기에 따라 비음성 사운드를 구분하여 인식하고 있는 것으로 나타났다.

Annotation of a Non-native English Speech Database by Korean Speakers

  • Kim, Jong-Mi
    • 음성과학
    • /
    • 제9권1호
    • /
    • pp.111-135
    • /
    • 2002
  • An annotation model of a non-native speech database has been devised, wherein English is the target language and Korean is the native language. The proposed annotation model features overt transcription of predictable linguistic information in native speech by the dictionary entry and several predefined types of error specification found in native language transfer. The proposed model is, in that sense, different from other previously explored annotation models in the literature, most of which are based on native speech. The validity of the newly proposed model is revealed in its consistent annotation of 1) salient linguistic features of English, 2) contrastive linguistic features of English and Korean, 3) actual errors reported in the literature, and 4) the newly collected data in this study. The annotation method in this model adopts the widely accepted conventions, Speech Assessment Methods Phonetic Alphabet (SAMPA) and the TOnes and Break Indices (ToBI). In the proposed annotation model, SAMPA is exclusively employed for segmental transcription and ToBI for prosodic transcription. The annotation of non-native speech is used to assess speaking ability for English as Foreign Language (EFL) learners.

  • PDF

How Korean Learner's English Proficiency Level Affects English Speech Production Variations

  • Hong, Hye-Jin;Kim, Sun-Hee;Chung, Min-Hwa
    • 말소리와 음성과학
    • /
    • 제3권3호
    • /
    • pp.115-121
    • /
    • 2011
  • This paper examines how L2 speech production varies according to learner's L2 proficiency level. L2 speech production variations are analyzed by quantitative measures at word and phone levels using Korean learners' English corpus. Word-level variations are analyzed using correctness to explain how speech realizations are different from the canonical forms, while accuracy is used for analysis at phone level to reflect phone insertions and deletions together with substitutions. The results show that speech production of learners with different L2 proficiency levels are considerably different in terms of performance and individual realizations at word and phone levels. These results confirm that speech production of non-native speakers varies according to their L2 proficiency levels, even though they share the same L1 background. Furthermore, they will contribute to improve non-native speech recognition performance of ASR-based English language educational system for Korean learners of English.

  • PDF

비대면 음성언어치료의 현황과 전망 (Current Status and Perspectives of Telepractice in Voice and Speech Therapy)

  • 이승진
    • 대한후두음성언어의학회지
    • /
    • 제33권3호
    • /
    • pp.130-141
    • /
    • 2022
  • Voice and speech therapy can be performed in various ways depending on the situation, although it is generally performed in a face-to-face manner. Telepractice refers to the provision of specialized voice and speech therapy by speech-language pathologists for assessment, therapy, and counseling by applying telecommunication technology from a remote location. Recently, due to the pandemic situation and the active use of non-face-to-face platforms, interest in telepractice of voice and speech therapy has increased. Moreover, a growing body of literature has been advocating its clinical usefulness and non-inferiority to traditional face-to-face intervention. In this review, the existing discussions, guidelines, and preliminary studies on non-face-toface voice and speech therapy were summarized, and recommendations on the tools for telepractice were provided.

A Low Bit Rate Speech Coder Based on the Inflection Point Detection

  • Iem, Byeong-Gwan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제15권4호
    • /
    • pp.300-304
    • /
    • 2015
  • A low bit rate speech coder based on the non-uniform sampling technique is proposed. The non-uniform sampling technique is based on the detection of inflection points (IP). A speech block is processed by the IP detector, and the detected IP pattern is compared with entries of the IP database. The address of the closest member of the database is transmitted with the energy of the speech block. In the receiver, the decoder reconstructs the speech block using the received address and the energy information of the block. As results, the coder shows fixed data rate contrary to the existing speech coders based on the non-uniform sampling. Through computer simulation, the usefulness of the proposed technique is shown. The SNR performance of the proposed method is approximately 5.27 dB with the data rate of 1.5 kbps.

비원어민 한국어 말하기 숙련도 평가와 평가항목의 상관관계 (Correlation analysis of linguistic factors in non-native Korean speech and proficiency evaluation)

  • 양승희;정민화
    • 말소리와 음성과학
    • /
    • 제9권3호
    • /
    • pp.49-56
    • /
    • 2017
  • Much research attention has been directed to identify how native speakers perceive non-native speakers' oral proficiency. To investigate the generalizability of previous findings, this study examined segmental, phonological, accentual, and temporal correlates of native speakers' evaluation of L2 Korean proficiency produced by learners with various levels and nationalities. Our experiment results show that proficiency ratings by native speakers significantly correlate not only with rate of speech, but also with the segmental accuracies. The influence of segmental errors has the highest correlation with the proficiency of L2 Korean speech. We further verified this finding within substitution, deletion, insertion error rates. Although phonological accuracy was expected to be highly correlated with the proficiency score, it was the least influential measure. Another new finding in this study is that the role of pitch and accent has been underemphasized so far in the non-native Korean speech perception studies. This work will serve as the groundwork for the development of automatic assessment module in Korean CAPT system.

Effect of Music Training on Categorical Perception of Speech and Music

  • L., Yashaswini;Maruthy, Sandeep
    • Journal of Audiology & Otology
    • /
    • 제24권3호
    • /
    • pp.140-148
    • /
    • 2020
  • Background and Objectives: The aim of this study is to evaluate the effect of music training on the characteristics of auditory perception of speech and music. The perception of speech and music stimuli was assessed across their respective stimulus continuum and the resultant plots were compared between musicians and non-musicians. Subjects and Methods: Thirty musicians with formal music training and twenty-seven non-musicians participated in the study (age: 20 to 30 years). They were assessed for identification of consonant-vowel syllables (/da/ to /ga/), vowels (/u/ to /a/), vocal music note (/ri/ to /ga/), and instrumental music note (/ri/ to /ga/) across their respective stimulus continuum. The continua contained 15 tokens with equal step size between any adjacent tokens. The resultant identification scores were plotted against each token and were analyzed for presence of categorical boundary. If the categorical boundary was found, the plots were analyzed by six parameters of categorical perception; for the point of 50% crossover, lower edge of categorical boundary, upper edge of categorical boundary, phoneme boundary width, slope, and intercepts. Results: Overall, the results showed that both speech and music are perceived differently in musicians and non-musicians. In musicians, both speech and music are categorically perceived, while in non-musicians, only speech is perceived categorically. Conclusions: The findings of the present study indicate that music is perceived categorically by musicians, even if the stimulus is devoid of vocal tract features. The findings support that the categorical perception is strongly influenced by training and results are discussed in light of notions of motor theory of speech perception.

Effect of Music Training on Categorical Perception of Speech and Music

  • L., Yashaswini;Maruthy, Sandeep
    • 대한청각학회지
    • /
    • 제24권3호
    • /
    • pp.140-148
    • /
    • 2020
  • Background and Objectives: The aim of this study is to evaluate the effect of music training on the characteristics of auditory perception of speech and music. The perception of speech and music stimuli was assessed across their respective stimulus continuum and the resultant plots were compared between musicians and non-musicians. Subjects and Methods: Thirty musicians with formal music training and twenty-seven non-musicians participated in the study (age: 20 to 30 years). They were assessed for identification of consonant-vowel syllables (/da/ to /ga/), vowels (/u/ to /a/), vocal music note (/ri/ to /ga/), and instrumental music note (/ri/ to /ga/) across their respective stimulus continuum. The continua contained 15 tokens with equal step size between any adjacent tokens. The resultant identification scores were plotted against each token and were analyzed for presence of categorical boundary. If the categorical boundary was found, the plots were analyzed by six parameters of categorical perception; for the point of 50% crossover, lower edge of categorical boundary, upper edge of categorical boundary, phoneme boundary width, slope, and intercepts. Results: Overall, the results showed that both speech and music are perceived differently in musicians and non-musicians. In musicians, both speech and music are categorically perceived, while in non-musicians, only speech is perceived categorically. Conclusions: The findings of the present study indicate that music is perceived categorically by musicians, even if the stimulus is devoid of vocal tract features. The findings support that the categorical perception is strongly influenced by training and results are discussed in light of notions of motor theory of speech perception.