• Title/Summary/Keyword: Acoustic Features

Search Result 328, Processing Time 0.022 seconds

Acoustic Cues in Spoken French for the Pronunciation Assessment Multimedia System (발음평가용 멀티미디어 시스템 구현을 위한 구어 프랑스어의 음향학적 단서)

  • Lee, Eun-Yung;Song, Mi-Young
    • Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.185-200
    • /
    • 2005
  • The objective of this study is to examine acoustic cues in spoken French for the assessment of pronunciation which is necessary to realization of the multimedia system. The corpus is composed of simple expressions which consist of the French phonological system include all phonemes. This experiment was made on 4 male and female French native speakers and on 20 Korean speakers, university students who had learned the French language more than two years. We analyzed the recorded data by using spectrograph and measured comparative features by the numerical values. First of all, we found the mean and the deviation of all phonemes, and then chose features which had high error frequency and great differences between French and Korean pronunciations. The selected data were simplified and compared among them. After we judged whether the problems of pronunciation in each Korean speaker were either the utterance mistake or the interference of mother tongue, in terms of articulatory and auditory aspects, we tried to find acoustic features as simplified as possible. From this experiment, we could extract acoustic cues for the construction of the French pronunciation training system.

  • PDF

An Analysis of Acoustic Features Caused by Articulatory Changes for Korean Distant-Talking Speech

  • Kim Sunhee;Park Soyoung;Yoo Chang D.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.2E
    • /
    • pp.71-76
    • /
    • 2005
  • Compared to normal speech, distant-talking speech is characterized by the acoustic effect due to interfering sound and echoes as well as articulatory changes resulting from the speaker's effort to be more intelligible. In this paper, the acoustic features for distant-talking speech due to the articulatory changes will be analyzed and compared with those of the Lombard effect. In order to examine the effect of different distances and articulatory changes, speech recognition experiments were conducted for normal speech as well as distant-talking speech at different distances using HTK. The speech data used in this study consist of 4500 distant-talking utterances and 4500 normal utterances of 90 speakers (56 males and 34 females). Acoustic features selected for the analysis were duration, formants (F1 and F2), fundamental frequency, total energy and energy distribution. The results show that the acoustic-phonetic features for distant-talking speech correspond mostly to those of Lombard speech, in that the main resulting acoustic changes between normal and distant-talking speech are the increase in vowel duration, the shift in first and second formant, the increase in fundamental frequency, the increase in total energy and the shift in energy from low frequency band to middle or high bands.

Comparison of Classification Performance Between Adult and Elderly Using Acoustic and Linguistic Features from Spontaneous Speech (자유대화의 음향적 특징 및 언어적 특징 기반의 성인과 노인 분류 성능 비교)

  • SeungHoon Han;Byung Ok Kang;Sunghee Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.365-370
    • /
    • 2023
  • This paper aims to compare the performance of speech data classification into two groups, adult and elderly, based on the acoustic and linguistic characteristics that change due to aging, such as changes in respiratory patterns, phonation, pitch, frequency, and language expression ability. For acoustic features we used attributes related to the frequency, amplitude, and spectrum of speech voices. As for linguistic features, we extracted hidden state vector representations containing contextual information from the transcription of speech utterances using KoBERT, a Korean pre-trained language model that has shown excellent performance in natural language processing tasks. The classification performance of each model trained based on acoustic and linguistic features was evaluated, and the F1 scores of each model for the two classes, adult and elderly, were examined after address the class imbalance problem by down-sampling. The experimental results showed that using linguistic features provided better performance for classifying adult and elderly than using acoustic features, and even when the class proportions were equal, the classification performance for adult was higher than that for elderly.

acoustic analysis of the aging voice;Baby voice (음성 연령에 대한 음향학적 분석;동음을 중심으로)

  • Kim, Ji-Chae;Han, Ji-Yeon;Jeong, Ok-Ran
    • Proceedings of the KSPS conference
    • /
    • 2006.11a
    • /
    • pp.127-130
    • /
    • 2006
  • The purpose of this study is to examine the difference in acoustic features between Young Voices and Aged Voices, which are actually come from the same age group. The 12 female subjects in their thirties were participated and recorded their sustained vowel /a/, connected speech, and reading. Their voices were divided into Younger Voices and Aged Voices, which means voices sound like younger person and sound like in their age or more aged ones. Praat 4.4.22 was used to record and analyze their acoustic features like Fo, SFF, Jitter, Shimmer, HNR, Pitch-range. And the six female listeners guessed the subjects' age and judged whether they sound younger or as like their actual age. We used the Independent t-Test to find the significant difference between those two groups' acoustic features. The result shows a significant difference in Fo, SFF. The above and the previous studies tell us the group who sounds like younger or baby like voice has the similar acoustic features of actually young people.

  • PDF

Intensified Sentiment Analysis of Customer Product Reviews Using Acoustic and Textual Features

  • Govindaraj, Sureshkumar;Gopalakrishnan, Kumaravelan
    • ETRI Journal
    • /
    • v.38 no.3
    • /
    • pp.494-501
    • /
    • 2016
  • Sentiment analysis incorporates natural language processing and artificial intelligence and has evolved as an important research area. Sentiment analysis on product reviews has been used in widespread applications to improve customer retention and business processes. In this paper, we propose a method for performing an intensified sentiment analysis on customer product reviews. The method involves the extraction of two feature sets from each of the given customer product reviews, a set of acoustic features (representing emotions) and a set of lexical features (representing sentiments). These sets are then combined and used in a supervised classifier to predict the sentiments of customers. We use an audio speech dataset prepared from Amazon product reviews and downloaded from the YouTube portal for the purposes of our experimental evaluations.

Acoustic Analysis for Natural Pronunciation Programs

  • Lim Un
    • MALSORI
    • /
    • no.44
    • /
    • pp.1-14
    • /
    • 2002
  • Because the accuracy and the fluency are the essence in English speaking, both of them are very important in English trencher training and in-service English training programs. To get the accuracy and the fluency, the causes and the phenomena of the unnatural pronunciation have to be diagnosed. Consequently, the problematic and unnatural pronunciation of Korean elementary and secondary English teachers should be analyzed with using Acoustic Analyzing tools like CSL, Multi-speech and Praat. In addition, an attempt to Pinpoint what the causes of unnatural pronunciation was executed. Next a procedure and steps were proposed for in-service training programs that would cultivate the fluency and the accuracy. In case of elementary teachers, the unnatural pronunciation of segmental features and suprasegmental features were found much. therefore segmental features should be emphasized in the begging of pronunciation training courses and then suprasegmental features have to be emphasized. In case of secondary teachers, the unnatural pronunciation of suprasegmental features were found much. Therefore segmental and suprasegmental features have to be focused at the same time. In other words, features in word level should be focused first for elementary English teacher, and features in word level and beyond word level should be trained at the same time for secondary English teachers.

  • PDF

F-ratio of Speaker Variability in Emotional Speech

  • Yi, So-Pae
    • Speech Sciences
    • /
    • v.15 no.1
    • /
    • pp.63-72
    • /
    • 2008
  • Various acoustic features were extracted and analyzed to estimate the inter- and intra-speaker variability of emotional speech. Tokens of vowel /a/ from sentences spoken with different modes of emotion (sadness, neutral, happiness, fear and anger) were analyzed. All of the acoustic features (fundamental frequency, spectral slope, HNR, H1-A1 and formant frequency) indicated greater contribution to inter- than intra-speaker variability across all emotions. Each acoustic feature of speech signal showed a different degree of contribution to speaker discrimination in different emotional modes. Sadness and neutral indicated greater speaker discrimination than other emotional modes (happiness, fear, anger in descending order of F-ratio). In other words, the speaker specificity was better represented in sadness and neutral than in happiness, fear and anger with any of the acoustic features.

  • PDF

Speech Developmental Link between Intelligibility and Phonemic Contrasts, and Acoustic Features in Putonghua-Speaking Children (표준 중국어의 구어 명료도와 음소 대조 및 음향 자질의 발달적 상관관계)

  • Han, Ji-Yeon
    • MALSORI
    • /
    • no.59
    • /
    • pp.1-11
    • /
    • 2006
  • This study was designed to investigate the relationship between intelligibility and phonemic contrasts, and acoustic features in terms of speech development. A total of 212 Putonghua speaking children was participated in the experiment. There were phonemic contrasts significantly related with speech intelligibility: aspirated vs. fricative, retroflex vs. unretroflex, and front vs. back nasal vowel contrast. A regression analysis showed that 88% of the speech intelligibility could be predicted by these phonemic contrasts. Acoustic values were significantly related to the intelligibility of the Putonghua-speaking children's speech: voice onset time of unaspirated stops, and the duration of frication noise in fricatives.

  • PDF

Voice Coding Using Mouth Shape Features (입술형태 특성을 이용한 음성코딩)

  • Jang, Jong-Hwan
    • The Journal of Engineering Research
    • /
    • v.1 no.1
    • /
    • pp.65-70
    • /
    • 1997
  • To transmit the degraded voice signal within various environment surrounding acoustic noises, we extract lip i the face and then compare lip edge features with prestoring DB having features such as mouth height, width, area, and rate. It provides high security and is not affected by acoustic noise because it is not necessary to transmit the actual utterance.

  • PDF

Time-Frequency Feature Extraction of Broadband Echo Signals from Individual Live Fish for Species Identification (활어 개체어의 광대역 음향산란신호로부터 어종식별을 위한 시간-주파수 특징 추출)

  • Lee, Dae-Jae;Kang, Hee-Young;Pak, Yong-Ye
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.49 no.2
    • /
    • pp.214-223
    • /
    • 2016
  • Joint time-frequency images of the broadband acoustic echoes of six fish species were obtained using the smoothed pseudo-Wigner-Ville distribution (SPWVD). The acoustic features were extracted by changing the sliced window widths and dividing the time window by a 0.02-ms interval and the frequency window by a 20-kHz bandwidth. The 22 spectrum amplitudes obtained in the time and frequency domains of the SPWVD images were fed as input parameters into an artificial neural network (ANN) to verify the effectiveness for species-dependent features related to fish species identification. The results showed that the time-frequency approach improves the extraction of species-specific features for species identification from broadband echoes, compare with time-only or frequency-only features. The ANN classifier based on these acoustic feature components was correct in approximately 74.5% of the test cases. In the future, the identification rate will be improved using time-frequency images with reduced dimensions of the broadband acoustic echoes as input for the ANN classifier.