• Title/Summary/Keyword: speech features

Search Result 647, Processing Time 0.028 seconds

An Analysis Method of Strange Attractor for the Feature Extraction (음성 특징 추출을 위한 스트레인지 어트랙터의 분석 방법)

  • Kim, Tae-Sik
    • Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.147-155
    • /
    • 2002
  • In the area of speech processing, raw signals used to be presented into 2D format. However, such kind of presentation methods have limitation to extract characteristics from the signal because of the presentation method. Generally, not much information can be detected from the 2D signal. Strange attractor in the field of chaos theory provides a 3D presentation method. In the area of recognition problem, signal presentation method is very important because good features can be detected from a good presentation. This paper discusses a new feature extraction method that extracts features from a cycle of the strange attractor. A neural network is used to check whether the method extracts suitable features or not. The result shows very good points that can be applied to some areas of signal processing.

  • PDF

The Lombard effect on the speech of children with intellectual disability (지적장애 아동의 롬바드 효과에 따른 말산출 특성)

  • Lee, Hyunju;Lee, Jiyun;Kim, Yukyung
    • Phonetics and Speech Sciences
    • /
    • v.8 no.4
    • /
    • pp.115-122
    • /
    • 2016
  • This study investigates the acoustic-phonetic features and speech intelligibility of Lombard speech in children with intellectual disability, by examining the effect of Lombard speech at 3 levels of non-noise, 55dB, and 65dB. Eight children with intellectual disability read sentences and played speaking games, and their speech were analyzed in terms of intensity, pitch, vowel space of /a/, /i/, and /u/, VAI(3), articulation rate and speech intelligibility. Results showed, first, that intensity and pitch increased as noise level increased; second, that VAI(3) increased as the noise level increased; third, that articulation rate decreased as noise intensity increased; finally, that speech intelligibility increased as noise intensity increased. The Lombard speech changed the VAI(3), vowel space, articulation rate, speech intelligibility of the children with intellectual disability as well. This study suggests that the Lombard speech will be clinically useful for the persons who have intellectual disability and difficulties in self-control.

Pauses Characteristics in Slowed Speech of Treated Stutterer (치료 받은 말더듬 성인의 느린 구어에서 나타나는 휴지 특성)

  • Jeon, Hee-Sook
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.189-197
    • /
    • 2008
  • In the process of speech therapy, fluency is acquired and speech rate increases in the process when the behavioral modification strategy, inducing speech fluency by making speech rate slower intentionally in an early stage, is applied. Therefore, the purpose of this study was to investigate the pause characteristics in slowed speech intentionally of treated stutterer. In this study, 10 developmental stutterers who had well established fluency in speech were involved. We had collected each 200 syllables sample of intentionally much slowed speech and a little slowed one in reading task. To measure the features of pause, total frequency of pauses, total durations of pauses, average duration of pauses and proportions of pause were investigated. The findings were as follows: Both the total durations and total frequency of pauses of much slowed speech were higher than that of a little slowed one. However, both the average duration and proportions of pauses of much slowed speech were not significantly higher than that of a little slowed one.

  • PDF

Korean speakers hyperarticulate vowels in polite speech

  • Oh, Eunhae;Winter, Bodo;Idemaru, Kaori
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.15-20
    • /
    • 2021
  • In line with recent attention to the multimodal expression of politeness, the present study examined the association between polite speech and acoustic features through the analysis of vowels produced in casual and polite speech contexts in Korean. Fourteen adult native speakers of Seoul Korean produced the utterances in two social conditions to elicit polite (professor) and casual (friend) speech. Vowel duration and the first (F1) and second formants (F2) of seven sentence- and phrase-initial monophthongs were measured. The results showed that polite speech shares acoustic similarities with vowel production in clear speech: speakers showed greater vowel space expansion in polite than casual speech in an effort to enhance perceptual intelligibility. Especially, female speakers hyperarticulated (front) vowels for polite speech, independent of speech rate. The implications for the acoustic encoding of social stance in polite speech are further discussed.

Effective Combination of Temporal Information and Linear Transformation of Feature Vector in Speaker Verification (화자확인에서 특징벡터의 순시 정보와 선형 변환의 효과적인 적용)

  • Seo, Chang-Woo;Zhao, Mei-Hua;Lim, Young-Hwan;Jeon, Sung-Chae
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.127-132
    • /
    • 2009
  • The feature vectors which are used in conventional speaker recognition (SR) systems may have many correlations between their neighbors. To improve the performance of the SR, many researchers adopted linear transformation method like principal component analysis (PCA). In general, the linear transformation of the feature vectors is based on concatenated form of the static features and their dynamic features. However, the linear transformation which based on both the static features and their dynamic features is more complex than that based on the static features alone due to the high order of the features. To overcome these problems, we propose an efficient method that applies linear transformation and temporal information of the features to reduce complexity and improve the performance in speaker verification (SV). The proposed method first performs a linear transformation by PCA coefficients. The delta parameters for temporal information are then obtained from the transformed features. The proposed method only requires 1/4 in the size of the covariance matrix compared with adding the static and their dynamic features for PCA coefficients. Also, the delta parameters are extracted from the linearly transformed features after the reduction of dimension in the static features. Compared with the PCA and conventional methods in terms of equal error rate (EER) in SV, the proposed method shows better performance while requiring less storage space and complexity.

  • PDF

Noise Robust Automatic Speech Recognition Scheme with Histogram of Oriented Gradient Features

  • Park, Taejin;Beack, SeungKwan;Lee, Taejin
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.5
    • /
    • pp.259-266
    • /
    • 2014
  • In this paper, we propose a novel technique for noise robust automatic speech recognition (ASR). The development of ASR techniques has made it possible to recognize isolated words with a near perfect word recognition rate. However, in a highly noisy environment, a distinct mismatch between the trained speech and the test data results in a significantly degraded word recognition rate (WRA). Unlike conventional ASR systems employing Mel-frequency cepstral coefficients (MFCCs) and a hidden Markov model (HMM), this study employ histogram of oriented gradient (HOG) features and a Support Vector Machine (SVM) to ASR tasks to overcome this problem. Our proposed ASR system is less vulnerable to external interference noise, and achieves a higher WRA compared to a conventional ASR system equipped with MFCCs and an HMM. The performance of our proposed ASR system was evaluated using a phonetically balanced word (PBW) set mixed with artificially added noise.

A Comparison of Front-Ends for Robust Speech Recognition

  • Kim, Doh-Suk;Jeong, Jae-Hoon;Lee, Soo-Young;Kil, Rhee M.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.3E
    • /
    • pp.3-11
    • /
    • 1998
  • Zero-crossings with Peak amplitudes (ZCPA) model motivated by human auditory periphery was proposed to extract reliable features form speech signals even in noisy environments for robust speech recognition. In this paper, the performance of the ZCPA model is further improved by incorporating conventional speech processing techniques into the model output. Spectral and cepstral representations of the ZCPA model output are compared, and the incorporation of dynamic features with several different lengths of time-derivative window are evaluated. Also, comparative evaluations with other front-ends in real-world noisy environments are performed, and result in the superiority of the ZCPA model.

  • PDF

Performance Improvement of Robust Speaker Verification According to Various Standard Deviations of a Reference Distribution in Histogram Transformation (히스토그램 변환에서 기준분포의 표준편차 변경에 따른 강인한 화자인증 성능 개선)

  • Kwon, Chul-Hong
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.127-134
    • /
    • 2010
  • Additive noise and channel mismatch strongly degrade the performance of speaker verification systems, as they distort the features of speech. In this paper a histogram transformation technique is presented to improve the robustness of text-independent speaker verification systems. The technique transforms the features extracted from speech such that their histogram is conformed to a reference distribution. The effect of different standard deviations for the reference distribution is investigated. Experimental results indicate that, in channel mismatched environments, the proposed technique offers significant improvements over existing techniques. We also verify performance improvement of the proposed method using statistics.

  • PDF

The effects of speakers' age on temporal features of speech among healthy young, middle-aged, and older adults (연령세대에 따른 말 산출의 시간적 특성: 말속도와 쉼을 중심으로)

  • Kim, Yeji;Lee, Song-min;Choi, Min-kyung;Jung, Sang-min;Sung, Jee Eun;Lee, Youngmee
    • Phonetics and Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.37-47
    • /
    • 2022
  • The purpose of the this study is to observe the effects of healthy adults' age on temporal features of speech and identify which could differentiate older and young adults. We examined speech rates(i.e., overall speaking rate, articulation rate), occurrence of pause, and duration of pause per utterance by utilizing the National Institute of Korean Language's open corpus. We selected a total of 30 healthy adults (10 young, 10 middle-aged, and 10 older adults) in this study. There were significant differences among the groups in the overall speaking rate, articulation rate, total occurrence of pause, the occurrence of pause between syntactic words, total duration of pause, and duration of pause between syntactic words. The older and middle-aged adults showed slower speech rates and longer and more frequent pause than young adults. But there were no significant differences among the three groups in terms of pause within syntactic word. The overall speaking rate significantly differentiated older adults from young adults. These findings suggested that the effect of speakers' age was reflected in gradual changes in the temporal features of their speech.

A Validity Study on Measurement of Mental Fatigue Using Speech Technology (음성기술을 이용한 정신피로 측정에 관한 타당성 연구)

  • Song, Seungkyu;Kim, Jongyeol;Jang, Junsu;Kwon, Chulhong
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.3-10
    • /
    • 2013
  • This study proposes a method to measure mental fatigue using speech technology, which has not been used in previous research and is easier than existing complex and difficult methods. It aims at establishing a relationship between the human voice and mental fatigue based on experiments to measure the influence of mental fatigue on the human voice. Two monotonous tasks of simple calculation such as finding the sum of three one digit numbers were used to measure the feeling of monotony and two sets of subjective questionnaires were used to measure mental fatigue. While thirty subjects perform the experiment, responses to the questionnaire and speech data were collected. Speech features related to speech source and the vocal tract filter were extracted from the speech data. According to the results, speech parameters deeply related to mental fatigue are a mean and standard deviation of fundamental frequency, jitter, and shimmer. This study shows that speech technology is a useful method for measuring mental fatigue.