• Title/Summary/Keyword: speech rate characteristic

Search Result 36, Processing Time 0.022 seconds

A New Endpoint Detection Method Based on Chaotic System Features for Digital Isolated Word Recognition System (음성인식을 위한 혼돈시스템 특성기반의 종단탐색 기법)

  • Zang, Xian;Chong, Kil-To
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.46 no.5
    • /
    • pp.8-14
    • /
    • 2009
  • In the research field of speech recognition, pinpointing the endpoints of speech utterance even with the presence of background noise is of great importance. These noise present during recording introduce disturbances which complicates matters since what we just want is to get the stationary parameters corresponding to each speech section. One major cause of error in automatic recognition of isolated words is the inaccurate detection of the beginning and end boundaries of the test and reference templates, thus the necessity to find an effective method in removing the unnecessary regions of a speech signal. The conventional methods for speech endpoint detection are based on two linear time-domain measurements: the short-time energy, and short-time zero-crossing rate. They perform well for clean speech but their precision is not guaranteed if there is noise present, since the high energy and zero-crossing rate of the noise is mistaken as a part of the speech uttered. This paper proposes a novel approach in finding an apparent threshold between noise and speech based on Lyapunov Exponents (LEs). This proposed method adopts the nonlinear features to analyze the chaos characteristics of the speech signal instead of depending on the unreliable factor-energy. The excellent performance of this approach compared with the conventional methods lies in the fact that it detects the endpoints as a nonlinearity of speech signal, which we believe is an important characteristic and has been neglected by the conventional methods. The proposed method extracts the features based only on the time-domain waveform of the speech signal illustrating its low complexity. Simulations done showed the effective performance of the Proposed method in a noisy environment with an average recognition rate of up 92.85% for unspecified person.

An Emotion Recognition Technique using Speech Signals (음성신호를 이용한 감정인식)

  • Jung, Byung-Wook;Cheun, Seung-Pyo;Kim, Youn-Tae;Kim, Sung-Shin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.494-500
    • /
    • 2008
  • In the field of development of human interface technology, the interactions between human and machine are important. The research on emotion recognition helps these interactions. This paper presents an algorithm for emotion recognition based on personalized speech signals. The proposed approach is trying to extract the characteristic of speech signal for emotion recognition using PLP (perceptual linear prediction) analysis. The PLP analysis technique was originally designed to suppress speaker dependent components in features used for automatic speech recognition, but later experiments demonstrated the efficiency of their use for speaker recognition tasks. So this paper proposed an algorithm that can easily evaluate the personal emotion from speech signals in real time using personalized emotion patterns that are made by PLP analysis. The experimental results show that the maximum recognition rate for the speaker dependant system is above 90%, whereas the average recognition rate is 75%. The proposed system has a simple structure and but efficient to be used in real time.

Time-Frequency Domain Impulsive Noise Detection System in Speech Signal (음성 신호에서의 시간-주파수 축 충격 잡음 검출 시스템)

  • Choi, Min-Seok;Shin, Ho-Seon;Hwang, Young-Soo;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.2
    • /
    • pp.73-79
    • /
    • 2011
  • This paper presents a new impulsive noise detection algorithm in speech signal. The proposed method employs the frequency domain characteristic of the impulsive noise to improve the detection accuracy while avoiding the false-alarm problem by the pitch of the speech signal. Furthermore, we proposed time-frequency domain impulsive noise detector that utilizes both the time and frequency domain parameters which minimizes the false-alarm problem by mutually complementing each other. As the result, the proposed time-frequency domain detector shows the best performance with 99.33 % of detection accuracy and 1.49 % of false-alarm rate.

The Speaker Recognition System using the Pitch Alteration (피치변경을 이용한 화자인식 시스템)

  • Jung JongSoon;Bae MyungJin
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • spring
    • /
    • pp.115-118
    • /
    • 2002
  • Parameters used in a speaker recognition system are desirable expressing speaker's characteristics filly and have in a speech. That is to say, if inter-speaker than intra-speaker variance a big characteristic, it is useful to distinguish between speakers. Also, to make minimum error between speakers, it is required the improved recognition technology as well as the distinguishing characteristics. When we see the result of recent simulation performance, we obtain more exact performance by using dynamic characteristics and constant characteristics by a speaking habit. Therefore we suggest it to solve this problem as followings. The prosodic information is used by a characteristic vector of speech. Characteristics vector generally using in speaker recognition system is a modeling spectrum information and is working for a high performance in non-noise circumstance. However, it is found a problem that characteristic vector is distorted in noise circumstance and it makes a reduction of recognition rate. In this paper, we change pitch line divided by segment which can estimate a dynamic characteristic and it is used as a recognition characteristic. we confirmed that the dynamic characteristic is very robust in noise circumstance with a simulation. We make a decision of acceptance or rejection by comparing test pattern and recognition rate using the proposed algorithm has more improvement than using spectrum and prosodic information. Especially stational recognition rate can be obtained in noise circumstance through the simulation.

  • PDF

On a Performance Improvement of Speaker Recognition by using the Auditory Characteristics of Speech (음성의 청각특성을 이용한 화자식별시스템의 성능향상에 관한 연구)

  • 이윤주;오세영배재옥배명진
    • Proceedings of the IEEK Conference
    • /
    • 1998.10a
    • /
    • pp.1223-1226
    • /
    • 1998
  • The pre-emephasis filter as the conventional method emphasizes all components of high frequency that reflects the speaker characteristics. However this filter don't show the auditory characteristics of speaker's speech. In order to emphasize the perceptual characteristics, we propose the speaker recognition system that uses the perceptual weighting as the preprocessor because the Auditory characteristic of human is sensitive to the formant peaks. This filter has the characteristcs that both deemphasizes the low-formants and emphasizes the high formants. As a result of the proposed method, we improve the total recognition rate 1.7% better than the conventional method.

  • PDF

Comparison of Voice Characteristics Before and After High-Caffeine Intake (고카페인 섭취 전·후 음성 특성 비교)

  • Lee, Areum;Kim, Eunyun;Yoo, Hyunji;Choi, Yaelin
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.59-65
    • /
    • 2015
  • This study was conducted to identify the differences in voice characteristic variables before and after taking a certain amount of high-caffeine. Linear PCM-M10 Recorder (SONY) was used for the recorder and basic frequency of the voice (Fo), frequency fluctuation rate (jitter), amplitude fluctuation rate (shimmer) and Signal-to-Noise Ratio (SNR) were measured using TF-32(University of Wisconsin-Madison, USA). First, prolonged phonation analysis results of /ah/ by male subjects showed the shimmer values after taking high-caffeine increased statistically significantly(p<.05) compared with before the intake and SNR values significantly decreased. (p<.05). On the other hand, female subjects didn't show any statistically significant differences in all variables. Second, male subjects showed statistically significant increased shimmer values after the intake compared with before the intake at /ah/ of syllable 'na' and /ah/ in 'ra' in 'autumn' paragraph (p<.05), and jitter values significantly increased at /ah/ in 'ah' (p<.05). However, female subjects didn't show any statistically significant differences in all variables. Results of this study showed that high-caffeine intake more affects male subjects than female subjects. In male subjects, shimmer and SNR changed at vowel prolonged phonation, /ah/, and study results showed that shimmer and SNR in 'Autumn' paragraph /na/, /ra/ and jitter in /ah/ could be identified as the variables to show the voice change.

The comparative Study of the Acoustic Representation between Pansori singer's and Spasmodic dysphonia patient's Voice (병적인 소리 떨림증과 소리꾼 떨림증의 음향학적인 비교연구)

  • Hong, K.H.;Kim, H.G.;Lee, J.K.;Choi, J.S.
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.143-145
    • /
    • 2007
  • Muscle groups that are located in and around the vocal tract can produce audible changes in frequency and/or intensity of the voice. Vocal vibrato is a characteristic feature in the singing of performers trained in the western classical tradition and vibrato is generally considered to result from modulation in frequency amplitude and timbre. Vocal tremor is also characterized by periodic fluctuations in the voice frequency or intensity and vocal tremor is symptom of a neurological disease as Spasmodic dysphonia , Parkinson's disease. Vocal vibrato and Vocal tremor may have many of the same origins and mechanisms in the voice production systems. The purpose of this study is to find acostic character of Korean traditional song Pansori singer's vibrato and Spasmodic dysphonia patient's vocal tremor. twelve Pansori singers and seven Spasmodic dysponia patients participated to this study. Power spectrum and Real time Spectrogram are used to analyze the acoustic characteristics of Pansori singing and Spasmodic dysphonia patient's voice The results are as follows; First, vowel formant differences between Pansori singing and Spasmodic dysphonia patient's voice are higher F1, F3. Second, The vibrato rate show differences between Pansori singing and Spasmodic dysphonia patients;$4^{\sim}6/sec$ and $5{\sim}6/sec$ Vibrato rate of pitch is 5.7 Hz ${\sim}$ 42.4 Hz for Pansori singing , 3.8 Hz ${\sim}$ 27.9 Hz for Spasmodic dysphonia patients ;Vibrato rate of intensity range is 0.07 dB ${\sim}$ 8.26 dB for Pansori singing and 0.07 dB ${\sim}$ 4.81 dB for Spasmodic dysphonia patients

  • PDF

Real Time Environmental Classification Algorithm Using Neural Network for Hearing Aids (인공 신경망을 이용한 보청기용 실시간 환경분류 알고리즘)

  • Seo, Sangwan;Yook, Sunhyun;Nam, Kyoung Won;Han, Jonghee;Kwon, See Youn;Hong, Sung Hwa;Kim, Dongwook;Lee, Sangmin;Jang, Dong Pyo;Kim, In Young
    • Journal of Biomedical Engineering Research
    • /
    • v.34 no.1
    • /
    • pp.8-13
    • /
    • 2013
  • Persons with sensorineural hearing impairment have troubles in hearing at noisy environments because of their deteriorated hearing levels and low-spectral resolution of the auditory system and therefore, they use hearing aids to compensate weakened hearing abilities. Various algorithms for hearing loss compensation and environmental noise reduction have been implemented in the hearing aid; however, the performance of these algorithms vary in accordance with external sound situations and therefore, it is important to tune the operation of the hearing aid appropriately in accordance with a wide variety of sound situations. In this study, a sound classification algorithm that can be applied to the hearing aid was suggested. The proposed algorithm can classify the different types of speech situations into four categories: 1) speech-only, 2) noise-only, 3) speech-in-noise, and 4) music-only. The proposed classification algorithm consists of two sub-parts: a feature extractor and a speech situation classifier. The former extracts seven characteristic features - short time energy and zero crossing rate in the time domain; spectral centroid, spectral flux and spectral roll-off in the frequency domain; mel frequency cepstral coefficients and power values of mel bands - from the recent input signals of two microphones, and the latter classifies the current speech situation. The experimental results showed that the proposed algorithm could classify the kinds of speech situations with an accuracy of over 94.4%. Based on these results, we believe that the proposed algorithm can be applied to the hearing aid to improve speech intelligibility in noisy environments.

A Study on Real-time Implementing of Time-Scale Modification (음성 신호 시간축 변환의 실시간 구현에 관한 연구)

  • Han, Dong-Chul;Lee, Ki-Seung;Cha, Il-Hawan;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.2
    • /
    • pp.50-61
    • /
    • 1995
  • A time scale modification method yielding rate-modified speech while conserving the characteristic of speech was implemented in real-time using a goneral purpose digital signal processor. Time scale modification changed pronunciation speed only, producing a time difference between the input signal and the modified signal, making it impossible to implement it in real-time. In this thesis, a system was implemented to remove the time difference between the input and modified signals. Speech signals slowed down or speeded up by a physical time scale modification method, such as adjusting the motor speed of the cassett tape recorder, was used as the input signal. Physical modification that controled only the inter speed of the cassette tape player distorted the pitch period of the original speech. In this study, a real-time system was implemented so that the pitch-distorted speech was reconstructed back to the original by fractional sampling pitch shifting using an FIR filter, and this signal was time scale modified to match the cassette tape recorder motor speed using SOLA time-scale medification. In experiments using speech signals medifiedby the proposed method, results obtained using a 16-bit resolution ADSP2101 processor and using computer simulations employing floating point operations showed about the same average frame signal-to-noise ratio of about 20 dB.

  • PDF

The cultural characteristic of American film (genre drama) in (영화 <블라인드 사이드 Blind Side>에 나타난 '드라마' 장르의 미국 문화 특성)

  • Han, Yong Taek;Woo, Jung Gueon
    • Cross-Cultural Studies
    • /
    • v.26
    • /
    • pp.273-296
    • /
    • 2012
  • The purpose of this paper is to examine the characteristics of American film (genre drama) through the analysis of , which merits our attention because the proportion of domestic gross earnings to foreign gross earnings is four to one. It means that the cultural discount rate of this film is relatively higher than the films which belong to the other genres, for example adventure, action, fantasy, SF etc. And it would be correct to say that this film is typically american. What is the reason of this difference of cultural discount rate? And what allows this film to be defined as a typical American film. The analysis of shows that the difference doesn't result from the actant structure. In fact the narrative structure of this film is similar to the other films of drama genre like or : the common structure of drama genre is characterized by an encounter of sujet and adjuvant and the progress of their relationship. But the drama is a genre in which the reflection of the actualities is important as compared with other genres. In that sense the story of is based upon the American cultural characteristics. Because the process that realize the progress of relationship between two protagonists is typically American such as race problem, adoption system, concept of family, system of education and going to college etc. As a result it is possible that move less the worldwide spectators than the American spectators.