Search | Korea Science

Emotion recognition in speech using hidden Markov model (은닉 마르코프 모델을 이용한 음성에서의 감정인식)

김성일;정현열
- Journal of the Institute of Convergence Signal Processing
- /
- v.3 no.3
- /
- pp.21-26
- /
- 2002
This paper presents the new approach of identifying human emotional states such as anger, happiness, normal, sadness, or surprise. This is accomplished by using discrete duration continuous hidden Markov models(DDCHMM). For this, the emotional feature parameters are first defined from input speech signals. In this study, we used prosodic parameters such as pitch signals, energy, and their each derivative, which were then trained by HMM for recognition. Speaker adapted emotional models based on maximum a posteriori(MAP) estimation were also considered for speaker adaptation. As results, the simulation performance showed that the recognition rates of vocal emotion gradually increased with an increase of adaptation sample number.
PDF

Analysis and synthesis of pseudo-periodicity on voice using source model approach (음성의 준주기적 현상 분석 및 구현에 관한 연구)

Jo, Cheolwoo
- Phonetics and Speech Sciences
- /
- v.8 no.4
- /
- pp.89-95
- /
- 2016
The purpose of this work is to analyze and synthesize the pseudo-periodicity of voice using a source model. A speech signal has periodic characteristics; however, it is not completely periodic. While periodicity contributes significantly to the production of prosody, emotional status, etc., pseudo-periodicity contributes to the distinctions between normal and abnormal status, the naturalness of normal speech, etc. Measurement of pseudo-periodicity is typically performed through parameters such as jitter and shimmer. For studying the pseudo-periodic nature of voice in a controlled environment, through collected natural voice, we can only observe the distributions of the parameters, which are limited by the size of collected data. If we can generate voice samples in a controlled manner, experiments that are more diverse can be conducted. In this study, the probability distributions of vowel pitch variation are obtained from the speech signal. Based on the probability distribution of vocal folds, pulses with a designated jitter value are synthesized. Then, the target and re-analyzed jitter values are compared to check the validity of the method. It was found that the jitter synthesis method is useful for normal voice synthesis.
https://doi.org/10.13064/KSSS.2016.8.4.089 인용 PDF KSCI

Acoustic Analysis of Reinke Edema (라인케부종환자의 음성분석)

김상균;최홍식;공석철;홍원표
- Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
- /
- v.7 no.1
- /
- pp.11-19
- /
- 1996
Reinke's edema is used for describing varying degrees of chronic swelling of the vocal folds. The acoustic analysis of Reinke's edema has not been reported so far in this country. The purpose of this study is to clarify acoustic and aerodynamic characteristics of the Reinke's edema. Several acoustic evaluations ＆ aerodynamic studies were done in 20 Reinke's edema patients and the data was compared with those of 20 normal controls. Videolaryngoscopy also was done to classify the severity in grading. We used C-Speech, Doctor speech science, and Phonatory function analyser. In C-Speech, we compared jitter, shimmer, and SNR(signal to noise ratio) of normal and Rrinke's edema patient. In Doctor speech science, we compared NNE(Glottal noise energy), speech fundamental frequency, voice quality between two groups. And in phonatory function analyser for aerodynamic function test, we compared speech intensity, airflow rate, and expiratory pressure between two groups. In conclusion, Reinke's edema patients showed lower voice pitches than normal, additionally jitter, shimmer, SNR(signal to noise ratio), NNE(Glottal noise energy), airflow rate, and expiratory pressure may be meaningful parameters for diagnosis and prognosis for treatment.
PDF

A Proposition of the Fuzzy Correlation Dimension for Speaker Recognition (화자인식을 위한 퍼지상관차원 제안)

Yoo, Byong-Wook;Kim, Chang-Seok;Park, Hyun-Sook
- Journal of the Korean Institute of Telematics and Electronics S
- /
- v.36S no.1
- /
- pp.115-122
- /
- 1999
In this paper, we confirmed that a speech signal is a chaos signal, and in order to use it as a speaker recognition parameter, analyzed chaos dimension. In order to raise speaker identification and pattern recognition, by making up the strange attractor involving an individual's vocal tract characteristics very well and applying fuzzy membership function to correlation dimension, we proposed fuzzy correlation dimension. By estimating the correlation of the points making up an attractor are limited according space dimension value, fuzzy correlation dimension absorbed the variation of the reference pattern attractor and test pattern attractor. Concerning fuzzy correlation dimension, by estimating the distance according to the average value of discrimination error per each speaker and reference pattern, investigated the validity of speaker recognition parameter.
PDF

A Study on the Acoustic Characteristics of the Pansori by Voice Signals Analysis (음성신호 분석에 의한 판소리의 음성학적 특징 연구)

Kim, HyunSook
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.14 no.7
- /
- pp.3218-3222
- /
- 2013
Pansori is our traditional vocal sound, originality and excellence in the art of conversation, gesture general became a globally recognized world intangible heritage. Especially, Pansori as shrews and humorous representation of audience participation with a high degree of artistic value and enjoy the arts throughout all layers to be responsible for the social integration of functions is evaluated. Therefore, in this paper, Pansori five yard target speech signal analysis techniques applied to analyze the Pansori acoustic features of a representation of a society and era correlation extraction studies were performed. Pansori on the five yard spectrogram, pitch, stability and strength analysis for this experiment. Pansori through experimental results Comical story while keeping the audience focused and interested to better reflect the characteristics of energy for the wave of voice and vocal cord tremor change the width of a large, stable and voice with a loud voice, that expresses were analyzed.
https://doi.org/10.5762/KAIS.2013.14.7.3218 인용 PDF KSCI

Intonatin Conversion using the Other Speaker's Excitation Signal (他話者의 勵起信號를 이용한 抑揚變換)

Lee, Ki-Young;Choi, Chang-Seok;Choi, Kap-Seok;Lee, Hyun-Soo
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.4
- /
- pp.21-28
- /
- 1995
In this paper an intonation conversion method is presented which provides the basic study on converting the original speech into the artificially intoned one. This method employs the other speaker's excitation signals as intonation information and the original vocal tract spectra, which are warped with the other speaker's ones by using DTW. as vocal features, and intonation converted speech signals are synthesized through short-time inverse Fourier transform(STIFT) of their product. To evaluate the intonation converted speech by this method, we collect Korean single vowels and sentences spoken by 30 males and compare fundamental frequency contours spectrograms, distortion measures and MOS test between the original speech and the converted one. The result shows that this method can convert and speech into the intoned one of the other speaker's.
PDF

On a Pitch Alteration Method using Scaling the Harmonics Compensated with the Phase for Speech Synthesis (위상 보상된 고조파 스케일링에 의한 음성합성용 피치변경법)

Bae, Myung-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.6
- /
- pp.91-97
- /
- 1994
In speech processing, the waveform codings are concerned with simply preserving the waveform of signal through a redundancy reduction process. In the case of speech synthesis, the waveform codings with high quality are mainly used to the synthesis by analysis. Because the parameters of this coding are not classified as both excitation and vocal tract, it is difficult to apply the waveform coding to the synthesis by rule. Thus, in order to apply the waveform coding to synthesis by rule, it is necessary to alter the pitches. In this paper, we proposed a new pitch alteration method that can change the pitch period in waveform coding by dividing the speech signals into the vocal tract and excitation parameters. This method is a time-frequency domain method preserving the phase component of the waveform in time domain and the magnitude component in frequency domain. Thus, it is possible that the waveform coding is carried out the synthesis by rule in speech processing. In case of using the algorithm, we can obtain spectrum distortion with $2.94\%$. That is, the spectrum distortion is decreased more $5.06\%$ than that of the pitch alteration method in time domain.
PDF

Comparative Evaluation of Electroglottography and Aerodynamic Study in Trained Singers and Untrained Controls under Different Two Pitch (성악인과 일반인 발성의 전기성문검사 및 공기역학적 검사에 대한 연구)

Ahn, Sung-Yoon;Kim, Han-Soo;Kim, Young-Ho;Song, Kee-Jae;Choi, Seong-Hee;Lee, Sung-Eun;Choi, Hong-Shik
- Speech Sciences
- /
- v.10 no.2
- /
- pp.111-128
- /
- 2003
Aerodynamic study is valuable information about the vocal efficiency in translating airflow to acoustic signal. The purpose of this study was to investigate the differences between trained singers and untrained controls under different two pitch by simultaneous using the airway interruption method and electroglottography (EGG). Under singing a Korean lied 'Gene', 20 (Male 10, Female 10) trained singers were studied on two one-octave different tone. Mean flow rate (MFR) , subglottic pressure (Psub) and intensity were measured with aerodynamic test using the Phonatory function analyzer (Nagashima Ltd. Model PS 77H, Tokyo, Japan). Closed quotients (Qx), jitter and shimmer were also investigated by electroglottography using Lx speech studio (Laryngograph Ltd, London, UK). These data were compared with those of normal controls. MFR and Psub were increased on high pitch tone in all subject groups. Statistically significant increasing of Qx and intensity were observed in male trained singers on high pitch tone (Qx;p = .025, intensity;p < .001). Beacasue of increasing of Qx and intensity, vocal efficiency was also significantly increased in male singers (p < .001). The trained singers' phonation was more efficient than untrained singers. The result means that the trained singers can increase the loudness with little changing of mean flow rate, subglottic pressure but more increasing of glottic closed quotients.
PDF

Electromyographic evidence for a gestural-overlap analysis of vowel devoicing in Korean

Jun, Sun-A;Beckman, M.;Niimi, Seiji;Tiede, Mark
- Speech Sciences
- /
- v.1
- /
- pp.153-200
- /
- 1997
In languages such as Japanese, it is very common to observe that short peripheral vowel are completely voiceless when surrounded by voiceless consonants. This phenomenon has been known as Montreal French, Shanghai Chinese, Greek, and Korean. Traditionally this phenomenon has been described as a phonological rule that either categorically deletes the vowel or changes the [+voice] feature of the vowel to [-voice]. This analysis was supported by Sawashima (1971) and Hirose (1971)'s observation that there are two distinct EMG patterns for voiced and devoiced vowel in Japanese. Close examination of the phonetic evidence based on acoustic data, however, shows that these phonological characterizations are not tenable (Jun & Beckman 1993, 1994). In this paper, we examined the vowel devoicing phenomenon in Korean using data from ENG fiberscopic and acoustic recorders of 100 sentences produced by one Korean speaker. The results show that there is variability in the 'degree of devoicing' in both acoustic and EMG signals, and in the patterns of glottal closing and opening across different devoiced tokens. There seems to be no categorical difference between devoiced and voiced tokens, for either EMG activity events or glottal patterns. All of these observations support the notion that vowel devoicing in Korean can not be described as the result of the application of a phonological rule. Rather, devoicing seems to be a highly variable 'phonetic' process, a more or less subtle variation in the specification of such phonetic metrics as degree and timing of glottal opening, or of associated subglottal pressure or intra-oral airflow associated with concurrent tone and stricture specifications. Some of token-pair comparisons are amenable to an explanation in terms of gestural overlap and undershoot. However, the effect of gestural timing on vocal fold state seems to be a highly nonlinear function of the interaction among specifications for the relative timing of glottal adduction and abduction gestures, of the amplitudes of the overlapped gestures, of aerodynamic conditions created by concurrent oral tonal gestures, and so on. In summary, to understand devoicing, it will be necessary to examine its effect on phonetic representation of events in many parts of the vocal tracts, and at many stages of the speech chain between the motor intent and the acoustic signal that reaches the hearer's ear.
PDF

Robust Speech Hash Function

Chen, Ning;Wan, Wanggen
- ETRI Journal
- /
- v.32 no.2
- /
- pp.345-347
- /
- 2010
In this letter, we present a new speech hash function based on the non-negative matrix factorization (NMF) of linear prediction coefficients (LPCs). First, linear prediction analysis is applied to the speech to obtain its LPCs, which represent the frequency shaping attributes of the vocal tract. Then, the NMF is performed on the LPCs to capture the speech's local feature, which is then used for hash vector generation. Experimental results demonstrate the effectiveness of the proposed hash function in terms of discrimination and robustness against various types of content preserving signal processing manipulations.
https://doi.org/10.4218/etrij.10.0209.0309 인용 PDF KSCI

Search Result 85, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)