• Title/Summary/Keyword: speech quality evaluation

Search Result 178, Processing Time 0.026 seconds

Two-Channel Noise Reduction Using Beamforming and DOA-Based Masking (빔포밍 및 DOA 기반의 마스킹을 이용한 2채널 잡음제거)

  • Kim, Youngil;Jeong, Sangbae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.1
    • /
    • pp.32-40
    • /
    • 2013
  • In this paper, we propose a multi-channel speech enhancement algorithm using beamforming and direction-of-arrival (DOA)-based masking. The proposed algorithm enhances noisy speech basically by the linearly constrained minimum variance (LCMV) algorithm and then a mel-scale Wiener filter designed using DOA-based masking is applied to remove still remaining noises. To improve the performance, we optimize the learning rate of the adaptive filters in LCMV and the DOA threshold to detect target speech spectrum. As performance indices, the perceptual evaluation of speech quality (PESQ) score and output SNRs are measured. Experimantal results show that the proposed algorithm outperforms the conventional LCMV beamformer by 0.09 in PESQ score and 5.75 dB in output SNR, respectively.

The Effects of Vocal Relaxation Training on Voice Improvement of Children with Vocal Nodules (성대접촉이완훈련이 성대결절아동의 음성개선에 미치는 효과)

  • Han, Ji Eun;Seong, Cheol Jae
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.147-154
    • /
    • 2012
  • The purpose of this study is to examine the effect of voice improvement when vocal training, which relaxes the vocal contact, is applied to children with vocal nodules. Subjects included 20 5- to 12-year-old boys with vocal nodules in Otolaryngology and for whom voice therapy had been advised. The vocal therapy was conducted for 40 minutes per a week for a total of eight times. Results were evaluated by videostroboscopy, auditory-perceptual evaluation of GRBAS Scale, aerodynamic test, and acoustic analysis before and after therapy. As a result, first, the size of vocal nodules was reduced and the unstable pattern of vocal contact was improved. Glottic closure was increased and Phase symmetry was decreased during vocal vibration. Mucosal wave was increased and muscle tension of the larynx was reduced. Second, auditory-perceptual evaluation showed that subjects' overall quality of voice improved. GRBAS Scale Evaluation showed that the characteristics of the subjects' voice which were rough, breathy, and strained and breathy were reduced after therapy. Third, the measurements of acoustic parameters showed a statistically significant improvement. The fundamental frequency of the subejects' voice was increased and values of Jitter and Shimmer, NHR, [H1-H2] decreased. Fourth, the maximum phonation time of children was increased. These results imply that vocal relaxation training conducted in this study has a very positive effect to improve the voice of children with vocal nodules.

Acoustic Analysis of Voice Change According to Extent of Thyroidectomy (갑상선 수술범위에 따른 음성의 음향적 분석)

  • Kang, Young Ae;Koo, Bon Seok
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.77-83
    • /
    • 2015
  • Voice complication without the laryngeal nerve injury can occur after thyroidectomy. The purpose of this study is to investigate voice changes according to extent of thyroidectomy with acoustic analysis. Thirty-five female patients with papillary thyroid carcinoma took voice evaluation at before and 1 month, and 3 months after thyroidectomy. Acoustic analysis parameters were speaking fundamental frequency(SFF), min $F_0$, max $F_0$, dynamic range $F_0$, jitter, shimmer, noise-to-harmonic ratio(NHR), and Cepstral prominence peak(CPP). Repeated-measured analysis of variance was applied. Time-related voice changes showed significant differences in all parameters except NHR. At 1 month after surgery, voice quality was worse and pitch was decreasing, but voice quality and pitch were improving at 3-month follow-up. Voice changes according to the extent of surgery were in SFF, max $F_0$, and dynamic range $F_0$. Time by surgery-related voice change existed only in min $F_0$. The result showed that the severity of voice complication depended on the extend of thyroidectomy which had a negative impact on $F_0$-related parameters. The deterioration of voice quality at 1 month after thyroidectomy may be affected by the loss of thyroid hormone in the blood. The descent of $F_0$-related parameters may be impacted by laryngeal fixation of surgical site adhesion.

Improvement of Speech Intelligibility in Noisy Environments (잡음 환경에서의 음성 명료도 향상 기술)

  • Yoon, Jae-Yul;Kim, Jung-Hoe;Oh, Eun-Mi;Park, Ho-Chong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.1
    • /
    • pp.70-76
    • /
    • 2009
  • In speech communications in noisy environments, speech intelligibility is seriously degraded due to the masking effect of ambient noise. In this paper, a new method to improve speech intelligibility in noisy environments is proposed. Based on the perception theory that the temporal envelope plays a major role in determining intelligibility, the proposed method uses a novel operation that enhances the fluctuation of band-wise temporal envelope and also contains pitch enhancement for improving speech naturalness. In addition, a new subjective evaluation scheme employing binaural listening is proposed in order to measure more reliable performance. The subjective performance measured with the proposed scheme shows that the proposed method improves both intelligibility and naturalness in various environments, whereas a function parameter can control the performance trade-off between intelligibility and naturalness.

Psycho-acoustic evaluation of the indoor noise in cabins of a naval vessel using a back-propagation neural network algorithm

  • Han, Hyung-Suk
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • v.4 no.4
    • /
    • pp.374-385
    • /
    • 2012
  • The indoor noise of a ship is usually determined using the A-weighted sound pressure level. However, in order to better understand this phenomenon, evaluation parameters that more accurately reflect the human sense of hearing are required. To find the level of the satisfaction index of the noise inside a naval vessel such as "Loudness" and "Annoyance", psycho-acoustic evaluation of various sound recordings from the naval vessel was performed in a laboratory. The objective of this paper is to develop a single index of "Loudness" and "Annoyance" for noise inside a naval vessel according to a psycho-acoustic evaluation by using psychological responses such as Noise Rating (NR), Noise Criterion (NC), Room Criterion (RC), Preferred Speech Interference Level (PSIL) and loudness level. Additionally, in order to determine a single index of satisfaction for noise such as "Loudness" and "Annoyance", with respect to a human's sense of hearing, a back-propagation neural network is applied.

Acoustic Features of Oral Vowels in the Esophagus Speakers (식도음성의 모음종류에 따른 음향학적 특성)

  • Yun, Eunmi;Mok, Eunhee;Minh, Phan huu Ngoc;Hong, Kihwan
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.85-92
    • /
    • 2015
  • This study aimed to establish characteristics related to voice and speech through the natural base frequency analysis of esophagus vocalization. In the study, 8 subjects were selected for esophagus vocals, and 10 other subjects were selected for a control group. MDVP(Multi-dimensional Voice Program, Model 4800, USA, 2001), Multi Speech(Model 3700, Kaypantax, USA, 2008) were used as experiment equipment. The speech samples selected for evaluation were vowels and sentences (both declarative and interrogative). For acoustic analysis, the intonation form of fo, jitter, energy, shimmer, HNR, and intonation patterns of the speech sample were measured. The results were as follows: First, the natural intrinsic frequency of extended vowels in the esophagus vocal group was lower than the frequency in the normal vocal group. In particular, the intrinsic frequency difference for high vowel /i/ was much greater than the frequency difference for low vowel /a/. Second, the jitter values of the esophagus vocal group were higher than the control group. In particular, there was a large difference between the jitter values for /a/ and /i/, with the jitter values being highest for /i/. Third, there was no significant difference in vocal strength between the esophagus vocal patient group and the control group. Fourth, the shimmer values of the voices in the esophagus vocal group were higher than shimmer values in the control group. In particular, there was a large difference in shimmer values for low vowel /a/. Fifth, the HNR values of the esophagus vocal group were showed significantly lower than the control group. In particular, the largest difference in HNR values between the two groups was for high vowel /i/. Sixth, the pitch contours of interrogative and declarative sentences of the esophagus vocal patient group showed a different form or only had with small differences compared to the pitch contours of the normal vocal group, thus presenting an inconsistent pattern.

Real-time Implementation of Speech and Channel Coder on a DSP Chip for Radio Communication System (무선통신 적용을 위한 단일 DSP칩상의 음성/채널 부호화기 실시간 구현)

  • Kim Jae-Won;Sohn Dong-Chul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.6
    • /
    • pp.1195-1201
    • /
    • 2005
  • This paper deals with procedures and results for teal time implementation of G.729 speech coder and channel coder including convolution codec, viterbi decoder, and interleaver using a fixed point DSP chip for radio communication systems. We described the method for real-time implementation based on integer simulation results and explained the implemented results by quality performance and required complexity for real-time operation. The required complexity was 24MIPS and 9MIPS in computational load, and 12K words and 4K words in execution code length for speech and channel. The functional evaluation was performed into two steps. The one was bit exact comparison with a fixed point C code, the other was executed by actual speech samples and error test vectors. Unlik other results such as individual implementation, We implemented speech and channel coders on a DSP chip with 160MIPS computation capability and 64 K words memory on chip. This results outweigh the conventional methods in the point of system complexity and implementation cost for radio communication system.

Transcoding Algorithm for SMV and G.723.1 Vocoders via Direct Parameter Transformation (SMV와 G.723.1 음성부호화기를 위한 파라미터 직접 변환 방식의 상호부호화 알고리듬)

  • 서성호;장달원;이선일;유창동
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2228-2231
    • /
    • 2003
  • In this paper, a transcoding algorithm for the Selectable Mode Vocoder (SMV) and the G.723.1 speech coder via direct parameter transformation is proposed. In contrast to the conventional tandem transcoding algorithm, the proposed algorithm converts the parameters of one coder to the Other Without going through the decoding md encoding process. The proposed algorithm is composed of four parts: the parameter decoding, line spectral pair (LSP) conversion, pitch period conversion, excitation conversion and rate selection. The evaluation results show that the proposed algorithm achieves equivalent speech quality to that of tandem transcoding with reduced computational complexity and delay.

  • PDF

Voice conversion using low dimensional vector mapping (낮은 차원의 벡터 변환을 통한 음성 변환)

  • Lee, Kee-Seung;Doh, Won;Youn, Dae-Hee
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.4
    • /
    • pp.118-127
    • /
    • 1998
  • In this paper, we propose a voice personality transformation method which makes one person's voice sound like another person's voice. In order to transform the voice personality, vocal tract transfer function is used as a transformation parameter. Comparing with previous methods, the proposed method can obtain high-quality transformed speech with low computational complexity. Conversion between the vocal tract transfer functions is implemented by a linear mapping based on soft clustering. In this process, mean LPC cepstrum coefficients and mean removed LPC cepstrum modeled by the low dimensional vector are used as transformation parameters. To evaluate the performance of the proposed method, mapping rules are generated from 61 Korean words uttered by two male and one female speakers. These rules are then applied to 9 sentences uttered by the same persons, and objective evaluation and subjective listening tests for the transformed speech are performed.

  • PDF

Adaptive TCX Windowing Technology for Unified Structure MPEG-D USAC

  • Lee, Tae-Jin;Beack, Seung-Kwon;Kang, Kyeong-Ok;Kim, Whan-Woo
    • ETRI Journal
    • /
    • v.34 no.3
    • /
    • pp.474-477
    • /
    • 2012
  • The MPEG-D unified speech and audio coding (USAC) standardization process was initiated by MPEG to develop an audio codec that is able to provide consistent quality for mixed speech and music contents. The current USAC reference model structure consists of frequency domain (FD) and linear prediction domain (LPD) core modules and is controlled using a signal classifier tool. In this letter, we propose an LPD single-mode USAC structure using an adaptive widowing-based transform-coded excitation module. We tested our system using official test items for all mono-evaluation modes. The results of the experiment show that the objective and subjective performances of the proposed single-mode USAC system are better than those of the FD/LPD dual-mode USAC system.