• Title/Summary/Keyword: 음성 품질의 지각평가

Search Result 7, Processing Time 0.022 seconds

A Study of Subjective Speech Quality Measurement in VoIP (VoIP 음질의 주관적 평가에 관한 연구)

  • 강영도;강진석;최연성;김장형
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.2
    • /
    • pp.279-287
    • /
    • 2001
  • In this paper, we discuss the scale of subjective speech quality measurement over VoIP(Voice over IP) network which is a component of broadband networks. Objective parameters of multimedia services like PSNR or jitter can easily measured and defined, but these factors are not easily meet the user's perceptual recognition. We suggest the speech quality measurement scale through the subjective measurement for end-to-end speech quality composed of sender-side quality, transmission quality, receiver-side quality, which provide the degree of correctness of representation of speaker, the degree of impairment caused by various factors, the degree of recognition of processed speech, respectively. Also, we examined the proposed method and verify it's availability.

  • PDF

Effect of HRTF on Sound Localization (머리전달함수가 음상정위에 미치는 영향)

  • 김진욱
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06c
    • /
    • pp.261-264
    • /
    • 1998
  • 본 논문에서는 MIT 머리전달함수(Head-Related Transfer Function; HRTF)와 Neumann의 머리전달함수를 이용하여 머리전달함수가 음성정위에 미치는 영향을 비교분석하였다. 이를 위하여 머리전달함수의 측정조건과 시간 및 주파수특성을 비교 분석하였고 청취실에 헤드폰 재생을 통하여 $10^{\circ}$간격으로 음상정위에 대한 주관평가들 실시하였으며, 주관평가 자료를 이용하여 개인과 전체 평균에 대한 방향 지각 에러(각도)를 계산하였다. 실험결과, MIT 머리전달함수에 비하여 Neumann 머리전달함수를 이용한 음상정위가 양호하게 나타났으며 음질에 대해서도 청취자들은 Neumann 머리전달함수에 의한 재생음이 보다 자연스럽고, 명확한 품질을 갖는다고 답하였다.

  • PDF

Quality Assessment and Predistortion Evaluation of the Multi-channel Audio Codec according to the bitrate changing (압축율 변화에 따른 멀티채널 오디오의 품질 및 Predistortion 의 영향 평가)

  • Cha, Kyung-Hwan;Jang, Dae-Young;Kim, Sung-Han;Kim, Chun-Duck
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.55-60
    • /
    • 1996
  • This paper describes the subjective assessment of the multi-channel audio quality according to the bitrate changing and evaluates the predistortion effect to avoid the unmasked noise after matrixing/dematrxing process in transmission and regeneration of the multi-channel audio. The simulation is processed by the perceptual coding that is MPEG-2 Audio layer II algorithm. We evaluate the quality improvement about predistortion using or not by 384, 320, 256, 128kbps. As the result of the double blind subjective assessment, 5 Grade-Impairment Scale is scored under minus one to 320kbps and so audio quality is evaluated to be perceptible, but not annoying in 3/2 channel. The effect of the predistortion is improved one level in 128kbps and especially speech test material I better improved than music test materials.

  • PDF

A study on speech enhancement using complex-valued spectrum employing Feature map Dependent attention gate (특징 맵 중요도 기반 어텐션을 적용한 복소 스펙트럼 기반 음성 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.544-551
    • /
    • 2023
  • Speech enhancement used to improve the perceptual quality and intelligibility of noise speech has been studied as a method using a complex-valued spectrum that can improve both magnitude and phase in a method using a magnitude spectrum. In this paper, a study was conducted on how to apply attention mechanism to complex-valued spectrum-based speech enhancement systems to further improve the intelligibility and quality of noise speech. The attention is performed based on additive attention and allows the attention weight to be calculated in consideration of the complex-valued spectrum. In addition, the global average pooling was used to consider the importance of the feature map. Complex-valued spectrum-based speech enhancement was performed based on the Deep Complex U-Net (DCUNET) model, and additive attention was conducted based on the proposed method in the Attention U-Net model. The results of the experiments on noise speech in a living room environment showed that the proposed method is improved performance over the baseline model according to evaluation metrics such as Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short Time Object Intelligence (STOI), and consistently improved performance across various background noise environments and low Signal-to-Noise Ratio (SNR) conditions. Through this, the proposed speech enhancement system demonstrated its effectiveness in improving the intelligibility and quality of noisy speech.

Evaluation of a signal segregation by FDBM (FDBM의 음원분리 성능평가)

  • Lee, Chai-Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.12
    • /
    • pp.1793-1802
    • /
    • 2013
  • Various approaches for sound source segregation have been proposed. Among these approaches, frequency domain binaural model(FDBM) has the advantages of low computational load and effective howling cancellation. A binaural hearing assistance system based on FDBM has been proposed. This system can enhance desired signal based on the directivity information. Although FDBM has been evaluated in terms of signal-to-noise ratio (SNR) and coherence function, the evaluation results do not always agree with the human impressions. These evaluation methods provide physical measures, and do not take account of perceptual aspect of human being. Considering a binaural hearing assistance system as a one of major applications, the quality of segregated sound should keep level enough. In the paper, signal segregation performance by means of FDBM is evaluated by three objective methods, i.e., SNR, coherence and Perceptual Evaluation of Speech Quality(PESQ), to discuss the characteristic of FDBM on the sound source segregation performance. The simulation's evaluation results show that FDBM improves the quality of the left and right channel signals to an equivalent level. And the results suggest the possibility that PESQ provides a more useful measure than SNR and coherence in terms of the segregation performance of FDBM. The evaluation results by PESQ show the effects from segregation parameters and indicate appropriate parameters under the conditions. In the paper, signal segregation performance by means of FDBM is evaluated by three objective methods, i.e., SNR, coherence and PESQ, to discuss the characteristic of FDBM on the sound source segregation performance. The simulation's evaluation results show that FDBM improves the quality of the left and right channel signals to an equivalent level. And the results suggest the possibility that PESQ provides a more useful measure than SNR and coherence in terms of the segregation performance of FDBM. The evaluation results by PESQ show the effects from segregation parameters and indicate appropriate parameters under the conditions.

Optimization of Multi-time Scale Loss Function Suitable for DNN-based Audio Coder (심층신경망 기반 오디오 부호화기를 위한 Multi-time Scale 손실함수의 최적화)

  • Shin, Seung-Min;Byun, Joon;Park, Young-Cheol;Beack, Seung-kwon;Sung, Jong-mo
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.06a
    • /
    • pp.1315-1317
    • /
    • 2022
  • 최근, 심층신경망 기반 오디오 부호화기가 활발히 연구되고 있다. 심층신경망 기반 오디오 부호화기는 기존의 전통적인 오디오 부호화기보다 구조적으로 간단하지만, 네트워크의 복잡도를 증가시키지 않고 인지적 성능향상을 기대하는 것은 어렵다. 이 문제를 해결하기 위하여 인간의 청각적 특성을 활용한 심리음향모델 기반 손실함수를 사용한 기법들이 소개되었다. 심리음향 모델 기반 손실함수를 사용한 오디오 부호화기는 양자화 잡음을 잘 제어하였지만, 여전히 지각적인 향상이 필요하다. 본 논문에서는 심층신경망 기반 오디오 부호화기를 위한 Multi-time Scale 손실함수의 지역 손실함수 윈도우 크기의 최적화 제안한다. Multi-time Scale 손실함수의 지역 손실함수 계산을 위한 윈도우 크기를 조절하며, 이를 통하여 오디오 부호화에 적합한 윈도우 사이즈를 결정한다. 실험을 통해 얻은 최적의 Multi-time Scale 손실함수를 사용하여 네트워크를 훈련하였고, 주관적 평가를 통해 기존의 심리음향모델 기반 손실함수보다 좋은 음성 품질을 보여주는 것을 확인하였다.

  • PDF

A study on sound source segregation of frequency domain binaural model with reflection (반사음이 존재하는 양귀 모델의 음원분리에 관한 연구)

  • Lee, Chai-Bong
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.15 no.3
    • /
    • pp.91-96
    • /
    • 2014
  • For Sound source direction and separation method, Frequency Domain Binaural Model(FDBM) shows low computational cost and high performance for sound source separation. This method performs sound source orientation and separation by obtaining the Interaural Phase Difference(IPD) and Interaural Level Difference(ILD) in frequency domain. But the problem of reflection occurs in practical environment. To reduce this reflection, a method to simulate the sound localization of a direct sound, to detect the initial arriving sound, to check the direction of the sound, and to separate the sound is presented. Simulation results show that the direction is estimated to lie close within 10% from the sound source and, in the presence of the reflection, the level of the separation of the sound source is improved by higher Coherence and PESQ(Perceptual Evaluation of Speech Quality) and by lower directional damping than those of the existing FDBM. In case of no reflection, the degree of separation was low.