• Title/Summary/Keyword: Subjective acoustic performance

Search Result 24, Processing Time 0.022 seconds

Enhancement of Sound Image Localization on Vertical Plane for Three-Dimensional Acoustic Synthesis (3차원 음향 합성을 위한 수직면에서의 음상 정위 향상)

  • 김동현;정하영;김기만
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.3 no.3
    • /
    • pp.541-546
    • /
    • 1999
  • The head-related transfer function (HRTF), which expresses the acoustic process from the sound source to the human ears in the free field, contains critical informations which the location of the source can be traced. It also makes it possible to realize multi-dimensional acoustic system that can approximately generate non-existing sound source. The use of non-individual, common HRTF brings performance degradation in localization ability such as front-back judgment error, elevation judgment error. In this paper, we have reduced the error on vertical plane by increasing the spectral notch level. The performance of the proposed method was Proved through subjective test that it is Possible to improve the ability to locate stationary/moving source.

  • PDF

STUDY ON THE OPTIMAL DESIGN OF A VEHICLE INTAKE SYSTEM USING THE BOOMING NOISE AND THE SOUND QUALITY EVALUATION INDEX

  • LEE J. K.;PARK Y. W.;CHAI J. B.
    • International Journal of Automotive Technology
    • /
    • v.7 no.1
    • /
    • pp.43-49
    • /
    • 2006
  • In this paper, an index for the evaluation of a vehicle intake booming noise and intake sound quality were developed through a correlation analysis and a multiple factor regression analysis of objective measurement and subjective evaluation data. At first, an intake orifice noise was measured at the wide-open throttle test condition. And then, an acoustic transfer function between intake orifice noise and interior noise at the steady state condition was estimated. Simultaneously, subjective evaluation was carried out with a 10-scale score by 8 intake noise and vibration expert evaluators. Next, the correlation analysis between the psychoacoustic parameters derived from the measured data and the subjective evaluation was performed. The most critical factor was determined and the corresponding index for intake booming noise and sound quality are obtained from the multiple factor regression method. And, the optimal design of intake system was studied using the booming noise and the sound quality evaluation index for expectation performance of intake system. Conclusively, the optimal designing parameters of intake system from noise level and sound quality whose point of view were extracted by adapting comparative weighting between the booming noise and sound quality evaluation index, which optimized the process. These work could be represented guideline to system engineers, designers and test engineers about optimization procedure of system performance by considering both of noise level and sound quality.

Study of Focusing Characteristics of Ultrasound for Designing Acoustic Lens in Ultrasonic Moxibustion Device (뜸 자극용 초음파 치료기기의 음향렌즈 설계를 위한 초음파 집속 특성 연구)

  • Bae, Jae-Hyun;Song, Sung-Jin;Kim, Hak-Joon;Kim, Ki-Bok
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.35 no.2
    • /
    • pp.134-140
    • /
    • 2015
  • Traditional moxibustion therapy can cause severe pain and leave scarring burns at the moxibustion site as it relies on the practitioner's subjective and qualitative treatment. Recently, ultrasound therapy has received attention as an alternative to moxibustion therapy owing to its objectiveness and quantitative nature. However, in order to convert ultrasound energy into heat energy, there is a need to precisely understand the ultrasound-focusing characteristics of the acoustic lens. Therefore, in this study, an FEM simulation was performed for acoustic lenses with different geometries a concave lens and zone lens as the geometry critically influences ultrasound focusing. The acoustic pressure field, amplitude, and focal point were also calculated. Furthermore, the performance of the fabricated acoustic lens was verified by a sound pressure measurement experiment.

An Enhanced MELP Vocoder in Noise Environments (MELP 보코더의 잡음성능 개선)

  • 전용억;전병민
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.28 no.1C
    • /
    • pp.81-89
    • /
    • 2003
  • For improving the performance of noise suppression in tactical communication environments, an enhanced MELP vocoder is suggested, in which an acoustic noise suppressor is integrated into the front end of the MELP algorithm, and an FEC code into the channel side of the MELP algorithm. The acoustic noise suppressor is the modified IS-127 EVRC noise suppressor which is adapted for the MELP vocoder. As for FEC, the turbo code, which consists of rate-113 encoding and BCJR-MAP decoding algorithm, is utilized. In acoustic noise environments, the lower the SNR becomes, the more the effects of noise suppression is increased. Moreover, The suggested system has greater noise suppression effects in stationary noise than in non-stationary noise, and shows its superiority by 0.24 in MOS test to the original MELP vocoder. When the interleave size is one MELP frame, BER 10-6 is accomplished at channel bit SNR 4.2 ㏈. The iteration of decoding at 3 times is suboptimal in its complexity vs. performance. Synthetic quality is realized as more than MOS 2.5 at channel bit SNR 2 ㏈ in subjective voice quality test, when the interleave size is one MELP frame and the iteration of decoding is more than 3 times.

Wideband Speech Reconstruction Using Modular Neural Networks (모듈화한 신경 회로망을 이용한 광대역 음성 복원)

  • Woo Dong Hun;Ko Charm Han;Kang Hyun Min;Jeong Jin Hee;Kim Yoo Shin;Kim Hyung Soon
    • MALSORI
    • /
    • no.48
    • /
    • pp.93-105
    • /
    • 2003
  • Since telephone channel has bandlimited frequency characteristics, speech signal over the telephone channel shows degraded speech quality. In this paper, we propose an algorithm using neural network to reconstruct wideband speech from its narrowband version. Although single neural network is a good tool for direct mapping, it has difficulty in training for vast and complicated data. To alleviate this problem, we modularize the neural networks based on appropriate clustering of the acoustic space. We also introduce fuzzy computing to compensate for probable misclassification at the cluster boundaries. According to our simulation, the proposed algorithm showed improved performance over the single neural network and conventional codebook mapping method in both objective and subjective evaluations.

  • PDF

Investigation of the listening environment for lower grade students in elementary school using subjective tests (주관적 평가법을 이용한 초등학교 저학년 교실의 청취환경 조사)

  • Park, Chan-Jae;Haan, Chan-Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.3
    • /
    • pp.201-212
    • /
    • 2021
  • The present study was conducted as a pilot investigation to suggest the standards of acoustic performance for classrooms suitable for incomplete hearing people such as children under 9 years of age. Subjective evaluations such as questionnaire and speech intelligibility test were conducted to 264 students at two elementary schools in Cheong-ju in order to analyze the characteristics of the listening environment in the classrooms of the lower grades in elementary school. The survey was undertaken with a total of 264 students at two elementary schools in Cheong-ju, and investigated their satisfaction with the classroom listening environment. As a result, students responded that the most helpful information type for understanding class content is the voice of teacher. In addition, the volume of the current teacher's voice is normal, and the level of clarity is highly satisfactory. As for the acoustic performance of the classroom, the opinion that the noise was normal and the reverberation was very short was found to be dominant in overall satisfaction with the listening environment. Meanwhile, as a result of speech intelligibility test using the word list selected for the lower grade students of elementary school, it could be inferred that the longitudinal axis distance from the sound source in the case of 8-year-olds is a factor that affects speech recognition.

Speech synthesis using acoustic Doppler signal (초음파 도플러 신호를 이용한 음성 합성)

  • Lee, Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.2
    • /
    • pp.134-142
    • /
    • 2016
  • In this paper, a method synthesizing speech signal using the 40 kHz ultrasonic signals reflected from the articulatory muscles was introduced and performance was evaluated. When the ultrasound signals are radiated to articulating face, the Doppler effects caused by movements of lips, jaw, and chin observed. The signals that have different frequencies from that of the transmitted signals are found in the received signals. These ADS (Acoustic-Doppler Signals) were used for estimating of the speech parameters in this study. Prior to synthesizing speech signal, a quantitative correlation analysis between ADS and speech signals was carried out on each frequency bin. According to the results, the feasibility of the ADS-based speech synthesis was validated. ADS-to-speech transformation was achieved by the joint Gaussian mixture model-based conversion rules. The experimental results from the 5 subjects showed that filter bank energy and LPC (Linear Predictive Coefficient) cepstrum coefficients are the optimal features for ADS, and speech, respectively. In the subjective evaluation where synthesized speech signals were obtained using the excitation sources extracted from original speech signals, it was confirmed that the ADS-to-speech conversion method yielded 72.2 % average recognition rates.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

Grading of Architectural Plumbing Noise using Psycho-Acoustic Experiment (청감실험을 이용한 건축설비소음의 등급화)

  • You, Hee-Jong;Jung, Chul-Woon;Kim, Jae-Soo
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.11a
    • /
    • pp.488-492
    • /
    • 2007
  • Since Equipment-Noises occurring in machine room penetrate wall-body and travel to the adjacent room in the form of air-borne sound and structure-borne sound, it is appearing as the main factor that injures a peaceful residential environment, and due to this, such noise-damage is currently increasing rapidly. Consequently, despite the measures for sound-insulation and soundproof against the equipment-noise penetrated wall-body is urgently required, but as the subjective evaluation considered for psychological response about the equipment-noise was not practiced, many dissatisfactions are still raising even after some measure was taken. On such point of view, this Research, at the spot, has actually measured the equipment-noise that had permeated various wall-bodies, and examined its physical characteristics, thence based upon this, has conducted Psycho-Acoustics Experiment and investigated the interrelation between the physical evaluation value and psychological reaction value. It is considered that such study result could be utilized as the useful material at the time of establishment of the Grading Standard for the insulation performance against the Architectural Plumbing Noise, in the future.

  • PDF

Human Laughter Generation using Hybrid Generative Models

  • Mansouri, Nadia;Lachiri, Zied
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1590-1609
    • /
    • 2021
  • Laughter is one of the most important nonverbal sound that human generates. It is a means for expressing his emotions. The acoustic and contextual features of this specific sound are different from those of speech and many difficulties arise during their modeling process. During this work, we propose an audio laughter generation system based on unsupervised generative models: the autoencoder (AE) and its variants. This procedure is the association of three main sub-process, (1) the analysis which consist of extracting the log magnitude spectrogram from the laughter database, (2) the generative models training, (3) the synthesis stage which incorporate the involvement of an intermediate mechanism: the vocoder. To improve the synthesis quality, we suggest two hybrid models (LSTM-VAE, GRU-VAE and CNN-VAE) that combine the representation learning capacity of variational autoencoder (VAE) with the temporal modelling ability of a long short-term memory RNN (LSTM) and the CNN ability to learn invariant features. To figure out the performance of our proposed audio laughter generation process, objective evaluation (RMSE) and a perceptual audio quality test (listening test) were conducted. According to these evaluation metrics, we can show that the GRU-VAE outperforms the other VAE models.