Search | Korea Science

A DNN-Based Personalized HRTF Estimation Method for 3D Immersive Audio

Son, Ji Su;Choi, Seung Ho
- International Journal of Internet, Broadcasting and Communication
- /
- v.13 no.1
- /
- pp.161-167
- /
- 2021
This paper proposes a new personalized HRTF estimation method which is based on a deep neural network (DNN) model and improved elevation reproduction using a notch filter. In the previous study, a DNN model was proposed that estimates the magnitude of HRTF by using anthropometric measurements [1]. However, since this method uses zero-phase without estimating the phase, it causes the internalization (i.e., the inside-the-head localization) of sound when listening the spatial sound. We devise a method to estimate both the magnitude and phase of HRTF based on the DNN model. Personalized HRIR was estimated using the anthropometric measurements including detailed data of the head, torso, shoulders and ears as inputs for the DNN model. After that, the estimated HRIR was filtered with an appropriate notch filter to improve elevation reproduction. In order to evaluate the performance, both of the objective and subjective evaluations are conducted. For the objective evaluation, the root mean square error (RMSE) and the log spectral distance (LSD) between the reference HRTF and the estimated HRTF are measured. For subjective evaluation, the MUSHRA test and preference test are conducted. As a result, the proposed method can make listeners experience more immersive audio than the previous methods.
https://doi.org/10.7236/IJIBC.2021.13.1.161 인용 PDF KSCI

Exploration of Optimal Multi-Core Processor Architecture for Physical Modeling of Plucked-String Instruments (현악기의 물리적 모델링을 위한 최적의 멀티코어 프로세서 아키텍처 탐색)

Kang, Myeong-Su;Choi, Ji-Won;Kim, Yong-Min;Kim, Jong-Myon
- The Journal of the Acoustical Society of Korea
- /
- v.30 no.5
- /
- pp.281-294
- /
- 2011
Physics-based sound synthesis usually requires high computational costs and this results in a restriction of its use in real-time applications. This motivates us to implement the sound synthesis algorithm of plucked-string instruments using multi-core processor architectures and determine the optimal processing element (PE) configuration for the target instruments. To determine the optimal PE configuration, we evaluate the impacts of a sample-per-processing element (SPE) ratio that is defined as the amount of sample data directly mapped to each PE on system performance and both area and energy efficiencies using architectural and workload simulations. For the acoustic guitar, the highest area and energy efficiencies are achieved at a SPE ratio of 5,513 and 2,756, respectively, for the synthesis of musical sounds sampled at 44.1 kHz. In the case of the classical guitar, the maximum area and energy efficiencies are achieved at a SPE ratio of 22,050 and 5,513, respectively. In addition, the synthetic sounds were very similar to original sounds in their spectra. Furthermore, we conducted MUSHRA subjective listening test with ten subjects including nine graduate students and one professor from the University of Ulsan, and the evaluation of the synthetic sounds was excellent.
https://doi.org/10.7776/ASK.2011.30.5.281 인용 PDF KSCI

Low-bitrate Multichannel Audio Coding (저비트율 멀티채널 오디오 부호화)

Jang, Inseon;Seo, Jeongil;Beak, Seungkwon;Kang, Kyeongok
- Journal of Broadcast Engineering
- /
- v.10 no.3
- /
- pp.328-338
- /
- 2005
Technology for compressing low-bitrate multichannel audio coding is being standardized owing to the increasing need of consumer for multichannel audio contents. In this paper we propose the sound source location cue coding (SSLCC) for extremely compressing multichannel audio to be suitable at the narrow bandwidth transmission environment. To improve the compression capability of the conventional binaural cue coding(BCC), the SSLCC adopts the virtual source location information (VSLI) as a spatial cue parameter, a symmetric uniform quantizer, and Huffman coder. The objective and subjective assessment results show that the SSLCC provides lower bitrate and better audio quality than conventional BCC method.
PDF KSCI

Search Result 13, Processing Time 0.019 seconds

A DNN-Based Personalized HRTF Estimation Method for 3D Immersive Audio

Exploration of Optimal Multi-Core Processor Architecture for Physical Modeling of Plucked-String Instruments (현악기의 물리적 모델링을 위한 최적의 멀티코어 프로세서 아키텍처 탐색)

Low-bitrate Multichannel Audio Coding (저비트율 멀티채널 오디오 부호화)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)