Search | Korea Science

A Study of Optimum Time-Spread Echo Audio Watermarking via Listening Test (청취실험에 의한 에코확산 오디오 워터마킹방법의 최적화에 관한 검토)

Ko Byeong-Seob
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.545-546
- /
- 2004
서브밴드 분리에 의한 에코확산 오디오 워터마킹법은 호스트 신호를 특정 주파수 대역으로 분리하고, MPEG 심리음향 모델을 이용하여 각 대역별로 삽입되는 워터마크의 파워를 파라미터 설정 함수에 의하여 설정한다. 여기서, 본 방법의 강인성과 비지각성을 좌우하는 것은 파라미터 설정 함수가 된다. 따라서, 본 연구에서는 최대의 강인성과 최소의 음질 열화를 구현하기 위하여 청취실험을 실시하여 최적의 파라미터 설정 함수 설정방법에 대한 검토를 수행하였다.
PDF

Adaptive Watermarking for MP3 Copyright Protections Using Psychological Acoustics (심리음향 분석을 이용한 MP3 저작권 보안을 위한 적응적 워터마킹)

Lee, Kyeong-Hwan
- The Journal of the Acoustical Society of Korea
- /
- v.32 no.1
- /
- pp.64-70
- /
- 2013
In this paper, we suggest a new audio watermarking method for audio contents copyrights that can efficiently provide protection from MP3 compression attacks. Watermarks were inserted at the coefficients repeatedly from low frequencies to high frequencies after DCT transform in commonly used Cox's spread spectrum method. Because the methods using arbitrary coefficients are not effective, we use the new weight functions that make small losses for the watermark coefficients during attacks, using psychological acoustics. In the results of various sound clips, the suggested method had overall better outcomes than the Cox's method by preserving watermarks and reducing distortions of the original sounds.
https://doi.org/10.7776/ASK.2013.32.1.064 인용 PDF KSCI

Analysis and Synthesis of Audio Signals using a Sinusoidal Model with Psychoacoustic Criteria (정현파 모델을 이용한 오디오 신호의 심리음향적 분석 및 합성)

남승현;강경옥;홍진우
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.2
- /
- pp.77-82
- /
- 1999
A sinusoidal model has been widely used in the analysis and synthesis of speech and audio signals, and becomes one of the efficient candidates for high quality low bit rate audio coders. One of the crucial steps in the analysis and synthesis using a sinusoidal model is the detection of tonal components. This paper proposes an efficient method for the analysis and synthesis of audio signals using a sinusoidal model, which uses psychoacoustic criteria such as masking effect, masking index, and JNDf(Just Noticeable Difference in Frequency). Simulation results show that the proposed method reduces the number of sinusoids significantly without degrading the quality of the synthesized audio signals.
PDF

Selecting Sound-Field Control Factors in the Image Model Method Using Head-Related Transfer Function (머리전달함수를 이용한 영상 음원법에서 음장 제어 요소 결정)

임정빈
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06d
- /
- pp.56-59
- /
- 1998
머리전달함수(Head-Related Transfer Function, HRTF)를 이용한 영상 음원법(Image Model Method, IMM)을 적용하여 3차원 음장을 제어하기 위한 요소결정 방법을 제안한다. 제어 요소들은 직방체 내부에서의 음 에너지에 관한 이론을 토대로 결정하였다. 각 제어요소를 3차원 음장 모델에 적용하고, 헤드폰을 사용하여 청취자에 의한 심리음향 실험한 결과, 제어된 음장에서는 음상의 두외 정위, 거리감, 공간감이 실내에서와 같이 자연스럽게 형성됨을 나타냈다.
PDF

The Effect of the Cicadas' Songs on the Psychological Responses in Adolescents (매미과(科) 노랫소리가 청소년의 심리적 반응에 미치는 영향)

Yoon, Ki-Sang;Suh, Sang-Joon;Suh, Jae-Gap
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.4
- /
- pp.136-143
- /
- 2007
This experiment was performed to find out the effects of the cicadas' songs on the psychological responses in adolescents. As a basic coulee, the experiment to set up 'Acceptable & Unacceptable' was performed. As a further coulee, five kinds of the cicadas' songs heard frequently were analyzed and adjectives expressing the feeling to cicadas' songs were factor-analyzed, and psychological responses to auditory sensations were analyzed through regression equations. As a result, the effect of the Cryptotympana atrata's song, the Meimuna opalifera's song and the traffic noise are similar in the degree of disturbing the meditation but they are less disturbing than the white noise. The experiment for adjectives expressing was performed, because it is possible that cicadas' songs affect adolescents as a noise. Cicadas' songs can be expressed with three kinds of factors. First factor is [Annoyance], second factor is [Strength] and third factor is [Rhythm]. The first factor dominates in the songs of the Cryptotympana atrata and the Platypleura kaempferi who generate steady sound, and the third factor dominates in the songs of the Meimuna opalifera, the Leptosemia takanonis and the Oncotympana fuscata who generate fluctuating sounds. The loudness of sound didn't affect on the third factor but the emotional values of the fist and the second factors are linearly proportional to the loudness. The analysis results of the first factor associated with noise showed that the annoyance of adolescents is increased in the order of white noise - the Platypleura kaempferi - the Cryptotympana atrata, if the loudness of sounds are generated equally.
https://doi.org/10.7776/ASK.2007.26.4.136 인용 PDF KSCI

Optimization of Multi-time Scale Loss Function Suitable for DNN-based Audio Coder (심층신경망 기반 오디오 부호화기를 위한 Multi-time Scale 손실함수의 최적화)

Shin, Seung-Min;Byun, Joon;Park, Young-Cheol;Beack, Seung-kwon;Sung, Jong-mo
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2022.06a
- /
- pp.1315-1317
- /
- 2022
최근, 심층신경망 기반 오디오 부호화기가 활발히 연구되고 있다. 심층신경망 기반 오디오 부호화기는 기존의 전통적인 오디오 부호화기보다 구조적으로 간단하지만, 네트워크의 복잡도를 증가시키지 않고 인지적 성능향상을 기대하는 것은 어렵다. 이 문제를 해결하기 위하여 인간의 청각적 특성을 활용한 심리음향모델 기반 손실함수를 사용한 기법들이 소개되었다. 심리음향 모델 기반 손실함수를 사용한 오디오 부호화기는 양자화 잡음을 잘 제어하였지만, 여전히 지각적인 향상이 필요하다. 본 논문에서는 심층신경망 기반 오디오 부호화기를 위한 Multi-time Scale 손실함수의 지역 손실함수 윈도우 크기의 최적화 제안한다. Multi-time Scale 손실함수의 지역 손실함수 계산을 위한 윈도우 크기를 조절하며, 이를 통하여 오디오 부호화에 적합한 윈도우 사이즈를 결정한다. 실험을 통해 얻은 최적의 Multi-time Scale 손실함수를 사용하여 네트워크를 훈련하였고, 주관적 평가를 통해 기존의 심리음향모델 기반 손실함수보다 좋은 음성 품질을 보여주는 것을 확인하였다.
PDF

Analysis of Cognitive Psychology Creates in Sound Design Structure (영상음향의 사운드디자인구조가 수용자 감응도 에 미치는 영향(TV광고영상음향을 중심으로))

Yoo, Whoi-Jong;Moon, Nam-Mee
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2008.02a
- /
- pp.173-178
- /
- 2008
본 연구에서는 영상음향의 사운드디자인 구조의 차이가 수용자에게 어떠한 감응도 변화가 있는가를 알아 보았다. 실험은 60초의 TV광고음향을 음악으로만 디자인한 음향트랙과(A), 음향효과와 음악으로 디자인한 음향트랙(B)을 청취하게 하여, 어떠한 사운드 구조가 감응도(주의력, 활성도, 감성도)에 더 유의한 영향을 미치는가를 알아 보았다. 실험은5명씩의 남.여 대학생을 대상으로 이루어 졌고 안정상태와 Fp1, Fp2를 중심으로 한 전두엽측정방식으로 한 뉴로피드백(Neuro feedback)뇌파측정방법으로 측정하였다. 그 결과, 음악으로만 디자인한 A음향트랙에서의 감응도는 주의((ATQ)와 활성도(ACQ), 감성도(EQ)모두가 증가했으나 B트랙보다는 낮았다. 음악+효과음으로 디자인한 B트랙에서는 주의력(ATQ), 활성도(ACQ),는 A트랙보다 증가하였으나 EQ(감성도)는 반대로 A트랙보다 낮았다. 결과적으로 영상에서 음향은 주의력과 활성도 에는 기본적으로 뇌의 변화에 작용하나 정서적인 면은 역시 음악이 유효한 것으로 나타났다. 그러나 강한 효과음들은 음악보다 주의력과 집중도를 높이기는 하나 스트레스적(SQ)지수는 오히려 높은 것으로 나타났다. 본 연구를 통하여 방송과 영화 등의 사운드구조는 영상과의 호흡을 기본으로 하지만 실제로 시각정보와 청각정보가 동시에 합쳐져서, 심리적으로 어떠한 영향을 미치는가를 뇌신경측정으로 고찰한 것에 의의가 있다.
PDF

Sinusoidal Modeling of Audio Signals Using Perceptually Weighted Matching Pursuit (지각적으로 가중된 매칭 퍼슈잇을 이용한 오디오 신호의 정현파 모델링)

김연지;이인성
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.2
- /
- pp.96-103
- /
- 2003
This paper describes a method for sinusoidal modeling of audio signals using perceptually weighted matching pursuit. Matching pursuits extracts iteratively the greatest energy signals from the input signals until the residual between the original and the reconstructed signal is zero. In this paper, perceptual matching pursuits using psychoacoustic model to matching pursuit extracts greatest perceived energy iteratively. To evaluate the performance of the perceptual matching pursuits it is compared with the sinusoidal matching pursuits which is not included perceptual weighting. For various audio signals the result of simulation shows that the perceptual matching pursuit is superior to the sinusoidal matching pursuits, especially for a high change rate in time domain it can synthesized original signal.
PDF KSCI

An Optimization on the Psychoacoustic Model for MPEG-2 AAC Encoder (MPEG-2 AAC Encoder의 심리음향 모델 최적화)

Park, Jong-Tae;Moon, Kyu-Sung;Rhee, Kang-Hyeon
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.38 no.2
- /
- pp.33-41
- /
- 2001
Currently, the compression is one of the most important technology in multimedia society. Audio files arc rapidly propagated throughout internet Among them, the most famous one is MP-3(MPEC-1 Laver3) which can obtain CD tone from 128Kbps, but tone quality is abruptly down below 64Kbps. MPEC-II AAC(Advanccd Audio Coding) is not compatible with MPEG 1, but it has high compression of 1.4 times than MP 3, has max. 7.1 and 96KHz sampling rate. In this paper, we propose an algorithm that decreased the capacity of AAC encoding computation but increased the processing speed by optimizing psychoacoustic model which has enormous amount of computation in MPEG 2 AAC encoder. The optimized psychoacoustic model algorithm was implemented by C++ language. The experiment shows that the psychoacoustic model carries out FFT(Fast Fourier Transform) computation of 3048 point with 44.1 KHz sampling rate for SMR(Signal to Masking Ratio), and each entropy value is inputted to the subband filters for the control of encoder block. The proposed psychoacoustic model is operated with high speed because of optimization of unpredictable value. Also, when we transform unpredictable value into a tonality index, the speed of operation process is increased by a tonality index optimized in high frequency range.
PDF

Voice Activity Detection Method Using Psycho-Acoustic Model Based on Speech Energy Maximization in Noisy Environments (잡음 환경에서 심리음향모델 기반 음성 에너지 최대화를 이용한 음성 검출 방법)

Choi, Gab-Keun;Kim, Soon-Hyob
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.5
- /
- pp.447-453
- /
- 2009
This paper introduces the method for detect voices and exact end point at low SNR by maximizing voice energy. Conventional VAD (Voice Activity Detection) algorithm estimates noise level so it tends to detect the end point inaccurately. Moreover, because it uses relatively long analysis range for reflecting temporal change of noise, computing load too high for application. In this paper, the SEM-VAD (Speech Energy Maximization-Voice Activity Detection) method which uses psycho-acoustical bark scale filter banks to maximize voice energy within frames is introduced. Stable threshold values are obtained at various noise environments (SNR 15 dB, 10 dB, 5 dB, 0 dB). At the test for voice detection in car noisy environment, PHR (Pause Hit Rate) was 100%accurate at every noise environment, and FAR (False Alarm Rate) shows 0% at SNR15 dB and 10 dB, 5.6% at SNR5 dB and 9.5% at SNR0 dB.
https://doi.org/10.7776/ASK.2009.28.5.447 인용 PDF KSCI

Search Result 52, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)