• Title/Summary/Keyword: perceptual distortion

Search Result 63, Processing Time 0.022 seconds

Interval-based Audio Integrity Authentication Algorithm using Reversible Watermarking (가역 워터마킹을 이용한 구간 단위 오디오 무결성 인증 알고리즘)

  • Yeo, Dong-Gyu;Lee, Hae-Yeoun
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.9-18
    • /
    • 2012
  • Many audio watermarking researches which have been adapted to authenticate contents can not recover the original media after watermark removal. Therefore, reversible watermarking can be regarded as an effective method to ensure the integrity of audio data in the applications requiring high-confidential audio contents. Reversible watermarking inserts watermark into digital media in such a way that perceptual transparency is preserved, which enables the restoration of the original media from the watermarked one without any loss of media quality. This paper presents a new interval-based audio integrity authentication algorithm which can detect malicious tampering. To provide complete reversibility, we used differential histogram-based reversible watermarking. To authenticate audio in parts, not the entire audio at once, the proposed algorithm processes audio by dividing into intervals and the confirmation of the authentication is carried out in each interval. Through experiments using multiple kinds of test data, we prove that the presented algorithm provides over 99% authenticating rate, complete reversibility, and higher perceptual quality, while maintaining the induced-distortion low.

Novel Robust High Dynamic Range Image Watermarking Algorithm Against Tone Mapping

  • Bai, Yongqiang;Jiang, Gangyi;Jiang, Hao;Yu, Mei;Chen, Fen;Zhu, Zhongjie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.9
    • /
    • pp.4389-4411
    • /
    • 2018
  • High dynamic range (HDR) images are becoming pervasive due to capturing or rendering of a wider range of luminance, but their special display equipment is difficult to be popularized because of high cost and technological problem. Thus, HDR images must be adapted to the conventional display devices by applying tone mapping (TM) operation, which puts forward higher requirements for intellectual property protection of HDR images. As the robustness presents regional diversity in the low dynamic range (LDR) watermarked image after TM, which is different from the traditional watermarking technologies, a concept of watermarking activity is defined and used to distinguish the essential distinction of watermarking between LDR image and HDR image in this paper. Then, a novel robust HDR image watermarking algorithm is proposed against TM operations. Firstly, based on the hybrid processing of redundant discrete wavelet transform and singular value decomposition, the watermark is embedded by modifying the structure information of the HDR image. Distinguished from LDR image watermarking, the high embedding strength can cause more obvious distortion in the high brightness regions of HDR image than the low brightness regions. Thus, a perceptual brightness mask with low complexity is designed to improve the imperceptibility further. Experimental results show that the proposed algorithm is robust to the existing TM operations, with taking into account the imperceptibility and embedded capacity, which is superior to the current state-of-art HDR image watermarking algorithms.

A study on combination of loss functions for effective mask-based speech enhancement in noisy environments (잡음 환경에 효과적인 마스크 기반 음성 향상을 위한 손실함수 조합에 관한 연구)

  • Jung, Jaehee;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.3
    • /
    • pp.234-240
    • /
    • 2021
  • In this paper, the mask-based speech enhancement is improved for effective speech recognition in noise environments. In the mask-based speech enhancement, enhanced spectrum is obtained by multiplying the noisy speech spectrum by the mask. The VoiceFilter (VF) model is used as the mask estimation, and the Spectrogram Inpainting (SI) technique is used to remove residual noise of enhanced spectrum. In this paper, we propose a combined loss to further improve speech enhancement. In order to effectively remove the residual noise in the speech, the positive part of the Triplet loss is used with the component loss. For the experiment TIMIT database is re-constructed using NOISEX92 noise and background music samples with various Signal to Noise Ratio (SNR) conditions. Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI) are used as the metrics of performance evaluation. When the VF was trained with the mean squared error and the SI model was trained with the combined loss, SDR, PESQ, and STOI were improved by 0.5, 0.06, and 0.002 respectively compared to the system trained only with the mean squared error.

A study on loss combination in time and frequency for effective speech enhancement based on complex-valued spectrum (효과적인 복소 스펙트럼 기반 음성 향상을 위한 시간과 주파수 영역 손실함수 조합에 관한 연구)

  • Jung, Jaehee;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.38-44
    • /
    • 2022
  • Speech enhancement is performed to improve intelligibility and quality of the noise-corrupted speech. In this paper, speech enhancement performance was compared using different loss functions in time and frequency domains. This study proposes a combination of loss functions to utilize advantage of each domain by considering both the details of spectrum and the speech waveform. In our study, Scale Invariant-Source to Noise Ratio (SI-SNR) is used for the time domain loss function, and Mean Squared Error (MSE) is used for the frequency domain, which is calculated over the complex-valued spectrum and magnitude spectrum. The phase loss is obtained using the sin function. Speech enhancement result is evaluated using Source-to-Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI). In order to confirm the result of speech enhancement, resulting spectrograms are also compared. The experimental results over the TIMIT database show the highest performance when using combination of SI-SNR and magnitude loss functions.

A study on deep neural speech enhancement in drone noise environment (드론 소음 환경에서 심층 신경망 기반 음성 향상 기법 적용에 관한 연구)

  • Kim, Jimin;Jung, Jaehee;Yeo, Chaneun;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.342-350
    • /
    • 2022
  • In this paper, actual drone noise samples are collected for speech processing in disaster environments to build noise-corrupted speech database, and speech enhancement performance is evaluated by applying spectrum subtraction and mask-based speech enhancement techniques. To improve the performance of VoiceFilter (VF), an existing deep neural network-based speech enhancement model, we apply the Self-Attention operation and use the estimated noise information as input to the Attention model. Compared to existing VF model techniques, the experimental results show 3.77%, 1.66% and 0.32% improvements for Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligence (STOI), respectively. When trained with a 75% mix of speech data with drone sounds collected from the Internet, the relative performance drop rates for SDR, PESQ, and STOI are 3.18%, 2.79% and 0.96%, respectively, compared to using only actual drone noise. This confirms that data similar to real data can be collected and effectively used for model training for speech enhancement in environments where real data is difficult to obtain.

Blind Image Quality Assessment on Gaussian Blur Images

  • Wang, Liping;Wang, Chengyou;Zhou, Xiao
    • Journal of Information Processing Systems
    • /
    • v.13 no.3
    • /
    • pp.448-463
    • /
    • 2017
  • Multimedia is a ubiquitous and indispensable part of our daily life and learning such as audio, image, and video. Objective and subjective quality evaluations play an important role in various multimedia applications. Blind image quality assessment (BIQA) is used to indicate the perceptual quality of a distorted image, while its reference image is not considered and used. Blur is one of the common image distortions. In this paper, we propose a novel BIQA index for Gaussian blur distortion based on the fact that images with different blur degree will have different changes through the same blur. We describe this discrimination from three aspects: color, edge, and structure. For color, we adopt color histogram; for edge, we use edge intensity map, and saliency map is used as the weighting function to be consistent with human visual system (HVS); for structure, we use structure tensor and structural similarity (SSIM) index. Numerous experiments based on four benchmark databases show that our proposed index is highly consistent with the subjective quality assessment.

Relative localization errors: The effect of reference location on the errors (상대적인 위치지각의 왜곡: 참조자극의 위치가 왜곡에 미치는 영향)

  • Li, Hyung-Chul
    • Korean Journal of Cognitive Science
    • /
    • v.15 no.3
    • /
    • pp.15-24
    • /
    • 2004
  • The perceived position of a flashing target object is generally biased towards the direction of eye movement when there is no reference around the target. Current research examined the localization accuracy of a flashing target relative to a static reference. The perceived location of the target relative to the reference was distorted and the pattern of perceptual distortion systematically depended on the position of the reference relative to the target. This kind of result was consistently observed regardless of the distance between the reference and the target and direction of pursuit eye movement. We have discussed how these results could he explained by the theories previously suggested to explain the localization of objects.

  • PDF

Enhanced Spectral Hole Substitution for Improving Speech Quality in Low Bit-Rate Audio Coding

  • Lee, Chang-Heon;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3E
    • /
    • pp.131-139
    • /
    • 2010
  • This paper proposes a novel spectral hole substitution technique for low bit-rate audio coding. The spectral holes frequently occurring in relatively weak energy bands due to zero bit quantization result in severe quality degradation, especially for harmonic signals such as speech vowels. The enhanced aacPlus (EAAC) audio codec artificially adjusts the minimum signal-to-mask ratio (SMR) to reduce the number of spectral holes, but it still produces noisy sound. The proposed method selectively predicts the spectral shapes of hole bands using either intra-band correlation, i.e. harmonically related coefficients nearby or inter-band correlation, i.e. previous frames. For the bands that have low prediction gain, only the energy term is quantized and spectral shapes are replaced by pseudo random values in the decoding stage. To minimize perceptual distortion caused by spectral mismatching, the criterion of the just noticeable level difference (JNLD) and spectral similarity between original and predicted shapes are adopted for quantizing the energy term. Simulation results show that the proposed method implemented into the EAAC baseline coder significantly improves speech quality at low bit-rates while keeping equivalent quality for mixed and music contents.

Reliability-Based Deblocking Filter for Wyner-Ziv Video Coding

  • Dinh, Khanh Quoc;Shim, Hiuk Jae;Jeon, Byeungwoo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.2
    • /
    • pp.129-142
    • /
    • 2016
  • In Wyner-Ziv coding, video signals are reconstructed by correcting side information generated by block-based motion estimation/compensation at the decoder. The correction is not always accurate due to the limited number of parity bits and early stopping of low-density parity check accumulate (LDPCA) decoding in distributed video coding, or due to the limited number of measurements in distributed compressive video sensing. The blocking artifacts caused by block-based processing are usually conspicuous in smooth areas and degrade the perceptual quality of the reconstructed video. Conventional deblocking filters try to remove the artifacts by treating both sides of the block boundary equally; however, coding errors generated by block-based processing are not necessarily the same on both sides of the block boundaries. Such a block-wise difference is exploited in this paper to improve deblocking for Wyner-Ziv frameworks by designing a filter where the deblocking strength at each block can be non-identical, depending on the reliability of the reconstructed pixels. Test results show that the proposed filter not only improves subjective quality by reducing the coding artifacts considerably, but also gains rate distortion performance.

PESQ-Based Selection of Efficient Partial Encryption Set for Compressed Speech

  • Yang, Hae-Yong;Lee, Kyung-Hoon;Lee, Sang-Han;Ko, Sung-Jea
    • ETRI Journal
    • /
    • v.31 no.4
    • /
    • pp.408-418
    • /
    • 2009
  • Adopting an encryption function in voice over Wi-Fi service incurs problems such as additional power consumption and degradation of communication quality. To overcome these problems, a partial encryption (PE) algorithm for compressed speech was recently introduced. However, from the security point of view, the partial encryption sets (PESs) of the conventional PE algorithm still have much room for improvement. This paper proposes a new selection method for finding a smaller PES while maintaining the security level of encrypted speech. The proposed PES selection method employs the perceptual evaluation of the speech quality (PESQ) algorithm to objectively measure the distortion of speech. The proposed method is applied to the ITU-T G.729 speech codec, and content protection capability is verified by a range of tests and a reconstruction attack. The experimental results show that encrypting only 20% of the compressed bitstream is sufficient to effectively hide the entire content of speech.