• Title/Summary/Keyword: audio signal processing

Search Result 157, Processing Time 0.03 seconds

An Optimization on the Psychoacoustic Model for MPEG-2 AAC Encoder (MPEG-2 AAC Encoder의 심리음향 모델 최적화)

  • Park, Jong-Tae;Moon, Kyu-Sung;Rhee, Kang-Hyeon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.38 no.2
    • /
    • pp.33-41
    • /
    • 2001
  • Currently, the compression is one of the most important technology in multimedia society. Audio files arc rapidly propagated throughout internet Among them, the most famous one is MP-3(MPEC-1 Laver3) which can obtain CD tone from 128Kbps, but tone quality is abruptly down below 64Kbps. MPEC-II AAC(Advanccd Audio Coding) is not compatible with MPEG 1, but it has high compression of 1.4 times than MP 3, has max. 7.1 and 96KHz sampling rate. In this paper, we propose an algorithm that decreased the capacity of AAC encoding computation but increased the processing speed by optimizing psychoacoustic model which has enormous amount of computation in MPEG 2 AAC encoder. The optimized psychoacoustic model algorithm was implemented by C++ language. The experiment shows that the psychoacoustic model carries out FFT(Fast Fourier Transform) computation of 3048 point with 44.1 KHz sampling rate for SMR(Signal to Masking Ratio), and each entropy value is inputted to the subband filters for the control of encoder block. The proposed psychoacoustic model is operated with high speed because of optimization of unpredictable value. Also, when we transform unpredictable value into a tonality index, the speed of operation process is increased by a tonality index optimized in high frequency range.

  • PDF

Classification of General Sound with Non-negativity Constraints (비음수 제약을 통한 일반 소리 분류)

  • 조용춘;최승진;방승양
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.10
    • /
    • pp.1412-1417
    • /
    • 2004
  • Sparse coding or independent component analysis (ICA) which is a holistic representation, was successfully applied to elucidate early auditor${\gamma}$ processing and to the task of sound classification. In contrast, parts-based representation is an alternative way o) understanding object recognition in brain. In this thesis we employ the non-negative matrix factorization (NMF) which learns parts-based representation in the task of sound classification. Methods of feature extraction from the spectro-temporal sounds using the NMF in the absence or presence of noise, are explained. Experimental results show that NMF-based features improve the performance of sound classification over ICA-based features.

Implementation of Low Complexity FFT, ADC and DAC Blocks of an OFDM Transmitter Receiver Using Verilog

  • Joshi, Alok;Gupta, Dewansh Aditya;Jaipuriyar, Pravriti
    • Journal of Information Processing Systems
    • /
    • v.15 no.3
    • /
    • pp.670-681
    • /
    • 2019
  • Orthogonal frequency division multiplexing (OFDM) is a system which is used to encode data using multiple carriers instead of the traditional single carrier system. This method improves the spectral efficiency (optimum use of bandwidth). It also lessens the effect of fading and intersymbol interference (ISI). In 1995, digital audio broadcast (DAB) adopted OFDM as the first standard using OFDM. Later in 1997, it was adopted for digital video broadcast (DVB). Currently, it has been adopted for WiMAX and LTE standards. In this project, a Verilog design is employed to implement an OFDM transmitter (DAC block) and receiver (FFT and ADC block). Generally, OFDM uses FFT and IFFT for modulation and demodulation. In this paper, 16-point FFT decimation-in-frequency (DIF) with the radix-2 algorithm and direct summation method have been analyzed. ADC and DAC in OFDM are used for conversion of the signal from analog to digital or vice-versa has also been analyzed. All the designs are simulated using Verilog on ModelSim simulator. The result generated from the FFT block after Verilog simulation has also been verified with MATLAB.

Infant cry recognition using a deep transfer learning method (딥 트랜스퍼 러닝 기반의 아기 울음소리 식별)

  • Bo, Zhao;Lee, Jonguk;Atif, Othmane;Park, Daihee;Chung, Yongwha
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.971-974
    • /
    • 2020
  • Infants express their physical and emotional needs to the outside world mainly through crying. However, most of parents find it challenging to understand the reason behind their babies' cries. Failure to correctly understand the cause of a baby' cry and take appropriate actions can affect the cognitive and motor development of newborns undergoing rapid brain development. In this paper, we propose an infant cry recognition system based on deep transfer learning to help parents identify crying babies' needs the same way a specialist would. The proposed system works by transforming the waveform of the cry signal into log-mel spectrogram, then uses the VGGish model pre-trained on AudioSet to extract a 128-dimensional feature vector from the spectrogram. Finally, a softmax function is used to classify the extracted feature vector and recognize the corresponding type of cry. The experimental results show that our method achieves a good performance exceeding 0.96 in precision and recall, and f1-score.

Real data-based active sonar signal synthesis method (실데이터 기반 능동 소나 신호 합성 방법론)

  • Yunsu Kim;Juho Kim;Jongwon Seok;Jungpyo Hong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.1
    • /
    • pp.9-18
    • /
    • 2024
  • The importance of active sonar systems is emerging due to the quietness of underwater targets and the increase in ambient noise due to the increase in maritime traffic. However, the low signal-to-noise ratio of the echo signal due to multipath propagation of the signal, various clutter, ambient noise and reverberation makes it difficult to identify underwater targets using active sonar. Attempts have been made to apply data-based methods such as machine learning or deep learning to improve the performance of underwater target recognition systems, but it is difficult to collect enough data for training due to the nature of sonar datasets. Methods based on mathematical modeling have been mainly used to compensate for insufficient active sonar data. However, methodologies based on mathematical modeling have limitations in accurately simulating complex underwater phenomena. Therefore, in this paper, we propose a sonar signal synthesis method based on a deep neural network. In order to apply the neural network model to the field of sonar signal synthesis, the proposed method appropriately corrects the attention-based encoder and decoder to the sonar signal, which is the main module of the Tacotron model mainly used in the field of speech synthesis. It is possible to synthesize a signal more similar to the actual signal by training the proposed model using the dataset collected by arranging a simulated target in an actual marine environment. In order to verify the performance of the proposed method, Perceptual evaluation of audio quality test was conducted and within score difference -2.3 was shown compared to actual signal in a total of four different environments. These results prove that the active sonar signal generated by the proposed method approximates the actual signal.

A Scheduler for Multimedia Data and Evaluation Method (멀티미디어 데이터를 위한 스케쥴러 및 평가법 설계)

  • 유명련;김현철
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.2
    • /
    • pp.1-7
    • /
    • 2002
  • Since multimedia data such as video and audio data are displayed within a certain time constraint, their computation and manipulation should be handled under limited condition. Traditional real-time scheduling algorithms could not be directly applicable, because they are not suitable for multimedia scheduling applications which support many clients at the same time. Rate Regulating Proportional Share Scheduling Algorithm is a scheduling algorithm considered the time constraint of the multimedia data. This scheduling algorithm uses a rate regulator which prevents tasks from receiving more resource than its share in a given period. But this algorithm loses fairness, and does not show graceful degradation of performance under overloaded situation. This paper proposes a new modified algorithm, namely Modified Proportional Share Scheduling Algorithm considering the characteristics of multimedia data such as its continuity and time dependency. Proposed scheduling algorithm shows graceful degradation of performance in overloaded situation and the reduction in the number of context switching. Furthermore, a new evaluation method is proposed which can evaluate the flexibility of scheduling algorithm.

  • PDF

Design of a 96-dB SNR and Low-Pass Digital Oversampling Noise-Shaping Coder for Low Supply Voltage (저 전압용 96-dB 신호대잡음비를 갖는 저역통과 디지털 과표본화 잡음변형기의 설계)

  • 김대정;손영철
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.5
    • /
    • pp.91-97
    • /
    • 2004
  • A digital over-sampling noise-shaping coder to achieve the processing accuracy for the audio signal bandwidth is designed. In order to implement an optimized design of the noise-shaping coder as a form of U (intellectual property), circuit design techniques that optimize the multiplication and the ROM architectures are proposed with emphasis on the low-voltage operation under 2.0 V and the minimization of the hardware resources. In the design and verification methodology, the overall architectures and the internal bit width have been determined through behavioral simulations. The overall performances including timing margin have been estimated through transistor-level simulations. Furthermore, the test results of the implemented chip using a 0.35-${\mu}{\textrm}{m}$ standard CMOS process proposed the validity of the proposed circuits and the design methodology.

The Vocabulary Recognition Optimize using Acoustic and Lexical Search (음향학적 및 언어적 탐색을 이용한 어휘 인식 최적화)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.4
    • /
    • pp.496-503
    • /
    • 2010
  • Speech recognition system is developed of standalone, In case of a mobile terminal using that low recognition rate represent because of limitation of memory size and audio compression. This study suggest vocabulary recognition highest performance improvement system for separate acoustic search and lexical search. Acoustic search is carry out in mobile terminal, lexical search is carry out in server processing system. feature vector of speech signal extract using GMM a phoneme execution, recognition a phoneme list transmission server using Lexical Tree Search algorithm lexical search recognition execution. System performance as a result of represent vocabulary dependence recognition rate of 98.01%, vocabulary independence recognition rate of 97.71%, represent recognition speed of 1.58 second.

Speaker-Dependent Emotion Recognition For Audio Document Indexing

  • Hung LE Xuan;QUENOT Georges;CASTELLI Eric
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.92-96
    • /
    • 2004
  • The researches of the emotions are currently great interest in speech processing as well as in human-machine interaction domain. In the recent years, more and more of researches relating to emotion synthesis or emotion recognition are developed for the different purposes. Each approach uses its methods and its various parameters measured on the speech signal. In this paper, we proposed using a short-time parameter: MFCC coefficients (Mel­Frequency Cepstrum Coefficients) and a simple but efficient classifying method: Vector Quantification (VQ) for speaker-dependent emotion recognition. Many other features: energy, pitch, zero crossing, phonetic rate, LPC... and their derivatives are also tested and combined with MFCC coefficients in order to find the best combination. The other models: GMM and HMM (Discrete and Continuous Hidden Markov Model) are studied as well in the hope that the usage of continuous distribution and the temporal behaviour of this set of features will improve the quality of emotion recognition. The maximum accuracy recognizing five different emotions exceeds $88\%$ by using only MFCC coefficients with VQ model. This is a simple but efficient approach, the result is even much better than those obtained with the same database in human evaluation by listening and judging without returning permission nor comparison between sentences [8]; And this result is positively comparable with the other approaches.

  • PDF

Remotely controlled Interactive Magnetic Resonance Imaging in Network Environment (Network을 이용한 원격 핵자기 공명 영상)

  • Park, J.I.;Kim, C.Y.;Park, D.J.;Ryu, W.S.;Ahn, C.B.
    • Proceedings of the KIEE Conference
    • /
    • 1996.07b
    • /
    • pp.1383-1385
    • /
    • 1996
  • A network based interactive magnetic resonance imaging (MRI) system has been developed using the World Wide Web. For this purpose, an HTTP server is developed on the host computer of the MRI system. Capabilities of video and audio conferencing are included for monitoring experiment. Using the developed system. MRI imaging has been successfully carried out at the Signal Processing Lab in the Kwangwoon University with the remote MRI system located at the Medical Image Research Center at the KAIST in Daejon.

  • PDF