Search | Korea Science

On-Line Audio Genre Classification using Spectrogram and Deep Neural Network (스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류)

Yun, Ho-Won;Shin, Seong-Hyeon;Jang, Woo-Jin;Park, Hochong
- Journal of Broadcast Engineering
- /
- v.21 no.6
- /
- pp.977-985
- /
- 2016
In this paper, we propose a new method for on-line genre classification using spectrogram and deep neural network. For on-line processing, the proposed method inputs an audio signal for a time period of 1sec and classifies its genre among 3 genres of speech, music, and effect. In order to provide the generality of processing, it uses the spectrogram as a feature vector, instead of MFCC which has been widely used for audio analysis. We measure the performance of genre classification using real TV audio signals, and confirm that the proposed method has better performance than the conventional method for all genres. In particular, it decreases the rate of classification error between music and effect, which often occurs in the conventional method.
https://doi.org/10.5909/JBE.2016.21.6.977 인용 PDF KSCI KPUBS

An Efficient Selective Method for Audio Watermarking Against De-synchronization Attacks

Mushgil, Baydaa Mohammad;Adnan, Wan Azizun Wan;Al-hadad, Syed Abdul-Rahman;Ahmad, Sharifah Mumtazah Syed
- Journal of Electrical Engineering and Technology
- /
- v.13 no.1
- /
- pp.476-484
- /
- 2018
The high capacity audio watermarking algorithms are facing a main challenge in satisfying the robustness against attacks especially on de-synchronization attacks. In this paper, a robust and a high capacity algorithm is proposed using segment selection, Stationary Wavelet Transform (SWT) and the Quantization Index Modulation (QIM) techniques along with new synchronization mechanism. The proposed algorithm provides enhanced trade-off between robustness, imperceptibility, and capacity. The achieved watermarking improves the reliability of the available watermarking methods and shows high robustness towards signal processing (manipulating) attacks especially the de-synchronization attacks such as cropping, jittering, and zero inserting attacks. For imperceptibility evaluation, high signal to noise ratio values of above 22 dB has been achieved. Also subjective test with volunteer listeners shows that the proposed method has high imperceptibility with Subjective Difference Grade (SDG) of 4.76. Meanwhile, high rational capacity up to 176.4 bps is also achieved.
https://doi.org/10.5370/JEET.2018.13.1.476 인용 PDF KSCI HTML

An Accidental Position Detection Algorithm for High-Pressure Equipment using Microphone Array (Microphone Array를 이용한 고압설비의 고장위치인식 알고리즘)

Kim, Deuk-Kwon;Han, Sun-Sin;Ha, Hyun-Uk;Lee, Jang-Myung
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.57 no.12
- /
- pp.2300-2307
- /
- 2008
This study receives the noise transmitted in a constant audio frequency range through a microphone array in which the noise(like grease in a pan) occurs on the power supply line due to the troublesome partial discharge(arc). Then by going through a series of signal processing of removing noise, this study measures the distance and direction up to the noise caused by the troublesome partial discharge(arc) and monitors the result by displaying in the analog and digital method. After these, it determines the state of each size and judges the distance and direction of problematic part. When the signal sound transmitted by the signal source of bad insulator is received on each microphone, the signal comes only in the frequency range of 20 kHz by passing through the circuit of amplification and 6th low pass filter. Then, this signal is entered in a digital value of digital signal processing(TMS320F2812) through the 16-bit A/D conversion. By doing so, the sound distance, direction and coordinate of bad insulator can be detected by realizing the correlation method of detecting the arriving time difference occurring on each microphone and the algorithm of detecting maximum time difference.
PDF KSCI

Ultra-low-power DSP for Audio Signal Processing (오디오 신호 처리를 위한 초저전력 DSP 프로세서)

Kwon, Kiseok;Ahn, Minwook;Jo, Seokhwan;Lee, Yeonbok;Lee, Seungwon;Park, Young-Hwan;Kim, Sukjin;Kim, Do-Hyung;Kim, Jaehyun
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2014.06a
- /
- pp.157-159
- /
- 2014
In this paper, we introduce SlimSRP, an ultra-low-power digital signal processor (DSP) solution for mobile audio and voice applications. So far, application processors (APs) have taken charge of all the tasks in mobile devices. However, they have suffered from short battery life problems to deal with complex usage scenarios, such as always-on voice trigger with continuous audio playback. From extensive analysis of audio and voice application characteristics, SlimSRP is designed to relive the performance and power burden of APs. It employs three-issue VLIW architecture, and the major low-power and high-performance techniques include: (1) an optimized register-file architecture friendly for constants generation, (2) a powerful instruction set to reduce the number of register file accesses and (3) a unique instruction compression scheme that contributes to saved memory size and reduced cache miss. An implementation of SlimSRP runs at up to 200MHz and the logic occupies 95K NAND2 gates in Samsung 28LPP process. The experimental results demonstrate that a MP3 decoder application with a 128kbps 44.1kHz input can run at 5.1MHz and the logic consumes only 22uW/MHz.
PDF

Improving Fidelity of Synthesized Voices Generated by Using GANs (GAN으로 합성한 음성의 충실도 향상)

Back, Moon-Ki;Yoon, Seung-Won;Lee, Sang-Baek;Lee, Kyu-Chul
- KIPS Transactions on Software and Data Engineering
- /
- v.10 no.1
- /
- pp.9-18
- /
- 2021
Although Generative Adversarial Networks (GANs) have gained great popularity in computer vision and related fields, generating audio signals independently has yet to be presented. Unlike images, an audio signal is a sampled signal consisting of discrete samples, so it is not easy to learn the signals using CNN architectures, which is widely used in image generation tasks. In order to overcome this difficulty, GAN researchers proposed a strategy of applying time-frequency representations of audio to existing image-generating GANs. Following this strategy, we propose an improved method for increasing the fidelity of synthesized audio signals generated by using GANs. Our method is demonstrated on a public speech dataset, and evaluated by Fréchet Inception Distance (FID). When employing our method, the FID showed 10.504, but 11.973 as for the existing state of the art method (lower FID indicates better fidelity).
https://doi.org/10.3745/KTSDE.2021.10.1.9 인용 PDF KSCI

A Design of Multi format HD LCD Monitor for Broadcasting (방송용 Multi format HD LCD Monitor의 설계)

Han, Sung-Il;Jun, Eung-Sup;Noh, Hyung-Il
- Journal of the Korea Society of Computer and Information
- /
- v.15 no.3
- /
- pp.37-43
- /
- 2010
In this study, the multi format 8.4" LCD monitor for broadcasting has been proposed by the simple circuit design and the efficient processing technique. The small sizing technique, the mixed-type video signal processing technique, the emerging skill of multi-format HD signal to one port, the dividing technique between the analog and digital signal, the embedded audio signal developing technique are the key researches of this study. The proposed multi format 8.4" LCD monitor has been focused on the broadcasting field and has been supposed to have the efficiency at that field.
https://doi.org/10.9708/jksci.2010.15.3.037 인용 PDF KSCI

Implementation of the automatic switching device for the voice communications between heterogeneous devices (이종 기기 간 음성통신을 위한 자동전환장치의 구현)

Lew, Chang-Guk;Lee, Bae-Ho
- The Journal of the Korea institute of electronic communication sciences
- /
- v.10 no.12
- /
- pp.1321-1328
- /
- 2015
A radio is a half-duplex voice communication method using the PTT(: Push To Talk), occupy a single line calls during transmission. As an interface between the telephone and the radio, UHF and VHF, for voice communication between the different heterogeneous devices, A device automatically switches between the two devices is required. Therefore, in accordance with the performance of the voice switching apparatus for detecting a voice to be transmitted from an input signal, loss of the audio signal to be transmitted is subjected to Significant influence. Conventional method has the problem responding to noise by setting the level through simple means of amplitude of input signal, in other words, the energy level of the input signal. This paper, by using the audio signal processing techniques, this discriminated what the voice is among the input signal and substantiated a device for the automatic voice transmission between heterogeneous devices. With this proposal, I was confirmed of improvement of performance in the automatic voice switching device, could perform loss-less transmission of voice between heterogeneous devices.
https://doi.org/10.13067/JKIECS.2015.10.12.1321 인용 PDF KSCI

A Study on the Audio Compensation System (음향 보상 시스템에 관한 연구)

Jeoung, Byung-Chul;Won, Chung-Sang
- The Journal of the Acoustical Society of Korea
- /
- v.32 no.6
- /
- pp.509-517
- /
- 2013
In this paper, we researched a method that makes a good acoustic-speech system using a digital signal processing technique with dynamic microphone as a transducer. Good acoustic-speech system should deliver the original sound input to electric signal without distortion. By measuring the frequency response of the microphone, adjustment factors are obtained by comparing measured data and standard frequency response of microphone for each frequency band. The final sound levels are obtained using the developed adjustment factors of frequency responses from the microphone and speaker to match the original sound levels using the digital signal processing technique. Then, we minimize the changes in the frequency response and level due to the variation of the distance from source to microphone, where the frequency responses were measured according to the distance changes.
https://doi.org/10.7776/ASK.2013.32.6.509 인용 PDF KSCI

Optimization of Multi-time Scale Loss Function Suitable for DNN-based Audio Coder (심층신경망 기반 오디오 부호화기를 위한 Multi-time Scale 손실함수의 최적화)

Shin, Seung-Min;Byun, Joon;Park, Young-Cheol;Beack, Seung-kwon;Sung, Jong-mo
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2022.06a
- /
- pp.1315-1317
- /
- 2022
최근, 심층신경망 기반 오디오 부호화기가 활발히 연구되고 있다. 심층신경망 기반 오디오 부호화기는 기존의 전통적인 오디오 부호화기보다 구조적으로 간단하지만, 네트워크의 복잡도를 증가시키지 않고 인지적 성능향상을 기대하는 것은 어렵다. 이 문제를 해결하기 위하여 인간의 청각적 특성을 활용한 심리음향모델 기반 손실함수를 사용한 기법들이 소개되었다. 심리음향 모델 기반 손실함수를 사용한 오디오 부호화기는 양자화 잡음을 잘 제어하였지만, 여전히 지각적인 향상이 필요하다. 본 논문에서는 심층신경망 기반 오디오 부호화기를 위한 Multi-time Scale 손실함수의 지역 손실함수 윈도우 크기의 최적화 제안한다. Multi-time Scale 손실함수의 지역 손실함수 계산을 위한 윈도우 크기를 조절하며, 이를 통하여 오디오 부호화에 적합한 윈도우 사이즈를 결정한다. 실험을 통해 얻은 최적의 Multi-time Scale 손실함수를 사용하여 네트워크를 훈련하였고, 주관적 평가를 통해 기존의 심리음향모델 기반 손실함수보다 좋은 음성 품질을 보여주는 것을 확인하였다.
PDF

Platform Library Development for Real-time Audio Communications in the Internet (인터넷을 위한 음성 통신 플랫폼 라이브러리 개발)

Seo, Dong-Won;Kim, Dong-Hyun;Lee, Myung-Jin
- Proceedings of the Korea Institute of Convergence Signal Processing
- /
- 2005.11a
- /
- pp.260-263
- /
- 2005
본 논문에서는 차세대 유무선 광대역 통합망에서 실시간 멀티미디어의 응용 계층 품질을 평가하기 위한 실시간 음성 전송 플랫폼 라이브러리를 개발하였다. 개발된 라이브러리는 실시간 음성 통신시 사용자가 요구할 수 있는 다양한 품질을 제공하기 위해 전송률과 압축 방식에 따른 음성 코덱들을 포함하고 있다. 본 라이브러리는 다양한 환경에도 동일 입출력, 다중스레드, 인터넷 통신 등이 가능한 PWLIB를 기반으로 한다. 음성 데이터는 RTP/UDP/IP를 이용하여 패킷화되고 RTCP를 이용하여 전송 품질이 모니터링된다. 개발된 음성 통신 라이브러리를 이용하여 간단한 음성 통신 시스템을 구현하였으며, 음성 코덱별로 네트워크 상에서 송수신 테스트를 진행하였다. 본 라이브러리는 비디오 코덱 및 시그널링 및 네트웍 자원 예약 프로토콜과 결합되어 멀티미디어 통신 단말 개발에 사용 될 수 있다.
PDF

Search Result 156, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)