Search | Korea Science

Music Genre Classification using Spikegram and Deep Neural Network (스파이크그램과 심층 신경망을 이용한 음악 장르 분류)

Jang, Woo-Jin;Yun, Ho-Won;Shin, Seong-Hyeon;Cho, Hyo-Jin;Jang, Won;Park, Hochong
- Journal of Broadcast Engineering
- /
- v.22 no.6
- /
- pp.693-701
- /
- 2017
In this paper, we propose a new method for music genre classification using spikegram and deep neural network. The human auditory system encodes the input sound in the time and frequency domain in order to maximize the amount of sound information delivered to the brain using minimum energy and resource. Spikegram is a method of analyzing waveform based on the encoding function of auditory system. In the proposed method, we analyze the signal using spikegram and extract a feature vector composed of key information for the genre classification, which is to be used as the input to the neural network. We measure the performance of music genre classification using the GTZAN dataset consisting of 10 music genres, and confirm that the proposed method provides good performance using a low-dimensional feature vector, compared to the current state-of-the-art methods.
https://doi.org/10.5909/JBE.2017.22.6.693 인용 PDF KSCI KPUBS

An Image Watermarking Method for Embedding Copyrighter's Audio Signal (저작권자의 음성 삽입을 위한 영상 워터마킹 방법)

Choi Jae-Seung;Kim Chung-Hwa;Koh Sung-Shik
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.4
- /
- pp.202-209
- /
- 2005
The rapid development of digital media and communication network urgently brings about the need of data certification technology to protect IPR (Intellectual property right). This paper proposed a new watermarking method for embedding owner's audio signal. Because this method uses an audio signal as a watermark to be embedded, it is very useful to claim the ownership aurally. And it has the advantage of restoring audio signal modified and especially removed by image removing attacks by applying our LBX(Linear Bit-expansion) interleaving. Three basic stages of our watermarking include: 1) Encode . analogue owner's audio signal by PCM and create new digital audio watermark, 2) Interleave an audio watermark by our LBX; and 3) Embed the interleaved audio watermark in the low frequency band on DTn (Discrete Haar Wavelet Transform) of image. The experimental results prove that this method is resistant to lossy JPEG compression as standard image compression and especially to cropping and rotation which remove a part of Image.
PDF KSCI

A Study on Digital Image Watermarking for Embedding Audio Logo (음성로고 삽입을 위한 디지털 영상 워터마킹에 관한 연구)

Cho, Gang-Seok;Koh, Sung-Shik
- Journal of the Institute of Electronics Engineers of Korea TE
- /
- v.39 no.3
- /
- pp.21-27
- /
- 2002
The digital watermarking methods have been proposed as a solution for solving the illegal copying and proof of ownership problems in the context of multimedia data. But it is still difficult to have been overcame the problem of the protection of property to multimedia data, such as digital images, digital video, and digital audio. This paper describes a watermarking algorithm that embeds non-linearly audio logo watermark data which is converted from audio signal of the ownership in the components of pixel intensities in an original image and that insists of ownership by hearing the audio signal transformed from the extracted audio logo through the speaker. Experimental results show that our algorithm using audio logo proposed in this paper is robust against attacks such as particularly lossy JPEG image compression.
PDF KSCI

Adaptive Enhancement Algorithm of Perceptual Filter Using Variable Threshold (가변 임계값을 이용한 지각 필터의 적응적인 음질 개선 알고리즘)

차형태
- The Journal of the Acoustical Society of Korea
- /
- v.23 no.6
- /
- pp.446-453
- /
- 2004
In this paper, a new adaptive perceptual filter using variable threshold to enhance audio signals degraded by additively nonstationary noise is proposed. The adaptive perceptual filter updates variable threshold each time according to the power of signal and the effect of noise variation. So the noisy audio signal is enhanced by the method which controls a residual noise effectively. The proposed algorithm uses the perceptual filter which transforms a time domain signal into frequency domain and calculates an intensity energy and an excitation energy in bark domain. In this method. the stage updated the response of filter is decided by threshold. The proposed algorithm using vairable threshold effectively controls a residual noise using the energy difference of audio signals degraded by the additive nonstationary noise. The proposed method is tested with the noisy audio signals degraded by nonstationary noise at various signal -to-noise ratios (SNR). We carry out NMR and MOS test when the input SNR is 15dB. 20dB. 25dB and 30dB. An approximate improvement of 17.4dB. 15.3dB, 12.8dB. 9.8dB in NMR and enhancement of 2.9, 2.5, 2.3, 1.7 in MOS test is achieved with the input signals. respectively.
PDF KSCI

Classification of General Sound with Non-negativity Constraints (비음수 제약을 통한 일반 소리 분류)

조용춘;최승진;방승양
- Journal of KIISE:Software and Applications
- /
- v.31 no.10
- /
- pp.1412-1417
- /
- 2004
Sparse coding or independent component analysis (ICA) which is a holistic representation, was successfully applied to elucidate early auditor${\gamma}$ processing and to the task of sound classification. In contrast, parts-based representation is an alternative way o) understanding object recognition in brain. In this thesis we employ the non-negative matrix factorization (NMF) which learns parts-based representation in the task of sound classification. Methods of feature extraction from the spectro-temporal sounds using the NMF in the absence or presence of noise, are explained. Experimental results show that NMF-based features improve the performance of sound classification over ICA-based features.
PDF KSCI

A Study on the Measure to Maximize the Effects of Functional Games in Relation to the Changes in Visual and Auditory Stimulations (시각 및 청각 자극 변화에 따른 기능성 게임의 효능 극대화 방안 연구)

Shin, Jeong-Hoon
- Journal of the Institute of Convergence Signal Processing
- /
- v.14 no.3
- /
- pp.147-153
- /
- 2013
Functional game, which is the combination of play and learning and a futuristic tool, can minimize the dysfunction and maximize the proper functions, and furthermore, has taken root as a new alternative that can change the game industry and game culture. Recently, the focus of game and education markets is shifting to the development of more advanced learning contents, rather than emphasizing the self-control and motivation of users. Along with that, the game market has excluded the socially dysfunctional elements, such as the addiction and learning disabilities, and has witnessed a diversification into the human-friendly entertainment business that emphasizes the mental and physical health and pursues scientific educational effects. In addition, functional games are expanding its reach from the professional sectors - such as medical aide/medical learning, military simulation, health, auxiliary tools, special education and learning tools - to the realm of routine education, mental health, etc., and has seen a steady growth. However, most functional games, which are being currently planned and developed to cope with the special characteristics of the market, have not undergone accurate scientific assessment of their functions and have not proven their effectiveness. An overwhelming proportion of the functional games are being developed based on the intuition and experience of game developers. Moreover, the type of games, which involve the repetition of simple tasks or take the form of simple puzzles, cannot effectively combine the practically interesting factors and the learning effects. Most games incorporate unscientific methods leading to the vague anticipation of improvement in functions, rather than the assessment of human functions. In this paper, a study was conducted to present the measures that could maximize the effects of functional games in relation to the changes in the visual and auditory stimulations in order to maximize the effects of functional games, i,e., the immersion and concentration. To compare the degree of effects arising from the visual stimulation, the functional game contents made in the form of 2D and 3D were utilized. In addition. ultra sound and 3-dimensional functional game contents were utilized to compare the degree of effects resulting from the changes in the auditory stimulation. The brainwave of the users were measured while conducting the experiments related to the response to the changes in visual and auditory stimulations in 3 steps, and the results of the analysis were compared.
PDF KSCI

Analysis of source localization of P300 in college students with schizotypal traits (조현형 인격 성향을 가진 대학생의 P300 국소화 분석)

Jang, Kyoung-Mi;Kim, Bo-Mi;Na, Eun-Chan;An, Eun-Ji;Kim, Myung-Sun
- Korean Journal of Cognitive Science
- /
- v.28 no.1
- /
- pp.1-26
- /
- 2017
This study investigated the cortical generators of P300 in college students with schizotypal traits by using an auditory oddball paradigm, event-related potentials (ERPs) and standardized low resolution brain electromagnetic tomography (sLORETA) model. We also investigated the relationship between the current density of P300 and the clinical symptoms of schizophrenia. Based on the scores of Schizotypal Personality Questionnaire(SPQ), schizotypal trait (n=37) and control (n=42) groups were selected. For the measurement of P300, an auditory oddball paradigm, in which frequent standard tones (1000Hz) and rare target tones (1500Hz) were presented randomly, was used. Participants were required to count the number of the target tones during the task and report this at the end of the experiment. The two groups did not differ significantly in the accuracy of the oddball task. The schizotypal trait group showed significantly smaller P300 amplitudes than control group. In terms of source localization, both groups showed the P300 current density over bilateral frontal, parietal, temporal and occipital lobes. However, the schizotypal trait group showed significantly reduced activations in the left superior temporal gyrus and the right middle temporal gyrus, but increased activations in both left inferior frontal gyrus and right superior frontal gyrus compared to the control group. Furthermore, a negative correlation between the current density of the right superior frontal gyrus and SPQ disorganization score was found in the schizotypal trait group. These findings indicate that the individuals with schizotypal traits have dysfunctions of frontal and temporal areas, which are known to be the source of P300, as observed in patients with schizophrenia. In addition, the present results indicate that the disorganization score, rather than total score, of the SPQ is useful in predicting the risk of future schizophrenia.
https://doi.org/10.19066/cogsci.2017.28.1.001 인용 PDF

A Novel Speech Enhancement Based on Speech/Noise-dominant Decision in Time-frequency Domain (시간-주파수 영역에서 음성/잡음 우세 결정에 의한 새로운 잡음처리)

윤석현;유창동
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.3
- /
- pp.48-55
- /
- 2001
A novel method to reduce additive non-stationary noise is proposed. The method requires neither the information about noise nor the estimate of the noise statistics from any pause regions. The enhancement is performed on a band-by-band basis for each time frame. Based on both the decision on whether a particular band in a frame is speech or noise dominant and the masking property of the human auditory system, an appropriate amount of noise is reduced using spectral subtraction. The proposed method was tested on various noisy conditions (car noise, Fl6 noise, white Gaussian noise, pink noise, tank noise and babble noise) and on the basis of comparing segmental SNR with spectral subtraction method and visually inspecting the enhanced spectrograms and listening to the enhanced speech, the method was able to effectively reduce various noise while minimizing distortion to speech.
PDF

Audio Stress Effect on Visual ERP Stimulated by 3-dimensional Environment (청각 스트레스가 3차원 시자극 유발전위에 미치는 영향 분석)

박찬희;홍철운;김남균
- Journal of Biomedical Engineering Research
- /
- v.23 no.4
- /
- pp.301-308
- /
- 2002
This research was performed to analyze quantitatively how spiritual stress affects some ERPs on human through sight stimulus after the settlement of visual and auditory integration environment in three dimension space. We measured ERPs in the normal state and spiritual stress sessions separately. The subjects were 10 normal men and women and vital signs was recorded from Fpl, Fz, Cz, Pz, O1, O2's scalps. The experiment was done in isolated room where electro-magnetic effect do not affect. The result showed that P300's amplitude was a little higher under stress session and latent period in this resulted in longer time. We recorded through voltage variation the activity of brain which is in charge of human's perception. cognition, process of action and evaluated the effect of spiritual stress. We expected that the result of this research can be used to evaluate the malfunction of brain.
PDF KSCI

Search Result 29, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)