Search | Korea Science

Music Genre Classification based on Deep Neural Network using Spikegram (스파이크그램을 이용한 심층 신경망 기반의 음악 장르 분류)

Yun, Ho-Won;Jang, Woo-Jin;Shin, Seong-Hyeon;Jang, Won;Cho, Hyo-Jin;Park, Ho-Chong
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2017.06a
- /
- pp.29-30
- /
- 2017
본 논문에서는 인간의 청각 기관을 모델링 한 스파이크그램 (spikegram)을 이용한 심층 신경망 기반의 음악 장르 분류 기술을 제안한다. 분류 대상은 GTZAN 데이터 세트의 10개 장르로 정의한다. 본 논문에서는 청각 기관의 인식 방법을 모델링한 방법을 이용하여 스파이크그램을 구하고, 스파이크그램에서 새로운 특성 벡터를 추출하는 방법을 제안한다. 제안하는 방법을 통해 심층 신경망에 적합한 특성 벡터를 구하고 이렇게 구한 특성 벡터로 신경망을 학습시켜 기존에 사용하던 다양한 방법들보다 높은 성능을 얻을 수 있다.
PDF

Speech Synthesis using Diphone Clustering and Improved Spectral Smoothing (다이폰 군집화와 개선된 스펙트럼 완만화에 의한 음성합성)

Jang, Hyo-Jong;Kim, Kwan-Jung;Kim, Gye-Young;Choi, Hyung-Il
- The KIPS Transactions:PartB
- /
- v.10B no.6
- /
- pp.665-672
- /
- 2003
This paper describes a speech synthesis technique by concatenating unit phoneme. At that time, a major problem is that discontinuity is happened from connection part between unit phonemes, especially from connection part between unit phonemes recorded by different persons. To solve the problem, this paper uses clustered diphone, and proposes a spectral smoothing technique, not only using formant trajectory and distribution characteristic of spectrum but also reflecting human's acoustic characteristic. That is, the proposed technique performs unit phoneme clustering using distribution characteristic of spectrum at connection part between unit phonemes and decides a quantity and a scope for the smoothing by considering human's acoustic characteristic at the connection part of unit phonemes, and then performs the spectral smoothing using weights calculated along a time axes at the border of two diphones. The proposed technique removes the discontinuity and minimizes the distortion which can be occurred by spectrum smoothing. For the purpose of the performance evaluation, we test on five hundred diphones which are extracted from twenty sentences recorded by five persons, and show the experimental results.
https://doi.org/10.3745/KIPSTB.2003.10B.6.665 인용 PDF KSCI

Perceptual and Adaptive Quantization of Line Spectral Frequency Parameters (선 스펙트럼 주파수의 청각 적응 부호화)

한우진;김은경;오영환
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.8
- /
- pp.68-77
- /
- 2000
Line special frequency (LSF) parameters have been widely used in low bit-rate speech coding due to their efficiency for representing the short-time speech spectrum. In this paper, a new distance measure based on the masking properties of human ear is proposed for quantizing LSF parameters whereas most conventional quantization methods are based on the weighted Euclidean distance measure. The proposed method derives the perceptual distance measure from the definition of noise-to-mask ratio (NMR) which has high correspondence with the actual distortion received in the human ear and uses it for quantizing LSF parameters. In addition, we propose an adaptive bit allocation scheme, which allocates minimal bits to LSF parameters maintaining the perceptual transparency of given speech frame for reducing the average bit-rates. For the performance evaluation, we has shown the ratio of perceptually transparent frames and the corresponding average bit-rates for the conventional and proposed methods. By jointly combining the proposed distance measure and adaptive bit allocation scheme, the proposed system requires only 770 bps for obtaining 95.5% perceptually transparent frames, while the conventional systems produce 89.9% at even 1800 bps.
PDF

Recurrence Quantification Analysis of Auditory Evoked Related Potential in Inattention and Attention (비 집중.집중 상태에 따른 청각 유발 전위의 반복 정량 분석)

Kim, Hye-Jin;Yoo, Sun-Kook;Lee, Byung-Chae
- Science of Emotion and Sensibility
- /
- v.16 no.4
- /
- pp.503-508
- /
- 2013
This study aims to analyze using RQA(Recurrence Quantification Analysis) about difference of electroencephalogram between inattention and attention among nonlinear methods for school age children who need attention. The experiments were conducted by 21 healthy subjects(12 males and 9 females). Inattention state is 500msec before the beginning of the auditory stimuli, attention state is 500msec after the beginning of auditory stimuli. The results of RQA parameters are greater in attention state than inattention state. It showed a statistically difference(p < 0.05). According to two states, auditory evoked potentials are displayed RP and CRP in diagram form to confirm nonlinear characteristics and The brain dynamics in the attention is more complex than the inattention. It is feasible that the RQA can be useful for the analysis of complex brain dynamics associated during auditory attentional task.
https://doi.org/10.14695/KJSOS.2013.16.4.503 인용 PDF

Low frequency critical bandwidths of Korean normal hearing adults (한국 정상 성인의 저주파수 임계 주파수 대역 특성에 관한 연구)

Moon, Jihyun;Jeon, Kyongeon;Lim, Dukhwan
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.1
- /
- pp.70-75
- /
- 2022
The critical bandwidth represents response interactions with respect to a signal tone and their neighboring bands. This study was to analyze the critical bandwidths of a clinically important 500 Hz tone in Korean young male and female subjects (male = 10, female = 10) at a conversational level (60 dB HL). Data were measured with notched band noise and two alternative forced choice methods. Results showed that the critical bandwidth was slightly greater (95 Hz) than the previous Western measures. There were no statistically significant differences in gender, nor were there any significant differences in lateralization of the ear (p > 0.05). These results may have implications in optimizing effective tinnitus masking or the related clinical applications.
https://doi.org/10.7776/ASK.2022.41.1.070 인용 PDF KSCI

Hearing Threshold of Children with Hearing Screening-Passed in Day Care Center and Speech-Language Pathology Clinic (청각선별을 통과한 주간 보호와 언어재활 서비스 수혜 소아의 가청역치)

Heo, Seung-Deok
- Journal of rehabilitation welfare engineering & assistive technology
- /
- v.10 no.4
- /
- pp.273-278
- /
- 2016
Responded threshold level in hearing screening depends on the noise level of test surroundings, physiological characteristics of hearing organs, excessive sound source exposures, and so on. The purpose of this study is to obtain the basic information of hearing threshold level at each frequencies in children with passed hearing screening. Subjects were 110 children, aged were from 3.3 to 16.3 ($9.01{\pm}2.52$), who were at private speech language pathological clinics and daycare centers. Methods of Hearing screening were tympanometry, acoustic reflex threshold, automated otoacoustic emission, and pure tone screening. The subjects were in normal criteria of hearing screening. The differences of hearing threshold among ages and frequencies were measured by means of repeated measures ANOVA. The mean of hearing thresholds level was observed $16{\pm}6.49$, $11.5{\pm}4.79$, $6.86{\pm}4.99$, $5.95{\pm}6.65$ dB HL in the right ear and $15.68{\pm}6.01$, $9.95{\pm}5.24$, $5.72{\pm}5.21$, $5.63{\pm}7.04$ dB HL in the left ear, in frequency of 500, 1,000, 2,000, 4,000 Hz respectively. There was a significant difference between 500 and 1,000, 2,000, 4,000 Hz (p=.000), between 1,000 and 2,000, 4,000 Hz (p=.000).
https://doi.org/10.21288/resko.2016.10.4.273 인용 PDF KSCI

Analysis of auditory temporal processing in within- and cross-channel gap detection thresholds for low-frequency pure tones (저주파수 순음에 대한 within- 및 cross-channel gap detectin thresholds를 이용한 auditory temporal processing 특성 연구)

Koo, Sungmin;Lim, Dukhwan
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.1
- /
- pp.58-63
- /
- 2022
This study was conducted to examine the characteristics of pitch perception and temporal resolution through Within-/Cross-Channel Gap Detection Thresholds (WC/CC GDTs) using low-frequency pure tones (such as 264 Hz, 373 Hz and 528 Hz related to C4, C4#, and C5 musical tones. 40 young people and 20 elderly people with normal hearing participated in this study. The results of WC GDTs were approximately 2 ms ~ 4 ms threshold values regardless of frequencies in two groups. There was no statistically significant difference in WC GDTs between groups. In both groups, CC GDTs were larger than WC GDTs, and as the frequency difference increased, the CC GDTs also increased. In particular, in the comparison between groups of CC GDTs, the results of the elderly group were 8 times ~ 10 times larger than that of the young group, and there was a statistically significant difference between the groups. These data also showed a different trend of GDTs in comparison with the previous data obtained from musical stimuli.This study suggests that GDTs may influence pitch perception mechanisms and can be used as psychoacoustic evidence for nonlinear responses of auditory nervous system.
https://doi.org/10.7776/ASK.2022.41.1.058 인용 PDF KSCI

Developing the Design Guideline of Auditory User Interface for Domestic Appliances (가전제품의 청각 사용자 인터페이스(AUI) 설계를 위한 가이드라인 개발 연구)

Lee, Ju-Hwan;Jeon, Myoung-Hoon;Ahn, Jeong-Hee;Han, Kwang-Hee
- 한국HCI학회:학술대회논문집
- /
- 2006.02b
- /
- pp.1-8
- /
- 2006
본 연구는 가전제품의 제품군과 그 기능들에 따라 차별화 가능한 인지적, 감성적 '청각 사용자 인터페이스 디자인 가이드라인(Auditory User Interface Design Guideline)'을 마련하고, 가전제품의 작동기능 정보와 직관적으로 연합 가능한 청각신호(auditory signal)를 제작할 수 있는 지침을 제시하여 GUI 중심의 제품 설계에서 한 차원 확장되고 사용자의 다중감각적 특성이 적용된 디자인 방법을 실무에 적용하고자 하였다. 특히 AUI 에 대한 체계를 확립함으로써 브랜드 정체성(Brand Identity) 및 기업 이미지를 제고할 수 있다는 목적을 함께 고려하였다. 이러한 연구가 필요했던 이유는 가전제품에 대한 소비자의 심적 모형(mental model)과 감성 측면에서의 접근에 대한 요구 때문인데, 이는 AUI 의 체계적 적용이 아닌 임의적 연결(mapping)으로 인한 버저(buzzer) 청각신호의 짜증(annoying) 발생이 빈번한 사례들에서 출발한다. 또한 GUI 의 변화와 수준에 미치지 못하는 AUI 의 업그레이드 필요성과 가전제품에서의 감성 마케팅 경향을 반영하는 의미를 지니고 있다. 이와 함께 멀티미디어 환경의 급속한 확산으로 다중감각적 정보제시(multimodal display)가 요구되는 상황에 걸맞은 시도이다. 본 연구는 특정 가전제품이나 특정 기능이 지니고 있는 인지적, 감성적 차원의 속성을 청각신호(auditory signal)의 다양한 속성들로 유발하는 관계를 추출하고, 이를 형성하는 기본 메커니즘에 대한 경험적 자료를 제시하여, 가전제품의 AUI 디자인에 유용한 가이드라인을 제공하고자 하였다. 그러나 본 논문에서는 연구의 구체적이고 세부적인 결과보다는 전체적인 계획과 진행과정의 절차를 소개하여 관련분야 연구 진행의 참조적 틀을 마련하고자 한다.
PDF

The Hearing Ability of Black Rockfish Sebastes inermis to Underwater Audible Sound -1. The Auditory Threshold- (수중 가청음에 의한 볼락의 청각 능력 -1. 청각 문턱치-)

LEE Chang-Heon;Seo Du-Ok
- Korean Journal of Fisheries and Aquatic Sciences
- /
- v.33 no.6
- /
- pp.581-584
- /
- 2000
In order to obtain the fundamental data about method of luring fish schools by underwater audible sound, the auditory threshold of black rockfish Sebastes inermis on the coast of Cheju Island was investigated by heartbeat conditioning technique using pure tones coupled with a delayed electric shock, The audible range of black rockfish extended from 80 Hz to 800 Hz with a peak sensitivity at 300 Hz. The mean auditory thresholds of black rockfish at the frequencies of 80 Hz, 100 Hz, 200 Hz, 300 Hz, 500 Hz and 800 Hz were 102 dB, 103 dB, 99 dB, 96 dB, 116 dB and 122 dB, respectively, As the frequency became higher than 300 Hz, the auditory threshold increased rapidly.
PDF

Improvement of Synthetic Speech Quality using a New Spectral Smoothing Technique (새로운 스펙트럼 완만화에 의한 합성 음질 개선)

장효종;최형일
- Journal of KIISE:Software and Applications
- /
- v.30 no.11
- /
- pp.1037-1043
- /
- 2003
This paper describes a speech synthesis technique using a diphone as an unit phoneme. Speech synthesis is basically accomplished by concatenating unit phonemes, and it's major problem is discontinuity at the connection part between unit phonemes. To solve this problem, this paper proposes a new spectral smoothing technique which reflects not only formant trajectories but also distribution characteristics of spectrum and human's acoustic characteristics. That is, the proposed technique decides the quantity and extent of smoothing by considering human's acoustic characteristics at the connection part of unit phonemes, and then performs spectral smoothing using weights calculated along a time axis at the border of two diphones. The proposed technique reduces the discontinuity and minimizes the distortion which is caused by spectral smoothing. For the purpose of performance evaluation, we tested on five hundred diphones which are extracted from twenty sentences using ETRI Voice DB samples and individually self-recorded samples.
PDF KSCI

Search Result 331, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)