Search | Korea Science

Recognition of Overlapped Sound and Influence Analysis Based on Wideband Spectrogram and Deep Neural Networks (광역 스펙트로그램과 심층신경망에 기반한 중첩된 소리의 인식과 영향 분석)

Kim, Young Eon;Park, Gooman
- Journal of Broadcast Engineering
- /
- v.23 no.3
- /
- pp.421-430
- /
- 2018
Many voice recognition systems use methods such as MFCC, HMM to acknowledge human voice. This recognition method is designed to analyze only a targeted sound which normally appears between a human and a device one. However, the recognition capability is limited when there is a group sound formed with diversity in wider frequency range such as dog barking and indoor sounds. The frequency of overlapped sound resides in a wide range, up to 20KHz, which is higher than a voice. This paper proposes the new recognition method which provides wider frequency range by conjugating the Wideband Sound Spectrogram and the Keras Sequential Model based on DNN. The wideband sound spectrogram is adopted to analyze and verify diverse sounds from wide frequency range as it is designed to extract features and also classify as explained. The KSM is employed for the pattern recognition using extracted features from the WSS to improve sound recognition quality. The experiment verified that the proposed WSS and KSM excellently classified the targeted sound among noisy environment; overlapped sounds such as dog barking and indoor sounds. Furthermore, the paper shows a stage by stage analyzation and comparison of the factors' influences on the recognition and its characteristics according to various levels of noise.
https://doi.org/10.5909/JBE.2018.23.3.421 인용 PDF KSCI KPUBS

Model-based Clustering of DOA Data Using von Mises Mixture Model for Sound Source Localization

Dinh, Quang Nguyen;Lee, Chang-Hoon
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.13 no.1
- /
- pp.59-66
- /
- 2013
In this paper, we propose a probabilistic framework for model-based clustering of direction of arrival (DOA) data to obtain stable sound source localization (SSL) estimates. Model-based clustering has been shown capable of handling highly overlapped and noisy datasets, such as those involved in DOA detection. Although the Gaussian mixture model is commonly used for model-based clustering, we propose use of the von Mises mixture model as more befitting circular DOA data than a Gaussian distribution. The EM framework for the von Mises mixture model in a unit hyper sphere is degenerated for the 2D case and used as such in the proposed method. We also use a histogram of the dataset to initialize the number of clusters and the initial values of parameters, thereby saving calculation time and improving the efficiency. Experiments using simulated and real-world datasets demonstrate the performance of the proposed method.
https://doi.org/10.5391/IJFIS.2013.13.1.59 인용 PDF KSCI

Multi-Pulse Amplitude and Location Estimation by Maximum-Likelihood Estimation in MPE-LPC Speech Synthesis (MPE-LPC음성합성에서 Maximum- Likelihood Estimation에 의한 Multi-Pulse의 크기와 위치 추정)

이기용;최홍섭;안수길
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.26 no.9
- /
- pp.1436-1443
- /
- 1989
In this paper, we propose a maximum-likelihood estimation(MLE) method to obtain the location and the amplitude of the pulses in MPE( multi-pulse excitation)-LPC speech synthesis using multi-pulses as excitation source. This MLE method computes the value maximizing the likelihood function with respect to unknown parameters(amplitude and position of the pulses) for the observed data sequence. Thus in the case of overlapped pulses, the method is equivalent to Ozawa's crosscorrelation method, resulting in equal amount of computation and sound quality with the cross-correlation method. We show by computer simulation: the multi-pulses obtained by MLE method are(1) pseudo-periodic in pitch in the case of voicde sound, (2) the pulses are random for unvoiced sound, (3) the pulses change from random to periodic in the interval where the original speech signal changes from unvoiced to voiced. Short time power specta of original speech and syunthesized speech obtained by using multi-pulses as excitation source are quite similar to each other at the formants.
PDF

Fabrication of Metallic Glass/metallic Glass Composites by Spark Plasma Sintering (방전플라즈마 소결법에 의한 비정질/비정질 복합재의 제조)

Lee, Jin-Kyu
- Journal of Powder Materials
- /
- v.14 no.6
- /
- pp.405-409
- /
- 2007
The Cu-based bulk metallic glass (BMG) composites containing Zr-based metallic glass phase have been consolidated by spark plasma sintering using the mixture of Cu-based and Zr-based metallic glass powders in their overlapped supercooled liquid region. The Zr-based metallic glass phases are well distributed homogeneously in the Cu-based metallic glass matrix after consolidation process. The successful consolidation of BMG composites with dual amorphous phases was corresponding to the sound viscous flow of the two kinds of metallic glass powders in their overlapped supercooled liquid region.
https://doi.org/10.4150/KPMI.2007.14.6.405 인용 PDF KSCI

Polyphonic sound event detection using multi-channel audio features and gated recurrent neural networks (다채널 오디오 특징값 및 게이트형 순환 신경망을 사용한 다성 사운드 이벤트 검출)

Ko, Sang-Sun;Cho, Hye-Seung;Kim, Hyoung-Gook
- The Journal of the Acoustical Society of Korea
- /
- v.36 no.4
- /
- pp.267-272
- /
- 2017
In this paper, we propose an effective method of applying multichannel-audio feature values to GRNNs (Gated Recurrent Neural Networks) in polyphonic sound event detection. Real life sounds are often overlapped with each other, so that it is difficult to distinguish them by using a mono-channel audio features. In the proposed method, we tried to improve the performance of polyphonic sound event detection by using multi-channel audio features. In addition, we also tried to improve the performance of polyphonic sound event detection by applying a gated recurrent neural network which is simpler than LSTM (Long Short Term Memory), which shows the highest performance among the current recurrent neural networks. The experimental results show that the proposed method achieves better sound event detection performance than other existing methods.
https://doi.org/10.7776/ASK.2017.36.4.267 인용 PDF KSCI

Static and dynamic spectral properties of the monophthong vowels in Seoul Korean: Implication on sound change (서울 방언 단모음의 소리 변화와 음향 단서 연구: 단일지점 포먼트와 궤적 양상)

Kang, Jieun;Kong, Eun Jong
- Phonetics and Speech Sciences
- /
- v.8 no.4
- /
- pp.39-47
- /
- 2016
While acoustic studies in the past decade documented a raised /o/ by showing their lowered first formants (F1) almost overlapped with those of high back vowel /u/, no consensus has been made in terms of how this /o/-raising affects the vowels as a system in Seoul Korean. The current study aimed to investigate the age- and gender-related differences of the relative distance among the vowels to better understand the influence of this on-going sound change on the vowel system. We measured the static and dynamic spectral characteristics (F1 and F2) of the seven Korean monophthong vowels /e a ʌ o u ɨ i/ in the spontaneous speech of Seoul Corpus, and depicted the patterns of 30 individual speakers (10 speakers in each group of teens, 20s and 40s) as a function of age and gender. The static spectral examination showed low F1 values of /o/ in the spontaneous speech corpus confirming the vowel /o/ raising, and also revealed greater F2 values of /u, ɨ/ suggesting their anterior articulations. The tendencies were stronger when the speakers were younger and female. The spectral trajectories further showed that the F1 and F2 between /o/ and /u/ were differentiated throughout the vowel mid-point although the trajectories gradually merged near the vowel mid point in the older male speakers' productions. The acoustic evidence of contrast among /o, u, ɨ/ supports that the raised /o/ is not indicative of a merger with /u/ but rather implying a chain-like vowel shift in the Seoul Korean.
https://doi.org/10.13064/KSSS.2016.8.4.039 인용 PDF KSCI

Computations of Flows and Acoustic Wave Emitted from Moving Body by ALE Formulation in Finite Difference Lattice Boltzmann Model (차분격자볼츠만법에 ALE모델을 적용한 이동물체 주위의 흐름 및 유동소음의 수치모사)

KANG HO-KEUN
- Journal of Ocean Engineering and Technology
- /
- v.20 no.1 s.68
- /
- pp.48-54
- /
- 2006
In this paper, flowfield and acoustic-field around moving bodies are simulated by the Arbitrary Lagrangian Eulerian (ALE) formulation in the finite difference lattice Boltzmann method. Some effects are checked by comparing flaw about a square cylinder in ALE formulation and that in the fixed coordinates, and both agree very well. Matching procedure between the moving grid and fixed grid is also considered. The applied method in which the both grids are connected through buffer region is shown to be superior to moving overlapped grid. Dipole-like emissions of sound wave from harmonically vibrating bodies in two- and three-dimensional cases are simulated.
PDF KSCI

Direct Simulation of Flows and Flow Noise around Moving Body by FDLBM with ALE Model (ALE모델을 갖는 차분격자볼츠만법에 의한 이동물체 주위의 유동장 및 유동소음의 직접계산)

Kang, Ho-Keun;Michihisa, Tsutahara;Kim, Myoung-Ho;Kim, Yu-Taek;Lee, Young-Ho
- Proceedings of the Korean Society of Marine Engineers Conference
- /
- 2005.11a
- /
- pp.248-249
- /
- 2005
In this paper, flowfield and acoustic-field around moving bodies are simulated by the Arbitrary Lagrangian Eulerian (ALE) formulation in FDLBM. The effect of the ALE is checked by comparing flow about a square cylinder in ALE formulation and that in the fixed coordinates, and the results show good agreement. Matching procedure between the moving grid and fixed grid is also considered. The applied method in which the both grids are connected through buffer zone is shown to be superior to moving overlapped grid. Dipole-like emissions of sound wave from harmonically vibrating bodies in 2- and 3-dimensional cases are simulated.
PDF

Overlapped Subband-Based Independent Vector Analysis

Jang, Gil-Jin;Lee, Te-Won
- The Journal of the Acoustical Society of Korea
- /
- v.27 no.1E
- /
- pp.30-34
- /
- 2008
An improvement to the existing blind signal separation (BSS) method has been made in this paper. The proposed method models the inherent signal dependency observed in acoustic object to separate the real-world convolutive sound mixtures. The frequency domain approach requires solving the well known permutation problem, and the problem had been successfully solved by a vector representation of the sources whose multidimensional joint densities have a certain amount of dependency expressed by non-spherical distributions. Especially for speech signals, we observe strong dependencies across neighboring frequency bins and the decrease of those dependencies as the bins become far apart. The non-spherical joint density model proposed in this paper reflects this property of real-world speech signals. Experimental results show the improved performances over the spherical joint density representations.
PDF KSCI

Acoustic and phonological processes in the repetition tasks (따라 말하기 과제에서의 음향적 처리와 음운적 처리)

Yoo, Se-Jin;Lee, Kyoung-Min
- Proceedings of the Korean Society for Cognitive Science Conference
- /
- 2010.05a
- /
- pp.42-47
- /
- 2010
Speech shares acoustic features with other sound-based processing, which makes it difficult to distinguish phonological process from acoustic process in speech processing. In this study, we examined the difference between acoustic process and phonological process during repetition tasks. By contrasting various stimuli in different lengths, we localized neural correlates of acoustic process within bilateral superior temporal gyrus, which was consistent with the previous studies. The activated patterns were widely overlapped between words and pseudowords, i.e., contents-free. In contrast, phonological process showed left-lateralized activation in middle temporal gyrus located at anterior temporal areas. It implies that phonological process is contents-specific as shown in our previous study, and at the same time, more language-specific. Thus, we suggest that phonological process is distinguished from acoustic process in that it is always accompanied with the obligatory access to available phonological codes, which can be an entry of the mental lexicon.
PDF

Search Result 19, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)