Search | Korea Science

Speech/Music Discrimination Using Spectrum Analysis and Neural Network (스펙트럼 분석과 신경망을 이용한 음성/음악 분류)

Keum, Ji-Soo;Lim, Sung-Kil;Lee, Hyon-Soo
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.5
- /
- pp.207-213
- /
- 2007
In this research, we propose an efficient Speech/Music discrimination method that uses spectrum analysis and neural network. The proposed method extracts the duration feature parameter(MSDF) from a spectral peak track by analyzing the spectrum, and it was used as a feature for Speech/Music discriminator combined with the MFSC. The neural network was used as a Speech/Music discriminator, and we have reformed various experiments to evaluate the proposed method according to the training pattern selection, size and neural network architecture. From the results of Speech/Music discrimination, we found performance improvement and stability according to the training pattern selection and model composition in comparison to previous method. The MSDF and MFSC are used as a feature parameter which is over 50 seconds of training pattern, a discrimination rate of 94.97% for speech and 92.38% for music. Finally, we have achieved performance improvement 1.25% for speech and 1.69% for music compares to the use of MFSC.
https://doi.org/10.7776/ASK.2007.26.5.207 인용 PDF KSCI

Analysis of Feature Parameter Variation for Korean Digit Telephone Speech according to Channel Distortion and Recognition Experiment (한국어 숫자음 전화음성의 채널왜곡에 따른 특징파라미터의 변이 분석 및 인식실험)

Jung Sung-Yun;Son Jong-Mok;Kim Min-Sung;Bae Keun-Sung
- MALSORI
- /
- no.43
- /
- pp.179-188
- /
- 2002
Improving the recognition performance of connected digit telephone speech still remains a problem to be solved. As a basic study for it, this paper analyzes the variation of feature parameters of Korean digit telephone speech according to channel distortion. As a feature parameter for analysis and recognition MFCC is used. To analyze the effect of telephone channel distortion depending on each call, MFCCs are first obtained from the connected digit telephone speech for each phoneme included in the Korean digit. Then CMN, RTCN, and RASTA are applied to the MFCC as channel compensation techniques. Using the feature parameters of MFCC, MFCC+CMN, MFCC+RTCN, and MFCC+RASTA, variances of phonemes are analyzed and recognition experiments are done for each case. Experimental results are discussed with our findings and discussions
PDF

Adaptive Feedback Cancellation Using by Independent Component Analysis for Digital Hearing Aid (독립성분분석을 이용한 디지털 보청기용 적응형 궤환 제거)

Ji, Yoon-Sang;Lee, Sang-Min;Jung, Sae-Young;Kim, In-Young;Kim, Sun-I
- Speech Sciences
- /
- v.12 no.3
- /
- pp.79-89
- /
- 2005
Acoustic feedback between microphone and receiver can be effectively cancelled adaptive feedback cancellation algorithm. Although many speech sounds have non-Gaussian distribution, most algorithms were tested with speech like sounds whose distribution were Guassian type. In this paper, we proposed an adaptive feedback cancellation algorithm based on independent component analysis (ICA) for digital hearing aid. The algorithm was tested with not only Gaussian distribution but also Laplacian distribution. We verified that the proposed algorithm has better acoustic feedback cancelling performance than conventional normalized root mean square (NLMS) algorithm, especially speech like sounds with Laplacian distribution.
PDF

Weighted filter bank analysis and model adaptation for improving the recognition performance of partially corrupted speech (부분 손상된 음성의 인식성능 향상을 위한 가중 필터뱅크 분석 및 모델 적응)

Cho Hoon-Young;Oh Yung-Hwan
- MALSORI
- /
- no.44
- /
- pp.157-169
- /
- 2002
We propose a weighted filter bank analysis and model adaptation (WFBA-MA) scheme to improve the utilization of uncorrupted or less severely corrupted frequency regions for robust speech recognition. A weighted met frequency cepstral coefficient is obtained by weighting log filter bank energies with reliability coefficients and hidden Markov models are also modified to reflect the local reliabilities. Experimental results on TIDIGITS database corrupted by band-limited noises and car noise indicated that the proposed WFBA-MA scheme utilizes the uncorrupted speech information well, significantly improving recognition performance in comparison to multi-band speech recognition systems.
PDF

Analysis of Speech Signals According to the Various Emotional Contents (정서정보의 변화에 따른 음성신호의 특성분석에 관한 연구)

Jo, Cheol-Woo;Jo, Eun-Kyung;Min, Kyung-Hwan
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.3
- /
- pp.33-37
- /
- 1997
This paper describes experimental results from emotional speech materials, which is analysed by various signal processing methods. Speech materials with emotional informations are collected from actors. Analysis is focused to the variations of pitch informations and durations. From the analysed results we can observe the characteristics of emotional speech. The materials from this experiment provides valuable resources for analysing emotional speech.
PDF

A Study on the Voice Onset Time of English Voiceless Stops in the Buckeye Corpus (벅아이 코퍼스를 이용한 영어 무성파열음의 VOT 연구)

Yoon, Kyu-Chul
- Phonetics and Speech Sciences
- /
- v.4 no.2
- /
- pp.33-40
- /
- 2012
The purpose of this paper is to investigate the voice onset time (VOT) of the English voiceless stops [p, t, k] found in the Buckeye Corpus of Conversational Speech [1]. Three young female speakers were chosen for this study and their VOT values were semi-automatically extracted along with other factors. The factors used for the analysis were place of articulation, location in word, syllabic stress, content word or not, word frequency calculated from the corpus, and the speech rate expressed in syllables per second. Results showed that, for the three places of articulation of each speaker, all the factors had a statistically significant effect on the VOT values. This paper has significance in that the materials used for the analysis were from a corpus of spontaneous natural English speech.
https://doi.org/10.13064/KSSS.2012.4.2.033 인용 PDF

Metrical Foot in Korean Phonology (한국어 음운론의 음보)

Lee Sang-Jik
- MALSORI
- /
- no.25_26
- /
- pp.38-51
- /
- 1993
Korean phonology has not recognised metrical foot as a phonological unit to account for certain phonological processes. This paper, however, suggests that an optional h-deletion process in Korean should require the notion of metrical foot as an independent phonological domain. The previous analyses rely on the notion of speech speed to explain optional h-deletion : i. e. an intervocalic h is deleted in fast speech, but in slow speech it remains. This paper claims that the notion of speech speed should be reinterpreted in terms of metrical foot : i.e. foot-internal t is deleted, but foot-initial h remains. Such analysis provides evidence that metrical foot constitutes a phonological unit in Korean phonology. With the notion of metrical foot, it enables us to achieve more detailed and accurate analysis of the optional h-deletion process in Korean.
PDF

Blind Classification of Speech Compression Methods using Structural Analysis of Bitstreams (비트스트림의 구조 분석을 이용한 음성 부호화 방식 추정 기법)

Yoo, Hoon;Park, Cheol-Sun;Park, Young-Mi;Kim, Jong-Ho
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.16 no.1
- /
- pp.59-64
- /
- 2012
This paper addresses a blind estimation and classification algorithm of the speech compression methods by using analysis on the structure of compressed bitstreams. Various speech compression methods including vocoders are developed in order to transmit or store the speech signals at very low bitrates. As a key feature, the vocoders contain the block structure inevitably. In classification of each compression method, we use the Measure of Inter-Block Correlation (MIBC) to check whether the bitstream includes the block structure or not, and to estimate the block length. Moreover, for the compression methods with the same block length, the proposed algorithm estimates the corresponding compression method correctly by using that each compression method has different correlation characteristics in each bit location. Experimental results indicate that the proposed algorithm classifies the speech compression methods robustly for various types and lengths of speech signals in noisy environment.
https://doi.org/10.6109/jkiice.2012.16.1.059 인용 PDF KSCI

Entrepreneur Speech and User Comments: Focusing on YouTube Contents (기업가 연설문의 주제와 시청자 댓글 간의 관계 분석: 유튜브 콘텐츠를 중심으로)

Kim, Sungbum;Lee, Junghwan
- The Journal of the Korea Contents Association
- /
- v.20 no.5
- /
- pp.513-524
- /
- 2020
Recently, YouTube's growth started drawing attention. YouTube is not only a content-consumption channel but also provides a space for consumers to express their intention. Consumers share their opinions on YouTube through comments. The study focuses on the text of global entrepreneurs' speeches and the comments in response to those speeches on YouTube. A content analysis was conducted for each speech and comment using the text mining software Leximancer. We analyzed the theme of each entrepreneurial speech and derived topics related to the propensity and characteristics of individual entrepreneurs. In the comments, we found the theme of money, work and need to be common regardless of the content of each speech. Talking into account the different lengths of text, we additionally performed a Prominence Index analysis. We derived time, future, better, best, change, life, business, and need as common keywords for speech contents and viewer comments. Users who watched an entrepreneur's speech on YouTube responded equally to the topics of life, time, future, customer needs, and positive change.
https://doi.org/10.5392/JKCA.2020.20.05.513 인용 PDF KSCI HTML

An Effective Two-Step Model for Speech Act Analysis in a Schedule Management Domain (일정 관리 영역에서의 화행 분석을 위한 효과적인 2단계 모델)

Lee, Hyun-Jung;Kim, Hark-Soo;Seo, Jung-Yun
- Korean Journal of Cognitive Science
- /
- v.19 no.3
- /
- pp.297-310
- /
- 2008
Since speech acts implies speakers' intentions, it is essential to determine speakers' speech acts if we want to implement an intelligent dialogue system. We propose a two-step model for effectively determining speakers' speech acts. In the first step, the proposed model returns speech act candidates by using a neural network model based on machine learning and a predictivity model based on statistics, respectively. In the second step, using speech act candidates which are returned by the predictivity model, the proposed model filters out speech act candidates which are returned by the neural network model. Then, the proposed model selects a speech act with maximum output value among the unremoved speech act candidates. In the experiment on a schedule management domain, the proposed two-step modeling method showed better precisions than the previous methods only using a machine learning model or a probability model.
PDF

Search Result 1,592, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)