• Title/Summary/Keyword: Mel frequency cepstral coefficients

Search Result 73, Processing Time 0.025 seconds

Classification of Phornographic Video with using the Features of Multiple Audio (다중 오디오 특징을 이용한 유해 동영상의 판별)

  • Kim, Jung-Soo;Chung, Myung-Bum;Sung, Bo-Kyung;Kwon, Jin-Man;Koo, Kwang-Hyo;Ko, Il-Ju
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.522-525
    • /
    • 2009
  • This paper proposed the content-based method of classifying filthy Phornographic video, which causes a big problem of modern society as the reverse function of internet. Audio data was used to extract the features from Phornographic video. There are frequency spectrum, autocorrelation, and MFCC as the feature of audio used in this paper. The sound that could be filthy contents was extracted, and the Phornographic was classified by measuring how much percentage of relevant sound was corresponding with the whole audio of video. For the experiment on the proposed method, The efficiency of classifying Phornographic was measured on each feature, and the measured result and comparison with using multi features were performed. I can obtain the better result than when only one feature of audio was extracted, and used.

  • PDF

Speech/Music Signal Classification Based on Spectrum Flux and MFCC For Audio Coder (오디오 부호화기를 위한 스펙트럼 변화 및 MFCC 기반 음성/음악 신호 분류)

  • Sangkil Lee;In-Sung Lee
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.239-246
    • /
    • 2023
  • In this paper, we propose an open-loop algorithm to classify speech and music signals using the spectral flux parameters and Mel Frequency Cepstral Coefficients(MFCC) parameters for the audio coder. To increase responsiveness, the MFCC was used as a short-term feature parameter and spectral fluxes were used as a long-term feature parameters to improve accuracy. The overall voice/music signal classification decision is made by combining the short-term classification method and the long-term classification method. The Gaussian Mixed Model (GMM) was used for pattern recognition and the optimal GMM parameters were extracted using the Expectation Maximization (EM) algorithm. The proposed long-term and short-term combined speech/music signal classification method showed an average classification error rate of 1.5% on various audio sound sources, and improved the classification error rate by 0.9% compared to the short-term single classification method and 0.6% compared to the long-term single classification method. The proposed speech/music signal classification method was able to improve the classification error rate performance by 9.1% in percussion music signals with attacks and 5.8% in voice signals compared to the Unified Speech Audio Coding (USAC) audio classification method.

AI-based stuttering automatic classification method: Using a convolutional neural network (인공지능 기반의 말더듬 자동분류 방법: 합성곱신경망(CNN) 활용)

  • Jin Park;Chang Gyun Lee
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.71-80
    • /
    • 2023
  • This study primarily aimed to develop an automated stuttering identification and classification method using artificial intelligence technology. In particular, this study aimed to develop a deep learning-based identification model utilizing the convolutional neural networks (CNNs) algorithm for Korean speakers who stutter. To this aim, speech data were collected from 9 adults who stutter and 9 normally-fluent speakers. The data were automatically segmented at the phrasal level using Google Cloud speech-to-text (STT), and labels such as 'fluent', 'blockage', prolongation', and 'repetition' were assigned to them. Mel frequency cepstral coefficients (MFCCs) and the CNN-based classifier were also used for detecting and classifying each type of the stuttered disfluency. However, in the case of prolongation, five results were found and, therefore, excluded from the classifier model. Results showed that the accuracy of the CNN classifier was 0.96, and the F1-score for classification performance was as follows: 'fluent' 1.00, 'blockage' 0.67, and 'repetition' 0.74. Although the effectiveness of the automatic classification identifier was validated using CNNs to detect the stuttered disfluencies, the performance was found to be inadequate especially for the blockage and prolongation types. Consequently, the establishment of a big speech database for collecting data based on the types of stuttered disfluencies was identified as a necessary foundation for improving classification performance.

A cable tension identification technology using percussion sound

  • Wang, Guowei;Lu, Wensheng;Yuan, Cheng;Kong, Qingzhao
    • Smart Structures and Systems
    • /
    • v.29 no.3
    • /
    • pp.475-484
    • /
    • 2022
  • The loss of cable tension for civil infrastructure reduces structural bearing capacity and causes harmful deformation of structures. Currently, most of the structural health monitoring (SHM) approaches for cables rely on contact transducers. This paper proposes a cable tension identification technology using percussion sound, which provides a fast determination of steel cable tension without physical contact between cables and sensors. Notably, inspired by the concept of tensioning strings for piano tuning, this proposed technology predicts cable tension value by deep learning assisted classification of "percussion" sound from tapping a steel cable. To simulate the non-linear mapping of human ears to sound and to better quantify the minor changes in the high-frequency bands of the sound spectrum generated by percussions, Mel-frequency cepstral coefficients (MFCCs) were extracted as acoustic features to train the deep learning network. A convolutional neural network (CNN) with four convolutional layers and two global pooling layers was employed to identify the cable tension in a certain designed range. Moreover, theoretical and finite element methods (FEM) were conducted to prove the feasibility of the proposed technology. Finally, the identification performance of the proposed technology was experimentally investigated. Overall, results show that the proposed percussion-based technology has great potentials for estimating cable tension for in-situ structural safety assessment.

Performance Improvement of Speaker Recognition by MCE-based Score Combination of Multiple Feature Parameters (MCE기반의 다중 특징 파라미터 스코어의 결합을 통한 화자인식 성능 향상)

  • Kang, Ji Hoon;Kim, Bo Ram;Kim, Kyu Young;Lee, Sang Hoon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.6
    • /
    • pp.679-686
    • /
    • 2020
  • In this thesis, an enhanced method for the feature extraction of vocal source signals and score combination using an MCE-Based weight estimation of the score of multiple feature vectors are proposed for the performance improvement of speaker recognition systems. The proposed feature vector is composed of perceptual linear predictive cepstral coefficients, skewness, and kurtosis extracted with lowpass filtered glottal flow signals to eliminate the flat spectrum region, which is a meaningless information section. The proposed feature was used to improve the conventional speaker recognition system utilizing the mel-frequency cepstral coefficients and the perceptual linear predictive cepstral coefficients extracted with the speech signals and Gaussian mixture models. In addition, to increase the reliability of the estimated scores, instead of estimating the weight using the probability distribution of the convectional score, the scores evaluated by the conventional vocal tract, and the proposed feature are fused by the MCE-Based score combination method to find the optimal speaker. The experimental results showed that the proposed feature vectors contained valid information to recognize the speaker. In addition, when speaker recognition is performed by combining the MCE-based multiple feature parameter scores, the recognition system outperformed the conventional one, particularly in low Gaussian mixture cases.

Whale Sound Reconstruction using MFCC and L2-norm Minimization (MFCC와 L2-norm 최소화를 이용한 고래소리의 재생)

  • Chong, Ui-Pil;Jeon, Seo-Yun;Hong, Jeong-Pil;Jo, Se-Hyung
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.19 no.4
    • /
    • pp.147-152
    • /
    • 2018
  • Underwater transient signals are complex, variable and nonlinear, resulting in a difficulty in accurate modeling with reference patterns. We analyze one type of underwater transient signals, in the form of whale sounds, using the MFCC(Mel-Frequency Cepstral Constant) and synthesize them from the MFCC and the weighted $L_2$-norm minimization techniques. The whales in this experiments are Humpback whales, Right whales, Blue whales, Gray whales, Minke whales. The 20th MFCC coefficients are extracted from the original signals using the MATLAB programming and reconstructed using the weighted $L_2$-norm minimization with the inverse MFCC. Finally, we could find the optimum weighted factor, 3~4 for reconstruction of whale sounds.

A Musical Genre Classification Method Based on the Octave-Band Order Statistics (옥타브밴드 순서 통계량에 기반한 음악 장르 분류)

  • Seo, Jin Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.1
    • /
    • pp.81-86
    • /
    • 2014
  • This paper presents a study on the effectiveness of using the spectral and the temporal octave-band order statistics for musical genre classification. In order to represent the relative disposition of the harmonic and non-harmonic components, we utilize the octave-band order statistics of power spectral distribution. Experiments on the widely used two music datasets were performed; the results show that the octave-band order statistics improve genre classification accuracy by 2.61 % for one dataset and 8.9 % for another dataset compared with the mel-frequency cepstral coefficients and the octave-band spectral contrast. Experimental results show that the octave-band order statistics are promising for musical genre classification.

Utilization of Phase Information for Speech Recognition (음성 인식에서 위상 정보의 활용)

  • Lee, Chang-Young
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.9
    • /
    • pp.993-1000
    • /
    • 2015
  • Mel-Frequency Cepstral Coefficients(: MFCC) is one of the noble feature vectors for speech signal processing. An evident drawback in MFCC is that the phase information is lost by taking the magnitude of the Fourier transform. In this paper, we consider a method of utilizing the phase information by treating the magnitudes of real and imaginary components of FFT separately. By applying this method to speech recognition with FVQ/HMM, the speech recognition error rate is found to decrease compared to the conventional MFCC. By numerical analysis, we show also that the optimal value of MFCC components is 12 which come from 6 real and imaginary components of FFT each.

An Efficient Voice Activity Detection Method using Bi-Level HMM (Bi-Level HMM을 이용한 효율적인 음성구간 검출 방법)

  • Jang, Guang-Woo;Jeong, Mun-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.8
    • /
    • pp.901-906
    • /
    • 2015
  • We presented a method for Vad(Voice Activity Detection) using Bi-level HMM. Conventional methods need to do an additional post processing or set rule-based delayed frames. To cope with the problem, we applied to VAD a Bi-level HMM that has an inserted state layer into a typical HMM. And we used posterior ratio of voice states to detect voice period. Considering MFCCs(: Mel-Frequency Cepstral Coefficients) as observation vectors, we performed some experiments with voice data of different SNRs and achieved satisfactory results compared with well-known methods.

A Study on Serendipity-Oriented Music Recommendation Based on Play Information (재생 정보 기반 우연성 지향적 음악 추천에 관한 연구)

  • Ha, Taehyun;Lee, Sangwon
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.41 no.2
    • /
    • pp.128-136
    • /
    • 2015
  • With the recent interests with culture technologies, many studies for recommendation systems have been done. In this vein, various music recommendation systems have been developed. However, they have often focused on the technical aspects such as feature extraction and similarity comparison, and have not sufficiently addressed them in user-centered perspectives. For users' high satisfaction with recommended music items, it is necessary to study how the items are connected to the users' actual desires. For this, our study proposes a novel music recommendation method based on serendipity, which means the freshness users feel for their familiar items. The serendipity is measured through the comparison of users' past and recent listening tendencies. We utilize neural networks to apply these tendencies to the recommendation process and to extract the features of music items as MFCCs (Mel-frequency cepstral coefficients). In that the recommendation method is developed based on the characteristics of user behaviors, it is expected that user satisfaction for the recommended items can be actually increased.