• Title/Summary/Keyword: MEL

Search Result 586, Processing Time 0.025 seconds

Emotion Recognition in Arabic Speech from Saudi Dialect Corpus Using Machine Learning and Deep Learning Algorithms

  • Hanaa Alamri;Hanan S. Alshanbari
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.9-16
    • /
    • 2023
  • Speech can actively elicit feelings and attitudes by using words. It is important for researchers to identify the emotional content contained in speech signals as well as the sort of emotion that resulted from the speech that was made. In this study, we studied the emotion recognition system using a database in Arabic, especially in the Saudi dialect, the database is from a YouTube channel called Telfaz11, The four emotions that were examined were anger, happiness, sadness, and neutral. In our experiments, we extracted features from audio signals, such as Mel Frequency Cepstral Coefficient (MFCC) and Zero-Crossing Rate (ZCR), then we classified emotions using many classification algorithms such as machine learning algorithms (Support Vector Machine (SVM) and K-Nearest Neighbor (KNN)) and deep learning algorithms such as (Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM)). Our Experiments showed that the MFCC feature extraction method and CNN model obtained the best accuracy result with 95%, proving the effectiveness of this classification system in recognizing Arabic spoken emotions.

Preprocessing performance of convolutional neural networks according to characteristic of underwater targets (수중 표적 분류를 위한 합성곱 신경망의 전처리 성능 비교)

  • Kyung-Min, Park;Dooyoung, Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.6
    • /
    • pp.629-636
    • /
    • 2022
  • We present a preprocessing method for an underwater target detection model based on a convolutional neural network. The acoustic characteristics of the ship show ambiguous expression due to the strong signal power of the low frequency. To solve this problem, we combine feature preprocessing methods with various feature scaling methods and spectrogram methods. Define a simple convolutional neural network model and train it to measure preprocessing performance. Through experiment, we found that the combination of log Mel-spectrogram and standardization and robust scaling methods gave the best classification performance.

Infant cry recognition using a deep transfer learning method (딥 트랜스퍼 러닝 기반의 아기 울음소리 식별)

  • Bo, Zhao;Lee, Jonguk;Atif, Othmane;Park, Daihee;Chung, Yongwha
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.971-974
    • /
    • 2020
  • Infants express their physical and emotional needs to the outside world mainly through crying. However, most of parents find it challenging to understand the reason behind their babies' cries. Failure to correctly understand the cause of a baby' cry and take appropriate actions can affect the cognitive and motor development of newborns undergoing rapid brain development. In this paper, we propose an infant cry recognition system based on deep transfer learning to help parents identify crying babies' needs the same way a specialist would. The proposed system works by transforming the waveform of the cry signal into log-mel spectrogram, then uses the VGGish model pre-trained on AudioSet to extract a 128-dimensional feature vector from the spectrogram. Finally, a softmax function is used to classify the extracted feature vector and recognize the corresponding type of cry. The experimental results show that our method achieves a good performance exceeding 0.96 in precision and recall, and f1-score.

Classification of infant cries using 3D feature vectors (3D 특징 벡터를 이용한 영아 울음소리 분류)

  • Park, JeongHyeon;Kim, MinSeo;Choi, HyukSoon;Moon, Nammee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.597-599
    • /
    • 2022
  • 영아는 울음이라는 비언어적 의사 소통 방식을 사용하여 모든 욕구를 표현한다. 하지만 영아의 울음소리를 파악하는 것에는 어려움이 따른다. 영아의 울음소리를 해석하기 위해 많은 연구가 진행되었다. 이에 본 논문에서는 3D 특징 벡터를 이용한 영아의 울음소리 분류를 제안한다. Donate-a-corpus-cry 데이터 세트는 복통, 트림, 불편, 배고픔, 피곤으로 총 5 개의 클래스로 분류된 데이터를 사용한다. 데이터들은 원래 속도의 90%와 110%로 수정하는 방법인 템포조절을 통해 증강한다. Spectrogram, Mel-Spectrogram, MFCC 로 특징 벡터화를 시켜준 후, 각각의 2 차원 특징벡터를 묶어 3차원 특징벡터로 구성한다. 이후 3 차원 특징 벡터를 ResNet 과 EfficientNet 모델로 학습을 진행한다. 그 결과 2 차원 특징 벡터는 0.89(F1) 3 차원 특징 벡터의 경우 0.98(F1)으로 0.09 의 성능 향상을 보여주었다.

Implementation of Melody Generation Model Through Weight Adaptation of Music Information Based on Music Transformer (Music Transformer 기반 음악 정보의 가중치 변형을 통한 멜로디 생성 모델 구현)

  • Seunga Cho;Jaeho Lee
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.5
    • /
    • pp.217-223
    • /
    • 2023
  • In this paper, we propose a new model for the conditional generation of music, considering key and rhythm, fundamental elements of music. MIDI sheet music is converted into a WAV format, which is then transformed into a Mel Spectrogram using the Short-Time Fourier Transform (STFT). Using this information, key and rhythm details are classified by passing through two Convolutional Neural Networks (CNNs), and this information is again fed into the Music Transformer. The key and rhythm details are combined by differentially multiplying the weights and the embedding vectors of the MIDI events. Several experiments are conducted, including a process for determining the optimal weights. This research represents a new effort to integrate essential elements into music generation and explains the detailed structure and operating principles of the model, verifying its effects and potentials through experiments. In this study, the accuracy for rhythm classification reached 94.7%, the accuracy for key classification reached 92.1%, and the Negative Likelihood based on the weights of the embedding vector resulted in 3.01.

Study on the Performance of Spectral Contrast MFCC for Musical Genre Classification (스펙트럼 대비 MFCC 특징의 음악 장르 분류 성능 분석)

  • Seo, Jin-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.4
    • /
    • pp.265-269
    • /
    • 2010
  • This paper proposes a novel spectral audio feature, spectral contrast MFCC (SCMFCC), and studies its performance on the musical genre classification. For a successful musical genre classifier, extracting features that allow direct access to the relevant genre-specific information is crucial. In this regard, the features based on the spectral contrast, which represents the relative distribution of the harmonic and non-harmonic components, have received increased attention. The proposed SCMFCC feature utilizes the spectral contrst on the mel-frequency cepstrum and thus conforms the conventional MFCC in a way more relevant for musical genre classification. By performing classification test on the widely used music DB, we compare the performance of the proposed feature with that of the previous ones.

New Temporal Features for Cardiac Disorder Classification by Heart Sound (심음 기반의 심장질환 분류를 위한 새로운 시간영역 특징)

  • Kwak, Chul;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.2
    • /
    • pp.133-140
    • /
    • 2010
  • We improve the performance of cardiac disorder classification by adding new temporal features extracted from continuous heart sound signals. We add three kinds of novel temporal features to a conventional feature based on mel-frequency cepstral coefficients (MFCC): Heart sound envelope, murmur probabilities, and murmur amplitude variation. In cardiac disorder classification and detection experiments, we evaluate the contribution of the proposed features to classification accuracy and select proper temporal features using the sequential feature selection method. The selected features are shown to improve classification accuracy significantly and consistently for neural network-based pattern classifiers such as multi-layer perceptron (MLP), support vector machine (SVM), and extreme learning machine (ELM).

Design of Experiments for Enhanced Catalytic Activity: Cu-Embedded Covalent Organic Frameworks in 4-Nitrophenol Reduction

  • Sangmin Lee;Kye Sang Yoo
    • Applied Chemistry for Engineering
    • /
    • v.35 no.4
    • /
    • pp.346-351
    • /
    • 2024
  • Chemical reduction using catalysts and NaBH4 presents a promising approach for reducing 4-nitrophenol contamination while generating valuable byproducts. Covalent organic frameworks (COFs) emerge as a versatile platform for supporting catalysts due to their unique properties, such as high surface area and tunable pore structures. This study employs design of experiments (DOE) to systematically optimize the synthesis of Cu embedded COF (Cu/COF) catalysts for the reduction of 4-nitrophenol. Through a series of experimental designs, including definitive screening, mixture method, and central composition design, the main synthesis parameters influencing Cu/COF formation are identified and optimized: MEL:TPA:DMSO = 0.31:0.36:0.33. Furthermore, the optimal synthesis temperature and time were predicted to be 195 ℃ and 14.7 h. Statistical analyses reveal significant factors affecting Cu/COF synthesis, facilitating the development of tailored nanostructures with enhanced catalytic performance. The catalytic efficacy of the optimized Cu/COF materials is evaluated in the reduction of 4-nitrophenol, demonstrating promising results in line with the predictions from DOE.

Cytotoxicity of a Novel Biphenolic Compound, Bis(2-hydroxy-3-tert-butyl-5-methylphenyl)methane against Human Tumor Cells In vitro

  • Choi, Sang-Un;Kim, Kwang-Hee;Kim, Nam-Young;Choi, Eun-Jung;Lee, Chong-Ock;Son, Kwang-Hee;Kim, Sung-Uk;Bok, Song-Hae;Kim, Young-Kook
    • Archives of Pharmacal Research
    • /
    • v.19 no.4
    • /
    • pp.286-291
    • /
    • 1996
  • Phenolic compounds are prevalent as toxins or environmental pollutants, but they are also widely used as drugs for various purpose including anticancer agent. A novel biphenolic compound, bis(2-hydroxy-3-tert-butyl-5-methylphenyl)methane (GERI-BPO02-A) was isolated from the fermentation broth of Aspergillus fumigatus F93 previously, and it has revealed cytotoxicity against human solid tumor cells. Its effective doses that cause 50% inhibition of cell growth in vitro against non-small cell lung cancer cell A549, ovarian cancer cell SK-OV-3, skin cancer cell SK-MEL-2 and central nerve system cancer cell XF498 were 8.24, 10.60, 8.83, $9.85\mug/ml$ respectively. GERI-BPO02-A has also revealed cytotoxicity against P-glycoproteinexpressed human colon cancer cell HCT15 and its multidrug-resistant subline HCT15/CL02, and its cytotoxicity was not affected by P-glycoprotein. We have also tested cytotoxicities of structurally related compounds of GERI-BPO02-A such as diphenylmethane, 1,1-bis(3,4dimethylphenyl)ethane, 2,2-diphenylpropane, 2-benzylpyridine, 3-benzylpyridine, $4,4^I-di-tert-butylphenyl$, bibenzyl, $2,2^I-dimethylbibenzyl$, cis-stilbene, trans-stilbene, 3-tert-butyl-4-hydroxy-5-methylphenyisulfide, sulfadiazine and sulfisomidine for studying of structure and activity relationship, and from these data we could suppose that hydroxyl group of GERI-BPO02A conducted important role in its cytotoxicity.

  • PDF

Sound event detection based on multi-channel multi-scale neural networks for home monitoring system used by the hard-of-hearing (청각 장애인용 홈 모니터링 시스템을 위한 다채널 다중 스케일 신경망 기반의 사운드 이벤트 검출)

  • Lee, Gi Yong;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.6
    • /
    • pp.600-605
    • /
    • 2020
  • In this paper, we propose a sound event detection method using a multi-channel multi-scale neural networks for sound sensing home monitoring for the hearing impaired. In the proposed system, two channels with high signal quality are selected from several wireless microphone sensors in home. The three features (time difference of arrival, pitch range, and outputs obtained by applying multi-scale convolutional neural network to log mel spectrogram) extracted from the sensor signals are applied to a classifier based on a bidirectional gated recurrent neural network to further improve the performance of sound event detection. The detected sound event result is converted into text along with the sensor position of the selected channel and provided to the hearing impaired. The experimental results show that the sound event detection method of the proposed system is superior to the existing method and can effectively deliver sound information to the hearing impaired.