Search | Korea Science

Korean Traditional Music Genre Classification Using Sample and MIDI Phrases

Lee, JongSeol;Lee, MyeongChun;Jang, Dalwon;Yoon, Kyoungro
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.12 no.4
- /
- pp.1869-1886
- /
- 2018
This paper proposes a MIDI- and audio-based music genre classification method for Korean traditional music. There are many traditional instruments in Korea, and most of the traditional songs played using the instruments have similar patterns and rhythms. Although music information processing such as music genre classification and audio melody extraction have been studied, most studies have focused on pop, jazz, rock, and other universal genres. There are few studies on Korean traditional music because of the lack of datasets. This paper analyzes raw audio and MIDI phrases in Korean traditional music, performed using Korean traditional musical instruments. The classified samples and MIDI, based on our classification system, will be used to construct a database or to implement our Kontakt-based instrument library. Thus, we can construct a management system for a Korean traditional music library using this classification system. Appropriate feature sets for raw audio and MIDI phrases are proposed and the classification results-based on machine learning algorithms such as support vector machine, multi-layer perception, decision tree, and random forest-are outlined in this paper.
https://doi.org/10.3837/tiis.2018.04.026 인용 PDF KSCI

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

Liu, Min;Tang, Jun
- Journal of Information Processing Systems
- /
- v.17 no.4
- /
- pp.754-771
- /
- 2021
In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.
https://doi.org/10.3745/JIPS.02.0161 인용 PDF KSCI

A New Tempo Feature Extraction Based on Modulation Spectrum Analysis for Music Information Retrieval Tasks

Kim, Hyoung-Gook
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.6 no.2
- /
- pp.95-106
- /
- 2007
This paper proposes an effective tempo feature extraction method for music information retrieval. The tempo information is modeled by the narrow-band temporal modulation components, which are decomposed into a modulation spectrum via joint frequency analysis. In implementation, the tempo feature is directly extracted from the modified discrete cosine transform coefficients, which is the output of partial MP3(MPEG 1 Layer 3) decoder. Then, different features are extracted from the amplitudes of modulation spectrum and applied to different music information retrieval tasks. The logarithmic scale modulation frequency coefficients are employed in automatic music emotion classification and music genre classification. The classification precision in both systems is improved significantly. The bit vectors derived from adaptive modulation spectrum is used in audio fingerprinting task That is proved to be able to achieve high robustness in this application. The experimental results in these tasks validate the effectiveness of the proposed tempo feature.
PDF

A Study on the Extraction of Specific Audio Feature In Basketball Video (농구 비디오에서 특정 음성 특징 추출에 관한 연구)

공현장;김원필;김판구
- Proceedings of the Korea Multimedia Society Conference
- /
- 2002.05d
- /
- pp.1075-1080
- /
- 2002
최근 멀티미디어 정보 시스템에서의 음성 핀 시각적 내용의 분류에 관한 연구가 활발히 진행되고 있다. 이에 본 논문에서는 농구 경기의 비디오 데이터로부터 특정 음성 정보를 추출하는 방법과 이를 농구 게임의 중요 이벤트 검출에 이용하는 방법을 제안한다. MFCC 특징들과 LPC 엔트로피의 조합을 이용하여 검출된 관중들의 환호 소리로부터 중요한 이벤트의 위치를 예측할 수 있다. 농구 경기의 다양한 소리들 중에서 관중들의 환호 소리를 분류하여 이를 농구 비디오 데이터에서 중요한 이벤트들을 검출하는데 사용함으로써 매우 효과적 결과를 얻을 수 있었다.
PDF

Voice Synthesis Detection Using Language Model-Based Speech Feature Extraction (언어 모델 기반 음성 특징 추출을 활용한 생성 음성 탐지)

Seung-min Kim;So-hee Park;Dae-seon Choi
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.34 no.3
- /
- pp.439-449
- /
- 2024
Recent rapid advancements in voice generation technology have enabled the natural synthesis of voices using text alone. However, this progress has led to an increase in malicious activities, such as voice phishing (voishing), where generated voices are exploited for criminal purposes. Numerous models have been developed to detect the presence of synthesized voices, typically by extracting features from the voice and using these features to determine the likelihood of voice generation.This paper proposes a new model for extracting voice features to address misuse cases arising from generated voices. It utilizes a deep learning-based audio codec model and the pre-trained natural language processing model BERT to extract novel voice features. To assess the suitability of the proposed voice feature extraction model for voice detection, four generated voice detection models were created using the extracted features, and performance evaluations were conducted. For performance comparison, three voice detection models based on Deepfeature proposed in previous studies were evaluated against other models in terms of accuracy and EER. The model proposed in this paper achieved an accuracy of 88.08%and a low EER of 11.79%, outperforming the existing models. These results confirm that the voice feature extraction method introduced in this paper can be an effective tool for distinguishing between generated and real voices.
https://doi.org/10.13089/JKIISC.2024.34.3.439 인용 PDF HTML

Prediction of Closed Quotient During Vocal Phonation using GRU-type Neural Network with Audio Signals

Hyeonbin Han;Keun Young Lee;Seong-Yoon Shin;Yoseup Kim;Gwanghyun Jo;Jihoon Park;Young-Min Kim
- Journal of information and communication convergence engineering
- /
- v.22 no.2
- /
- pp.145-152
- /
- 2024
Closed quotient (CQ) represents the time ratio for which the vocal folds remain in contact during voice production. Because analyzing CQ values serves as an important reference point in vocal training for professional singers, these values have been measured mechanically or electrically by either inverse filtering of airflows captured by a circumferentially vented mask or post-processing of electroglottography waveforms. In this study, we introduced a novel algorithm to predict the CQ values only from audio signals. This has eliminated the need for mechanical or electrical measurement techniques. Our algorithm is based on a gated recurrent unit (GRU)-type neural network. To enhance the efficiency, we pre-processed an audio signal using the pitch feature extraction algorithm. Then, GRU-type neural networks were employed to extract the features. This was followed by a dense layer for the final prediction. The Results section reports the mean square error between the predicted and real CQ. It shows the capability of the proposed algorithm to predict CQ values.
https://doi.org/10.56977/jicce.2024.22.2.145 인용 PDF

Feature Extraction and Selection for Emotion Classification of inter-persons (개인 내 정서판별을 위한 특징 추출 및 선택)

Yang, Heui-Kyung;Lee, Jeong-Whan;Lee, Young-Jae;Lee, Pil-Jae;Sohn, Jin-Hun;Heo, Jun-Hyoung
- Proceedings of the KIEE Conference
- /
- 2011.07a
- /
- pp.1970-1971
- /
- 2011
정서인식분야에서 현재 활발히 연구되고 있는 방법은 다양한 생체신호를 통해 인간의 감정을 인식하는 것이다. 생리심리학적 연구에서 인간의 감정상태와 생체반응은 강한 상관이 있다고 알려져 있다. 생체신호는 센서 등으로 비교적 간단하게 획득할 수 있으며, 이를 이용한 감정인식은 사회적, 문화적인 차이에 덜 민감하므로 최근에 주목 받고 있다. 본 연구에서는 audio-visual film clips 자극으로 기쁨, 분노, 놀람, 스트레스 4종류의 정서를 유발하고 그에 대한 반응으로써 생체신호를 측정하였다. 그리고 생체신호로부터 feature를 추출하였고, 주성분분석(PCA)로 특징 축소를 수행하였다. 4가지 정서를 분류 한 결과, 9명의 가우시안 프로세스 분류기에 의한 평균 정서 판별율은 64.85 % (57.14~70.0)의 결과를 얻었다.
PDF

Speech Emotion Recognition with SVM, KNN and DSVM

Hadhami Aouani ;Yassine Ben Ayed
- International Journal of Computer Science & Network Security
- /
- v.23 no.8
- /
- pp.40-48
- /
- 2023
Speech Emotions recognition has become the active research theme in speech processing and in applications based on human-machine interaction. In this work, our system is a two-stage approach, namely feature extraction and classification engine. Firstly, two sets of feature are investigated which are: the first one is extracting only 13 Mel-frequency Cepstral Coefficient (MFCC) from emotional speech samples and the second one is applying features fusions between the three features: Zero Crossing Rate (ZCR), Teager Energy Operator (TEO), and Harmonic to Noise Rate (HNR) and MFCC features. Secondly, we use two types of classification techniques which are: the Support Vector Machines (SVM) and the k-Nearest Neighbor (k-NN) to show the performance between them. Besides that, we investigate the importance of the recent advances in machine learning including the deep kernel learning. A large set of experiments are conducted on Surrey Audio-Visual Expressed Emotion (SAVEE) dataset for seven emotions. The results of our experiments showed given good accuracy compared with the previous studies.
https://doi.org/10.22937/IJCSNS.2023.23.8.6 인용 PDF

Shooting sound analysis using convolutional neural networks and long short-term memory (합성곱 신경망과 장단기 메모리를 이용한 사격음 분석 기법)

Kang, Se Hyeok;Cho, Ji Woong
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.3
- /
- pp.312-318
- /
- 2022
This paper proposes a model which classifies the type of guns and information about sound source location using deep neural network. The proposed classification model is composed of convolutional neural networks (CNN) and long short-term memory (LSTM). For training and test the model, we use the Gunshot Audio Forensic Dataset generated by the project supported by the National Institute of Justice (NIJ). The acoustic signals are transformed to Mel-Spectrogram and they are provided as learning and test data for the proposed model. The model is compared with the control model consisting of convolutional neural networks only. The proposed model shows high accuracy more than 90 %.
https://doi.org/10.7776/ASK.2022.41.3.312 인용 PDF KSCI

Search Result 46, Processing Time 0.024 seconds

Korean Traditional Music Genre Classification Using Sample and MIDI Phrases

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

A New Tempo Feature Extraction Based on Modulation Spectrum Analysis for Music Information Retrieval Tasks

A Study on the Extraction of Specific Audio Feature In Basketball Video (농구 비디오에서 특정 음성 특징 추출에 관한 연구)

Voice Synthesis Detection Using Language Model-Based Speech Feature Extraction (언어 모델 기반 음성 특징 추출을 활용한 생성 음성 탐지)

Prediction of Closed Quotient During Vocal Phonation using GRU-type Neural Network with Audio Signals

Feature Extraction and Selection for Emotion Classification of inter-persons (개인 내 정서판별을 위한 특징 추출 및 선택)

Speech Emotion Recognition with SVM, KNN and DSVM

Similar Movie Contents Retrieval Using Peak Features from Audio (오디오의 Peak 특징을 이용한 동일 영화 콘텐츠 검색)

Shooting sound analysis using convolutional neural networks and long short-term memory (합성곱 신경망과 장단기 메모리를 이용한 사격음 분석 기법)

Search Result 46, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)