• Title/Summary/Keyword: Vocal Detection

Search Result 34, Processing Time 0.024 seconds

A Karaoke system based on the vocal characteristics (음성 특성을 고려한 가라오케 시스템)

  • Kim, Yu-Seung;Kim, Rin-Chul
    • Journal of Broadcast Engineering
    • /
    • v.13 no.3
    • /
    • pp.380-387
    • /
    • 2008
  • This paper presents a karaoke system employing a vocal region detection algorithm based on the vocal characteristics. In the proposed system, an input song is classified into vocal and instrumental regions using the vocal region detection algorithm. Then, a vocal removal method is applied only to the vocal region. To detect vocal region, a classification algorithm is designed based on the vocal characteristics in the TICFT (twice iterated composite Fourier transform) domain. For vocal removal, vocal components are extracted from a band pass filtered vocal region and they are subtracted from the original song, yielding a vocal removed song. The performance of the proposed method is measured on four different songs.

Vocal Enhancement for Improving the Performance of Vocal Pitch Detection (보컬 피치 검출의 성능 향상을 위한 보컬 강화 기술)

  • Lee, Se-Won;Song, Chai-Jong;Lee, Seok-Pil;Park, Ho-Chong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.6
    • /
    • pp.353-359
    • /
    • 2011
  • This paper proposes a vocal enhancement technique for improving the performance of vocal pitch detection in polyphonic music signal. The proposed vocal enhancement technique predicts an accompaniment signal from the input signal and generates an accompaniment replica signal according to the vocal power. Then, it removes the accompaniment replica signal from the input signal, resulting in a vocal-enhanced signal. The performance of the proposed method was measured by applying the same vocal pitch extraction method to the original and the vocal-enhanced signal, and the vocal pitch detection accuracy was increased by 7.1 % point in average.

Vocal Effort Detection Based on Spectral Information Entropy Feature and Model Fusion

  • Chao, Hao;Lu, Bao-Yun;Liu, Yong-Li;Zhi, Hui-Lai
    • Journal of Information Processing Systems
    • /
    • v.14 no.1
    • /
    • pp.218-227
    • /
    • 2018
  • Vocal effort detection is important for both robust speech recognition and speaker recognition. In this paper, the spectral information entropy feature which contains more salient information regarding the vocal effort level is firstly proposed. Then, the model fusion method based on complementary model is presented to recognize vocal effort level. Experiments are conducted on isolated words test set, and the results show the spectral information entropy has the best performance among the three kinds of features. Meanwhile, the recognition accuracy of all vocal effort levels reaches 81.6%. Thus, potential of the proposed method is demonstrated.

Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks

  • You, Shingchern D.;Liu, Chien-Hung;Lin, Jia-Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.2
    • /
    • pp.729-748
    • /
    • 2021
  • Vocal detection is one of the fundamental steps in musical information retrieval. Typically, the detection process consists of feature extraction and classification steps. Recently, neural networks are shown to outperform traditional classifiers. In this paper, we report our study on how to improve detection accuracy further by carefully choosing the parameters of the deep network model. Through experiments, we conclude that a feature-classifier model is still better than an end-to-end model. The recommended model uses a spectrogram as the input plane and the classifier is an 18-layer convolutional neural network (CNN). With this arrangement, when compared with existing literature, the proposed model improves the accuracy from 91.8% to 94.1% in Jamendo dataset. As the dataset has an accuracy of more than 90%, the improvement of 2.3% is difficult and valuable. If even higher accuracy is required, the ensemble learning may be used. The recommend setting is a majority vote with seven proposed models. Doing so, the accuracy increases by about 1.1% in Jamendo dataset.

A Clinical Study on 197 Cases of Vocal Cord Paralysis (성대마비 197례에 대한 임상적 고찰)

  • Park, Young-Hak;Choi, Ji-Young;Jung, Hyun-Chul;Lee, Seok-Eun;Kim, Min-Sik;Cho, Seung-Ho
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.17 no.2
    • /
    • pp.138-142
    • /
    • 2006
  • Objectives : Vocal cord paralysis(VCP) is a complex disorder which may result from numerous causes. We reviewed and analyzed the trend of clinical characteristics and causes of VCP in Korean patient. Method : A total 197 patients with VCP who visited St.Mary's hospital from March, 2000 to August, 2006 were reviewed retrospectively. They were analyzed according to sex, age, cause of VCP, position of paralyzed vocal fold, treatment methods. Results : The male and female ratio was 1.6 : 1. The unilateral paralyzed vocal fold was fixed at paramedian position in 84% of the cases. The left vocal fold was paralyzed about 2 1/2 times as much as the right vocal fold. Among the causes of VCP 30.9% of the cases were due to postoperative paralysis, and most of those were developed after lung, mediastinal surgery. laryngeal EMG was performed in 47 patients for determines the prognosis and treatment method. In the unilateral VCP, 90 patients were treated with injection laryngoplasty, 21 patients were performed thyroplasty type I. Conclusion : The causes of VCP include various diseases, so, detection of the primary disease is very important, because many fatal diseases are included among the primary diseases, and late detection can cause serious problems. VCP is not only a disease entity in itself, but can be seen as a sign of an underlying disease.

  • PDF

A Study on the Pitch Detection of Speech Harmonics by the Peak-Fitting (음성 하모닉스 스펙트럼의 피크-피팅을 이용한 피치검출에 관한 연구)

  • Kim, Jong-Kuk;Jo, Wang-Rae;Bae, Myung-Jin
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.85-95
    • /
    • 2003
  • In speech signal processing, it is very important to detect the pitch exactly in speech recognition, synthesis and analysis. If we exactly pitch detect in speech signal, in the analysis, we can use the pitch to obtain properly the vocal tract parameter. It can be used to easily change or to maintain the naturalness and intelligibility of quality in speech synthesis and to eliminate the personality for speaker-independence in speech recognition. In this paper, we proposed a new pitch detection algorithm. First, positive center clipping is process by using the incline of speech in order to emphasize pitch period with a glottal component of removed vocal tract characteristic in time domain. And rough formant envelope is computed through peak-fitting spectrum of original speech signal infrequence domain. Using the roughed formant envelope, obtain the smoothed formant envelope through calculate the linear interpolation. As well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. Inverse fast fourier transform (IFFT) compute this flattened harmonics. After all, we obtain Residual signal which is removed vocal tract element. The performance was compared with LPC and Cepstrum, ACF. Owing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

Vocal Separation in Music Using SVM and Selective Frequency Subtraction (SVM과 선택적 주파수 차감법을 이용한 음악에서의 보컬 분리)

  • Kim, Hyun-Tae
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.1
    • /
    • pp.1-6
    • /
    • 2015
  • Recently, According to increasing interest to original sound Karaoke instrument, MIDI type karaoke manufacturer attempt to make more cheap method instead of original recoding method. The specific method is to make the original sound accompaniment to remove only the voice of the singer in the singer music album. In this paper, a system to separate vocal components from music accompaniment for stereo recordings were proposed. Proposed system consists of two stages. The first stage is a vocal detection. This stage classifies an input into vocal and non vocal portions by using SVM with MFCC. In the second stage, selective frequency subtractions were performed at each frequency bin in vocal portions. Listening test with removed vocal music from proposed system show relatively high satisfactory level.

Flattening Techniques for Pitch Detection (피치 검출을 위한 스펙트럼 평탄화 기법)

  • 김종국;조왕래;배명진
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.381-384
    • /
    • 2002
  • In speech signal processing, it Is very important to detect the pitch exactly in speech recognition, synthesis and analysis. but, it is very difficult to pitch detection from speech signal because of formant and transition amplitude affect. therefore, in this paper, we proposed a pitch detection using the spectrum flattening techniques. Spectrum flattening is to eliminate the formant and transition amplitude affect. In time domain, positive center clipping is process in order to emphasize pitch period with a glottal component of removed vocal tract characteristic. And rough formant envelope is computed through peak-fitting spectrum of original speech signal in frequency domain. As a results, well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. After all, we obtain residual signal which is removed vocal tract element The performance was compared with LPC and Cepstrum, ACF 0wing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

Detection of Glottal Closure Instant for Voiced Speech Using Wavelet Transform (웨이브렛 변환을 이용한 음성신호의 성문폐쇄시점 검출)

  • Bae, Keun-Sung
    • Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.153-165
    • /
    • 2000
  • During the phonation of voiced sounds, instants exist where the glottis is opened or closed, due to the periodic vibration of the vocal cord. When closed, this is called the glottal closure instant(GCI) or epoch.. The correct detection of the GCI is one of the important problems in speech processing for pitch detection, pitch synchronous analysis, and so on. Recently, it has been shown that the local maxima points of the wavelet transformed speech signal correspond to the GCIs of speech signal. In this paper, we investigate the accuracy of Gels estimated from this wavelet transformed speech signal. For this purpose we compare them with the negative peak points of the differentiated EGG signal that represents the actual GCIs of speech signal.

  • PDF

Performance Assessment of Several Established Pitch Detection Algorithms in Voices of Benign Vocal Fold Lesions (양성후두 질환 음성에 대한 여러 기존 피치검출 알고리즘의 성능 평가)

  • Jang, Seung-Jin;Choi, Seong-Hee;Kim, Hyo-Min;Choi, Hong-Shik;Yoon, Young-Ro
    • Proceedings of the IEEK Conference
    • /
    • 2007.07a
    • /
    • pp.407-408
    • /
    • 2007
  • Robust pitch estimation is an important study in many areas of speech processing. In voice pathology, diverse statistics extracted form pitch were commonly used to test voice quality. In this study, we compared several established pitch detection algorithms (PDAs) for verification of adequacy of the PDAs. In the database of total pathological voices of 99 and normal voices of 30, an analysis of errors related with pitch detection was evaluated between pathological and normal voices, or among the types of pathological voices such as benign vocal fold lesions; polyp, nodule, and cysts. Consequently, it is required to survey the severity of tested voice in order to obtain accurate pitch estimates.

  • PDF