• 제목/요약/키워드: Vocal Detection

검색결과 34건 처리시간 0.023초

음성 특성을 고려한 가라오케 시스템 (A Karaoke system based on the vocal characteristics)

  • 김유승;김인철
    • 방송공학회논문지
    • /
    • 제13권3호
    • /
    • pp.380-387
    • /
    • 2008
  • 본 논문에서는 음성 특성에 기반을 둔 보컬 영역 검색 알고리듬을 적용하는 가라오케 시스템을 제시한다. 제안한 시스템에서 입력 음악은 보컬 영역 검색 알고리듬을 통해 보컬 부분과 반주 부분으로 분류된다. 그런 다음, 보컬 영역에 대해서만 보컬 제거기법을 적용한다. 보컬 영역 검색에서는 TICFT (twice iterated composite Fourier transform) 영역에서 보컬의 특성을 고려하여 분류를 수행한다. 보컬 제거를 위해서 대역 통과 필터링 된 보컬 영역으로부터 보컬 성분을 추출하고, 이를 원래의 음악에서 감산함으로써 보컬 성분이 제거된 음악을 얻는다. 본 논문에서 제시한 기법은 4곡의 노래에 적용하고, 그 성능을 평가한다.

보컬 피치 검출의 성능 향상을 위한 보컬 강화 기술 (Vocal Enhancement for Improving the Performance of Vocal Pitch Detection)

  • 이세원;송재종;이석필;박호종
    • 한국음향학회지
    • /
    • 제30권6호
    • /
    • pp.353-359
    • /
    • 2011
  • 본 논문에서는 다성 음악 신호의 보컬 피치 검출 성능을 향상시키기 위해 음악 신호의 보컬 신호를 강화시키는 전처리 기술을 제안한다. 제안한 보컬 강화 기술은 입력된 다성 음악 신호로부터 반주 신호를 예측하고, 예측된 반주 신호를 입력된 보컬 신호의 크기에 맞춰 가공하여 반주 복사본 신호를 생성한다. 마지막으로 주파수 영역에서 반주 복사본 신호를 원래 다성 음악 신호에서 제거하여 보컬이 강화된 출력 신호를 생성한다. 원 음악 신호와 제안한 방법으로 보컬이 강화된 신호에 동일한 보컬 피치 검출 방법을 각각 적용하여 피치 검출의 정확도를 측정하였고, 제안한 기술에 의하여 피치 검출 정확도가 평균 7.1 % 포인트 향상된 것을 확인하였다.

Vocal Effort Detection Based on Spectral Information Entropy Feature and Model Fusion

  • Chao, Hao;Lu, Bao-Yun;Liu, Yong-Li;Zhi, Hui-Lai
    • Journal of Information Processing Systems
    • /
    • 제14권1호
    • /
    • pp.218-227
    • /
    • 2018
  • Vocal effort detection is important for both robust speech recognition and speaker recognition. In this paper, the spectral information entropy feature which contains more salient information regarding the vocal effort level is firstly proposed. Then, the model fusion method based on complementary model is presented to recognize vocal effort level. Experiments are conducted on isolated words test set, and the results show the spectral information entropy has the best performance among the three kinds of features. Meanwhile, the recognition accuracy of all vocal effort levels reaches 81.6%. Thus, potential of the proposed method is demonstrated.

Improvement of Vocal Detection Accuracy Using Convolutional Neural Networks

  • You, Shingchern D.;Liu, Chien-Hung;Lin, Jia-Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권2호
    • /
    • pp.729-748
    • /
    • 2021
  • Vocal detection is one of the fundamental steps in musical information retrieval. Typically, the detection process consists of feature extraction and classification steps. Recently, neural networks are shown to outperform traditional classifiers. In this paper, we report our study on how to improve detection accuracy further by carefully choosing the parameters of the deep network model. Through experiments, we conclude that a feature-classifier model is still better than an end-to-end model. The recommended model uses a spectrogram as the input plane and the classifier is an 18-layer convolutional neural network (CNN). With this arrangement, when compared with existing literature, the proposed model improves the accuracy from 91.8% to 94.1% in Jamendo dataset. As the dataset has an accuracy of more than 90%, the improvement of 2.3% is difficult and valuable. If even higher accuracy is required, the ensemble learning may be used. The recommend setting is a majority vote with seven proposed models. Doing so, the accuracy increases by about 1.1% in Jamendo dataset.

성대마비 197례에 대한 임상적 고찰 (A Clinical Study on 197 Cases of Vocal Cord Paralysis)

  • 박영학;최지영;정현철;이석은;김민식;조승호
    • 대한후두음성언어의학회지
    • /
    • 제17권2호
    • /
    • pp.138-142
    • /
    • 2006
  • Objectives : Vocal cord paralysis(VCP) is a complex disorder which may result from numerous causes. We reviewed and analyzed the trend of clinical characteristics and causes of VCP in Korean patient. Method : A total 197 patients with VCP who visited St.Mary's hospital from March, 2000 to August, 2006 were reviewed retrospectively. They were analyzed according to sex, age, cause of VCP, position of paralyzed vocal fold, treatment methods. Results : The male and female ratio was 1.6 : 1. The unilateral paralyzed vocal fold was fixed at paramedian position in 84% of the cases. The left vocal fold was paralyzed about 2 1/2 times as much as the right vocal fold. Among the causes of VCP 30.9% of the cases were due to postoperative paralysis, and most of those were developed after lung, mediastinal surgery. laryngeal EMG was performed in 47 patients for determines the prognosis and treatment method. In the unilateral VCP, 90 patients were treated with injection laryngoplasty, 21 patients were performed thyroplasty type I. Conclusion : The causes of VCP include various diseases, so, detection of the primary disease is very important, because many fatal diseases are included among the primary diseases, and late detection can cause serious problems. VCP is not only a disease entity in itself, but can be seen as a sign of an underlying disease.

  • PDF

음성 하모닉스 스펙트럼의 피크-피팅을 이용한 피치검출에 관한 연구 (A Study on the Pitch Detection of Speech Harmonics by the Peak-Fitting)

  • 김종국;조왕래;배명진
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.85-95
    • /
    • 2003
  • In speech signal processing, it is very important to detect the pitch exactly in speech recognition, synthesis and analysis. If we exactly pitch detect in speech signal, in the analysis, we can use the pitch to obtain properly the vocal tract parameter. It can be used to easily change or to maintain the naturalness and intelligibility of quality in speech synthesis and to eliminate the personality for speaker-independence in speech recognition. In this paper, we proposed a new pitch detection algorithm. First, positive center clipping is process by using the incline of speech in order to emphasize pitch period with a glottal component of removed vocal tract characteristic in time domain. And rough formant envelope is computed through peak-fitting spectrum of original speech signal infrequence domain. Using the roughed formant envelope, obtain the smoothed formant envelope through calculate the linear interpolation. As well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. Inverse fast fourier transform (IFFT) compute this flattened harmonics. After all, we obtain Residual signal which is removed vocal tract element. The performance was compared with LPC and Cepstrum, ACF. Owing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

SVM과 선택적 주파수 차감법을 이용한 음악에서의 보컬 분리 (Vocal Separation in Music Using SVM and Selective Frequency Subtraction)

  • 김현태
    • 한국전자통신학회논문지
    • /
    • 제10권1호
    • /
    • pp.1-6
    • /
    • 2015
  • 최근 원음 반주기에 대한 관심이 증가됨에 따라 고가의 스튜디오 직접 녹음 방법 대신 보다 저렴한 방법을 시도하고 있다. 그 구체적인 방법으로는 가수의 음악 앨범에서 가수의 목소리만 제거하여 원음 반주 음원을 만드는 것이다. 본 논문에서는 스테레오로 녹음된 반주음악에서 보컬을 분리하는 시스템을 제안한다. 제안하는 시스템은 두 단계로 구성된다. 첫 단계는 보컬을 검출하는 단계이다. 이 단계에서는 MFCC를 가지고 SVM 방법을 이용하여 입력 신호를 보컬 부분과 비보컬 부분으로 분리한다. 두 번째 단계에서는 보컬 부분에 대해 각 주파수 빈별로 선택적 주파수 차감을 수행한다. 제안하는 방법으로 보컬을 제거한 음악에 대한 청취실험에서 상대적으로 높은 만족도를 보여준다.

피치 검출을 위한 스펙트럼 평탄화 기법 (Flattening Techniques for Pitch Detection)

  • 김종국;조왕래;배명진
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2002년도 하계종합학술대회 논문집(4)
    • /
    • pp.381-384
    • /
    • 2002
  • In speech signal processing, it Is very important to detect the pitch exactly in speech recognition, synthesis and analysis. but, it is very difficult to pitch detection from speech signal because of formant and transition amplitude affect. therefore, in this paper, we proposed a pitch detection using the spectrum flattening techniques. Spectrum flattening is to eliminate the formant and transition amplitude affect. In time domain, positive center clipping is process in order to emphasize pitch period with a glottal component of removed vocal tract characteristic. And rough formant envelope is computed through peak-fitting spectrum of original speech signal in frequency domain. As a results, well get the flattened harmonics waveform with the algebra difference between spectrum of original speech signal and smoothed formant envelope. After all, we obtain residual signal which is removed vocal tract element The performance was compared with LPC and Cepstrum, ACF 0wing to this algorithm, we have obtained the pitch information improved the accuracy of pitch detection and gross error rate is reduced in voice speech region and in transition region of changing the phoneme.

  • PDF

웨이브렛 변환을 이용한 음성신호의 성문폐쇄시점 검출 (Detection of Glottal Closure Instant for Voiced Speech Using Wavelet Transform)

  • 배건성
    • 음성과학
    • /
    • 제7권3호
    • /
    • pp.153-165
    • /
    • 2000
  • During the phonation of voiced sounds, instants exist where the glottis is opened or closed, due to the periodic vibration of the vocal cord. When closed, this is called the glottal closure instant(GCI) or epoch.. The correct detection of the GCI is one of the important problems in speech processing for pitch detection, pitch synchronous analysis, and so on. Recently, it has been shown that the local maxima points of the wavelet transformed speech signal correspond to the GCIs of speech signal. In this paper, we investigate the accuracy of Gels estimated from this wavelet transformed speech signal. For this purpose we compare them with the negative peak points of the differentiated EGG signal that represents the actual GCIs of speech signal.

  • PDF

양성후두 질환 음성에 대한 여러 기존 피치검출 알고리즘의 성능 평가 (Performance Assessment of Several Established Pitch Detection Algorithms in Voices of Benign Vocal Fold Lesions)

  • 장승진;최성희;김효민;최홍식;윤영로
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2007년도 하계종합학술대회 논문집
    • /
    • pp.407-408
    • /
    • 2007
  • Robust pitch estimation is an important study in many areas of speech processing. In voice pathology, diverse statistics extracted form pitch were commonly used to test voice quality. In this study, we compared several established pitch detection algorithms (PDAs) for verification of adequacy of the PDAs. In the database of total pathological voices of 99 and normal voices of 30, an analysis of errors related with pitch detection was evaluated between pathological and normal voices, or among the types of pathological voices such as benign vocal fold lesions; polyp, nodule, and cysts. Consequently, it is required to survey the severity of tested voice in order to obtain accurate pitch estimates.

  • PDF