• Title/Summary/Keyword: Voice Detection

Search Result 282, Processing Time 0.02 seconds

Voice-Pishing Detection Algorithm Based on 3GPP2 SMV (3GPP2 SMV 기반의 보이스 피싱 검출 알고리즘)

  • Lee, Kye-Hwan;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.4
    • /
    • pp.92-99
    • /
    • 2008
  • We propose an effective voice-pishing detection algorithm based on the 3GPP2 selectable mode vocoder (SMV). The detection of voice pishing is performed based on a Gaussian mixture model (GMM) using decoding parameters of the SMV directly extracted from the decoding process of the transmitted speech information in the mobile phone. The experimental results indicate that SMV decoding parameters are effective in discriminating between general voice and phisher's voice and the performance is significantly acceptable when the proposed technique is applied.

Statistical Model-Based Voice Activity Detection Based on Second-Order Conditional MAP with Soft Decision

  • Chang, Joon-Hyuk
    • ETRI Journal
    • /
    • v.34 no.2
    • /
    • pp.184-189
    • /
    • 2012
  • In this paper, we propose a novel approach to statistical model-based voice activity detection (VAD) that incorporates a second-order conditional maximum a posteriori (CMAP) criterion. As a technical improvement for the first-order CMAP criterion in [1], we consider both the current observation and the voice activity decision in the previous two frames to take full consideration of the interframe correlation of voice activity. This is clearly different from the previous approach [1] in that we employ the voice activity decisions in the second-order (previous two frames) CMAP, which has quadruple thresholds with an additional degree of freedom, rather than the first-order (previous single frame). Also, a soft-decision scheme is incorporated, resulting in time-varying thresholds for further performance improvement. Experimental results show that the proposed algorithm outperforms the conventional CMAP-based VAD technique under various experimental conditions.

Voice Activity Detection Algorithm using Fuzzy Membership Shifted C-means Clustering in Low SNR Environment (낮은 신호 대 잡음비 환경에서의 퍼지 소속도 천이 C-means 클러스터링을 이용한 음성구간 검출 알고리즘)

  • Lee, G.H.;Lee, Y.J.;Cho, J.H.;Kim, M.N.
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.3
    • /
    • pp.312-323
    • /
    • 2014
  • Voice activity detection is very important process that find voice activity from noisy speech signal for noise cancelling and speech enhancement. Over the past few years, many studies have been made on voice activity detection, it has poor performance for speech signal of sentence form in a low SNR environment. In this paper, it proposed new voice activity detection algorithm that has beginning VAD process using entropy and main VAD process using fuzzy membership shifted c-means clustering. We conduct an experiment in various SNR environment of white noise to evaluate performance of the proposed algorithm and confirmed good performance of the proposed algorithm.

Voice Synthesis Detection Using Language Model-Based Speech Feature Extraction (언어 모델 기반 음성 특징 추출을 활용한 생성 음성 탐지)

  • Seung-min Kim;So-hee Park;Dae-seon Choi
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.3
    • /
    • pp.439-449
    • /
    • 2024
  • Recent rapid advancements in voice generation technology have enabled the natural synthesis of voices using text alone. However, this progress has led to an increase in malicious activities, such as voice phishing (voishing), where generated voices are exploited for criminal purposes. Numerous models have been developed to detect the presence of synthesized voices, typically by extracting features from the voice and using these features to determine the likelihood of voice generation.This paper proposes a new model for extracting voice features to address misuse cases arising from generated voices. It utilizes a deep learning-based audio codec model and the pre-trained natural language processing model BERT to extract novel voice features. To assess the suitability of the proposed voice feature extraction model for voice detection, four generated voice detection models were created using the extracted features, and performance evaluations were conducted. For performance comparison, three voice detection models based on Deepfeature proposed in previous studies were evaluated against other models in terms of accuracy and EER. The model proposed in this paper achieved an accuracy of 88.08%and a low EER of 11.79%, outperforming the existing models. These results confirm that the voice feature extraction method introduced in this paper can be an effective tool for distinguishing between generated and real voices.

Detection of Pathological Voice Using Linear Discriminant Analysis

  • Lee, Ji-Yeoun;Jeong, Sang-Bae;Choi, Hong-Shik;Hahn, Min-Soo
    • MALSORI
    • /
    • no.64
    • /
    • pp.77-88
    • /
    • 2007
  • Nowadays, mel-frequency cesptral coefficients (MFCCs) and Gaussian mixture models (GMMs) are used for the pathological voice detection. This paper suggests a method to improve the performance of the pathological/normal voice classification based on the MFCC-based GMM. We analyze the characteristics of the mel frequency-based filterbank energies using the fisher discriminant ratio (FDR). And the feature vectors through the linear discriminant analysis (LDA) transformation of the filterbank energies (FBE) and the MFCCs are implemented. An accuracy is measured by the GMM classifier. This paper shows that the FBE LDA-based GMM is a sufficiently distinct method for the pathological/normal voice classification, with a 96.6% classification performance rate. The proposed method shows better performance than the MFCC-based GMM with noticeable improvement of 54.05% in terms of error reduction.

  • PDF

Emotion Detecting Method Based on Various Attributes of Human Voice

  • MIYAJI Yutaka;TOMIYAMA Ken
    • Science of Emotion and Sensibility
    • /
    • v.8 no.1
    • /
    • pp.1-7
    • /
    • 2005
  • This paper reports several emotion detecting methods based on various attributes of human voice. These methods have been developed at our Engineering Systems Laboratory. It is noted that, in all of the proposed methods, only prosodic information in voice is used for emotion recognition and semantic information in voice is not used. Different types of neural networks(NNs) are used for detection depending on the type of voice parameters. Earlier approaches separately used linear prediction coefficients(LPCs) and time series data of pitch but they were combined in later studies. The proposed methods are explained first and then evaluation experiments of individual methods and their performances in emotion detection are presented and compared.

  • PDF

Voice Activity Detection employing the Generalized Normal-Laplace Distribution (일반화된 정규-라플라스 분포를 이용한 음성검출기)

  • Kim, Sang-Kyun;Kwon, Jang-Woo;Lee, Sangmin
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.3
    • /
    • pp.294-299
    • /
    • 2014
  • In this paper, we propose a novel algorithm to improve the performance of a voice activity detection(VAD) which is based on the generalized normal-Laplace(GNL) distribution. In our algorithm, the probability density function(PDF) of the noisy speech signal is represented by the GNL distribution and the variance of the speech and noise of GNL distribution are estimated using higher order moments. Experimental results show that the proposed algorithm yields better results compared to the conventional VAD algorithms.

Robust Voice Activity Detection Using the Spectral Peaks of Vowel Sounds

  • Yoo, In-Chul;Yook, Dong-Suk
    • ETRI Journal
    • /
    • v.31 no.4
    • /
    • pp.451-453
    • /
    • 2009
  • This letter proposes the use of vowel sound detection for voice activity detection. Vowels have distinctive spectral peaks. These are likely to remain higher than their surroundings even after severe corruption. Therefore, by developing a method of detecting the spectral peaks of vowel sounds in corrupted signals, voice activity can be detected as well even in low signal-to-noise ratio (SNR) conditions. Experimental results indicate that the proposed algorithm performs reliably under various noise and low SNR conditions. This method is suitable for mobile environments where the characteristics of noise may not be known in advance.

Noise-Robust Speech Detection Using The Coefficient of Variation of Spectrum (스펙트럼의 변동계수를 이용한 잡음에 강인한 음성 구간 검출)

  • Kim Youngmin;Hahn Minsoo
    • MALSORI
    • /
    • no.48
    • /
    • pp.107-116
    • /
    • 2003
  • This paper deals with a new parameter for voice detection which is used for many areas of speech engineering such as speech synthesis, speech recognition and speech coding. CV (Coefficient of Variation) of speech spectrum as well as other feature parameters is used for the detection of speech. CV is calculated only in the specific range of speech spectrum. Average magnitude and spectral magnitude are also employed to improve the performance of detector. From the experimental results the proposed voice detector outperformed the conventional energy-based detector in the sense of error measurements.

  • PDF

A Study on the Realization of Wireless Home Network System Using High-performance Speech Recognition in Variable Position (가변위치 고음성인식 기술을 이용한 무선 홈 네트워크 시스템 구현에 관한 연구)

  • Yoon, Jun-Chul;Choi, Sang-Bang;Park, Chan-Sub;Kim, Se-Yong;Kim, Ki-Man;Kang, Suk-Youb
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.4
    • /
    • pp.991-998
    • /
    • 2010
  • In realization of wireless home network system using speech recognition in indoor voice recognition environment, background noise and reverberation are two main causes of digression in voice recognition system. In this study, the home network system resistant to reverberation and background noise using voice section detection method based on spectral entropy in indoor recognition environment is to be realized. Spectral subtraction can reduce the effect of reverberation and remove noise independent from voice signal by eliminating signal distorted by reverberation in spectrum. For effective spectral subtraction, the correct separation of voice section and silent section should be accompanied and for this, improvement of performance needs to be done, applying to voice section detection method based on entropy. In this study, experimental and indoor environment testing is carried out to figure out command recognition rate in indoor recognition environment. The test result shows that command recognition rate improved in static environment and reverberant room condition, using voice section detection method based on spectral entropy.