Search | Korea Science

Voice Synthesis Detection Using Language Model-Based Speech Feature Extraction (언어 모델 기반 음성 특징 추출을 활용한 생성 음성 탐지)

Seung-min Kim;So-hee Park;Dae-seon Choi
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.34 no.3
- /
- pp.439-449
- /
- 2024
Recent rapid advancements in voice generation technology have enabled the natural synthesis of voices using text alone. However, this progress has led to an increase in malicious activities, such as voice phishing (voishing), where generated voices are exploited for criminal purposes. Numerous models have been developed to detect the presence of synthesized voices, typically by extracting features from the voice and using these features to determine the likelihood of voice generation.This paper proposes a new model for extracting voice features to address misuse cases arising from generated voices. It utilizes a deep learning-based audio codec model and the pre-trained natural language processing model BERT to extract novel voice features. To assess the suitability of the proposed voice feature extraction model for voice detection, four generated voice detection models were created using the extracted features, and performance evaluations were conducted. For performance comparison, three voice detection models based on Deepfeature proposed in previous studies were evaluated against other models in terms of accuracy and EER. The model proposed in this paper achieved an accuracy of 88.08%and a low EER of 11.79%, outperforming the existing models. These results confirm that the voice feature extraction method introduced in this paper can be an effective tool for distinguishing between generated and real voices.
https://doi.org/10.13089/JKIISC.2024.34.3.439 인용 PDF HTML

Robust Feature Extraction for Voice Activity Detection in Nonstationary Noisy Environments (음성구간검출을 위한 비정상성 잡음에 강인한 특징 추출)

Hong, Jungpyo;Park, Sangjun;Jeong, Sangbae;Hahn, Minsoo
- Phonetics and Speech Sciences
- /
- v.5 no.1
- /
- pp.11-16
- /
- 2013
This paper proposes robust feature extraction for accurate voice activity detection (VAD). VAD is one of the principal modules for speech signal processing such as speech codec, speech enhancement, and speech recognition. Noisy environments contain nonstationary noises causing the accuracy of the VAD to drastically decline because the fluctuation of features in the noise intervals results in increased false alarm rates. In this paper, in order to improve the VAD performance, harmonic-weighted energy is proposed. This feature extraction method focuses on voiced speech intervals and weighted harmonic-to-noise ratios to determine the amount of the harmonicity to frame energy. For performance evaluation, the receiver operating characteristic curves and equal error rate are measured.
https://doi.org/10.13064/KSSS.2013.5.1.011 인용 PDF

Emotion Recognition of Facial Expression using the Hybrid Feature Extraction (혼합형 특징점 추출을 이용한 얼굴 표정의 감성 인식)

Byun, Kwang-Sub;Park, Chang-Hyun;Sim, Kwee-Bo
- Proceedings of the KIEE Conference
- /
- 2004.05a
- /
- pp.132-134
- /
- 2004
Emotion recognition between human and human is done compositely using various features that are face, voice, gesture and etc. Among them, it is a face that emotion expression is revealed the most definitely. Human expresses and recognizes a emotion using complex and various features of the face. This paper proposes hybrid feature extraction for emotions recognition from facial expression. Hybrid feature extraction imitates emotion recognition system of human by combination of geometrical feature based extraction and color distributed histogram. That is, it can robustly perform emotion recognition by extracting many features of facial expression.
PDF

Study for Extraction of Stable Vocal Features and Definition of the Features (음성의 안정적 변수 추출 및 변수의 의미 연구)

Kim, Keun-Ho;Kim, Sang-Gil;Kang, Nam-Sik;Kim, Jong-Yeol
- Korean Journal of Oriental Medicine
- /
- v.17 no.3
- /
- pp.97-104
- /
- 2011
Objectives : In this paper, we proposed a method for selecting reliable variables from various vocal features such as frequency derivative features, frequency band ratios, intensities of 5 vowels and an intensity of a sentence, since some features are sensitive to the variation of a subject's utterance. Methods : To obtain the reliable voice variables, the coefficient of variation (CV) was used as the index to evaluate the level of reliability. Since the distributions of a few features are not Gaussian, but are instead skewed to the right or left, we transformed the features by taking the log or square root. Moreover, the definition of the variables that are suitable to represent the vocal property was explained and analyzed. Results : At first, we recorded the vowels and the sentence five times both in the morning and afternoon of the same day, totally ten recordings from each of six subjects (three males and three females). We then analyzed the CVs of each subject's voice to obtain the stable features with a sufficient repeatability. The features having less than 20% CVs for all six subjects were selected. As a result, 92 stable variables from the 222 features were extracted, which included all the transformed variables. Conclusions : Voice can be widely used to classify the four constitution types and to recognize one's health condition from extracting meaningful features as physical quantity in traditional Korean medicine or Western medicine. Therefore, stable voice variables can be useful in the u-Healthcare system of personalized medicine and for improving diagnostic accuracy.
PDF KSCI

Voice Features Extraction of Lung Diseases Based on the Analysis of Speech Rates and Intensity (발화속도 및 강도 분석에 기반한 폐질환의 음성적 특징 추출)

Kim, Bong-Hyun;Cho, Dong-Uk
- The KIPS Transactions:PartB
- /
- v.16B no.6
- /
- pp.471-478
- /
- 2009
The lung diseases classifying as one of the six incurable diseases in modern days are caused mostly by smoking and air pollution. Such causes the lung function damages, and results in malfunction of the exchange of carbon dioxide and oxygen in an alveolus, which the interest is augment with risk diseases of life prolongation. With this in the paper, we proposed a diagnosis method of lung diseases by applying parameters of voice analysis aiming at the getting the voice feature extraction. Firstly, we sampled the voice data from patients and normal persons in the same age and sex, and made two sample groups from them. Also, we conducted an analysis by applying the various parameters of voice analysis through the collected voice data. The relational significance between the patient and normal groups can be evaluated in terms of speech rates and intensity as a part of analized parameters. In conclusion, the patient group has shown slower speech rates and bigger intensity than the normal group. With this, we propose the method of voice feature extraction for lung diseases.
https://doi.org/10.3745/KIPSTB.2009.16B.6.471 인용 PDF KSCI

A Comparison of Effective Feature Vectors for Speech Emotion Recognition (음성신호기반의 감정인식의 특징 벡터 비교)

Shin, Bo-Ra;Lee, Soek-Pil
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.67 no.10
- /
- pp.1364-1369
- /
- 2018
Speech emotion recognition, which aims to classify speaker's emotional states through speech signals, is one of the essential tasks for making Human-machine interaction (HMI) more natural and realistic. Voice expressions are one of the main information channels in interpersonal communication. However, existing speech emotion recognition technology has not achieved satisfactory performances, probably because of the lack of effective emotion-related features. This paper provides a survey on various features used for speech emotional recognition and discusses which features or which combinations of the features are valuable and meaningful for the emotional recognition classification. The main aim of this paper is to discuss and compare various approaches used for feature extraction and to propose a basis for extracting useful features in order to improve SER performance.
https://doi.org/10.5370/KIEE.2018.67.10.1364 인용 PDF KSCI

A Study of the extraction of a Hand Vein Pattern (손정맥 패턴 추출에 관한 연구)

Kim, Jong-Seok;Baek, Han-Wook;Chung, Chin-Hyun
- Proceedings of the KIEE Conference
- /
- 2000.07d
- /
- pp.3022-3024
- /
- 2000
Biometrics is the electronic recognition of individuals achieved through a process of extracting, and then verifying, features which are unique to that individual. This field is rapidly evolving technology that has to be widely adopted in a broad range of applications. Many methods have been studied such as extraction of the facial features, the voice, the vein and even a person's signature. Among biometrics, a hand veins provide large, robust, stable, hidden biometric features. Hand vein patterns have been proven to be absolutely unique by Cambridge Consultants Ltd. Because of this advantage, hand vein recognition are recently developing field in the field of a security.
PDF

A Voice-Activated Dialing System with Distributed Speech Recognition in WiFi Environments (무선랜 환경에서의 분산 음성 인식을 이용한 음성 다이얼링 시스템)

Park Sung-Joon;Koo Myoung_wan
- MALSORI
- /
- no.56
- /
- pp.135-145
- /
- 2005
In this paper, a WiFi phone system with distributed speech recognition is implemented. The WiFi phone with voice-activated dialing and its functions are explained. Features of the input speech are extracted and are sent to the interactive voice response (IVR) server according to the real-time transport protocol (RTP). Feature extraction is based on the European Telecommunication Standards Institute (ETSI) standard front-end, but is modified to reduce the processing time. The time for front-end processing on a WiFi phone is compared with that in a PC.
PDF

Extraction of Speech Features for Emotion Recognition (감정 인식을 위한 음성 특징 도출)

Kwon, Chul-Hong;Song, Seung-Kyu;Kim, Jong-Yeol;Kim, Keun-Ho;Jang, Jun-Su
- Phonetics and Speech Sciences
- /
- v.4 no.2
- /
- pp.73-78
- /
- 2012
Emotion recognition is an important technology in the filed of human-machine interface. To apply speech technology to emotion recognition, this study aims to establish a relationship between emotional groups and their corresponding voice characteristics by investigating various speech features. The speech features related to speech source and vocal tract filter are included. Experimental results show that statistically significant speech parameters for classifying the emotional groups are mainly related to speech sources such as jitter, shimmer, F0 (F0_min, F0_max, F0_mean, F0_std), harmonic parameters (H1, H2, HNR05, HNR15, HNR25, HNR35), and SPI.
https://doi.org/10.13064/KSSS.2012.4.2.073 인용 PDF

Proposed Efficient Architectures and Design Choices in SoPC System for Speech Recognition

Trang, Hoang;Hoang, Tran Van
- Journal of IKEEE
- /
- v.17 no.3
- /
- pp.241-247
- /
- 2013
This paper presents the design of a System on Programmable Chip (SoPC) based on Field Programmable Gate Array (FPGA) for speech recognition in which Mel-Frequency Cepstral Coefficients (MFCC) for speech feature extraction and Vector Quantization for recognition are used. The implementing process of the speech recognition system undergoes the following steps: feature extraction, training codebook, recognition. In the first step of feature extraction, the input voice data will be transformed into spectral components and extracted to get the main features by using MFCC algorithm. In the recognition step, the obtained spectral features from the first step will be processed and compared with the trained components. The Vector Quantization (VQ) is applied in this step. In our experiment, Altera's DE2 board with Cyclone II FPGA is used to implement the recognition system which can recognize 64 words. The execution speed of the blocks in the speech recognition system is surveyed by calculating the number of clock cycles while executing each block. The recognition accuracies are also measured in different parameters of the system. These results in execution speed and recognition accuracy could help the designer to choose the best configurations in speech recognition on SoPC.
https://doi.org/10.7471/ikeee.2013.17.3.241 인용 PDF KSCI

Search Result 21, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)