• 제목/요약/키워드: Voice learning

Search Result 276, Processing Time 0.032 seconds

Probabilistic Neural Network Based Learning from Fuzzy Voice Commands for Controlling a Robot

  • Jayawardena, Chandimal;Watanabe, Keigo;Izumi, Kiyotaka
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.2011-2016
    • /
    • 2004
  • Study of human-robot communication is one of the most important research areas. Among various communication media, any useful law we find in voice communication in human-human interactions, is significant in human-robot interactions too. Control strategy of most of such systems available at present is on/off control. These robots activate a function if particular word or phrase associated with that function can be recognized in the user utterance. Recently, there have been some researches on controlling robots using information rich fuzzy commands such as "go little slowly". However, in those works, although the voice command interpretation has been considered, learning from such commands has not been treated. In this paper, learning from such information rich voice commands for controlling a robot is studied. New concepts of the coach-player model and the sub-coach are proposed and such concepts are also demonstrated for a PA-10 redundant manipulator.

  • PDF

Gender Classification of Speakers Using SVM

  • Han, Sun-Hee;Cho, Kyu-Cheol
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.59-66
    • /
    • 2022
  • This research conducted a study classifying gender of speakers by analyzing feature vectors extracted from the voice data. The study provides convenience in automatically recognizing gender of customers without manual classification process when they request any service via voice such as phone call. Furthermore, it is significant that this study can analyze frequently requested services for each gender after gender classification using a learning model and offer customized recommendation services according to the analysis. Based on the voice data of males and females excluding blank spaces, the study extracts feature vectors from each data using MFCC(Mel Frequency Cepstral Coefficient) and utilizes SVM(Support Vector Machine) models to conduct machine learning. As a result of gender classification of voice data using a learning model, the gender recognition rate was 94%.

Speaker verification with ECAPA-TDNN trained on new dataset combined with Voxceleb and Korean (Voxceleb과 한국어를 결합한 새로운 데이터셋으로 학습된 ECAPA-TDNN을 활용한 화자 검증)

  • Keumjae Yoon;Soyoung Park
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.209-224
    • /
    • 2024
  • Speaker verification is becoming popular as a method of non-face-to-face identity authentication. It involves determining whether two voice data belong to the same speaker. In cases where the criminal's voice remains at the crime scene, it is vital to establish a speaker verification system that can accurately compare the two voice evidence. In this study, to achieve this, a new speaker verification system was built using a deep learning model for Korean language. High-dimensional voice data with a high variability like background noise made it necessary to use deep learning-based methods for speaker matching. To construct the matching algorithm, the ECAPA-TDNN model, known as the most famous deep learning system for speaker verification, was selected. A large dataset of the voice data, Voxceleb, collected from people of various nationalities without Korean. To study the appropriate form of datasets necessary for learning the Korean language, experiments were carried out to find out how Korean voice data affects the matching performance. The results showed that when comparing models learned only with Voxceleb and models learned with datasets combining Voxceleb and Korean datasets to maximize language and speaker diversity, the performance of learning data, including Korean, is improved for all test sets.

Interactive content development of voice pattern recognition (음성패턴인식 인터랙티브 콘텐츠 개발)

  • Na, Jong-Won
    • Journal of Advanced Navigation Technology
    • /
    • v.16 no.5
    • /
    • pp.864-870
    • /
    • 2012
  • Voice pattern recognition technology to solve the problems of the existing problems and common issues that you may have in language learning content analysis. This is the first problem of language-learning content, online learning posture. Game open another web page through the lesson, but the concentration of the students fell. Have not been able to determine the second issue according Speaking has made the learning process actually reads. Third got a problem with the mechanical process by a learning management system, as well by the teacher in the evaluation of students and students who are learning progress between the difference in the two. Finally, the biggest problem, while maintaining their existing content made to be able to solve the above problem. Speaking learning dedicated learning programs under this background, voice pattern recognition technology learning process for speech recognition and voice recognition capabilities for learning itself has been used in the recognition process the data of the learner's utterance as an audio file of the desired change to a transfer to a specific location of the server or SQL server may be easily inserted into any system or program, any and all applicable content that has already been created without damaging the entire component because the new features were available. Contributed to this paper, active participation in class more interactive teaching methods to change.

A Voice Controlled Service Robot Using Support Vector Machine

  • Kim, Seong-Rock;Park, Jae-Suk;Park, Ju-Hyun;Lee, Suk-Gyu
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1413-1415
    • /
    • 2004
  • This paper proposes a SVM(Support Vector Machine) training algorithm to control a service robot with voice command. The service robot with a stereo vision system and dual manipulators of four degrees of freedom implements a User-Dependent Voice Control System. The training of SVM algorithm that is one of the statistical learning theories leads to a QP(quadratic programming) problem. In this paper, we present an efficient SVM speech recognition scheme especially based on less learning data comparing with conventional approaches. SVM discriminator decides rejection or acceptance of user's extracted voice features by the MFCC(Mel Frequency Cepstrum Coefficient). Among several SVM kernels, the exponential RBF function gives the best classification and the accurate user recognition. The numerical simulation and the experiment verified the usefulness of the proposed algorithm.

  • PDF

Exploiting Korean Language Model to Improve Korean Voice Phishing Detection (한국어 언어 모델을 활용한 보이스피싱 탐지 기능 개선)

  • Boussougou, Milandu Keith Moussavou;Park, Dong-Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.10
    • /
    • pp.437-446
    • /
    • 2022
  • Text classification task from Natural Language Processing (NLP) combined with state-of-the-art (SOTA) Machine Learning (ML) and Deep Learning (DL) algorithms as the core engine is widely used to detect and classify voice phishing call transcripts. While numerous studies on the classification of voice phishing call transcripts are being conducted and demonstrated good performances, with the increase of non-face-to-face financial transactions, there is still the need for improvement using the latest NLP technologies. This paper conducts a benchmarking of Korean voice phishing detection performances of the pre-trained Korean language model KoBERT, against multiple other SOTA algorithms based on the classification of related transcripts from the labeled Korean voice phishing dataset called KorCCVi. The results of the experiments reveal that the classification accuracy on a test set of the KoBERT model outperforms the performances of all other models with an accuracy score of 99.60%.

A Study on the Prediction Method of Voice Phishing Damage Using Big Data and FDS (빅데이터와 FDS를 활용한 보이스피싱 피해 예측 방법 연구)

  • Lee, Seoungyong;Lee, Julak
    • Korean Security Journal
    • /
    • no.62
    • /
    • pp.185-203
    • /
    • 2020
  • While overall crime has been on the decline since 2009, voice phishing has rather been on the rise. The government and academia have presented various measures and conducted research to eradicate it, but it is not enough to catch up with evolving voice phishing. In the study, researchers focused on catching criminals and preventing damage from voice phishing, which is difficult to recover from. In particular, a voice phishing prediction method using the Fraud Detection System (FDS), which is being used to detect financial fraud, was studied based on the fact that the victim engaged in financial transaction activities (such as account transfers). As a result, it was conceptually derived to combine big data such as call details, messenger details, abnormal accounts, voice phishing type and 112 report related to voice phishing in machine learning-based Fraud Detection System(FDS). In this study, the research focused mainly on government measures and literature research on the use of big data. However, limitations in data collection and security concerns in FDS have not provided a specific model. However, it is meaningful that the concept of voice phishing responses that converge FDS with the types of data needed for machine learning was presented for the first time in the absence of prior research. Based on this research, it is hoped that 'Voice Phishing Damage Prediction System' will be developed to prevent damage from voice phishing.

Voice Conversion using Generative Adversarial Nets conditioned by Phonetic Posterior Grams (Phonetic Posterior Grams에 의해 조건화된 적대적 생성 신경망을 사용한 음성 변환 시스템)

  • Lim, Jin-su;Kang, Cheon-seong;Kim, Dong-Ha;Kim, Kyung-sup
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.369-372
    • /
    • 2018
  • This paper suggests non-parallel-voice-conversion network conversing voice between unmapped voice pair as source voice and target voice. Conventional voice conversion researches used learning methods that minimize spectrogram's distance error. Not only these researches have some problem that is lost spectrogram resolution by methods averaging pixels. But also have used parallel data that is hard to collect. This research uses PPGs that is input voice's phonetic data and a GAN learning method to generate more clear voices. To evaluate the suggested method, we conduct MOS test with GMM based Model. We found that the performance is improved compared to the conventional methods.

  • PDF

Design and Implementation of Web Interworking Learning System Using VoiceXML (VoiceXML을 이용한 Web 연동 학습 시스템 설계 및 구현)

  • Kim Dong-Hyun;Cho Chang-Su;Shin Jeong-Hoon;Hong Kwang-Seok
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.2 s.302
    • /
    • pp.21-30
    • /
    • 2005
  • Development of both multimedia technology and communication network technology has accomplished many changes through the field of learning system. For the construction of a more efficient and clever learning system there is a research being done by the use of the Web and the telephone network. But until now, the case of current implemented teaming system is single system and so it has each merits and demerits. That is to say, when we use the learning system through the Web, the demerit is only possible by the static states using computer. For those who do not use the computer, the demerit is that the user must learn the use of the new system. Also, the case of using telephone network has merits that one can use the system anyplace, anytime by the telephone. But it has the problem of not being able to transmit information very efficiently. From these, this paper proposes the learning system that can be used efficiently and conveniently anyplace, anytime by connecting both telephone network and web. Also, we propose a new algorithm of user ID, password and name registration function using teaming system using VoiceXML and individual learning progress save function using VoiceXML and web.

Communication Support System for ALS Patient Based on Text Input Interface Using Eye Tracking and Deep Learning Based Sound Synthesi (눈동자 추적 기반 입력 및 딥러닝 기반 음성 합성을 적용한 루게릭 환자 의사소통 지원 시스템)

  • Park Hyunjoo;Jeong Seungdo
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.20 no.2
    • /
    • pp.27-36
    • /
    • 2024
  • Accidents or disease can lead to acquired voice dysphonia. In this case, we propose a new input interface based on eye movements to facilitate communication for patients. Unlike the existing method that presents the English alphabet as it is, we reorganized the layout of the alphabet to support the Korean alphabet and designed it so that patients can enter words by themselves using only eye movements, gaze, and blinking. The proposed interface not only reduces fatigue by minimizing eye movements, but also allows for easy and quick input through an intuitive arrangement. For natural communication, we also implemented a system that allows patients who are unable to speak to communicate with their own voice. The system works by tracking eye movements to record what the patient is trying to say, then using Glow-TTS and Multi-band MelGAN to reconstruct their own voice using the learned voice to output sound.