• Title/Summary/Keyword: Voice learning

Search Result 276, Processing Time 0.024 seconds

A Study on Voice Command Learning of Smart Toy using Convolutional Neural Network (합성곱 신경망을 이용한 스마트 토이의 음성명령 학습에 관한 연구)

  • Lee, Kyung-Min;Park, Chul-Won
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.9
    • /
    • pp.1210-1215
    • /
    • 2018
  • Recently, as the IoT(Internet of Things) and AI(Artificial Intelligence) technologies have developed, smart toys that can understand and act on the language of human beings are being studied. In this paper, we study voice learning using CNN(Convolutional Neural Network) by applying artificial intelligence based voice secretary technology to smart toy. When a human voice command gives, Smart Toy recognizes human voice, converts it into text, analyzes the morpheme, and conducts tagging and voice learning. As a result of test for the simulator program implemented using Python, no malfunction occurred in a single command. And satisfactory results were obtained within the selected simulation condition range.

Voice Recognition-Based on Adaptive MFCC and Deep Learning for Embedded Systems (임베디드 시스템에서 사용 가능한 적응형 MFCC 와 Deep Learning 기반의 음성인식)

  • Bae, Hyun Soo;Lee, Ho Jin;Lee, Suk Gyu
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.22 no.10
    • /
    • pp.797-802
    • /
    • 2016
  • This paper proposes a noble voice recognition method based on an adaptive MFCC and deep learning for embedded systems. To enhance the recognition ratio of the proposed voice recognizer, ambient noise mixed into the voice signal has to be eliminated. However, noise filtering processes, which may damage voice data, diminishes the recognition ratio. In this paper, a filter has been designed for the frequency range within a voice signal, and imposed weights are used to reduce data deterioration. In addition, a deep learning algorithm, which does not require a database in the recognition algorithm, has been adapted for embedded systems, which inherently require small amounts of memory. The experimental results suggest that the proposed deep learning algorithm and HMM voice recognizer, utilizing the proposed adaptive MFCC algorithm, perform better than conventional MFCC algorithms in its recognition ratio within a noisy environment.

Artificial Intelligence for Clinical Research in Voice Disease (후두음성 질환에 대한 인공지능 연구)

  • Jungirl, Seok;Tack-Kyun, Kwon
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.33 no.3
    • /
    • pp.142-155
    • /
    • 2022
  • Diagnosis using voice is non-invasive and can be implemented through various voice recording devices; therefore, it can be used as a screening or diagnostic assistant tool for laryngeal voice disease to help clinicians. The development of artificial intelligence algorithms, such as machine learning, led by the latest deep learning technology, began with a binary classification that distinguishes normal and pathological voices; consequently, it has contributed in improving the accuracy of multi-classification to classify various types of pathological voices. However, no conclusions that can be applied in the clinical field have yet been achieved. Most studies on pathological speech classification using speech have used the continuous short vowel /ah/, which is relatively easier than using continuous or running speech. However, continuous speech has the potential to derive more accurate results as additional information can be obtained from the change in the voice signal over time. In this review, explanations of terms related to artificial intelligence research, and the latest trends in machine learning and deep learning algorithms are reviewed; furthermore, the latest research results and limitations are introduced to provide future directions for researchers.

Real time instruction classification system

  • Sang-Hoon Lee;Dong-Jin Kwon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.3
    • /
    • pp.212-220
    • /
    • 2024
  • A recently the advancement of society, AI technology has made significant strides, especially in the fields of computer vision and voice recognition. This study introduces a system that leverages these technologies to recognize users through a camera and relay commands within a vehicle based on voice commands. The system uses the YOLO (You Only Look Once) machine learning algorithm, widely used for object and entity recognition, to identify specific users. For voice command recognition, a machine learning model based on spectrogram voice analysis is employed to identify specific commands. This design aims to enhance security and convenience by preventing unauthorized access to vehicles and IoT devices by anyone other than registered users. We converts camera input data into YOLO system inputs to determine if it is a person, Additionally, it collects voice data through a microphone embedded in the device or computer, converting it into time-domain spectrogram data to be used as input for the voice recognition machine learning system. The input camera image data and voice data undergo inference tasks through pre-trained models, enabling the recognition of simple commands within a limited space based on the inference results. This study demonstrates the feasibility of constructing a device management system within a confined space that enhances security and user convenience through a simple real-time system model. Finally our work aims to provide practical solutions in various application fields, such as smart homes and autonomous vehicles.

Zero-shot voice conversion with HuBERT

  • Hyelee Chung;Hosung Nam
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.69-74
    • /
    • 2023
  • This study introduces an innovative model for zero-shot voice conversion that utilizes the capabilities of HuBERT. Zero-shot voice conversion models can transform the speech of one speaker to mimic that of another, even when the model has not been exposed to the target speaker's voice during the training phase. Comprising five main components (HuBERT, feature encoder, flow, speaker encoder, and vocoder), the model offers remarkable performance across a range of scenarios. Notably, it excels in the challenging unseen-to-unseen voice-conversion tasks. The effectiveness of the model was assessed based on the mean opinion scores and similarity scores, reflecting high voice quality and similarity to the target speakers. This model demonstrates considerable promise for a range of real-world applications demanding high-quality voice conversion. This study sets a precedent in the exploration of HuBERT-based models for voice conversion, and presents new directions for future research in this domain. Despite its complexities, the robust performance of this model underscores the viability of HuBERT in advancing voice conversion technology, making it a significant contributor to the field.

A study on the vowel extraction from the word using the neural network (신경망을 이용한 단어에서 모음추출에 관한 연구)

  • 이택준;김윤중
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2003.11a
    • /
    • pp.721-727
    • /
    • 2003
  • This study designed and implemented a system to extract of vowel from a word. The system is comprised of a voice feature extraction module and a neutral network module. The voice feature extraction module use a LPC(Linear Prediction Coefficient) model to extract a voice feature from a word. The neutral network module is comprised of a learning module and voice recognition module. The learning module sets up a learning pattern and builds up a neutral network to learn. Using the information of a learned neutral network, a voice recognition module extracts a vowel from a word. A neutral network was made to learn selected vowels(a, eo, o, e, i) to test the performance of a implemented vowel extraction recognition machine. Through this experiment, could confirm that speech recognition module extract of vowel from 4 words.

  • PDF

Increasing Persona Effects: Does It Matter the Voice and Appearance of Animated Pedagogical Agent

  • RYU, Jeeheon;KE, Fengfeng
    • Educational Technology International
    • /
    • v.19 no.1
    • /
    • pp.61-91
    • /
    • 2018
  • The animated pedagogical agent has been implemented to promote learning outcomes and motivation in multimedia learning. It has been claimed that one of the advantages of using pedagogical agent is persona effect - the personalization or social presence of pedagogical agent can enhance learning engagement and motivation. However, prior research is inconclusive as to whether and how the features of the pedagogical agent have effects on the persona effect. This study investigated whether the similarity between a pedagogical agent and the real instructor in terms of the voice and outlook would improve students' perception of the agent's persona. The study also examined the effect by the size of pedagogical agent on the persona perception. Two experiments were conducted with a total of 115 college students. Experiment 1 indicated a significant main effect of voice on the persona perception. Experiment 2 was conducted to examine whether the size of pedagogical agent would affect the voice effect on the persona perception. The results showed that the instructor-like voice yielded higher persona perception regardless of the pedagogical agent's size. Overall, the study findings indicated that the similarity in voice positively fostered the agent's persona.

Effects of Motor Learning Guided Laryngeal Motor Control Therapy for Muscle Misuse Dysphonia (운동학습이론에 기초한 발성운동조절법이 근오용성 발성장애의 음성에 미치는 효과)

  • Seo, In-Hyo;Lee, Ok-Bun;Lee, Sang-Joon;Chung, Phil-Sang
    • Phonetics and Speech Sciences
    • /
    • v.3 no.3
    • /
    • pp.133-140
    • /
    • 2011
  • Muscle misuse dysphonia (MMD) is defined as a behavioral voice disorder resulting from inappropriate contractions of intrinsic and/or extrinsic laryngeal muscles. The purpose of this study was to investigate the effect of motor learning guided laryngeal motor control therapy (MLG-LMCT) which is designed to improve an existing LMT and further the effective voice treatment on people with muscle misuse dysphonia. Forty-six people with MMD (M:F=16:30) participated in this study. The voice samples of the participants were recorded to investigate the effect of MLG-LMCT before and after the voice therapy. Voice samples were analyzed via electro-glotto-graph (EGG). Contact quotient (CQ), speed quotient (SQ), and waveform were reported. In addition, perceptual and acoustical evaluation were conducted to determine the change of voice improvement after treatment. The experimenter massaged the tensioned muscles around the neck. In order to find more proper phonation the experimenter showed the subjects their EGG wave forms as to whether or not they are moving the vocal folds to the appropriate position. Therefore, the EGG wave forms were used as a type of visual feedback. With the wave form, the experimenter helped subjects move the vocal folds and laryngeal muscles to find more proper voice production. The sensory stimuli from the experimenter gradually faded out. A paired dependent t- test revealed that there was significant differences in CQ between pre- and post-therapy. Perceptually, overall, rough, breathy, strain, and transition were significantly reduced. Acoustically, there were significant differences in Fo, jitter, shimmer, and NHR. After using MLG-LMCT, most of the subjects showed improvements in voice quality. The results from this study led us to the following conclusions: Motor learning guided laryngeal motor control therapy (MLG-LMCT) has reduces muscle misuse dysphonia. These results may occur because a visual feedback from EGG wave form can maintain the effect of the muscle tension reduction from laryngeal manual therapy. In case of people with MMD who reduced muscle tension from the therapy (LMT) but, not appropriately manipulating the location of larynx or adducting the vocal folds, MLG-LMCT might be an alternative therapy approach.

  • PDF

Voice Recognition Softwares: Their implications to second language teaching, learning, and research

  • Park, Chong-won
    • Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.69-85
    • /
    • 2000
  • Recently, Computer Assisted Language Learning (CALL) received widely held attention from diverse audiences. However, to the author's knowledge, relatively little attention was paid to the educational implications of voice recognition (VR) softwares in language teaching in general, and teaching and learning pronunciation in particular. This study explores, and extends the applicability of VR softwares toward second language research areas addressing how VR softwares might facilitate interview data entering processes. To aid the readers' understanding in this field, the background of classroom interaction research, and the rationale of why interview data, therefore the role of VR softwares, becomes critical in this realm of inquiry will be discussed. VR softwares' development and a brief report on the features of up-to-date VR softwares will be sketched. Finally, suggestions for future studies investigating the impact of VR softwares on second language learning, teaching, and research will be offered.

  • PDF

Development of voice pen-pal application of global communication system by voice message

  • Lau, Shuai
    • Korean Journal of Artificial Intelligence
    • /
    • v.2 no.1
    • /
    • pp.1-3
    • /
    • 2014
  • These days, interest and demand on smart learning has rapidly increased. Video English and mobile system based English speaking service have become popular. This study gave prototype of application to give and take voice message with world people and to give new concept of voice pen-pal beyond exchange of text messages. In modern society having rapidly increasing demand on smart learning, you can study foreign language by smart phone and communicate with foreigners by voice anytime and anywhere. The app allows global exchange to learn conversation. Recruitment of initial users and profit model have problems. We shall develop to improve problems and to solve difficulty.