• Title/Summary/Keyword: 음성 훈련

Search Result 281, Processing Time 0.025 seconds

Analysis and Implementation of Speech/Music Classification for 3GPP2 SMV Codec Based on Support Vector Machine (SMV코덱의 음성/음악 분류 성능 향상을 위한 Support Vector Machine의 적용)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.6
    • /
    • pp.142-147
    • /
    • 2008
  • In this paper, we propose a novel a roach to improve the performance of speech/music classification for the selectable mode vocoder (SMV) of 3GPP2 using the support vector machine (SVM). The SVM makes it possible to build on an optimal hyperplane that is separated without the error where the distance between the closest vectors and the hyperplane is maximal. We first present an effective analysis of the features and the classification method adopted in the conventional SMV. And then feature vectors which are a lied to the SVM are selected from relevant parameters of the SMV for the efficient speech/music classification. The performance of the proposed algorithm is evaluated under various conditions and yields better results compared with the conventional scheme of the SMV.

Multi-Emotion Regression Model for Recognizing Inherent Emotions in Speech Data (음성 데이터의 내재된 감정인식을 위한 다중 감정 회귀 모델)

  • Moung Ho Yi;Myung Jin Lim;Ju Hyun Shin
    • Smart Media Journal
    • /
    • v.12 no.9
    • /
    • pp.81-88
    • /
    • 2023
  • Recently, communication through online is increasing due to the spread of non-face-to-face services due to COVID-19. In non-face-to-face situations, the other person's opinions and emotions are recognized through modalities such as text, speech, and images. Currently, research on multimodal emotion recognition that combines various modalities is actively underway. Among them, emotion recognition using speech data is attracting attention as a means of understanding emotions through sound and language information, but most of the time, emotions are recognized using a single speech feature value. However, because a variety of emotions exist in a complex manner in a conversation, a method for recognizing multiple emotions is needed. Therefore, in this paper, we propose a multi-emotion regression model that extracts feature vectors after preprocessing speech data to recognize complex, inherent emotions and takes into account the passage of time.

A Study on the Impact of Speech Data Quality on Speech Recognition Models

  • Yeong-Jin Kim;Hyun-Jong Cha;Ah Reum Kang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.41-49
    • /
    • 2024
  • Speech recognition technology is continuously advancing and widely used in various fields. In this study, we aimed to investigate the impact of speech data quality on speech recognition models by dividing the dataset into the entire dataset and the top 70% based on Signal-to-Noise Ratio (SNR). Utilizing Seamless M4T and Google Cloud Speech-to-Text, we examined the text transformation results for each model and evaluated them using the Levenshtein Distance. Experimental results revealed that Seamless M4T scored 13.6 in models using data with high SNR, which is lower than the score of 16.6 for the entire dataset. However, Google Cloud Speech-to-Text scored 8.3 on the entire dataset, indicating lower performance than data with high SNR. This suggests that using data with high SNR during the training of a new speech recognition model can have an impact, and Levenshtein Distance can serve as a metric for evaluating speech recognition models.

A Korean menu-ordering sentence text-to-speech system using conformer-based FastSpeech2 (콘포머 기반 FastSpeech2를 이용한 한국어 음식 주문 문장 음성합성기)

  • Choi, Yerin;Jang, JaeHoo;Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.359-366
    • /
    • 2022
  • In this paper, we present the Korean menu-ordering Sentence Text-to-Speech (TTS) system using conformer-based FastSpeech2. Conformer is the convolution-augmented transformer, which was originally proposed in Speech Recognition. Combining two different structures, the Conformer extracts better local and global features. It comprises two half Feed Forward module at the front and the end, sandwiching the Multi-Head Self-Attention module and Convolution module. We introduce the Conformer in Korean TTS, as we know it works well in Korean Speech Recognition. For comparison between transformer-based TTS model and Conformer-based one, we train FastSpeech2 and Conformer-based FastSpeech2. We collected a phoneme-balanced data set and used this for training our models. This corpus comprises not only general conversation, but also menu-ordering conversation consisting mainly of loanwords. This data set is the solution to the current Korean TTS model's degradation in loanwords. As a result of generating a synthesized sound using ParallelWave Gan, the Conformer-based FastSpeech2 achieved superior performance of MOS 4.04. We confirm that the model performance improved when the same structure was changed from transformer to Conformer in the Korean TTS.

Voice therapy for pitch problems following thyroidectomy without laryngeal nerve injury (신경학적 손상이 없는 갑상선 술 후 음도문제의 음성치료)

  • Ji-sung Kim;Mi-jin Kim
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.53-58
    • /
    • 2023
  • After thyroidectomy, some patients who show normal vocal cord movement still complain of subjective voice problems, which could lead to a decrease in quality of life related to communication. This study aims to investigate the effectiveness of a newly designed voice therapy applying neck exercise and semi-occluded vocal tract exercise (SOVTE) to improve voice problems after thyroidectomy without neurological injury. For this purpose, voice therapy was randomly assigned to 10 women who received thyroidectomy. Acoustic analysis [fundamental frequency, jitter, shimmer, noise-to-harmonics ratio, min Voice Range Profile (VRP), max VRP, VRP] was performed before and after surgery and immediately after voice therapy to compare voice changes. The study showed a statistically significant increase in max VRP and VRP after voice therapy compared to before surgery. These results suggest that the voice therapy methods in this study effectively improve a major symptom of voice problems after thyroidectomy, specifically the reduction in the high-frequency range. However, this study was limited in the number of s participants and did not control for the type of surgery. Therefore, further research utilizing larger sample sizes and controlled variables is needed to investigate the long-term effects of voice therapy.

Korean Speakers' Pronunciation and Pronunciation Training of English Stops (한국인의 영어 폐쇄음 발화와 발화 훈련)

  • Kim, Ji-Eun
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.29-36
    • /
    • 2010
  • The purposes of this study are (1) to see if language transfer effect is found in Korean speakers' pronunciation of English stops and to correct them and (2) to investigate the effectiveness of mimicry training and Speech Analyzer training on subjects' pronunciation of English stops. For these purposes, 20 Korean speakers' VOT values of English stops were measured using Speech Analyzer and their post-training production was compared with their pre-training production. The result shows that Korean speakers have no difficulty in correcting pronunciation errors of English voiceless stops and voiced stops and such a result indicates that language transfer effect is not noticed as expected. In addition, the result of pronunciation training shows that the training using Speech Analyzer is more effective than mimicry training.

  • PDF

A Study on DTW Reference Pattern Creation Using Genetic Algorithm (유전자 알고리듬을 이용한 DTW 참조패턴 생성에 관한 연구)

  • 서광석
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.06e
    • /
    • pp.385-388
    • /
    • 1998
  • DTW를 이용한 음성인식에서는 참조패턴이 인식률에 절대적인 영향을 미치므로 가장 적합한 참조패턴의 생성이 중요한 요인으로 작용한다. 그러므로 인식률 향상을 위해 여러개의 참조패턴을 사용하는 방법이 있다. 그러나 이러한 방법은 게산량의 과다 및 사용 메모리의 증가 등이 단점으로 지적되고 있다. 따라서 본 논문에서는 참조패턴의 수를 줄이면서 높은 인식률을 얻기 위해 유전자 알고리듬을 이용하여 보다 우수한 참조패턴을 생성하여 음성인식에 적용하였다. 본 논문에서는 참조패턴 생성을 위하여 훈련에 참가한 자료를 서로 비교하여 DTW 거리값의 누적값이 최소가 되는 데이터를 선정하는 방법, 유전자 알고리듬을 이용한 방법으로 선정하는 방법으로 나누어 실험을 했고, 그 결과 누적값의 최소값을 이용하였을 경우 98.33%의 인식률을 얻을 수 있었던 반면에 유전자 알고리듬을 사용하였을 경우 100%의 화자종속 인식률을 얻을 수 있었다.

  • PDF

Acomparison of Sao2 & PACO2 Changes of pre & post vocal training Classical singer (발성훈련 전 후의 혈중산소포화도(SaO2)와 폐포 내 이산화탄소분압(PaCO2)의 비교 연구)

  • Nam, Do-Hyun;Ahn, Chul-Min
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.261-264
    • /
    • 2007
  • Five males trained singers (age:25.0${\pm}$1.4years, career:6.8${\pm}$1.1 years) and five female trained singers (age:22.0${\pm}$1.0years, career:5.8${\pm}$1.2 years) participated in this study. SaO2(Oxi Hemoglobin saturation) measured by Oxy-Pulse meter and PAC02 (Pressure Alveolar Co2) measured by Quick et CO2 are compared with pre and post vocal training. As the result, PAC02 was lower than normal range (36-40mmHg) after vocal training, leading to Hypocapnia. This causes headache and dizziness

  • PDF

Implementation of CNN in the view of mini-batch DNN training for efficient second order optimization (효과적인 2차 최적화 적용을 위한 Minibatch 단위 DNN 훈련 관점에서의 CNN 구현)

  • Song, Hwa Jeon;Jung, Ho Young;Park, Jeon Gue
    • Phonetics and Speech Sciences
    • /
    • v.8 no.2
    • /
    • pp.23-30
    • /
    • 2016
  • This paper describes some implementation schemes of CNN in view of mini-batch DNN training for efficient second order optimization. This uses same procedure updating parameters of DNN to train parameters of CNN by simply arranging an input image as a sequence of local patches, which is actually equivalent with mini-batch DNN training. Through this conversion, second order optimization providing higher performance can be simply conducted to train the parameters of CNN. In both results of image recognition on MNIST DB and syllable automatic speech recognition, our proposed scheme for CNN implementation shows better performance than one based on DNN.

Remote Articulation Training System for the Deafs (청각장애자를 위한 원격조음훈련시스템의 개발)

  • 이재혁;유선국;박상희
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.7 no.1
    • /
    • pp.43-49
    • /
    • 1996
  • In this study, remote articulation training system which connects the hearing disabled trainee and the speech therapist via B-ISDN is introduced. The hearing disabled does not have the hearing feedback of his own pronuciation, and the chance of watching his speech organs movement trajectory will offer him the self-training of articulation. So the system has two purposes of self articulation training and trainer's on-line checking in remote place. We estimate the vocal tract articultory movements from the speech signal using inverse modelling and display the movement trajectoy on the sideview of human face graphically. The trajectories of trainees articulation is displayed along with the reference trajectories, so the trainee can control his articulating to make the two trajectories overlapped. For on-line communication and ckecking training record the system has the function of video conferencing and tranferring articulatory data.

  • PDF