• Title/Summary/Keyword: voice quality features

Search Result 42, Processing Time 0.025 seconds

A Real-time Implementation of G.729.1 Codec on an ARM Processor for the Improvement of VoWiFi Voice Quality (VoWiFi 음질 향상을 위한 G.729.1 광대역 코덱의 ARM 프로세서에의 실시간 구현)

  • Park, Nam-In;Kang, Jin-Ah;Kim, Hong-Kook
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.230-235
    • /
    • 2008
  • This paper addresses issues associated with the real-time implementation of a wideband speech codec such as ITU-T G. 729. 1 on an ARM processor in order to provide an improved voice quality of a VoWiFi service. The real-time implementation features in optimizing the C-source code of G.729. 1 and replacing several parts of the codec algorithm with faster ones. The performance of the implementation is measured by the CPU time spent for G.729.1 on the ARM926EJ processor that is used for a VoWiFi phone. It is shown from the experiments that the G.729.1 codec works in real-time with better voice quality than G 729 codec that is conventionally used for VoIP or VoWiFi phones.

  • PDF

Design of a variable rate speech codec for the W-CDMA system (W-CDMA 시스템을 위한 가변율 음성코덱 설계)

  • 정우성
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.142-147
    • /
    • 1998
  • Recently, 8 kb/s CS-ACELP coder of G.729 is atandardized by ITU-T SG15 and it has been reported that the speech quality of G729 is better than or equal to that of 32kb/s ADPCM. However G.729 is the fixed rate speech coder, and it does not consider the property of voice activity in mutual conversation. If we use the voice activity, we can reduce the average bit rate in half without any degradations of the speech quality. In this paper, we propose an efficient variable rate algorithm for G.729. The variable rate algorithm consists of two main subjects, the rate determination algorithm and algorithm, we combine the energy-thresholding method, the phonetic segmentation method by integration of various feature parameters obtained through the analysis procedure, and the variable hangover period method. Through the analysis of noise features, the 1 kb/s sub rate coder is designed for coding the background noise signal. So, we design the 4 kb/s sub rate coder for the unvoiced parts. The performance of the variable rate algorithm is evaluated by the comparison of speed quality and average bit rate with G.729. Subjective quality test is also done by MOS test. Conclusively, it is verified that the proposed variable rate CS-ACELP coder produced the same speech quality as G.729, at the average bit rate of 4.4 kb/s.

  • PDF

The Effects of Pitch Increasing Training (PIT) on Voice and Speech of a Patient with Parkinson's Disease: A Pilot Study

  • Lee, Ok-Bun;Jeong, Ok-Ran;Shim, Hong-Im;Jeong, Han-Jin
    • Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.95-105
    • /
    • 2006
  • The primary goal of therapeutic intervention in dysarthric speakers is to increase the speech intelligibility. Decision of critical features to increase the intelligibility is very important in speech therapy. The purpose of this study is to know the effects of pitch increasing training (PIT) on speech of a subject with Parkinson's disease (PD). The PIT program is focused on increasing pitch while a vowel is sustained with the same loudness. The loudness level is somewhat higher than that of the habitual loudness. A 67-year-old female with PD participated in the study. Speech therapy was conducted for 4 sessions (200 minutes) for one week. Before and after the treatment, acoustic, perceptual and speech naturalness evaluation was peformed for data analysis. Speech and voice satisfaction index (SVSI) was obtained after the treatment. Results showed Improvements in voice quality and speech naturalness. In addition, the patient's satisfaction ratings (SVSI) indicated a positive relationship between improved speech production and their (the patient and care-givers) satisfaction.

  • PDF

Phonation types of Korean fricatives and affricates

  • Lee, Goun
    • Phonetics and Speech Sciences
    • /
    • v.9 no.4
    • /
    • pp.51-57
    • /
    • 2017
  • The current study compared the acoustic features of the two phonation types for Korean fricatives (plain: /s/, fortis : /s'/) and the three types for affricates (aspirated : /$ts^h$/, lenis : /ts/, and fortis : /ts'/) in order to determine the phonetic status of the plain fricative /s/. Considering the different manners of articulation between fricatives and affricates, we examined four acoustic parameters (rise time, intensity, fundamental frequency, and Cepstral Peak Prominence (CPP) values) of the 20 Korean native speakers' productions. The results showed that unlike Korean affricates, F0 cannot distinguish two fricatives, and voice quality (CPP values) only distinguishes phonation types of Korean fricatives and affricates by grouping non-fortis sibilants together. Therefore, based on the similarity found in /$ts^h$/ and /ts/ and the idiosyncratic pattern found in /s/, this research concludes that non-fortis fricative /s/ cannot be categorized as belonging to either phonation type.

Knowledge-driven speech features for detection of Korean-speaking children with autism spectrum disorder

  • Seonwoo Lee;Eun Jung Yeo;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.53-59
    • /
    • 2023
  • Detection of children with autism spectrum disorder (ASD) based on speech has relied on predefined feature sets due to their ease of use and the capabilities of speech analysis. However, clinical impressions may not be adequately captured due to the broad range and the large number of features included. This paper demonstrates that the knowledge-driven speech features (KDSFs) specifically tailored to the speech traits of ASD are more effective and efficient for detecting speech of ASD children from that of children with typical development (TD) than a predefined feature set, extended Geneva Minimalistic Acoustic Standard Parameter Set (eGeMAPS). The KDSFs encompass various speech characteristics related to frequency, voice quality, speech rate, and spectral features, that have been identified as corresponding to certain of their distinctive attributes of them. The speech dataset used for the experiments consists of 63 ASD children and 9 TD children. To alleviate the imbalance in the number of training utterances, a data augmentation technique was applied to TD children's utterances. The support vector machine (SVM) classifier trained with the KDSFs achieved an accuracy of 91.25%, surpassing the 88.08% obtained using the predefined set. This result underscores the importance of incorporating domain knowledge in the development of speech technologies for individuals with disorders.

The Correlation between The Size and Location of Vocal Polyp and Voice Quality, Before and After Laryngeal Microsurgery (후두미세수술 전후 성대 용종의 크기 및 위치가 음성의 질의 변화에 미치는 영향)

  • Han, Won Gue;Kim, Min-Su;Oh, Kyung Ho;Woo, Jeung Soo;Jung, Kwang Yoon;Kwon, Soon Young
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.27 no.2
    • /
    • pp.102-107
    • /
    • 2016
  • Background and Objectives : Vocal polyps are caused by inflammation induced by stress or irritation. Many patients with vocal polyps complain voice discomfort. For vocal polyps, surgery such as laryngeal microsurgery has been the mainstay of management. We analyzed the clinical features of vocal polyps, and how the size and location of vocal polyps affect the outcomes of surgery. Methods : We retrospectively reviewed 42 patients from March 2014 to December 2015, who were diagnosed as unilateral single vocal polyp. When we operated on a vocal polyp with laryngeal microscopy, we measured their size and location. The quality of voice was evaluated by GRABS scale, jitter, shimmer, NHR (noise to harmonic ratio), MPT (maximum phonation time), and VHI (voice handicap index) before operation and 4 weeks after operation. Results : When we divided the patients into large-sized vocal polyp group (the longest length >3 mm) and small-sized vocal polyp group (the longest length ${\leq}3mm$), all parameter differences tend to be greater at large sized vocal polyp. However, these differences were not statistically significant (p>0.05). When we divided into two groups depending on the volume of vocal polyp, no distinct tendency was found. When we compared the location (anterior, mid and posterior) of vocal polyp with the improvement of voice quality, more change was found at mid portion vocal polyp, except the difference of VHI. However, these differences were also not statistically significant (p>0.05). Conclusion : All parameter differences tend to be greater at large vocal polyp and polyp of the mid location.

  • PDF

Diagnosis of Parkinson's disease based on audio voice using wav2vec (Wav2vec을 이용한 오디오 음성 기반의 파킨슨병 진단)

  • Yoon, Hee-Jin
    • Journal of Digital Convergence
    • /
    • v.19 no.12
    • /
    • pp.353-358
    • /
    • 2021
  • Parkinson's disease is the second most common degenerative brain disease after Alzheimer's in old age. Symptoms of Parkinson's disease are factors that reduce the quality of life in daily life, such as shaking hands, slowing behavior and cognitive function. Parkinson's disease that can slow the progression of the disease through early diagnosis. To diagnoze Parkinson's disease early, an algorithm was implemented to extract features using wav2vec and to diagnose the presence or absence of Parkinson's disease with deep learning(ANN). As a results of the experiment, the accuracy was 97.47%. It was better than the results of diagnosing Parkinson's disease using the existing neural network. The audio voice file could simply reduce the experiment process and obtain improved results.

Computation of Laryngeal Flow and Sound through a Dynamic Model of the Vocal Folds (동적 성대 모델을 이용한 후두 내 유동 및 음향장에 대한 수치 연구)

  • Bae, Young-Min;Moon, Young-J.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2008.03b
    • /
    • pp.21-24
    • /
    • 2008
  • The present study numerically investigates the glottal airflow characteristics as well as acoustic features of phonation fully coupled with dynamic behavior of vocal folds. The vocal folds are described by a low-dimensional body-covered model characterized by bio-mechanical parameters such as glottal width, vocal folds stiffness, and subglottal pressure. The flow in the vocal tract is modeled as an incompressible, axisymmetric form of the Navier-Stokes equations (INS), while the acoustic field is predicted by the linearized perturbed compressible equations (LPCE). The computed result shows that a two-mass model of vocal folds is sufficient to reproduce temporal variations in oral airflow and glottis motion produced by female speakers. It is also found that i) the glottal width has a significant effect on the amplitude of glottal flow, and thus on the amplitude of acoustic wave in the vocal tract, ii) the vocal fold tension is the main control parameter for the fundamental frequency of phonation, iii) the subglottal pressure plays an appreciable role on reproduction of the self-sustained oscillation of vocal folds, and iv) the strength of pulsating airflow and vortical structures are primarily affected by glottal width and subglottal pressure, and are closely related to pitch, loudness, and voice quality. Finally, more comprehensive explanation about the difference between one- and two-mass models is presented with discussion of effectiveness of vocal folds oscillation and voice quality.

  • PDF

A Design of TDMA/TDD MAC Protocol for Full-Duplex Multi-User Voice Communication Systems Based on Sensor Network (센서 네트워크 기반의 다수 사용자간 Full-Duplex 음성 통신 시스템을 위한 TDMA/TDD MAC 프로토콜 설계)

  • Kim, Jisoo;Lee, Jae Hyoung;Cho, Sung Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38C no.3
    • /
    • pp.239-246
    • /
    • 2013
  • The IEEE 802.15.4 offers standard about PHY and MAC layer and features low power, low bandwidth, and low speed data communication. Because of this reason, IEEE 802.15.4 is only within a limited range such as sensor detection and home network; nevertheless, the research about transmission multimedia data like voice packet through wireless sensor networks is conducted widely. In this paper, we proposed the group communication system based on the sensor network. TDMA/TDD MAC based on the IEEE 802.15.4 PHY for voice communication on the sensor network is designed by improvement existing peer-to-peer voice communication on the sensor network and hardware is implemented for group communication. To measure the quality of designed system, mean opinion score (MOS) is obtained from the experiment and verified by using sine wave method. As a result of an experiment, we expect that a many cases of application solution can be developed using presented system.

DEVS Simulation of Spam Voice Signal Detection in VoIP Service (VoIP 스팸 콜 탐지를 위한 음성신호의 DEVS 모델링 및 시뮬레이션)

  • Kim, Ji-Yeon;Kim, Hyung-Jong;Cho, Young-Duk;Kim, Hwan-Kuk;Won, Yoo-Jae;Kim, Myuhng-Joo
    • Journal of the Korea Society for Simulation
    • /
    • v.16 no.3
    • /
    • pp.75-87
    • /
    • 2007
  • As the VoIP service quality is getting better and many shortcomings are being overcome, users are getting interested in this service. Also, there are several additional features that provide a convenience to users such as presence service, instant messaging service and so on. But, as there are always two sides of rein, some security issues have users hesitate to make use of it. This paper deals with one of the issues, the VoIP spam problem. We took into account the signal pattern of voice message in spam call and we have constructed voice signal models of normal call, normal call with noise and spam call. Each voice signal case is inserted into our spam decision algorithm which detects the spam calls based on the amount of information in the call signal. We made use of the DEVS-$Java^{TM}$ for our modeling and simulation. The contribution of this work is in suggestion of a way to detect voice spam call signal and testing of the method using modeling and simulation methodology.

  • PDF