• Title/Summary/Keyword: Speech Recording

Search Result 97, Processing Time 0.025 seconds

Comparative Study on Acoustic Characteristics of Vocal Fold Paralysis and Benign Mucosal Disorders of Vocal Fold (성대마비와 양성 성대점막질환의 음향학적 특성비교)

  • Kong, Il-Seung;Cho, Young-Ju;Lee, Myung-Hee;Kim, Jong-Seung;Yang, Yun-Su;Hong, Ki-Hwan
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.18 no.2
    • /
    • pp.122-128
    • /
    • 2007
  • This study aims to analyze the voices of the patients with voice disorders including vocal fold paralysis, vocal fold cyst and vocal nodule/polyp in the aspect of acoustic phonetics. This study intends to collect subsidiary acoustic data in order to make a speech treatment and an standardization of vocal disorders. Subjects and Methods: The subjects of this study were 64 adult patients who underwent indirect laryngoscopy and laryngostroboscopy, and were diagnosed as vocal fold paralysis, vocal fold cyst or vocal nodule/polyp. Experimental group consisted of 20 patients who were diagnosed as vocal fold paralysis, 21 patients who were diagnosed as vocal fold cyst and had the average age of 42.0 $({\pm}10.03)$ ; and 23 patients who were diagnosed as vocal nodule/polyp and had the average age of 40.9 $({\pm}13.75)$. For the methodology of this study, the patients listed above were asked to sit in a comfortable position at intervals of 10cm apart from the patient's mouth and a microphone, and subsequently to phonate a vowel sound /e/ for the maximum phonation time with natural tone and vocal volume then the sound was directly inputted on a computer. During recording, sampling rate was set to 44,100Hz and the 1-second area corresponding to stable zone except the first and the last stage of waveform of the vowel sound /e/ vocalized by the individual patients was analyzed. Results: First, there was no statistically significant difference in jitter and shimmer between vocal fold paralysis and vocal fold cyst, while there was highly statistically significant difference in them between vocal fold paralysis and vocal nodule/polyp. Second, looking into the mean values obtained from NNE, HNR and SNR results associated with noise ratio, the disease showing the most abnormal characteristics was vocal fold paralysis, followed by cyst and nodule/polyp in order. For NNE, there was statistically significant difference between vocal nodule/polyp, and cyst or paralysis. In other words, it was found that the NNE of vocal nodule/polyp was weaker than that of cyst or paralysis. Similarly, HNR and SNR also showed the same characteristics; there was statistically significant difference between vocal fold paralysis and vocal fold cyst or nodule/polyp, and HNR and SNR values of vocal fold paralysis were lower than those of vocal fold cyst or nodule/polyp. Conclusion: For vocal fold paralysis, the abnormal values of acoustic parameters associated with frequency, amplitude and noise ratio were statistically significantly higher than those of vocal fold cyst and nodule/polyp. This finding suggests that the voices of the patients with vocal fold paralysis are the most severely injured due to less stability of vocal fold movement, asymmetry and incomplete glottic closure. In addition, there was no statistically significant difference in the acoustic parameters of tremor among vocal fold paralysis, vocal fold cyst and vocal nodule/polyp. Further studies need to ascertain reasonable acoustic parameters with various vocal disorders as well as to clarify the correlation between acoustics-based objective tools and subjective evaluations.

  • PDF

Implementation of Music Signals Discrimination System for FM Broadcasting (FM 라디오 환경에서의 실시간 음악 판별 시스템 구현)

  • Kang, Hyun-Woo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.2
    • /
    • pp.151-156
    • /
    • 2009
  • This paper proposes a Gaussian mixture model(GMM)-based music discrimination system for FM broadcasting. The objective of the system is automatically archiving music signals from audio broadcasting programs that are normally mixed with human voices, music songs, commercial musics, and other sounds. To improve the system performance, make it more robust and to accurately cut the starting/ending-point of the recording, we also added a post-processing module. Experimental results on various input signals of FM radio programs under PC environments show excellent performance of the proposed system. The fixed-point simulation shows the same results under 3MIPS computational power.

Acoustic Characteristics of Korean Alveolar Sibilant 's', 's'' according to Phonetic Contexts of Children with Cerebral Palsy (뇌성마비 아동의 음성 환경에 따른 치경마찰음 'ㅅ', 'ㅆ'의 음향학적 특성)

  • Kim, Sookhee;Kim, Hyungi
    • Phonetics and Speech Sciences
    • /
    • v.5 no.2
    • /
    • pp.3-10
    • /
    • 2013
  • The purpose of this study is to analyze the acoustic characteristics of Korean alveolar sibilant sounds of children with cerebral palsy by acoustic analysis. Thirteen children with spastic cerebral palsy aging from 6 to 10 years old, were selected by an articulation test, and compared with a control group of thirty children. The meaningless monosyllable CV, disyllable VCV(/asa/) and frame sentence including target syllables CV were measured. C was from the /s, s'/, and V was from the set /a, i, u, ${\varepsilon}$, o, ɯ, ʌ/. Multi-Speech was used for data recording and analysis. As a result, the frication duration of lenis-glottalized alveolar sibilant of children with cerebral palsy was significantly shorter than that of the control group in CV, VCV and frame sentence. The vowel duration in the following lenis-glottalized alveolar sibilant of children with cerebral palsy was significantly longer than that of the control group in CV, VCV and frame sentence. The children with cerebral palsy showed frequency and intensity of friction intervals which were significantly lower than in the control group in CV, VCV and frame sentence. In the comparison of the lenis-glottalized alveolar sibilant by the children with cerebral palsy group's phonation types, the frication duration showed a significant difference between the phonation types in CV, VCV and between the phonetic contexts. The glottalized-sibilant was longer than the lenis-sibilant in all the phonetic contexts. The subsequent vowel duration showed a significant difference between the phonation types in VCV and between the phonetic contexts(p<.05). The vowel duration in the following glottalized-sibilant was longer than the vowel duration in the following lenis-sibilant in all the phonetic contexts. In the frequency there was a significant difference between the phonation types in CV, and in the intensity there was a significant difference between the phonation type in CV and VCV. The children with spastic cerebral palsy had difficulty in articulating the alveolar sibilant due to poor control ability in laryngeal, respiration and articulatory movements which require fine motor coordination. This study quantitatively analyzes the acoustic parameters of the alveolar sibilant in various phonetic contexts. Therefore, the results are expected to help provide fundamental data for an intervention of articulation treatment for children with cerebral palsy.

The Effect of An Increase of Closed Quotient on Improvement of Voice Quality after Type I Thyroplasty in Patients with Unilateral Vocal Cord Paralysis (일측 성대마비 환자에서 성대내전술 후 성대접촉율의 증가가 음질 개선에 미치는 영향)

  • Kim, Han-Su;Choi, Seung-Hee;Lim, Jae-Yol;Choi, Hong-Shik
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.15 no.1
    • /
    • pp.16-20
    • /
    • 2004
  • Purpose : To assess perceptual, acoustic and aerodynamic measure of voice quality in patients with unilateral vocal cord paralysis before and after type I thyroplasty. Methods : The clinical records of patients operated type I thyroplasty in the Departement of otorhinoalryngolgy, Yongdong Severance hospital from November 2001 to November 2003 were reviewed. All patients uderwent a vocal function evaluation including perceptual, acoustic and aerodynamic measures of voice preoperative and on $60^{th}$ postoperative day. The perceptual and acoustic measures were obtained from recording of patients' reading a 'Sanchak' passage. The perceptual evaluation was performed by 2 speech pathologist using a 4-point rating scale. Acoustic parameters(voice range profile low(RAL), voice range profile high(RAH), average fundamental frequency(AFX), closed quotient, harmonic to noise ratio, jitter and shimmer) were investigated by Lx speech studio. Mean flow rate(MFR), subglottic pressure(Psub) and intensity were measured using the Phonatory function analyzer. The maximum phonation time was also measured. The data were statistically analyzed. A paired t-test (p<0.1) was used to compare preoperative and postoperative results. And multiple regression test was used to find which parameter was most correlated to improvement of postoperative voice quality. Results : Among aerodynamic parameters, Psub $(88.11mmH_2O{\rightarrow}58.7mmH_2O)$, MPT(7.87sec${\rightarrow}$12.53sec), MFR (359.8ml/sec${\rightarrow}$161.06ml/sec) were statistically improved. AFx(205.5Hz${\rightarrow}$163.27Hz), AQx(23.9%${\rightarrow}$48.3%), RAL, RAH. Jotter and shimmer were improved. In multiple regression test, AFx and AQx was noted as the two meost correlated parameters to improvement of postoperative breathiness. But general grade of voice quality was more correlated to Psub and shimmer. Conclusion : Vocal fold medialization procedures effectively reduce glottic gap. Increasing of contact area of both vocal folds induced improvement in aerodynamic parameters and leaded stabilizing of vocal fold vibration. That effect results in improvement in acoustic parameters (shimmer, jitter, signal-to-noise ratio, voice range profile) and voice quality.

  • PDF

Reliability of OperaVOXTM against Multi-Dimensional Voice Program to Assess Voice Quality before and after Laryngeal Microsurgery in Patient with Vocal Polyp (성대 용종 환자의 후두미세수술 전후 음성 평가에서 OperaVOXTM와 Multi-Dimensional Voice Program 간의 신뢰도 연구)

  • Kim, Sun Woo;Kim, So Yean;Cho, Jae Kyung;Jin, Sung Min;Lee, Sang Hyuk
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.31 no.2
    • /
    • pp.71-77
    • /
    • 2020
  • Background and Objectives OperaVOXTM (Oxford Wave Research Ltd.) is a portable voice analysis software package designed for use with iOS devices. As a relatively cheap, portable and easily accessible form of acoustic analysis, OperaVOXTM may be more clinically useful than laboratory-based software in many situations. The aim of this study was to evaluate the agreement between OperaVOXTM and Multi-Dimensional Voice Program (MDVP; Computerized Speech Lab) to assess voice quality before and after laryngeal microsurgery in patient with vocal polyp. Materials and Method Twenty patients who had undergone laryngeal microsurgery for vocal polyp were enrolled in this study. Preoperative and postoperative voices were assessed by acoustic analysis using MDVP and OperaVOXTM. A five-seconds recording of vowel /a/ was used to measure fundamental frequency (F0), jitter, shimmer and noise-to-harmonic ratio (NHR). Results Several acoustic parameters of MDVP and OperaVOXTM related to short-term variability showed significant improvement. While pre-operative value of F0, jitter, shimmer, NHR was 155.75 Hz (male: 125.37 Hz, female: 183.37 Hz), 2.20%, 6.28%, 0.16, post-operative values of these parameter was 164.34 Hz (male: 129.42 Hz, female: 199.26 Hz), 2.15%, 5.18%, 0.14 Hz in MDVP. While pre-operative value of F0, jitter, shimmer, NHR was 168.26 Hz (male: 135.16 Hz, female: 201.37 Hz), 2.27%, 6.95%, 0.26, post-operative values of these parameters was 162.72 Hz (male: 128.267 Hz, female: 197.18 Hz), 1.71%, 5.36%, 0.20 in OperaVOXTM. There was high intersoftware agreement for F0, jitter, shimmer with intraclass correlation coefficient. Conclusion Our results showed that the short-term variability of acoustic parameters in both MDVP and OperaVOXTM were useful for the objective assessment of voice quality in patients who received laryngeal microsurgery. OperaVOXTM is comparable to MDVP and has high intersoftware reliability with MDVP in measuring the F0, jitter, and shimmer

A study on English vowel duration with respect to the various characteristics of the following consonant (후행하는 자음의 여러 특성에 따른 영어 모음 길이에 관한 연구)

  • Yoo, Hyunbin;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.14 no.1
    • /
    • pp.1-11
    • /
    • 2022
  • The purpose of this study is to investigate the difference of vowel duration due to the voicing of word-final consonants in English and its relation to the types of word-final consonants (stops vs. fricatives), (partial) devoicing, and stop releasing. Addtionally, this study attempts to interpret the findings from the functional view that the vowels before voiced consonants are produced with a longer duration in order to enhance the salience of the voicing of word-final consonants. This study conducted a recording experiment with English native speakers, and measured the vowel duration, the degree of (partial) devoicing of word-final voiced consonants and the release of word-final stops. First, the results showed that the ratio of the duration difference was not influenced by the types of word-final consonants. Second, it was revealed that the higher the degree of (partial) devoicing of word-final voiced consonants, the longer vowel duration before word-final voiced consonants, which was compatible with the prediction based on the functional view. Lastly, the ratio of the duration difference was greater when the word-final stops were uttered with the release compared to when uttered without the release, which was not consistent with the functional view. These results suggest that it is not sufficient enough to explain the voicing effect by its function of distinguishing the voicing of word-final consonants.

Time-Synchronization Method for Dubbing Signal Using SOLA (SOLA를 이용한 더빙 신호의 시간축 동기화)

  • 이기승;지철근;차일환;윤대희
    • Journal of Broadcast Engineering
    • /
    • v.1 no.2
    • /
    • pp.85-95
    • /
    • 1996
  • The purpose of this paper Is to propose a dubbed signal time-synchroniztion technique based on the SOLA(Synchronized Over-Lap and Add) method which has been widely used to modify the time scale of speech signal. In broadcasting audio recording environments, the high degree of background noise requires dubbing process. Since the time difference between the original and the dubbed signal ranges about 200mili seconds, process is required to make the dubbed signal synchronize to the corresponding image. The proposed method finds he starting point of the dubbing signal using the short-time energy of the two signals. Thereafter, LPC cepstrum analysis and DTW(Dynamic Time Warping) process are applied to synchronize phoneme positions of the two signals. After determining the matched point by the minimum mean square error between orignal and dubbed LPC cepstrums, the SOLA method is applied to the dubbed signal, to maintain the consistency of the corresponding phase. Effectiveness of proposed method is verified by comparing the waveforms and the spectrograms of the original and the time synchronized dubbing signal.

  • PDF

Outcome of Pallidal Deep Brain Stimulation in Meige Syndrome

  • Ghang, Ju-Young;Lee, Myung-Ki;Jun, Sung-Man;Ghang, Chang-Ghu
    • Journal of Korean Neurosurgical Society
    • /
    • v.48 no.2
    • /
    • pp.134-138
    • /
    • 2010
  • Objective : Meige syndrome is the combination of blepharospasm and oromandibular dystonia. We assessed the surgical results of bilateral globus pallidus internus (GPi) deep brain stimulation (DBS) in patients with medically refractory Meige syndrome. Methods : Eleven patients were retrospectively analyzed with follow-ups of more than 12 months. The mean follow-up period was $23.1{\pm}6.4$ months. The mean age at time of surgery was $58.0{\pm}7.8$ years. The mean duration of symptoms was $8.7 {\pm}7.6$ years. DBS electrodes were placed under local anesthesia using microelectrode recording and stimulation. After $2.4{\pm}1.3$ days of trial tests, the stimulation device was implanted under general anesthesia. Patients were evaluated using the Burke-Fahn-Marsden Dystonia Rating Scale (BFMDRS). Results : BFMDRS total movement scores improved by 59.8%, 63.5%, 74.1%, 74.5%, and 85.5% during the immediate postoperative period of test stimulation, 3, 6, 12, and 24 months (n = 5) after surgery, respectively. The BFMDRS total movement scores were reduced gradually and the results reached statistical significance in the postoperative period (test period, p < 0.001; 3 months, p < 0.001; 6 months, p = 0.003; 12 months, p < 0.001; 24 months, p = 0.042). There was no statistical difference between 12 months and 24 months. BFM subscores improved by 63.3% for the eyes, 80.9% for the mouth, 68.4% for speech/swallowing, and 87.9% for the neck at 12 months after surgery. The adverse effects were insignificant. Conclusion : The bilateral GPi-DBS can be effective for the treatment of intractable Meige syndrome without significant side effects.

Clinical Observation of Streptomycin Ototoxicity (스트레프토마이신 중독성난청의 임상적 관찰)

  • 이종담;윤병용
    • Proceedings of the KOR-BRONCHOESO Conference
    • /
    • 1977.06a
    • /
    • pp.3.1-3
    • /
    • 1977
  • Now a lot of antibiotics are available for daily clinical use. Streptomycin, above all them, is widely used because of its effects upon tuberculosis. Recently, however, the ototoxic effects of SM was noticed especially on the vestibular sensory organs, so called, SM deafness. The authors have clinically investigated the incidence of SM deafness of 198 cases of admission at Masan National Hospital for treatment of pulmonary tuberculosis. The results obtained were follows; 1. The incidence of SM deafness was 30 cases (15.2%) of total 198 cases. 2. SM deafness had a tendency to increase according to amount of administrated SM dose. 3. The administration of SM 1.0 gm. daily (17.4%) resulted in more frequent incidence than that (2.9%) of 1.0gm. single dose, twice a week. 4. The complication due to SM administration was 59 cases (2.98%). Among them, tinnitus was the most frequent symptom recording 14 cases (46.7%). 5. SM deafness had a tendency to begin with high frequency and than progress gradually to speech range. 6. As concern of type of audiogram, gradual descending type (40.0%) and horizontal type(30.3%) were the most frequent.

  • PDF

A Review on Deep Learning Platform for Artificial Intelligence (인공지능 딥러링 학습 플랫폼에 관한 선행연구 고찰)

  • Jin, Chan-Yong;Shin, Seong-Yoon;Nam, Soo-Tai
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.169-170
    • /
    • 2019
  • Lately, as artificial intelligence becomes a source of global competitiveness, the government is strategically fostering artificial intelligence that is the base technology of future new industries such as autonomous vehicles, drones, and robots. Domestic artificial intelligence research and services have been launched mainly in Naver and Kakao, but their size and level are weak compared to overseas. Recently, deep learning has been conducted in recent years while recording innovative performance in various pattern recognition fields including speech recognition and image recognition. In addition, deep running has attracted great interest from industry since its inception, and global information technology companies such as Google, Microsoft, and Samsung have successfully applied deep learning technology to commercial products and are continuing research and development. Therefore, we will look at artificial intelligence which is attracting attention based on previous research.

  • PDF