• Title/Summary/Keyword: Speech Recording

Search Result 97, Processing Time 0.021 seconds

Analysis of sequential motion rate in dysarthric speakers using a software (소프트웨어를 이용한 마비말장애 화자의 일련운동속도 분석)

  • Park, Heejune;An, Sinwook;Shin, Bumjoo
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.173-177
    • /
    • 2018
  • Purpose: The primary goal of this study was to discover whether the articulatory diadochokinesis (sequential motionrate, SMR) collected using the Motor Speech Disorder Assessment (MSDA) software module can diagnose dysarthria and determine its severity. Methods: Two subject groups, one with spastic dysarthria (n=26) and a control group of speakers (n=30) without neurological disease, were set up. From both groups, the SMR was collected by MSDA at a time, and then analyzed using descriptive statistics. Results: For the parameters of syllable rate, jitter, and the mean syllable length (MSL) at the front and back, the control group displayed better results than the dysarthria patients. Conclusions: At the level of articulatory diadochokinesis, the results showed that the use of MSDA software in clinical practice was generally suitable for quickly recording the parameters of syllable rate, jitter, and mean syllable length.

Fast ab/adduction Rate of Articulation Valves in Normal Adults (정상 성인의 조음밸브에 대한 내${\cdot}$외전 비율)

  • Park, Hee-Jun;Han, Ji-Yeon
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.149-151
    • /
    • 2007
  • This study was designed to investigate fast ab/adduction rate of articulation valves in normal adults. The measurement of fast ab/aduction rate has traditionally been used for assessment, diagnosis and therapy in patients who suffered from dysarthria, functional articulation disorders or apraxia of speech. Fast ab/adduction rate shows the documented structural and physiological changes in the central nervous system and the peripheral components of oral and speech production mechanism. Fast ab/adduction rates were obtained from 20 normal subjects by producing the repetition of vocal function (/ihi/), tongue function (/t${\wedge}$/), velopharyngeal function (/m/), and labial function (/p${\wedge}$/). The Aerophone II was used for data recording. The results of finding as follows: average fast ab/adduction rates were vocal function(6.21cps), tongue function(7.42cps), velopharyngeal function(5.23cps), labial function (6.93cps). The results of this study are guidelines of normal diadochokinetic rates. In addition, they can indicate the severity of diseases and evaluation of treatment.

  • PDF

Analysis of Voice Color Similarity for the development of HMM Based Emotional Text to Speech Synthesis (HMM 기반 감정 음성 합성기 개발을 위한 감정 음성 데이터의 음색 유사도 분석)

  • Min, So-Yeon;Na, Deok-Su
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.15 no.9
    • /
    • pp.5763-5768
    • /
    • 2014
  • Maintaining a voice color is important when compounding both the normal voice because an emotion is not expressed with various emotional voices in a single synthesizer. When a synthesizer is developed using the recording data of too many expressed emotions, a voice color cannot be maintained and each synthetic speech is can be heard like the voice of different speakers. In this paper, the speech data was recorded and the change in the voice color was analyzed to develop an emotional HMM-based speech synthesizer. To realize a speech synthesizer, a voice was recorded, and a database was built. On the other hand, a recording process is very important, particularly when realizing an emotional speech synthesizer. Monitoring is needed because it is quite difficult to define emotion and maintain a particular level. In the realized synthesizer, a normal voice and three emotional voice (Happiness, Sadness, Anger) were used, and each emotional voice consists of two levels, High/Low. To analyze the voice color of the normal voice and emotional voice, the average spectrum, which was the measured accumulated spectrum of vowels, was used and the F1(first formant) calculated by the average spectrum was compared. The voice similarity of Low-level emotional data was higher than High-level emotional data, and the proposed method can be monitored by the change in voice similarity.

Wiener filtering-based ambient noise reduction technique for improved acoustic target detection of directional frequency analysis and recording sonobuoy (Directional frequency analysis and recording 소노부이의 표적 탐지 성능 향상을 위한 위너필터링 기반 주변 소음 제거 기법)

  • Hong, Jungpyo;Bae, Inyeong;Seok, Jongwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.192-198
    • /
    • 2022
  • As an effective weapon system for anti-submarine warfare, DIrectional Frequency Analysis and Recording (DIFAR) sonobuoy detects underwater targets via beamforming with three channels composed of an omni-direcitonal and two directional channels. However, ambient noise degrades the detection performance of DIFAR sonobouy in specific direction (0°, 90°, 180°, 270°). Thus, an ambient noise redcution technique is proposed for performance improvement of acoustic target detection of DIFAR sonobuoy. The proposed method is based on OTA (Order Truncate Average), which is widely used in sonar signal processing area, for ambient noise estimation and Wiener filtering, which is widely used in speech signal processing area, for noise reduction. For evaluation, we compare mean square errors of target bearing estmation results of conventional and proposed methods and we confirmed that the proposed method is effective under 0 dB signal-to-noise ratio.

The Interlanguage Speech Intelligibility Benefit (ISIB) of English Prosody: The Case of Focal Prominence for Korean Learners of English and Natives

  • Lee, Joo-Kyeong;Han, Jeong-Im;Choi, Tae-Hwan;Lim, Injae
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.53-68
    • /
    • 2012
  • This study investigated the speech intelligibility of Korean-accented and native English focus speech for Korean and native English listeners. Three different types of focus in English, broad, narrow and contrastive, were naturally induced in semantically optimal dialogues. Seven high and seven low proficiency Korean speakers and seven native speakers participated in recording the stimuli with another native speaker. Fifteen listeners from each of Korean high & low proficiency and native groups judged audio signals of focus sentences. Results showed that Korean listeners were more accurate at identifying the focal prominence for Korean speakers' narrow focus speech than that of native speakers, and this suggests that the interlanguage speech intelligibility benefit-talker (ISIB-T) held true for narrow focus regardless of Korean speakers' and listeners' proficiency. However, Korean listeners did not outperform native listeners for Korean speakers' production of narrow focus, which did not support for the ISIB-listener (L). Broad and contrastive focus speech did not provide evidence for either the ISIB-T or ISIB-L. These findings are explained by the interlanguage shared by Korean speakers and listeners where they have established more L1-like common phonetic features and phonological representations. Once semantically and syntactically interpreted in a higher level processing in Korean narrow focus speech, the narrow focus was phonetically realized in a more intelligible way to Korean listeners due to the interlanguage. This may elicit ISIB. However, Korean speakers did not appear to make complete semantic/syntactic access to either broad or contrastive focus, which might lead to detrimental effects on lower level phonetic outputs in top-down processing. This is, therefore, attributed to the fact that Korean listeners did not take advantage over native listeners for Korean talkers and vice versa.

The Movements of Vocal Folds during Voice Onset Time of Korean Stops

  • Hong, Ki-Hwan;Kim, Hyun-Ki;Yang, Yoon-Soo;Kim, Bum-Kyu;Lee, Sang-Heon
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.17-26
    • /
    • 2002
  • Voice onset time (VOT) is defined as the time interval from the oral release of a stop consonant to the onset of glottal pulsing in the following vowel. VOT is a temporal characteristic of stop consonants that reflects the complex timing of glottal articulation relative to supraglottal articulation. There have been many reports on efforts to clarify the acoustical and physiological properties that differentiate the three types of Korean stops, including acoustic, fiberscopic, aerodynamic and electromyographic studies. In the acoustic and fiberscopic studies for stop consonants, the voice onset time and glottal width during the production of stops has been known as the longest and largest in the heavily aspirated type followed by the slightly aspirated type and unaspirated types. The thyroarytenoid and posterior cricoarytenoid muscles were physiologically inter-correlated for differentiating these types of stops. However, a review of the English literature shows that the fine movement of the mucosal edges of the vocal folds during the production of stops has not been well documented. In recent. years, a new method for high-speed recording of laryngeal dynamics by use of a digital recording system allows us to observe with fine time resolution. The movements of the vocal fold edges were documented during the period of stop production using a fiberscopic system of high speed digital images. By observing the glottal width and the visual vibratory movements of the vocal folds before voice onset, the heavily aspirated stop was characterized as being more prominent and dynamic than the slightly aspirated and unaspirated stops.

  • PDF

Characteristics of Phoniatrics in Patients with Spastic Dysarthria (경직형 마비말장애의 음성언어의학적 특성)

  • Kim, Sook-Hee;Kim, Hyun-Gi
    • Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.159-170
    • /
    • 2008
  • The purpose of this study was to find out the ability of coordination of the articulatory motor and the ability of control of the respiration and laryngeal for spastic dysarthria by acoustic analysis. The sustained of vowel /a/ and repetition of syllable /pa/ in 15 normal and 10 spastic dysarthria were measured. Multi-Speech, MDVP, and MSP were used for data recording and analysis. As a result, the mean DDK rate in the spastic group was significantly slower than in the normal. The maximum phonation time in the spastic group ($4.80{\pm}1.94$) was shorter than in the normal ($11.20{\pm}3.72$). The DDKjit in the spastic group was significantly higher than in the normal. The DDKsla was reduced in the spastic group. The mean syllable duration in the spastic group (146.2ms) was significantly longer than in the normal (75.8ms). The mean energy was reduced in the spastic group. The range of Fo was greater than in the normal. The frequency perturbation (jitter, vFo) and amplitude perturbation (shimmer, vAm) were higher than in the normal group. The NHR was higher than in the normal group. The parameters of this were significantly difference between the spastic dysarthria and the normal (p<0.05). Finally, the spastic dysarthria has short respiration, slow speech rate, and voice quality problem. The these results will help to establish a plan and the intervention of treatment.

  • PDF

Artificial Intelligence for Clinical Research in Voice Disease (후두음성 질환에 대한 인공지능 연구)

  • Jungirl, Seok;Tack-Kyun, Kwon
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.33 no.3
    • /
    • pp.142-155
    • /
    • 2022
  • Diagnosis using voice is non-invasive and can be implemented through various voice recording devices; therefore, it can be used as a screening or diagnostic assistant tool for laryngeal voice disease to help clinicians. The development of artificial intelligence algorithms, such as machine learning, led by the latest deep learning technology, began with a binary classification that distinguishes normal and pathological voices; consequently, it has contributed in improving the accuracy of multi-classification to classify various types of pathological voices. However, no conclusions that can be applied in the clinical field have yet been achieved. Most studies on pathological speech classification using speech have used the continuous short vowel /ah/, which is relatively easier than using continuous or running speech. However, continuous speech has the potential to derive more accurate results as additional information can be obtained from the change in the voice signal over time. In this review, explanations of terms related to artificial intelligence research, and the latest trends in machine learning and deep learning algorithms are reviewed; furthermore, the latest research results and limitations are introduced to provide future directions for researchers.

Restoration for Speech Records Managed by the National Archives of Korea (국가기록원 음성 기록물의 복원과 분석)

  • Oh, Sejin;Kang, Hong-Goo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.3
    • /
    • pp.269-278
    • /
    • 2013
  • The speech recording of the National Archives of Korea contains very important traces which represent modern times of Korea. But the way to be recorded by analogue is easily contaminated as time goes by. So it has to be digitalized for management and services. Consequently, restoration method of distorted speech is needed. We propose the four classes for each distortion kind and apply restoration algorithms for the cases of speech level, stationary noise and abrupt noise. As a result, speech volume adjusts to -26 dBov for only on the speech region and SNR improves above 10dB. Especially, conventional way to remove the noise is almost impossible because we need to listen to all of them but it can be more effective by adaptation of auto restoration algorithm.

Forensic Automatic Speaker Identification System for Korean Speakers (과학수사를 위한 한국인 음성 특화 자동화자식별시스템)

  • Kim, Kyung-Wha;So, Byung-Min;Yu, Ha-Jin
    • Phonetics and Speech Sciences
    • /
    • v.4 no.3
    • /
    • pp.95-101
    • /
    • 2012
  • In this paper, we introduce the automatic speaker identification system 'SPO(Supreme Prosecutors Office) Verifier'. SPO Verifier is a GMM(Gaussian mixture model)-UBM(universal background model) based automatic speaker recognition system and has been developed using Korean speakers' utterances. This system uses a channel compensation algorithm to compensate recording device characteristics. The system can give the users the ability to manage reference models with utterances from various environments to get more accurate recognition results. To evaluate the performance of SPO Verifier on Korean speakers, we compared this system with one of the most widely used commercial systems in the forensic field. The results showed that SPO Verifier shows lower EER(equal error rate) than that of the commercial system.