• Title/Summary/Keyword: Spectrogram Analysis

Search Result 91, Processing Time 0.024 seconds

An Acoustic Study of Prosodic Features of Korean Spoken Language and Korean Folk Song (Minyo) (언어와 민요의 운율 자질에 관한 음향음성학적 연구)

  • Koo, Hee-San
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.133-144
    • /
    • 2003
  • The purpose of this acoustic experimental study was to investigate interrelation between prosodic features of Korean spoken language and those of Korean folk songs. The words of Changbutaryoung were spoken for analysis of spoken language by three female graduate students and the song was sung for musical features by three Kyunggi Minyo singers. Pitch contours were analyzed from sound spectrogram made by Pitch Works. Results showed that special musical voices (breaking, tinkling, vibrating, etc.) and tunes (rising, falling, level, etc) of folk song were discovered at the same place where accents of spoken language came. It appeared that, even though the patterns of pitch contour were different from each other, there was positive interrelation between prosodic features of Korean spoken language and those of Korean folk songs.

  • PDF

An Acoustic Analysis of Vowels for Severe-profound Hearing Impaired Children (최고도이상의 청력손실을 가진 아동의 모음음형대 분석)

  • Huh, Myung-Jin
    • Speech Sciences
    • /
    • v.14 no.2
    • /
    • pp.65-71
    • /
    • 2007
  • The severe-profound hearing impaired children have various disorders in everday communication due to the lack of hearing feedback. Especially, their speech produced unstable voice, omission and distortion of articulation, pitch break, cul-de-sac voice, and so on so that they were difficult to accurately deliver an intended message. This study attempts to analyze the acoustic characteristics of 4 vowel sounds produced by 35 severe-profound hearing impaired children using CSL(Computerized Speech Lab, Model 4300b). The formant data were obtained from the spectrogram and analyzed data by 12 formant filter and auto-correlation among the formants. Results showed that the hearing impaired children's formant values came out very high. They produced the vowels at the mode of hypertension with unstable voice. In order to improve their speech, they would need some adequate auditory feedback.

  • PDF

Studies on the Chemical Components of Fruits of Forsythia Koreana NAKAI (II) Ocurrence of betaine in the fruits of Forsythia Koreana (Forsythia koreana NAKAI 씨 (토연교)의 성분에 관한 연구 (II) (Betaine의 분리 및 확인))

  • Cang Sae Hee;Kim Jae Soon;Huh Tae Soung
    • Journal of the Korean Chemical Society
    • /
    • v.15 no.1
    • /
    • pp.1-3
    • /
    • 1971
  • The methanol extract of the fruits of Forsythia Koreana NAKAI was separated and purified. The quaternary base chloride was obtained. Through the mass spectrogram, ultraviolet spectra, infrared spectra, elemental analysis and qualitative tests it was identified as betaine hydrochloride.

  • PDF

The Computation Reduction Algorithm Independent of the Language for CELP Vocoders (각국 언어 특성에 독립적인 CELP 계열 보코더에서의 계산량 단축 알고리즘)

  • Ju, Sang-Gyu
    • Proceedings of the KAIS Fall Conference
    • /
    • 2010.05a
    • /
    • pp.257-260
    • /
    • 2010
  • In this paper, we propose the computation reduction methods of LSP(Line spectrum pairs) transformation that is mainly used in CELP vocoders. In order to decrease the computational time in real root method the characteristic of four proposed algorithms is as the following. First, scheme to reduce the LSP transformation time uses mel scale. Developed the second scheme is the control of searching order by the distribution characteristic of LSP parameters. Third, scheme to reduce the LSP transformation time uses voice characteristics. Developed the fourth scheme is the control of searching interval and order by the distribution characteristic of LSP parameters. As a result of searching time, computational amount, transformed LSP parameters, SNR, MOS test, waveform of synthesized speech, spectrogram analysis, searching time is reduced about 37.5%, 46.21%, 46.3%, 51.29% in average, computational amount is reduced about 44.76%, 49.44%, 47.03%, 57.40%. But the transformed LSP parameters of the proposed methods were the same as those of real root method.

  • PDF

Formant Measurements of Complex Waves and Vowels Produced by Students (복합음과 대학생이 발음한 모음 포먼트 측정)

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.39-51
    • /
    • 2008
  • Formant measurements are one of the most important factors to objectively test cross-linguistic differences among vowels produced by speakers of any given languages. However, many speech analysis softwares present erroneous estimates and some researchers use them without any verification procedures. The purposes of this paper are to examine formant measurements of complex waves which were synthesized from the average formant values of five Korean vowels using three default methods in Praat and to verify the measured values of the five vowels produced by 20 students using one of the methods. Variances along the time axis are discussed after determining absolute difference sum from the 1/3 vowel duration point. Results show that there were smaller measurement errors by the burg method. Also, greater errors were observed in the sl or lpc methods mostly caused by the inappropriate formant settings. Formant measurement deviations were greater in those vowels produced by the female students than those of the male students, which were mostly attributed to the settings for the vowels /o, u/. Formant settings can best be corrected by changing the number of formants to the number of visible dark bands on the spectrogram. Those results suggest that researchers should check the validity of the estimates from the speech analysis software. Further studies are recommended on the perception test of the original sound with the synthesized sound by the estimated formant values.

  • PDF

Perturbation and Perceptual Analysis of Pathological Sustained Vowels according to Signal Typing

  • Lee, Ji-Yeoun;Choi, Seong-Hee;Jiang, Jack J.;Hahn, Min-Soo;Choi, Hong-Shik
    • Phonetics and Speech Sciences
    • /
    • v.2 no.2
    • /
    • pp.109-115
    • /
    • 2010
  • In this paper, we investigate a signal typing on the basis of visual impression of distinctive spectrogram. Pathological voices are classified into signal type 1, 2, 3, or 4 to estimate perturbation parameters and to mark perceptual rating based on Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). The results suggest that perturbation analysis can be applied to only type 1 and 2 signals and the perceptual ratings of overall grade increase with each signal type, overall. A good inter-rater reliability is showed among three raters. We recommend that pathological voices should be marked the signal typing and CAPE-V, together, to definitely describe the characteristics of pathological voices.

  • PDF

Determination of Arsenic in Korean human liver and manganese, copper in Vitamin prepartions by neutron action analysis (중성자(中性子) 방사화(放射化) 분석법(分析法)에 의(依)한 한국인(韓國人) 간장중(肝臟中)의 비소(砒素) 및 Vitamin제제중(製劑中)의 금속(金屬)(CU, Mn)의 정량(定量))

  • Oh, Soo-Chang
    • Journal of Pharmaceutical Investigation
    • /
    • v.4 no.4
    • /
    • pp.17-25
    • /
    • 1974
  • 1. Neutron acivation analysis of arsenic contained in Korean human liver was studied in the view point of forensic chemistry, using 12 corpses. A sample of 1g was irradiated for 30 mins. in a neutron flux of $1.2{\times}10^{12}n/cm^2/sec$, followed by nitric-sulfuric acid digestion and then by Gutzeit separation. Radio activity was detected by it's scintillation counter. The arsenic content in the liver was found to be $0.01{\mu}g/g$ to $0.15{\mu}g/g$. 2. A rapid and convenient method for the radiochemical determination of minerals by neutron activation analysis was established. After neutron irradiation to the standard soln. of Cu and Mn in pneumatic tube (neutron flux : $1.2{\times}10^{12}n/cm^2/sec$), Cu and Mn were determined by estimating the ratio of the widths under energy peak area in ${\gamma}-ray-spectrogram$. When the standard soln. of Mn and Cu is irradiated for 15 mins. to 18 hrs., recovery test shows that the relative errors are 5.1% and 4.5% for copper and manganese, respectively.

  • PDF

Comparative Analysis for General and Estrus-related Vocalizations in Sows (모돈의 일반 발성음과 발정기 특이음의 비교분석)

  • Jeon, J.H.;Yeon, S.C.;Chang, H.H.
    • Journal of Animal Science and Technology
    • /
    • v.47 no.1
    • /
    • pp.133-140
    • /
    • 2005
  • The aim of this study was to divide vocalizations of sows into general(GVs) and estrus-related vocalizations( EVs) and to find out their phonetic characteristics. Ten sows(Landrace) were recorded using digital video recorders twice daily(06: 00 - 08 : 00h and 17: 00 - 19 : 00h) during the anestrus and estrus periods. The GVs and EVs were divided based on the shapes of spectrum and spectrogram. The GVs and EVs were identified as 5 and 3 types, respectively. Pitch, formant I, formant 2, and formant 3 between GVs and EVs were not significantly different(P> 0.05), whereas intensity(P < 0.001), duration(P < 0.05), and formant 4(P < 0.01) were significantly different. Three parameter groups(Group I : Formant vector alone, Group II: Formant veetor+ parameters from time signal, Group III: Formant vector+parameters from time signal-parameters eliminated by stepwise discriminant analysis backward) were compared by discriminant function analysis. The classification system adopted in the Group II represented the higher discrimination rate than those in other groups(Group I : 76.1 0/0, Group II : 88.1 0/0, Group Ill: 87.3 %). These results suggest that EVs are present and intensity, formant 2, and formant 4 are available parameters for discrimination of EVs in sows.

Performance Comparison of State-of-the-Art Vocoder Technology Based on Deep Learning in a Korean TTS System (한국어 TTS 시스템에서 딥러닝 기반 최첨단 보코더 기술 성능 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.2
    • /
    • pp.509-514
    • /
    • 2020
  • The conventional TTS system consists of several modules, including text preprocessing, parsing analysis, grapheme-to-phoneme conversion, boundary analysis, prosody control, acoustic feature generation by acoustic model, and synthesized speech generation. But TTS system with deep learning is composed of Text2Mel process that generates spectrogram from text, and vocoder that synthesizes speech signals from spectrogram. In this paper, for the optimal Korean TTS system construction we apply Tacotron2 to Tex2Mel process, and as a vocoder we introduce the methods such as WaveNet, WaveRNN, and WaveGlow, and implement them to verify and compare their performance. Experimental results show that WaveNet has the highest MOS and the trained model is hundreds of megabytes in size, but the synthesis time is about 50 times the real time. WaveRNN shows MOS performance similar to that of WaveNet and the model size is several tens of megabytes, but this method also cannot be processed in real time. WaveGlow can handle real-time processing, but the model is several GB in size and MOS is the worst of the three vocoders. From the results of this study, the reference criteria for selecting the appropriate method according to the hardware environment in the field of applying the TTS system are presented in this paper.

Feasibility of Deep Learning-Based Analysis of Auscultation for Screening Significant Stenosis of Native Arteriovenous Fistula for Hemodialysis Requiring Angioplasty

  • Jae Hyon Park;Insun Park;Kichang Han;Jongjin Yoon;Yongsik Sim;Soo Jin Kim;Jong Yun Won;Shina Lee;Joon Ho Kwon;Sungmo Moon;Gyoung Min Kim;Man-deuk Kim
    • Korean Journal of Radiology
    • /
    • v.23 no.10
    • /
    • pp.949-958
    • /
    • 2022
  • Objective: To investigate the feasibility of using a deep learning-based analysis of auscultation data to predict significant stenosis of arteriovenous fistulas (AVF) in patients undergoing hemodialysis requiring percutaneous transluminal angioplasty (PTA). Materials and Methods: Forty patients (24 male and 16 female; median age, 62.5 years) with dysfunctional native AVF were prospectively recruited. Digital sounds from the AVF shunt were recorded using a wireless electronic stethoscope before (pre-PTA) and after PTA (post-PTA), and the audio files were subsequently converted to mel spectrograms, which were used to construct various deep convolutional neural network (DCNN) models (DenseNet201, EfficientNetB5, and ResNet50). The performance of these models for diagnosing ≥ 50% AVF stenosis was assessed and compared. The ground truth for the presence of ≥ 50% AVF stenosis was obtained using digital subtraction angiography. Gradient-weighted class activation mapping (Grad-CAM) was used to produce visual explanations for DCNN model decisions. Results: Eighty audio files were obtained from the 40 recruited patients and pooled for the study. Mel spectrograms of "pre-PTA" shunt sounds showed patterns corresponding to abnormal high-pitched bruits with systolic accentuation observed in patients with stenotic AVF. The ResNet50 and EfficientNetB5 models yielded an area under the receiver operating characteristic curve of 0.99 and 0.98, respectively, at optimized epochs for predicting ≥ 50% AVF stenosis. However, Grad-CAM heatmaps revealed that only ResNet50 highlighted areas relevant to AVF stenosis in the mel spectrogram. Conclusion: Mel spectrogram-based DCNN models, particularly ResNet50, successfully predicted the presence of significant AVF stenosis requiring PTA in this feasibility study and may potentially be used in AVF surveillance.