• 제목/요약/키워드: Speech Spectrogram

검색결과 90건 처리시간 0.028초

한시의 평측법에 대한 음향음성학적 연구 (An Acoustic Study of Pitch Rules of Chinese Poetry)

  • 조성문
    • 음성과학
    • /
    • 제14권3호
    • /
    • pp.59-76
    • /
    • 2007
  • The purpose of this study is to investigate the pitch rules of Chinese poetry. Pitch rules are concerned with the high tone and the low tone. Because Chinese poetry is a fixed form of verse, it must keep pitch rules to compose Chinese poetry. But until now there has been no acoustic study of pitch rules of Chinese poetry. So, for the first time the present study investigates pitch rules of Chinese poetry acoustically. Pitch contours were analyzed from the sound spectrogram made by Praat. Results showed that actual pitch patterns did not coincide with theoretical pitch rules in reciting Chinese poetry. Therefore, in studying Chinese classics, the Chinese poetry, which has traditionally been considered to be recited according to original Chinese pitch rules, must now be considered in terms of how pitch rules may have changed over time in Korea since it was first introduce to Korean scholars.

  • PDF

스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류 (On-Line Audio Genre Classification using Spectrogram and Deep Neural Network)

  • 윤호원;신성현;장우진;박호종
    • 방송공학회논문지
    • /
    • 제21권6호
    • /
    • pp.977-985
    • /
    • 2016
  • 본 논문은 스펙트로그램과 심층 신경망을 이용한 온라인 오디오 장르 분류 방법을 제안한다. 제안한 방법은 온라인 동작을 위하여 1초 단위로 신호를 입력하여 speech, music, effect 중 하나의 장르로 분류하고, 동작의 범용성을 위하여 기존 오디오 분석에 널리 사용되는 MFCC 대신에 스펙트로그램 기반의 특성 벡터를 사용한다. 실제 TV 방송 신호를 사용하여 장르 분류 성능을 측정하였고, 제안 방법이 기존 방법보다 각 장르에 대하여 우수한 성능을 제공하는 것을 확인하였다. 특히 제안 방법은 기존 방법에서 나타나는 music과 effect 사이를 잘못 분류하는 문제점을 감소시킨다.

음소 음향학적 변화 정보를 이용한 한국어 음성신호의 자동 음소 분할 (Automatic Phonetic Segmentation of Korean Speech Signal Using Phonetic-acoustic Transition Information)

  • 박창목;왕지남
    • 한국음향학회지
    • /
    • 제20권8호
    • /
    • pp.24-30
    • /
    • 2001
  • 본 논문에서는 발음표기가 주어진 상황에서 음성 신호의 자동 음소 분할에 관한 것이며 음소의 경계를 음소 음향학적인 변화특성에 따라 3가지 형태로 분류하여 각각에 적합한 분할 알고리즘을 개발하였다. 형태 1은 묵음·유성음·무성음간의 분할이며 히스토그램분석으로 구한 문턱 값으로 초기 분할 후, 웨이블릿 계수의 SVF (Spectral Variation Function)를 이용하여 분할하였다. 형태 2는 연속적인 모음의 분할이며 각 모음변화특성을 템플릿으로 구성하여 분할에 활용하였다. 형태 3은 모음과 유성자음 혹은 유성화 자음의 분할이며 특성주파수대역의 진폭변화를 이용하여 후보구간을 정한 후, 캡스트럼 계수의 SVF를 이용하여 최종적인 분할을 수행하였다. 본 실험에서는 분할 성능을 테스트하기 위하여 한국어 PBWSpeech DB에서 342개의 단어를 자동으로 분할한 후, 수작업으로 분할한 결과와 비교하였다. 전체적인 자동 분할 성능은 20 msec내에서 81.5%의 분할성능을 보였다.

  • PDF

연속구어 내 발성 종결-개시의 음향학적 특징 - 말더듬 화자와 비말더듬 화자 비교 - (Acoustic Features of Phonatory Offset-Onset in the Connected Speech between a Female Stutterer and Non-Stutterers)

  • 한지연;이옥분
    • 음성과학
    • /
    • 제13권2호
    • /
    • pp.19-33
    • /
    • 2006
  • The purpose of this paper was to examine acoustical characteristics of phonatory offset-onset mechanism in the connected speech of female adults with stuttering and normal nonfluency. The phonatory offset-onset mechanism refers to the laryngeal articulatory gestures. Those gestures are required to mark word boundaries in phonetic contexts of the connected speech. This mechanism included 7 patterns based on the speech spectrogram. This study showed the acoustic features in the connected speech in the production of female adults with stuttering (n=1) and normal nonfluency (n=3). Speech tokens in V_V, V_H, and V_S contexts were selected for the analysis. Speech samples were recorded by Sound Forge, and the spectrographic analysis was conducted using Praat. Results revealed a stuttering (with a type of block) female exhibited more laryngealization gestures in the V_V context. Laryngealization gesture was more characterized by a complete glottal stop or glottal fry both in V_H and in V_S contexts. The results were discussed from theoretical and clinical perspectives.

  • PDF

A Novel Approach to COVID-19 Diagnosis Based on Mel Spectrogram Features and Artificial Intelligence Techniques

  • Alfaidi, Aseel;Alshahrani, Abdullah;Aljohani, Maha
    • International Journal of Computer Science & Network Security
    • /
    • 제22권9호
    • /
    • pp.195-207
    • /
    • 2022
  • COVID-19 has remained one of the most serious health crises in recent history, resulting in the tragic loss of lives and significant economic impacts on the entire world. The difficulty of controlling COVID-19 poses a threat to the global health sector. Considering that Artificial Intelligence (AI) has contributed to improving research methods and solving problems facing diverse fields of study, AI algorithms have also proven effective in disease detection and early diagnosis. Specifically, acoustic features offer a promising prospect for the early detection of respiratory diseases. Motivated by these observations, this study conceptualized a speech-based diagnostic model to aid in COVID-19 diagnosis. The proposed methodology uses speech signals from confirmed positive and negative cases of COVID-19 to extract features through the pre-trained Visual Geometry Group (VGG-16) model based on Mel spectrogram images. This is used in addition to the K-means algorithm that determines effective features, followed by a Genetic Algorithm-Support Vector Machine (GA-SVM) classifier to classify cases. The experimental findings indicate the proposed methodology's capability to classify COVID-19 and NOT COVID-19 of varying ages and speaking different languages, as demonstrated in the simulations. The proposed methodology depends on deep features, followed by the dimension reduction technique for features to detect COVID-19. As a result, it produces better and more consistent performance than handcrafted features used in previous studies.

성대형태 및 음향발현에서 성악 발성 및 판소리 발성의 비교 연구 (A Comparative Study of Western Singer's Voice and a Pansori Singer's Voice Based on Glottal Image and Acoustic Characteristics)

  • 김선숙
    • 음성과학
    • /
    • 제11권2호
    • /
    • pp.165-177
    • /
    • 2004
  • Western singers voice have been studied in music science since the early 20th century. However, Korean traditional singers voice have not yet been studied scientifically. This study is to find the physiological and acoustic characteristics of Pansori singers voices. Western singers participated for comparative purposes. Ten western singers and ten Pansori singers participated in this study. The subjects spoke and sung seven simple vowels /a, e, i, o, u, c, w/. An analysis of Glottal image was done by Scope View and acoustic characteristics of speech and singing voice were analyzed by CSL. The results are as follows: (1) Glottal gestures of Pansori singers showed asymmetric vocal folds. (2) Singing vowel formants of Pansori singers showed breathiness based on Spectrogram. (3) Music formant of western singers appeared in around 3kHz area, however, Pansori singers formant appeared in low frequency area. Modulation of vibrato showed 6 frequency per sec in case of western singers. Pansori singers showed no deep modulation of vibrato on spectrogram.

  • PDF

Complexity Reduction Algorithm of Speech Coder(EVRC) for CDMA Digital Cellular System

  • Min, So-Yeon
    • 한국멀티미디어학회논문지
    • /
    • 제10권12호
    • /
    • pp.1551-1558
    • /
    • 2007
  • The standard of evaluating function of speech coder for mobile telecommunication can be shown in channel capacity, noise immunity, encryption, complexity and encoding delay largely. This study is an algorithm to reduce complexity applying to CDMA(Code Division Multiple Access) mobile telecommunication system, which has a benefit of keeping the existing advantage of telecommunication quality and low transmission rate. This paper has an objective to reduce the computing complexity by controlling the frequency band nonuniform during the changing process of LSP(Line Spectrum Pairs) parameters from LPC(Line Predictive Coding) coefficients used for EVRC(Enhanced Variable-Rate Coder, IS-127) speech coders. Its experimental result showed that when comparing the speech coder applied by the proposed algorithm with the existing EVRC speech coder, it's decreased by 45% at average. Also, the values of LSP parameters, Synthetic speech signal and Spectrogram test result were obtained same as the existing method.

  • PDF

말소리장애 아동의 말명료도와 음향학적 측정치 간 상관관계 (The Correlation between Speech Intelligibility and Acoustic Measurements in Children with Speech Sound Disorders)

  • 강은영
    • 대한통합의학회지
    • /
    • 제6권4호
    • /
    • pp.191-206
    • /
    • 2018
  • Purpose : This study investigated the correlation between speech intelligibility and acoustic measurements of speech sounds produced by the children with speech sound disorders and children without any diagnosed speech sound disorder. Methods : A total of 60 children with and without speech sound disorders were the subjects of this study. Speech samples were obtained by having the subjects? speak meaningful words. Acoustic measurements were analyzed on a spectrogram using the Multi-speech 3700 program. Speech intelligibility was determined according to a listener's perceptual judgment. Results : Children with speech sound disorders had significantly lower speech intelligibility than those without speech sound disorders. The intensity of the vowel /u/, the duration of the vowel /${\omega}$/, and the second formant of the vowel /${\omega}$/ were significantly different between both groups. There was no difference in voice onset time between the groups. There was a correlation between acoustic measurements and speech intelligibility. Conclusion : The results of this study showed that the speech intelligibility of children with speech sound disorders was affected by intensity, word duration, and formant frequency. It is necessary to complement clinical setting results using acoustic measurements in addition to evaluation of speech intelligibility.

Speech Denoising via Low-Rank and Sparse Matrix Decomposition

  • Huang, Jianjun;Zhang, Xiongwei;Zhang, Yafei;Zou, Xia;Zeng, Li
    • ETRI Journal
    • /
    • 제36권1호
    • /
    • pp.167-170
    • /
    • 2014
  • In this letter, we propose an unsupervised framework for speech noise reduction based on the recent development of low-rank and sparse matrix decomposition. The proposed framework directly separates the speech signal from noisy speech by decomposing the noisy speech spectrogram into three submatrices: the noise structure matrix, the clean speech structure matrix, and the residual noise matrix. Evaluations on the Noisex-92 dataset show that the proposed method achieves a signal-to-distortion ratio approximately 2.48 dB and 3.23 dB higher than that of the robust principal component analysis method and the non-negative matrix factorization method, respectively, when the input SNR is -5 dB.

무제한 음성합성기를 위한음성 분석 장치 (Speech Analysis Tools for Text-to-Speech Synthesizer)

  • 김재인
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1995년도 제12회 음성통신 및 신호처리 워크샵 논문집 (SCAS 12권 1호)
    • /
    • pp.115-118
    • /
    • 1995
  • 무제한 음성합성기를 구현하기 위하여 꼭 필요한 음성분석장치의 개발에 대하여 논하엿다. 이 분석장치는 신호처리 보드를 사용하여 PC에서 사용할 수 있도록 되어 있으며, 음성의 A/D, D/A 및 spectrogram display는 물론 pitch pulse 위치를 Glottal instint closure에 맞추어 삽입할 수 있어 linear prediction base의 무제한 합성기에서 필요한 음성 data base를 구축하기 용이하도록 개발하였다. 또한 음성인식을 위한 음성 DB나 현재 사용중인 ARS를 구축하고자 할 때에도 적은 노력과 시간이 소요되도록 하였다.

  • PDF