• 제목/요약/키워드: speech analysis

검색결과 1,592건 처리시간 0.026초

음성 데이터 전처리 기법에 따른 뉴로모픽 아키텍처 기반 음성 인식 모델의 성능 분석 (Performance Analysis of Speech Recognition Model based on Neuromorphic Architecture of Speech Data Preprocessing Technique)

  • 조진성;김봉재
    • 한국인터넷방송통신학회논문지
    • /
    • 제22권3호
    • /
    • pp.69-74
    • /
    • 2022
  • 뉴로모픽 아키텍처에서 동작하는 SNN (Spiking Neural Network) 은 인간의 신경망을 모방하여 만들어졌다. 뉴로모픽 아키텍처 기반의 뉴로모픽 컴퓨팅은 GPU를 이용한 딥러닝 기법보다 상대적으로 낮은 전력을 요구한다. 이와 같은 이유로 뉴로모픽 아키텍처를 이용하여 다양한 인공지능 모델을 지원하고자 하는 연구가 활발히 일어나고 있다. 본 논문에서는 음성 데이터 전처리 기법에 따른 뉴로모픽 아키텍처 기반의 음성 인식 모델의 성능 분석을 진행하였다. 실험 결과 푸리에 변환 기반 음성 데이터 전처리시 최대 84% 정도의 인식 정확도 성능을 보임을 확인하였다. 따라서 뉴로모픽 아키텍처 기반의 음성 인식 서비스가 효과적으로 활용될 수 있음을 확인하였다.

QR 코드에 음성 데이터 삽입을 위한 AMR 압축 비트열 분석 (Analysis of AMR Compressed Bit Stream for Insertion of Voice Data in QR Code)

  • 오은주;조현지;정현아;배정은;유훈
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2018년도 추계학술대회
    • /
    • pp.490-492
    • /
    • 2018
  • 본 논문은 음성 데이터를 QR 코드에 입력 및 전송하는 기법을 연구하기 위해 실생활에 가장 많이 사용되는 AMR 음성 데이터를 분석한 결과를 제공한다. AMR은 HEADER와 Speech Data로 구성되어 있고, 비트 형식으로 전송되고 있으며 총 8개의 비트 전송률 모드를 갖고 있다. HEADER에는 Speech Data의 모드 정보가 포함되어 있으며 모드에 따라 Speech Data의 길이는 달라진다. 그 중 QR 코드에 삽입하기 가장 적절한 전송률 모드를 선택하고 해당 모드에 대한 분석을 제공한다. 각 모드에 대한 분석 및 실험을 통해 추후 음성 데이터에 대해 더 높은 압축률을 보이는 것이 최종 목표이다. 그럼으로써 음성 데이터를 보다 효율적으로 전송할 수 있다는 점에서 성능 개선을 보인다.

  • PDF

조음도를 이용한 발음훈련기기의 개발 (Development of Speech Training Aids Using Vocal Tract Profile)

  • 박상희;김동준;이재혁;윤태성
    • 대한전기학회논문지
    • /
    • 제41권2호
    • /
    • pp.209-216
    • /
    • 1992
  • Deafs train articulation by observing mouth of a tutor, sensing tactually the motions of the vocal organs, or using speech training aids. Present speech training aids for deafs can measure only single speech parameter, or display only frequency spectra in histogram of pseudo-color. In this study, a speech training aids that can display subject's articulation in the form of a cross section of the vocal organs and other speech parameters together in a single system is to be developed and this system makes a subject know where to correct. For our objective, first, speech production mechanism is assumed to be AR model in order to estimate articulatory motions of the vocal organs from speech signal. Next, a vocal tract profile model using LP analysis is made up. And using this model, articulatory motions for Korean vowels are estimated and displayed in the vocal tract profile graphics.

  • PDF

Nasometer 활용 바이오피드백 기법을 이용한 비인강폐쇄전환자의 치험 사례 (Speech treatment of velopharyngeal insufficiency using biofeedback technique with NM II; A case report)

  • 양지형;최진영
    • 대한구순구개열학회지
    • /
    • 제8권1호
    • /
    • pp.45-52
    • /
    • 2005
  • Velopharyngeal Insufficiency(VPI); the failure of velum, the lateral wall and the posterior pharyngeal wall to separate the nasal cavity from pharyngeal cavity during speech, can be caused by congenital conditions include cleft palate, submucous cleft palate and congenital palatal insufficiency. Speech problems of VPI are characterized by hypernasality, nasal air emission, increased nasal air flow and decreased intelligibility. These speech problems of VPI can be treated with the surgical procedure, the application of temporary prosthesis and speech therapy. Biofeedback technique with Nasometer is a speech treatment method of VPI that commonly used as one component of a comprehensive procedure for improvement of speech in patients with VPI. In this article describes a case of VPI treated by biofeedback technique with Nasometer; which showed satisfactory result in nasalance and formant analysis after the speech therapy during 9 months.

  • PDF

HMM 기반의 한국어 합성음에 대한 PESQ 및 MOS 평가의 상관도 분석 (Correlation Analysis of PESQ and MOS Evaluation for HMM-based Synthetic Korean Speech)

  • 임창송;배건성
    • 말소리와 음성과학
    • /
    • 제2권1호
    • /
    • pp.71-75
    • /
    • 2010
  • The PESQ is an objective speech quality evaluation measure that is known to have a high correlation with a subjective speech quality measure such as MOS. To examine whether it could be useful as an objective quality measure of synthetic speech, we carried out both subjective evaluation tests with MOS and DMOS and an objective evaluation test with PESQ for HMM-based Korean synthetic speech signals and analyzed the correlation between them. Experimental results have shown that the PESQ has correlations of 0.87 with MOS and 0.92 with DMOS. It means that the PESQ holds much promise for evaluating the quality of synthetic Korean speech.

  • PDF

말소리 변조 스크립트를 이용한 호감도 청취평가 특징 (Characteristics of the auditory evaluation of good impression using speech manipulation scripts)

  • 권순복
    • 말소리와 음성과학
    • /
    • 제8권4호
    • /
    • pp.131-138
    • /
    • 2016
  • This study analyzes the characteristics of good impression using speech manipulation scripts and investigates the characteristics of preferred speech voice. Fourty male and female college students participated in this study. They have been exposed to the Gyeongsang dialect spoken by their friends and family for more than 15 years. Two sample voices(1 male and 1 female), considered as giving good impression, were subject to voice analysis. Two students were asked to read the sample paragraph of 'Walking' and their voice samples were analyzed through Praat. The collected speech data were manipulated into 4 different sets by changing pitch level, degree of loudness and speech rate. First, both men and women received good impression more from pitch-lowered sound than from the original one. Second, men tended to receive good impression more from slightly louder voice than from the natural-pitched one. Third, it was shown that men often felt more drowned to a voice at slightly faster speech rate than at the original speech rate. Overall, both male and female listeners favored lower pitch over the original pitch. Men tended to prefer louder voice sound while women preferred less loud one. Men received better impression at a lower speech rate but women at a faster speech rate.

Knowledge-driven speech features for detection of Korean-speaking children with autism spectrum disorder

  • Seonwoo Lee;Eun Jung Yeo;Sunhee Kim;Minhwa Chung
    • 말소리와 음성과학
    • /
    • 제15권2호
    • /
    • pp.53-59
    • /
    • 2023
  • Detection of children with autism spectrum disorder (ASD) based on speech has relied on predefined feature sets due to their ease of use and the capabilities of speech analysis. However, clinical impressions may not be adequately captured due to the broad range and the large number of features included. This paper demonstrates that the knowledge-driven speech features (KDSFs) specifically tailored to the speech traits of ASD are more effective and efficient for detecting speech of ASD children from that of children with typical development (TD) than a predefined feature set, extended Geneva Minimalistic Acoustic Standard Parameter Set (eGeMAPS). The KDSFs encompass various speech characteristics related to frequency, voice quality, speech rate, and spectral features, that have been identified as corresponding to certain of their distinctive attributes of them. The speech dataset used for the experiments consists of 63 ASD children and 9 TD children. To alleviate the imbalance in the number of training utterances, a data augmentation technique was applied to TD children's utterances. The support vector machine (SVM) classifier trained with the KDSFs achieved an accuracy of 91.25%, surpassing the 88.08% obtained using the predefined set. This result underscores the importance of incorporating domain knowledge in the development of speech technologies for individuals with disorders.

강인 음성 인식을 위한 가중화된 음원 분산 및 잡음 의존성을 활용한 보조함수 독립 벡터 분석 기반 음성 추출 (Speech extraction based on AuxIVA with weighted source variance and noise dependence for robust speech recognition)

  • 신의협;박형민
    • 한국음향학회지
    • /
    • 제41권3호
    • /
    • pp.326-334
    • /
    • 2022
  • 이 논문에서는 배경 잡음이 포함되는 환경에서 강인한 음성 인식을 하기 위한 전처리 단계로서 쓰이는 목표 음성 향상 방법을 제안한다. 보조 함수 기반의 독립 벡터 분석(Auxiliary-function-based Independent Vector Analysis, AuxIVA) 기법을 기반으로 가중 공분산 행렬에서 시간에 따라 변하는 분산에 의해서 가중치가 결정된다. 목표 음성에 대한 시간-주파수별 기여도를 나타내는 마스크를 통해 분산의 크기를 조절한다. 이러한 마스크는 음성 향상을 위해서 학습된 신경망 혹은 목표 화자로부터의 직선 성분의 기여도를 찾기 위한 확산성으로부터 추정할 수 있다. 이에 더하여 둘러싼 잡음에 대한 출력들은 서로 다차원 독립 성분 분석을 도입하여 의존성을 주어 안정적으로 노이즈 성분을 추출할 수 있다. 이 AuxIVA 기반의 목표 음성 추출 알고리즘은 또한 노이즈에 대해서 비음수 행렬 분해(Non-negative Matrix Factorization, NMF)를 비음수 텐서 분해(Non-negative Tensor Factorization, NTF)로 확장하여 독립 단순 행렬 분석(Independent Low-Rank Matrix Analysis, ILRMA)의 틀에서도 수행될 수 있다. 이러한 확장을 통해서 여전히 잡음 출력 채널에서의 채널간 의존성을 유지할 수 있다. CHiME-4데이터셋에 대한 실험 결과는 소개된 알고리즘에 대한 효과를 보여준다.

영어의 강음절(강세 음절)과 한국어 화자의 단어 분절 (Strong (stressed) syllables in English and lexical segmentation by Koreans)

  • 김선미;남기춘
    • 말소리와 음성과학
    • /
    • 제3권1호
    • /
    • pp.3-14
    • /
    • 2011
  • It has been posited that in English, native listeners use the Metrical Segmentation Strategy (MSS) for the segmentation of continuous speech. Strong syllables tend to be perceived as potential word onsets for English native speakers, which is due to the high proportion of strong syllables word-initially in the English vocabulary. This study investigates whether Koreans employ the same strategy when segmenting speech input in English. Word-spotting experiments were conducted using vowel-initial and consonant-initial bisyllabic targets embedded in nonsense trisyllables in Experiment 1 and 2, respectively. The effect of strong syllable was significant in the RT (reaction times) analysis but not in the error analysis. In both experiments, Korean listeners detected words more slowly when the word-initial syllable is strong (stressed) than when it is weak (unstressed). However, the error analysis showed that there was no effect of initial stress in Experiment 1 and in the item (F2) analysis in Experiment 2. Only the subject (F1) analysis in Experiment 2 showed that the participants made more errors when the word starts with a strong syllable. These findings suggest that Koran listeners do not use the Metrical Segmentation Strategy for segmenting English speech. They do not treat strong syllables as word beginnings, but rather have difficulties recognizing words when the word starts with a strong syllable. These results are discussed in terms of intonational properties of Korean prosodic phrases which are found to serve as lexical segmentation cues in the Korean language.

  • PDF

CSL Computerized Speech Lab - Model 4300B Software version 5.X

  • Ahn, Cheol-Min
    • 대한음성언어의학회:학술대회논문집
    • /
    • 대한음성언어의학회 1995년도 제4회 학술대회 심포지움 및 워크샵
    • /
    • pp.154-164
    • /
    • 1995
  • CSL, Model 4300B is a highly flexible audio processing package designed to provide a wide variety of speech analysis operations for both new and sophisticated users. Operations include 1) Data acquisition 2) File management 3) Graphics 4) Numerical display 5) Audio output 6) Signal editing 7) A variety of analysis functions, External module include 1) Input control B) Output control 3) Jacks, Software include 1) Wide range of speech display manipulation 2) Editing 3) Analysis (omitted)

  • PDF