• Title/Summary/Keyword: 화자 특징

Search Result 300, Processing Time 0.023 seconds

Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition (화자인식을 위한 주파수 워핑 기반 특징 및 주파수-시간 특징 평가)

  • Choi, Young Ho;Ban, Sung Min;Kim, Kyung-Wha;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.7 no.1
    • /
    • pp.3-10
    • /
    • 2015
  • In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.

Vector Quantization based Speech Recognition Performance Improvement using Maximum Log Likelihood in Gaussian Distribution (가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상)

  • Chung, Kyungyong;Oh, SangYeob
    • Journal of Digital Convergence
    • /
    • v.16 no.11
    • /
    • pp.335-340
    • /
    • 2018
  • Commercialized speech recognition systems that have an accuracy recognition rates are used a learning model from a type of speaker dependent isolated data. However, it has a problem that shows a decrease in the speech recognition performance according to the quantity of data in noise environments. In this paper, we proposed the vector quantization based speech recognition performance improvement using maximum log likelihood in Gaussian distribution. The proposed method is the best learning model configuration method for increasing the accuracy of speech recognition for similar speech using the vector quantization and Maximum Log Likelihood with speech characteristic extraction method. It is used a method of extracting a speech feature based on the hidden markov model. It can improve the accuracy of inaccurate speech model for speech models been produced at the existing system with the use of the proposed system may constitute a robust model for speech recognition. The proposed method shows the improved recognition accuracy in a speech recognition system.

Speech/Music Discrimination Using Spectrum Analysis and Neural Network (스펙트럼 분석과 신경망을 이용한 음성/음악 분류)

  • Keum, Ji-Soo;Lim, Sung-Kil;Lee, Hyon-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.5
    • /
    • pp.207-213
    • /
    • 2007
  • In this research, we propose an efficient Speech/Music discrimination method that uses spectrum analysis and neural network. The proposed method extracts the duration feature parameter(MSDF) from a spectral peak track by analyzing the spectrum, and it was used as a feature for Speech/Music discriminator combined with the MFSC. The neural network was used as a Speech/Music discriminator, and we have reformed various experiments to evaluate the proposed method according to the training pattern selection, size and neural network architecture. From the results of Speech/Music discrimination, we found performance improvement and stability according to the training pattern selection and model composition in comparison to previous method. The MSDF and MFSC are used as a feature parameter which is over 50 seconds of training pattern, a discrimination rate of 94.97% for speech and 92.38% for music. Finally, we have achieved performance improvement 1.25% for speech and 1.69% for music compares to the use of MFSC.

A Study on the Speech Recognition for DDD Area - Name Using Vector Quantization with Time Information (시간 정보와 VQ를 이용한 DDD 지역명 인식에 관한 연구)

  • LEE S. K.;LEE K. S.;ANN T. O.;CHO H. J.;BYON Y. C.;KIM S. H.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.8 no.5
    • /
    • pp.102-112
    • /
    • 1989
  • In this paper, we proposed the study on speaker-independent isolated word recognition for DDD area-name using vector quantization and chose total 146 DDD area-name to recognize words for application of dialing system. We made the codebook using 12th LPC cepstrum coefficients and used the minsum and the minimax method to find the centroid and we applied 3 splitting rule to a codebook generation. The single section and the multi section with time information were used to generate the codebooks and the over-lapped section codebook was used, too. From the experiment result, we proved that the minsum method was better than the minimax method and the evaluation of the system yielded an accuracy of about 90 percents In case of speaker-independent.

  • PDF

A Discourse-Pragmatic Study of Preposing and Inversion in English. (전치문과 도치문의 담화화용론적인 비교)

  • 박원경
    • Korean Journal of English Language and Linguistics
    • /
    • v.3 no.1
    • /
    • pp.37-54
    • /
    • 2003
  • The purpose of this paper is to investigate the phenomena of preposing and inversion in English from the discourse-pragmatic perspectives. We claim that different types of preposing can be unified to a single process of topicalization. We also show that diverse discourse functions of inversion can be subsumed under the ‘linking’ function with the prior discourse. It is followed a comparative discussion between preposing and inversion to find what similarities the two constructions share and what differences there exist between the two. It is concluded that the choice of a syntactic forms ultimately depends on the speaker's evaluation of the information status of the knowledge store of the hearer.

  • PDF

A Study on the Playback Theatre, the Improvisational Theatre based on Storytelling (Storytelling을 기반으로 한 즉흥연극, 플레이백 씨어터(Playback Theatre) 연구)

  • Jung, Sung Hee
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.4
    • /
    • pp.532-540
    • /
    • 2017
  • This study analyzes and compares the meaning and the value of storytelling with characteristics of the Playback Theatre, which is an improvisational play. On one side, the Playback Theatre, invented by Jonathan Fox in 1975, is an improvisational and communication-based theatre format; and it usually brings in the storytelling of audiences and often entails no play script. On the other side, storytelling requires imagination and creativity; presupposes the sympathy between the speaker and the listener. In fact, this theatre format is utilizing the educational and healing effects of verbal cultures, group sacrificial ceremonies, and/or psychodrama; and it is contextualizing them in our modern society. The Playback Theatre provides the audiences with opportunities to share their own stories within the play itself. The actors first listen to the stories of audiences; and then make them into a play impromptu, right in front of the speaker(s) and other audiences. For this very reason, it is called a "play-back". In this process, the individuals and the community are invited to experience the educational and healing effects.

A Study on the Korean Syllable As Recognition Unit (인식 단위로서의 한국어 음절에 대한 연구)

  • Kim, Yu-Jin;Kim, Hoi-Rin;Chung, Jae-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.3
    • /
    • pp.64-72
    • /
    • 1997
  • In this paper, study and experiments are performed for finding recognition unit fit which can be used in large vocabulary recognition system. Specifically, a phoneme that is currently used as recognition unit and a syllable in which Korean is well characterized are selected. From comparisons of recognition experiments, the study is performed whether a syllable can be considered as recognition unit of Korean recognition system. For report of an objective result of the comparison experiment, we collected speech data of a male speaker and processed them by hand-segmentation for phoneme boundary and labeling to construct speech database. And for training and recognition based on HMM, we used HTK (HMM Tool Kit) 2.0 of commercial tool from Entropic Co. to experiment in same condition. We applied two HMM model topologies, 3 emitting state of 5 state and 6 emitting state of 8 state, in Continuous HMM on training of each recognition unit. We also used 3 sets of PBW (Phonetically Balanced Words) and 1 set of POW(Phonetically Optimized Words) for training and another 1 set of PBW for recognition, that is "Speaker Dependent Medium Vocabulary Size Recognition." Experiments result reports that recognition rate is 95.65% in phoneme unit, 94.41% in syllable unit and decoding time of recognition in syllable unit is faster by 25% than in phoneme.

  • PDF

Development and Evaluation of an Address Input System Employing Speech Recognition (음성인식 기능을 가진 주소입력 시스템의 개발과 평가)

  • 김득수;황철준;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.2
    • /
    • pp.3-10
    • /
    • 1999
  • This paper describes the development and evaluation of a Korean address input system employing automatic speech recognition technique as user interface for input Korean address. Address consists of cities, provinces and counties. The system works on a window 95 environment of personal computer with built-in soundcard. In the speech recognition part, the Continuous density Hidden Markov Model(CHMM) for making phoneme like units(PLUs) and One Pass Dynamic Programming(OPDP) algorithm is used for recognition. For address recognition, Finite State Automata(FSA) suitable for Korean address structure is constructed. To achieve an acceptable performance against the variation of speakers, microphones, and environmental noises, Maximum a posteriori(MAP) estimation is implemented in adaptation. And to improve the recognition speed, fast search method using variable pruning threshold is newly proposed. In the evaluation tests conducted for the 100 connected words uttered by 3 males the system showed above average 96.0% of recognition accuracy for connected words after adaption and recognition speed within 2 seconds, showing the effectiveness of the system.

  • PDF

ImprovementofMLLRAlgorithmforRapidSpeakerAdaptationandReductionofComputation (빠른 화자 적응과 연산량 감소를 위한 MLLR알고리즘 개선)

  • Kim, Ji-Un;Chung, Jae-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.65-71
    • /
    • 2004
  • We improved the MLLR speaker adaptation algorithm with reduction of the order of HMM parameters using PCA(Principle Component Analysis) or ICA(Independent Component Analysis). To find a smaller set of variables with less redundancy, we adapt PCA(principal component analysis) and ICA(independent component analysis) that would give as good a representation as possible, minimize the correlations between data elements, and remove the axis with less covariance or higher-order statistical independencies. Ordinary MLLR algorithm needs more than 30 seconds adaptation data to represent higher word recognition rate of SD(Speaker Dependent) models than of SI(Speaker Independent) models, whereas proposed algorithm needs just more than 10 seconds adaptation data. 10 components for ICA and PCA represent similar performance with 36 components for ordinary MLLR framework. So, compared with ordinary MLLR algorithm, the amount of total computation requested in speaker adaptation is reduced by about 1/167 in proposed MLLR algorithm.

De re context and some semantic traits of 'rago' (대물(de re) 문맥과 '-라고'의 몇 가지 의미론적 특성)

  • Min, Chanhong
    • Korean Journal of Logic
    • /
    • v.16 no.1
    • /
    • pp.61-85
    • /
    • 2013
  • The author, after introducing the concept of de re belief and discussing de re/de dicto ambiguity in belief context and modal context, concludes that modal sentences of Korean language does not show any distinctive traits against English. He, after discussing this ambiguity in negative sentence a la Russell, tries to show that Korean provides two way of negation construction, one of which corresponds to de re negation (primary occurrence in Russell's terms). De re reading makes referentially transparent context, thus permits substitutions of identicals salva veritate; De dicto reading does not. Korean ending 'rago', used with quotation verbs, speech act verbs and cognitive attitude verbs, deserves some attention in that it permits de re sentences in addition to de re/de dicto ambiguous sentences. 'Rago' also makes speaker's commitment to the content of the intensionally contained clause 'neutral', in contrast with other Korean endings such as 'um/im' and 'raneun gut' which make speaker's positive commitment. This explains why the maxim of western epistemology that knowledge presupposes truth does not hold in Korean 'rago' sentences.

  • PDF