• Title/Summary/Keyword: Non-speech

Search Result 470, Processing Time 0.024 seconds

The Variable Acquisition of Discourse Marker Use in Korean American Speakers of English

  • Lee, Hi-Kyoung
    • English Language & Literature Teaching
    • /
    • v.11 no.2
    • /
    • pp.1-18
    • /
    • 2005
  • This study is a preliminary investigation of the nature of discourse marker acquisition in Korean American speakers of English. Discourse markers are of interest because they are not an aspect of language taught through formal instruction either to native or non-native speakers. Therefore, discourse marker use serves as indirect evidence of face-to-face interaction with native speakers and an indicator of integration. In this light, the present study examines the presence of discourse markers in Korean Americans. The markers chosen for analysis were you know, like, and I mean. The data consist of spontaneous speech elicited from interviews. Sociolinguistic variables such as age, sex, and generation (i.e., $1^{st}$, 1.5, $2^{nd}$) were examined. Results show that there appears to be interaction between the variables and discourse marker use. While all speakers showed variable acquisition of markers, younger, female, and 1.5 generation speakers were found to use discourse markers more than other speakers. Although discourse marker use is optional and thus not a linguistic feature that must be necessarily acquired, it is clear that use is pervasive and acquired differentially by English speakers irrespective of whether they are native or not.

  • PDF

Dialogue Act Classification for Non-Task-Oriented Korean Dialogues (도메인에 비종속적인 대화에서의 화행 분류)

  • Kim, Min-Jeong;Han, Kyoung-Soo;Park, Jae-Hyun;Song, Young-In;Rim, Hae-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 2006.10e
    • /
    • pp.246-253
    • /
    • 2006
  • 대화 에이전트와 관련된 지금까지의 연구는 대개 대상 도메인을 한정하고, 특정 목적을 달성하기 위해 사용자와 대화할 수 있는 에이전트에 관한 연구가 많았다. 본 연구에서는 도메인이 한정되지 않은 일반 도메인 대화에서 화행(speech act)정보를 수동으로 부착시켜 구축한 말뭉치에 대해 소개하고 이 말뭉치를 토대로 자동으로 화행을 분류할 수 있는 유용한 자질들을 선보인다. 그리고 도메인이 한정된 말뭉치와 도메인이 한정되지 않은 말뭉치를 자동으로 화행분류해 본 실험한 결과를 비교하였다.

  • PDF

A Study on Number sounds Speaker recognition using the Pitch detection and the Fuzzified pattern (피치 검출과 퍼지화 패턴을 이용한 숫자음 화자 인식에 관한 연구)

  • 김연숙;김희주;김경재
    • Journal of the Korea Society of Computer and Information
    • /
    • v.8 no.3
    • /
    • pp.73-79
    • /
    • 2003
  • This paper proposes speaker recognition algorithm which includes both the pitch detection and the fuzzified pattern matching. This study utilizes pitch pattern using a pitch and speech parameter uses binary spectrum. In this paper. makes reference pattern using fuzzy membership function in order to include time variation width for non-utterance time and performs vocal track recognition of common character using fuzzified pattern matching.

  • PDF

A Reduction Method of Computational Complexity through Adjustment the Non-Uniform Interval in the Vocoder (음성 부호화기에서 불균등 간격조절을 통한 계산량 단축법)

  • Jun, Woo-Jin
    • Proceedings of the KAIS Fall Conference
    • /
    • 2010.05a
    • /
    • pp.277-280
    • /
    • 2010
  • LSP(Line Spectrum Pairs) Parameter is used for speech analysis in vocoders or recognizers since it has advantages of constant spectrum sensitivity, low spectrum distortion and easy linear interpolation. However the method of transforming LPC(Linear Predictive Coding) into LSP is so complex that it takes much time to compute. Among conventional methods, the real root method is considerably simpler than others, but nevertheless, it still suffers from its indeterministic computation time because the root searching is processed sequentially in frequency region. We suggest a method of reducing the LSP transformation time using voice characteristics.

  • PDF

Effects of Experience on the Production of English Unstressed Vowels

  • Lee, Bo-Rim;Guion Susan G.
    • MALSORI
    • /
    • no.60
    • /
    • pp.47-66
    • /
    • 2006
  • This study examined the effect of English-language experience on Korean- and Japanese-English late learners' production of English unstressed vowels in terms of four acoustic phonetic features: F0, duration, intensity and vowel reduction. The learners manifested some improvement with experience. The native-like attainment of a phonetic feature, however, was related to the phonological status of that feature in the speakers' native language. The results suggest that the extent to which the non-native speakers' production of English unstressed vowels improved with English-language experience varied as a function of their native language background.

  • PDF

Recognizing Hand Digit Gestures Using Stochastic Models

  • Sin, Bong-Kee
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.6
    • /
    • pp.807-815
    • /
    • 2008
  • A simple efficient method of spotting and recognizing hand gestures in video is presented using a network of hidden Markov models and dynamic programming search algorithm. The description starts from designing a set of isolated trajectory models which are stochastic and robust enough to characterize highly variable patterns like human motion, handwriting, and speech. Those models are interconnected to form a single big network termed a spotting network or a spotter that models a continuous stream of gestures and non-gestures as well. The inference over the model is based on dynamic programming. The proposed model is highly efficient and can readily be extended to a variety of recurrent pattern recognition tasks. The test result without any engineering has shown the potential for practical application. At the end of the paper we add some related experimental result that has been obtained using a different model - dynamic Bayesian network - which is also a type of stochastic model.

  • PDF

ELECTOROGLOTTOGRAPH IN NORMAL ADULT ; PRELIMINARY STUDY FOR ELECTROGLOTTOGRAPHIC STUDY OF SWALLOING DISORDER (정상 성인에서의 전기성문파형 검사 ; 연하장애 환자의 전기성문파형 검사를 위한 예비연구)

  • Kim, Young-Bin;Lee, Ju-Kyung;Leem, Dae-Ho;Baek, Jin-A;Ko, Seung-O;Im, Ik-Jae;Kim, Hyun-Ki;Shin, Hyo-Keun
    • Maxillofacial Plastic and Reconstructive Surgery
    • /
    • v.30 no.5
    • /
    • pp.437-446
    • /
    • 2008
  • Electroglottography (EGG) is a simple and non-invasive technique for analyzing the vibratory patterns of the vocal folds by detecting impedance changes across the larynx. An abnormal electroglottogram is shown in patients who have a dysphagia associated with neuromuscular disorder. Electroglottography offers reliable informations for diagnosis of swallowing disorder and gives quantitative datas. The purpose of this study is to provide the normal value of electroglottography in normal adults. We took electroglottograms of 80 adults who have no problem in swallowing and utterance. EGG data were analyzed to find out the value of Pitch, Jitter and Closed quotient with a commercially available software. There were significant differences between a usual voice and loud voice in 3 measures on the EGG signalmean pitch, Avg. jitter, mean quotient. To get a proper electroglottography, phonation of a usual voice was better than a loud voice. Four measurements- S.D pitch, Avg. Jitter, Mean closed quotient, S.D closed quotient- were independent of sex for adult. Three measurements- Mean pitch, S.D pitch, Mean closed quotient - were independent of age for adult aged twenties to fifties. The Avg. Jitter of twenties appeared to be lower than those of forties and fifties. The S.D closed quotient of twenties appeared to be lower than those of thirties, forties and fifties.

A Study on the Signal Processing for Content-Based Audio Genre Classification (내용기반 오디오 장르 분류를 위한 신호 처리 연구)

  • 윤원중;이강규;박규식
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.271-278
    • /
    • 2004
  • In this paper, we propose a content-based audio genre classification algorithm that automatically classifies the query audio into five genres such as Classic, Hiphop, Jazz, Rock, Speech using digital sign processing approach. From the 20 seconds query audio file, the audio signal is segmented into 23ms frame with non-overlapped hamming window and 54 dimensional feature vectors, including Spectral Centroid, Rolloff, Flux, LPC, MFCC, is extracted from each query audio. For the classification algorithm, k-NN, Gaussian, GMM classifier is used. In order to choose optimum features from the 54 dimension feature vectors, SFS(Sequential Forward Selection) method is applied to draw 10 dimension optimum features and these are used for the genre classification algorithm. From the experimental result, we can verify the superior performance of the proposed method that provides near 90% success rate for the genre classification which means 10%∼20% improvements over the previous methods. For the case of actual user system environment, feature vector is extracted from the random interval of the query audio and it shows overall 80% success rate except extreme cases of beginning and ending portion of the query audio file.

Syllable Recognition of HMM using Segment Dimension Compression (세그먼트 차원압축을 이용한 HMM의 음절인식)

  • Kim, Joo-Sung;Lee, Yang-Woo;Hur, Kang-In;Ahn, Jum-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2
    • /
    • pp.40-48
    • /
    • 1996
  • In this paper, a 40 dimensional segment vector with 4 frame and 7 frame width in every monosyllable interval was compressed into a 10, 14, 20 dimensional vector using K-L expansion and neural networks, and these was used to speech recognition feature parameter for CHMM. And we also compared them with CHMM added as feature parameter to the discrete duration time, the regression coefficients and the mixture distribution. In recognition test at 100 monosyllable, recognition rates of CHMM +${\bigtriangleup}$MCEP, CHMM +MIX and CHMM +DD respectively improve 1.4%, 2.36% and 2.78% over 85.19% of CHMM. And those using vector compressed by K-L expansion are less than MCEP + ${\bigtriangleup}$MCEP but those using K-L + MCEP, K-L + ${\bigtriangleup}$MCEP are almost same. Neural networks reflect more the speech dynamic variety than K-L expansion because they use the sigmoid function for the non-linear transform. Recognition rates using vector compressed by neural networks are higher than those using of K-L expansion and other methods.

  • PDF

Robust Blind Source Separation to Noisy Environment For Speech Recognition in Car (차량용 음성인식을 위한 주변잡음에 강건한 브라인드 음원분리)

  • Kim, Hyun-Tae;Park, Jang-Sik
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.12
    • /
    • pp.89-95
    • /
    • 2006
  • The performance of blind source separation(BSS) using independent component analysis (ICA) declines significantly in a reverberant environment. A post-processing method proposed in this paper was designed to remove the residual component precisely. The proposed method used modified NLMS(normalized least mean square) filter in frequency domain, to estimate cross-talk path that causes residual cross-talk components. Residual cross-talk components in one channel is correspond to direct components in another channel. Therefore, we can estimate cross-talk path using another channel input signals from adaptive filter. Step size is normalized by input signal power in conventional NLMS filter, but it is normalized by sum of input signal power and error signal power in modified NLMS filter. By using this method, we can prevent misadjustment of filter weights. The estimated residual cross-talk components are subtracted by non-stationary spectral subtraction. The computer simulation results using speech signals show that the proposed method improves the noise reduction ratio(NRR) by approximately 3dB on conventional FDICA.

  • PDF