• Title/Summary/Keyword: Speech speed

Search Result 238, Processing Time 0.027 seconds

Design of pitch parameter search architecture for a speech coder using dual MACs (Dual MAC을 이용한 음성 부호화기용 피치 매개변수 검색 구조 설계)

  • 박주현;심재술;김영민
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.33A no.5
    • /
    • pp.172-179
    • /
    • 1996
  • In the paper, QCELP (qualcomm code excited linear predictive), CDMA (code division multiple access)'s vocoder algorithm, was analyzed. And then, a ptich parameter seaarch architecture for 16-bit programmable DSP(digital signal processor) for QCELP was designed. Because we speed up the parameter search through high speed DSP using two MACs, we can satisfy speech codec specifiction for the digital celluar. Also, we implemented in FIFO(first-in first-out) memory using register file to increase the access time of data. This DSP was designed using COMPASS, ASIC design tool, by top-down design methodology. Therefore, it is possible to cope with rapid change at mobile communication market.

  • PDF

Influences of Unilateral Mandibular Block Anesthesia on Motor Speech Abilities (편측 하악전달마취가 운동구어능력에 미치는 영향)

  • Yang, Seung-Jae;Seo, In-Hyo;Kim, Mee-Eun;Kim, Ki-Suk
    • Journal of Oral Medicine and Pain
    • /
    • v.31 no.1
    • /
    • pp.59-67
    • /
    • 2006
  • There exist patients complaining speech problem due to dysesthesia or anesthesia following dental surgical procedure accompanied by local anesthesia in clinical setting. However, it is not clear whether sensory problems in orofacial region may have an influence on motor speech abilities. The purpose of this study was to investigate whether transitory sensory impairment of mandibular nerve by local anesthesia may influence on the motor speech abilities and thus to evaluate possibility of distorted motor speech abilities due to dysesthesia of mandibular nerve. The subjects in this study consisted of 7 men and 3 women, whose right inferior alveolar nerve, lingual nerve and long buccal nerve was anesthetized by 1.8 mL lidocaine containing 1:100,000 epinephrine. All the subjects were instructed to self estimate degree of anesthesia on the affected region and speech discomfort with VAS before anesthesia, 30 seconds, 30, 60, 90, 120 and 150 minutes after anesthesia. In order to evaluate speech problems objectively, the words and sentences suggested to be read for testing speech speed, diadochokinetic rate, intonation, tremor and articulation were recorded according to the time and evaluated using a Computerized Speech $Lab^{(R)}$. Articulation was evaluated by a speech language clinician. The results of this study indicated that subjective discomfort of speech and depth of anesthesia was increased with time until 60 minutes after anesthesia and then decreased. Degree of subjective speech discomfort was correlated with depth of anesthesia self estimated by each subject. On the while, there was no significant difference in objective assessment item including speech speed, diadochokinetic rate, intonation and tremor. There was no change in articulation related with anesthesia. Based on the results of this study, it is not thought that sensory impairment of unilateral mandibular nerve deteriorates motor speech abilities in spite of individual's complaint of speech discomfort.

Emotion Recognition Based on Frequency Analysis of Speech Signal

  • Sim, Kwee-Bo;Park, Chang-Hyun;Lee, Dong-Wook;Joo, Young-Hoon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.2 no.2
    • /
    • pp.122-126
    • /
    • 2002
  • In this study, we find features of 3 emotions (Happiness, Angry, Surprise) as the fundamental research of emotion recognition. Speech signal with emotion has several elements. That is, voice quality, pitch, formant, speech speed, etc. Until now, most researchers have used the change of pitch or Short-time average power envelope or Mel based speech power coefficients. Of course, pitch is very efficient and informative feature. Thus we used it in this study. As pitch is very sensitive to a delicate emotion, it changes easily whenever a man is at different emotional state. Therefore, we can find the pitch is changed steeply or changed with gentle slope or not changed. And, this paper extracts formant features from speech signal with emotion. Each vowels show that each formant has similar position without big difference. Based on this fact, in the pleasure case, we extract features of laughter. And, with that, we separate laughing for easy work. Also, we find those far the angry and surprise.

A Study on Voice Web Browsing in JAVA Beans Component Architecture Automatic Speech Recognition Application System. (JAVABeans Component 구조를 갖는 음성인식 시스템에서의 Voice Web Browsing에 관한 연구)

  • 장준식;윤재석
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.05a
    • /
    • pp.273-276
    • /
    • 2003
  • In this study, Automatic Speech Recognition Application System is designed and implemented to realize transformation from present GUI-centered web services to VUI-centered web service. Due to ASP's restriction with web in speed and implantation, in this study, Automatic Speech Recognition Application System with Java beans Component Architecture is devised and studied. Also the voice web browsing which is able to transfer voice and graphic information simultaneously is studied using Remote AWT(Abstract Windows Toolkit).

  • PDF

Design of a variable rate speech codec for the W-CDMA system (W-CDMA 시스템을 위한 가변율 음성코덱 설계)

  • 정우성
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.142-147
    • /
    • 1998
  • Recently, 8 kb/s CS-ACELP coder of G.729 is atandardized by ITU-T SG15 and it has been reported that the speech quality of G729 is better than or equal to that of 32kb/s ADPCM. However G.729 is the fixed rate speech coder, and it does not consider the property of voice activity in mutual conversation. If we use the voice activity, we can reduce the average bit rate in half without any degradations of the speech quality. In this paper, we propose an efficient variable rate algorithm for G.729. The variable rate algorithm consists of two main subjects, the rate determination algorithm and algorithm, we combine the energy-thresholding method, the phonetic segmentation method by integration of various feature parameters obtained through the analysis procedure, and the variable hangover period method. Through the analysis of noise features, the 1 kb/s sub rate coder is designed for coding the background noise signal. So, we design the 4 kb/s sub rate coder for the unvoiced parts. The performance of the variable rate algorithm is evaluated by the comparison of speed quality and average bit rate with G.729. Subjective quality test is also done by MOS test. Conclusively, it is verified that the proposed variable rate CS-ACELP coder produced the same speech quality as G.729, at the average bit rate of 4.4 kb/s.

  • PDF

Speech Problems of English Laterals by Korean Learners based on the acoustic Characteristics (한국인 영어 학습자의 설측음 발화의 문제점: 음향음성학적 특성을 중심으로)

  • Kim, Chong-Gu;Kim, Hyun-Gi;Jeon, Byung-Man
    • Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.127-138
    • /
    • 2000
  • The aim of this paper is to find the speech problems of English Laterals by Korean learners and to contribute to the effective pronunciation education with visualizing the pronunciation. In this paper we analyzed 18 words including lateral sounds which were divided into such as: initial, initial consonant cluster, intervocalic, final consonant cluster, and final. To analyse the words we used High speed speech analysis system. We examined acoustic characteristics of English lateral spectrogram by using voice sustained time(ms), FL1, FL2, FL3. Before we started, we had expected that the result would show us that the mother tongue interfere in the final sounds because we have similar sounds in Korea. The results of our experiments showed that initially, voice sustained time showed many more differences between Korean and native pronunciation. Also, it was seen that Korean pronunciation used the syllable structure of the own mother tongue. For instance, in the case of initial consonant cluster CCVC, Koreans often used CC as a syllable and VC as another. This was due to the mother tongue interference. For this reason in the intervocalic and in the final, we saw the differences between Korean and native. Therefore we have to accept the visualized analysis system in the instruction of pronunciation.

  • PDF

A Design of Multi-channel Speech Pickup Embedded System for Hands-free Comuunication (핸즈프리 통신을 위한 다중채널 음성픽업 임베디드 시스템 설계)

  • Ju, Hyng-Jun;Park, Chan-Sub;Jeon, Jae-Kuk;Kim, Ki-Man
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.2
    • /
    • pp.366-373
    • /
    • 2007
  • In this paper we propose a multi-channel speech pickup system for calling quality enhancement of hands-free communication using ALTERA Nios-II processor. Multi-channel speech pickup system uses Delay-and-Sum beamformer with zero-padding interpolator. This paper implements speech pickup system using the Nios-II processor with real-time I/O data processing speed. The proposes speech pickup embedded system shows a good agreement with those of computer simulation(MATLAB) and conventional DSP processor(TMS320C6711) result. The proposed method is effective more than previous methods in cost and design processing time. As a result, LE(Logic Element) of hardware used 3,649/5,980(61%) on a chip.

Hyperparameter experiments on end-to-end automatic speech recognition

  • Yang, Hyungwon;Nam, Hosung
    • Phonetics and Speech Sciences
    • /
    • v.13 no.1
    • /
    • pp.45-51
    • /
    • 2021
  • End-to-end (E2E) automatic speech recognition (ASR) has achieved promising performance gains with the introduced self-attention network, Transformer. However, due to training time and the number of hyperparameters, finding the optimal hyperparameter set is computationally expensive. This paper investigates the impact of hyperparameters in the Transformer network to answer two questions: which hyperparameter plays a critical role in the task performance and training speed. The Transformer network for training has two encoder and decoder networks combined with Connectionist Temporal Classification (CTC). We have trained the model with Wall Street Journal (WSJ) SI-284 and tested on devl93 and eval92. Seventeen hyperparameters were selected from the ESPnet training configuration, and varying ranges of values were used for experiments. The result shows that "num blocks" and "linear units" hyperparameters in the encoder and decoder networks reduce Word Error Rate (WER) significantly. However, performance gain is more prominent when they are altered in the encoder network. Training duration also linearly increased as "num blocks" and "linear units" hyperparameters' values grow. Based on the experimental results, we collected the optimal values from each hyperparameter and reduced the WER up to 2.9/1.9 from dev93 and eval93 respectively.

Control Rules of Synthetical Pauses and Syllable Duration depending on Pronunciation Speed in Korean Speech (발음속도에 따른 한국어의 휴지기 규칙 및 평균음절길이 조절규칙)

  • Kim, Jae-In;Kim, Jin-Young;Lee, Tae-Won
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.1
    • /
    • pp.56-64
    • /
    • 1995
  • In this paper we extracted control rules of synthetical pauses and syllable durations depending on pronunciation speed in Korean speech from the spoken sentences recorded by 18 professional announcers. Pause rules were divided into three categories : pause between sentences(PBS), pause between clauses(PBC) and pause between intonational phrases(PBI). From the analysis results it is found that comparing the slowly spoken sentence with the fast spoken sentence the duration difference between them is due to the synthetical pause increments, expecially, of PBS and PBC. In addition, it is also found that the increment ratio of the mean syllable duration Is low. On the other hand, PBI was not pronounced in the fast spoken sentences. PBI was pronounced at the pronunciation speed(PS) over some PS.

  • PDF

Korean Students' Repetition of English Sentences Under Noise and Speed Conditions (소음과 속도를 변화시킨 영어 문장 따라하기에 대한 연구)

  • Kim, Eun-Jee;Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.105-117
    • /
    • 2004
  • Recently, many scholars have emphasized the importance of English listening ability for smoother communication. Most audio materials, however, were recorded in a quiet sound-proof booth. Therefore, students who have spent so much time listening to the ideal audio materials are expected to have difficulty communicating with native speakers in the real life. In this study, we examined how well thirty three Korean university students and five native speakers will repeat the recorded English sentences under noise and speed conditions. The subjects' production was scored by listening to each recorded sentence and counting the number of words correctly produced and determined the percent ratios of correctly produced words to the total words in each sentence. Results showed that the student group correctly repeated around 65% of all the words in each sentence while the native speakers demonstrated almost perfect match. It seemed that the students had difficulty perceiving and repeating function words in various conditions. Also, high-proficiency student group outperformed the low-proficiency student group particularly in their repetition of function words. In addition, the student subjects' accuracy of repetition remarkably dropped when the normal sentences were both sped up and mixed with noise. Finally, it was observed that the Korean students' percent correct ratio fell down as the stimulus sentence became longer.

  • PDF