• Title/Summary/Keyword: continuous speech

Search Result 314, Processing Time 0.027 seconds

Speech Perception Ability of Schizophrenics - A Comparative Study with Depressives & Normal Control - (정신분열병환자의 언어지각 능력 - 우울증 환자군, 정상인과의 비교 연구 -)

  • Chung, Young-Cho;Lee, Soon Jeong;Lee, Seung-Hwan
    • Korean Journal of Biological Psychiatry
    • /
    • v.9 no.2
    • /
    • pp.112-119
    • /
    • 2002
  • Object:This study was to investigate the difference of speech perception ability in schizophrenic patients, and depression patients in order to explore trait-dependent speech perception ability of each disorder. Methods:The speech perception ability was assessed with masked speech tracking test(MST) in schizophrenic patients(N=31), depression patients(N=25), and normal controls(N=21). The continuous performance test(CPT) and sentence repetition test(SRT) were also used for assessment of attention and working memory. Results:The schizophrenic patients showed significant impaired MST performance, compared with depressive patients and normal controls. The performances of CPT and SRT were also more impaired in schizophrenic patients. The difference of MST performances between two patient group was cancelled out after consideration of differences in CPT & SRT performances. Conclusions:These results imply that schizophrenic patients have the impaired speech perception ability compared with depressive patients and normal controls. But speech perception ability was significantly influenced with CPT and SRT. For evaluation of pure speech perception ability, the more elaborate controlled study that excluded factors such as attention, working memory and intelligence is needed.

  • PDF

Aerodynamic Characteristics of Young and Elderly Adult Patients with Voice Disorders during Continuous Speech (젊은 성인 및 노인 음성장애 환자의 연속발화시 공기역학적 특성 비교)

  • Pyo, Hwa-young
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.12
    • /
    • pp.270-278
    • /
    • 2019
  • This study was performed to compare the aerodynamic characteristics of young and elderly adult male patients with voice disorders during continuous speech. Aerodynamic measurements were obtained after 12 young male patients and 9 elderly male patients read a paragraph. The elderly group showed longer duration, lower airflow rate and air volume than the younger group, but the differences were not significant except phonation time. So, when interpreting the meaning of aerodynamic measures of elderly voice disorder patients in the aspects of airflow and air volume, it should take into account various conditions(e. g. reading materials, pulmonary functions) as well as age.

The Effect of Strong Syllables on Lexical Segmentation in English Continuous Speech by Korean Speakers (강음절이 한국어 화자의 영어 연속 음성의 어휘 분절에 미치는 영향)

  • Kim, Sunmi;Nam, Kichun
    • Phonetics and Speech Sciences
    • /
    • v.5 no.2
    • /
    • pp.43-51
    • /
    • 2013
  • English native listeners have a tendency to treat strong syllables in a speech stream as the potential initial syllables of new words, since the majority of lexical words in English have a word-initial stress. The current study investigates whether Korean (L1) - English (L2) late bilinguals perceive strong syllables in English continuous speech as word onsets, as English native listeners do. In Experiment 1, word-spotting was slower when the word-initial syllable was strong, indicating that Korean listeners do not perceive strong syllables as word onsets. Experiment 2 was conducted in order to avoid any possibilities that the results of Experiment 1 may be due to the strong-initial targets themselves used in Experiment 1 being slower to recognize than the weak-initial targets. We employed the gating paradigm in Experiment 2, and measured the Isolation Point (IP, the point at which participants correctly identify a word without subsequently changing their minds) and the Recognition Point (RP, the point at which participants correctly identify the target with 85% or greater confidence) for the targets excised from the non-words in the two conditions of Experiment 1. Both the mean IPs and the mean RPs were significantly earlier for the strong-initial targets, which means that the results of Experiment 1 reflect the difficulty of segmentation when the initial syllable of words was strong. These results are consistent with Kim & Nam (2011), indicating that strong syllables are not perceived as word onsets for Korean listeners and interfere with lexical segmentation in English running speech.

An Implementation of the Real Time Speech Recognition for the Automatic Switching System (자동 교환 시스템을 위한 실시간 음성 인식 구현)

  • 박익현;이재성;김현아;함정표;유승균;강해익;박성현
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.4
    • /
    • pp.31-36
    • /
    • 2000
  • This paper describes the implementation and the evaluation of the speech recognition automatic exchange system. The system provides government or public offices, companies, educational institutions that are composed of large number of members and parts with exchange service using speech recognition technology. The recognizer of the system is a Speaker-Independent, Isolated-word, Flexible-Vocabulary recognizer based on SCHMM(Semi-Continuous Hidden Markov Model). For real-time implementation, DSP TMS320C32 made in Texas Instrument Inc. is used. The system operating terminal including the diagnosis of speech recognition DSP and the alternation of speech recognition candidates makes operation easy. In this experiment, 8 speakers pronounced words of 1,300 vocabulary related to automatic exchange system over wire telephone network and the recognition system achieved 91.5% of word accuracy.

  • PDF

Development of a Stock Information Retrieval System using Speech Recognition (음성 인식을 이용한 증권 정보 검색 시스템의 개발)

  • Park, Sung-Joon;Koo, Myoung-Wan;Jhon, Chu-Shik
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.4
    • /
    • pp.403-410
    • /
    • 2000
  • In this paper, the development of a stock information retrieval system using speech recognition and its features are described. The system is based on DHMM (discrete hidden Markov model) and PLUs (phonelike units) are used as the basic unit for recognition. End-point detection and echo cancellation are included to facilitate speech input. Continuous speech recognizer is implemented to allow multi-word speech. Data collected over several months are analyzed.

  • PDF

Korean LVCSR for Broadcast News Speech

  • Lee, Gang-Seong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.2E
    • /
    • pp.3-8
    • /
    • 2001
  • In this paper, we will examine a Korean large vocabulary continuous speech recognition (LVCSR) system for broadcast news speech. The combined vowel and implosive unit is included in a phone set together with other short phone units in order to obtain a longer unit acoustic model. The effect of this unit is compared with conventional phone units. The dictionary units for language processing are automatically extracted from eojeols appearing in transcriptions. Triphone models are used for acoustic modeling and a trigram model is used for language modeling. Among three major speaker groups in news broadcasts-anchors, journalists and people (those other than anchors or journalists, who are being interviewed), the speech of anchors and journalists, which has a lot of noise, was used for testing and recognition.

  • PDF

Codebook design for subspace distribution clustering hidden Markov model (Subspace distribution clustering hidden Markov model을 위한 codebook design)

  • Cho, Young-Kyu;Yook, Dong-Suk
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.87-90
    • /
    • 2005
  • Today's state-of the-art speech recognition systems typically use continuous distribution hidden Markov models with the mixtures of Gaussian distributions. To obtain higher recognition accuracy, the hidden Markov models typically require huge number of Gaussian distributions. Such speech recognition systems have problems that they require too much memory to run, and are too slow for large applications. Many approaches are proposed for the design of compact acoustic models. One of those models is subspace distribution clustering hidden Markov model. Subspace distribution clustering hidden Markov model can represent original full-space distributions as some combinations of a small number of subspace distribution codebooks. Therefore, how to make the codebook is an important issue in this approach. In this paper, we report some experimental results on various quantization methods to make more accurate models.

  • PDF

Development of FSN-based Large Vocabulary Continuous Speech Recognition System (FSN 기반의 대어휘 연속음성인식 시스템 개발)

  • Park, Jeon-Gue;Lee, Yun-Keun
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.327-329
    • /
    • 2007
  • This paper presents a FSN-based LVCSR system and it's application to the speech TV program guide. Unlike the most popular statistical language model-based system, we used FSN grammar based on the graph theory-based FSN optimization algorithm and knowledge-based advanced word boundary modeling. For the memory and latency efficiency, we implemented the dynamic pruning scheduling based on the histogram of active words and their likelihood distribution. We achieved a 10.7% word accuracy improvement with 57.3% speedup.

  • PDF

A Study on a Searching, Extraction and Approximation-Synthesis of Transition Segment in Continuous Speech (연속음성에서 천이구간의 탐색, 추출, 근사합성에 관한 연구)

  • Lee, Si-U
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1299-1304
    • /
    • 2000
  • In a speed coding system using excitation source of voiced and unvoiced, it would be involved a distortion of speech quality in case coexist with a voiced and an unvoiced consonants in a frame. So, I propose TSIUVC(Transition Segment Including UnVoiced Consonant) searching, extraction ad approximation-synthesis method in order to uncoexistent with a voiced and unvoiced consonants in a frame. This method based on a zerocrossing rate and pitch detector using FIR-STREAK Digital Filter. As a result, the extraction rates of TSIUVC are 84.8% (plosive), 94.9%(fricative), 92.3%(affricative) in female voice, and 88%(plosive), 94.9%(fricative), 92.3%(affricative) in male voice respectively, Also, I obain a high quality approximation-synthesis waveforms within TSIUVC by using frequency information of 0.547kHz below and 2.813kHz above. This method has the capability of being applied to speech coding of low bit rate, speech analysis and speech synthesis.

  • PDF

Harmonic Peak Picking-based MVF Estimation for Improvement of HMM-based Speech Synthesis System Using TBE Model (TBE 모델을 사용하는 HMM 기반 음성합성기 성능 향상을 위한 하모닉 선택에 기반한 MVF 예측 방법)

  • Park, Jihoon;Hahn, Minsoo
    • Phonetics and Speech Sciences
    • /
    • v.4 no.4
    • /
    • pp.79-86
    • /
    • 2012
  • In the two-band excitation (TBE) model, maximum voiced frequency (MVF) is the most important feature of the excitation parameter because the synthetic speech quality depends on MVF. Thus, this paper proposes an enhanced MVF estimation scheme based on the peak picking method. In the proposed scheme, the local peak and the peak lobe are picked from the spectrum of a linear predictive residual signal. The normalized distance between neighboring peak lobes is calculated and utilized as a feature to estimate MVF. Experimental results of both objective and subjective tests show that the proposed scheme improves synthetic speech quality compared with that of the conventional one.