• 제목/요약/키워드: speech task

검색결과 316건 처리시간 0.025초

Consolidation of Subtasks for Target Task in Pipelined NLP Model

  • Son, Jeong-Woo;Yoon, Heegeun;Park, Seong-Bae;Cho, Keeseong;Ryu, Won
    • ETRI Journal
    • /
    • 제36권5호
    • /
    • pp.704-713
    • /
    • 2014
  • Most natural language processing tasks depend on the outputs of some other tasks. Thus, they involve other tasks as subtasks. The main problem of this type of pipelined model is that the optimality of the subtasks that are trained with their own data is not guaranteed in the final target task, since the subtasks are not optimized with respect to the target task. As a solution to this problem, this paper proposes a consolidation of subtasks for a target task ($CST^2$). In $CST^2$, all parameters of a target task and its subtasks are optimized to fulfill the objective of the target task. $CST^2$ finds such optimized parameters through a backpropagation algorithm. In experiments in which text chunking is a target task and part-of-speech tagging is its subtask, $CST^2$ outperforms a traditional pipelined text chunker. The experimental results prove the effectiveness of optimizing subtasks with respect to the target task.

Effects of F1/F2 Manipulation on the Perception of Korean Vowels /o/ and /u/ (F1/F2의 변화가 한국어 /오/, /우/ 모음의 지각판별에 미치는 영향)

  • Yun, Jihyeon;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • 제5권3호
    • /
    • pp.39-46
    • /
    • 2013
  • This study examined the perception of two Korean vowels using F1/F2 manipulated synthetic vowels. Previous studies indicated that there is an overlap between the acoustic spaces of Korean /o/ and /u/ in terms of the first two formants. A continuum of eleven synthetic vowels were used as stimuli. The experiment consisted of three tasks: an /o/ identification task (Yes-no), an /u/ identification task (Yes-no), and a forced choice identification task (/o/-/u/). ROC(Receiver Operating Characteristic) analysis and logistic regression were performed to calculate the boundary criterion of the two vowels along the stimulus continuum, and to predict the perceptual judgment on F1 and F2. The result indicated that the location between stimulus no.5 (F1 = 342Hz, F2 = 691Hz) and no.6 (F1 = 336Hz, F2 = 700Hz) was estimated as a perceptual boundary region between /o/ and /u/, while stimulus no.0 (F1=405Hz, F2=666Hz) and no.10 (F1=321Hz, F2=743Hz) were at opposite ends of the continuum. The influence of F2 was predominant over F1 on the perception of the vowel categories.

The Effects of Group Therapy Using a Cooperative Learning in Aphasics (협력학습을 통한 실어증자의 그룹치료 효과)

  • Lee, Ok-Bun;Jeong, Ok-Ran;Ko, Do-Heung
    • Speech Sciences
    • /
    • 제11권2호
    • /
    • pp.27-38
    • /
    • 2004
  • This study attempted to determine the effects of a cooperative and cognitive group therapy compared to individual therapy in 24 aphasic subjects. Two dependent variables were measured overall language performance, functional communication skills. 18 subjects with different types and severity of aphasia participated in the group therapy. 6 aphasic subjects participated in the individual therapy and they functioned as a control group. The subjects were ranged from 27 to 59 years in age. The group therapy using the cooperative learning utilized the following procedures. First, 6 aphasics constituted 1 group where each subject peformed a task and they monitored one another. Second, 2 aphasics consisted 1 group and they cooperated to perform a task. Third, 3 groups with 2 aphasics in a group competed one another in a task where the 2 aphasics had to cooperated. Finally, the investigator gave the feedback to the group and she and the subjects discussed the overall procedures of the therapy. The above mentioned 2 tests were administered pre- and post-treatment. A repeated two-way ANOVA was performed for analysis. The results showed that the group therapy was more effective in improving overall language performance as compared to the individual therapy. And, the group therapy was more effective in increasing functional communication skills as compared to the individual therapy.

  • PDF

Differential Effect for Neural Activation Processes according to the Proficiency Level of Code Switching: An ERP Study (이중언어환경에서의 언어간 부호전환 수준에 따른 차별적 신경활성화 과정: ERP연구)

  • Kim, Choong-Myung
    • Phonetics and Speech Sciences
    • /
    • 제2권4호
    • /
    • pp.3-10
    • /
    • 2010
  • The present study aims to investigate neural activations according to the level of code switching in English proficient bilinguals and to find the relationship between the performance of language switching and proficiency level using ERPs (event-related potentials). First, when comparing high-proficient (HP) with low-proficient (LP) bilingual performance in a native language environment, the activation level of N2 was observed to be higher in the HP group than in the LP group, but only under two conditions: 1) the language switching (between-language) condition known as indexing attention of code switching and 2) the inhibition of current language for L1. Another effect of N400 can be shown in both groups only in the language non-switching (within-language) condition. This effect suggests that both groups completed the semantic acceptability task well in their native language environment without the burden of language switching, irrespective of high or low performance. The latencies of N400 are only about 100ms earlier in the HP group than in the LP group. This difference can be interpreted as facilitation of the given task. These results suggest that HP showed the differential activation in inhibitory system for L1 in switching condition of L1-to-L2 to be contrary to inactivation of inhibitory system for the LP group. Despite the absence of an N400 effect at the given task in both groups, differential latencies between the peaks were attributed to the differences of efficiency in semantic processing.

  • PDF

On-Line Blind Channel Normalization for Noise-Robust Speech Recognition

  • Jung, Ho-Young
    • IEIE Transactions on Smart Processing and Computing
    • /
    • 제1권3호
    • /
    • pp.143-151
    • /
    • 2012
  • A new data-driven method for the design of a blind modulation frequency filter that suppresses the slow-varying noise components is proposed. The proposed method is based on the temporal local decorrelation of the feature vector sequence, and is done on an utterance-by-utterance basis. Although the conventional modulation frequency filtering approaches the same form regardless of the task and environment conditions, the proposed method can provide an adaptive modulation frequency filter that outperforms conventional methods for each utterance. In addition, the method ultimately performs channel normalization in a feature domain with applications to log-spectral parameters. The performance was evaluated by speaker-independent isolated-word recognition experiments under additive noise environments. The proposed method achieved outstanding improvement for speech recognition in environments with significant noise and was also effective in a range of feature representations.

  • PDF

A Train Ticket Reservation Aid System Using Automated Call Routing Technology Based on Speech Recognition (음성인식을 이용한 자동 호 분류 철도 예약 시스템)

  • Shim Yu-Jin;Kim Jae-In;Koo Myung-Wan
    • MALSORI
    • /
    • 제52호
    • /
    • pp.161-169
    • /
    • 2004
  • This paper describes the automated call routing for train ticket reservation aid system based on speech recognition. We focus on the task of automatically routing telephone calls based on user's fluently spoken response instead of touch tone menus in an interactive voice response system. Vector-based call routing algorithm is investigated and mapping table for key term is suggested. Korail database collected by KT is used for call routing experiment. We evaluate call-classification experiments for transcribed text from Korail database. In case of small training data, an average call routing error reduction rate of 14% is observed when mapping table is used.

  • PDF

The Role of the Cricopharyngeus Muscle in Pitch Control - Electromyographic and radiographic studies

  • Hong, Ki-Hwan;Kim, Hyun-Ki;Yang, Yoon-Soo
    • Speech Sciences
    • /
    • 제11권1호
    • /
    • pp.73-83
    • /
    • 2004
  • Electromyographic studies of the cricopharyngeus muscle using hooked wire electrodes were performed in thyroidectomized patients. The shape of the cricoid cartilage and soft tissue thickness in the postcricoid area were evaluated during pitch elevation and pitch lowering using conventional neck lateral films. The cricopharyngeus muscle simultaneously activated in the initial task of speech and continuously activated. Its activity lessened in the interrogative stress contrast of sentence terminals and increased in the pitch lowered contrast of sentence terminal. On the radiologic findings the cricoid cartilage was tilted backward during high pitched phonation and tilted forward during low pitched phonation. The soft tissue thickness of postcricoid area was thicker at the low pitch than at high pitch. At low pitch the cricoid cartilage paralleled along the vertebral column. This result suggests that the bulging of cricopharyngeus muscle in contraction induce a thickened the postcricoid area thickened, and exert pressure anteriorly exerted on the cricoid cartilage. This contraction of the cricopharyngeus muscle may result in shortening the vocal fold and lowering pitch.

  • PDF

Malay Syllables Speech Recognition Using Hybrid Neural Network

  • Ahmad, Abdul Manan;Eng, Goh Kia
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2005년도 ICCAS
    • /
    • pp.287-289
    • /
    • 2005
  • This paper presents a hybrid neural network system which used a Self-Organizing Map and Multilayer Perceptron for the problem of Malay syllables speech recognition. The novel idea in this system is the usage of a two-dimension Self-organizing feature map as a sequential mapping function which transform the phonetic similarities or acoustic vector sequences of the speech frame into trajectories in a square matrix where elements take on binary values. This property simplifies the classification task. An MLP is then used to classify the trajectories that each syllable in the vocabulary corresponds to. The system performance was evaluated for recognition of 15 Malay common syllables. The overall performance of the recognizer showed to be 91.8%.

  • PDF

Performance Improvement ofSpeech Recognition Based on SPLICEin Noisy Environments (SPLICE 방법에 기반한 잡음 환경에서의 음성 인식 성능 향상)

  • Kim, Jong-Hyeon;Song, Hwa-Jeon;Lee, Jong-Seok;Kim, Hyung-Soon
    • MALSORI
    • /
    • 제53호
    • /
    • pp.103-118
    • /
    • 2005
  • The performance of speech recognition system is degraded by mismatch between training and test environments. Recently, Stereo-based Piecewise LInear Compensation for Environments (SPLICE) was introduced to overcome environmental mismatch using stereo data. In this paper, we propose several methods to improve the conventional SPLICE and evaluate them in the Aurora2 task. We generalize SPLICE to compensate for covariance matrix as well as mean vector in the feature space, and thereby yielding the error rate reduction of 48.93%. We also employ the weighted sum of correction vectors using posterior probabilities of all Gaussians, and the error rate reduction of 48.62% is achieved. With the combination of the above two methods, the error rate is reduced by 49.61% from the Aurora2 baseline system.

  • PDF

The Effect of the Number of Training Data on Speech Recognition

  • Lee, Chang-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • 제28권2E호
    • /
    • pp.66-71
    • /
    • 2009
  • In practical applications of speech recognition, one of the fundamental questions might be on the number of training data that should be provided for a specific task. Though plenty of training data would undoubtedly enhance the system performance, we are then faced with the problem of heavy cost. Therefore, it is of crucial importance to determine the least number of training data that will afford a certain level of accuracy. For this purpose, we investigate the effect of the number of training data on the speaker-independent speech recognition of isolated words by using FVQ/HMM. The result showed that the error rate is roughly inversely proportional to the number of training data and grows linearly with the vocabulary size.