• Title/Summary/Keyword: speech database

Search Result 332, Processing Time 0.026 seconds

Speaker-Dependent Emotion Recognition For Audio Document Indexing

  • Hung LE Xuan;QUENOT Georges;CASTELLI Eric
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.92-96
    • /
    • 2004
  • The researches of the emotions are currently great interest in speech processing as well as in human-machine interaction domain. In the recent years, more and more of researches relating to emotion synthesis or emotion recognition are developed for the different purposes. Each approach uses its methods and its various parameters measured on the speech signal. In this paper, we proposed using a short-time parameter: MFCC coefficients (Mel­Frequency Cepstrum Coefficients) and a simple but efficient classifying method: Vector Quantification (VQ) for speaker-dependent emotion recognition. Many other features: energy, pitch, zero crossing, phonetic rate, LPC... and their derivatives are also tested and combined with MFCC coefficients in order to find the best combination. The other models: GMM and HMM (Discrete and Continuous Hidden Markov Model) are studied as well in the hope that the usage of continuous distribution and the temporal behaviour of this set of features will improve the quality of emotion recognition. The maximum accuracy recognizing five different emotions exceeds $88\%$ by using only MFCC coefficients with VQ model. This is a simple but efficient approach, the result is even much better than those obtained with the same database in human evaluation by listening and judging without returning permission nor comparison between sentences [8]; And this result is positively comparable with the other approaches.

  • PDF

Recognition of Noise Quantity by Linear Predictive Coefficient of Speech Signal (음성신호의 선형예측계수에 의한 잡음량의 인식)

  • Choi, Jae-Seung
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.2
    • /
    • pp.120-126
    • /
    • 2009
  • In order to reduce the noise quantity in a conversation under the noisy environment it is necessary for the signal processing system to process adaptively according to the noise quantity in order to enhance the performance. Therefore this paper presents a recognition method for noise quantity by linear predictive coefficient using a three layered neural network, which is trained using three kinds of speech that is degraded by various background noises. The performance of the proposed method for the noise quantity was evaluated based on the recognition rates for various noises. In the experiment, the average values of the recognition results were 98.4% or more for such noise using Aurora2 database.

Development of Digital Endoscopic Data Management System (디지탈 내시경 데이터 management system의 개발)

  • Song, C.G.;Lee, S.M.;Lee, Y.M.;Kim, W.K.
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1996 no.11
    • /
    • pp.304-306
    • /
    • 1996
  • Endoscopy has become a crucial diagnostic and theraputic procedure in clinical areas. Over the past three years, we have developed a computerized system to record and store clinical data pertaining to endoscopic surgery of laparascopic cholesystectomy, peviscopic endometriosis, and surgical arthroscopy. In this study, we are developed computer system, which is composed of frame grabber, sound board, VCR control board, LAN card and EDMS(endoscopic data management software). Also, computer system has controled over peripheral instruments as a color video printer, video cassette recorder, and endoscopic input/output signals(image and doctor's speech). Also, we are developed one body system of camels control unit including an endoscopic miniature camera and light source. Our system offer unsurpassed image quality in terms of resolution and color fidelity. Digital endoscopic data management system is based on open architecture and a set of widely available industry standards, namely: windows 3.1 as a operating system, TCP/IP as a network protocol and a time sequence based database that handles both an image and drctor's speech synchronized with endoscopic image.

  • PDF

Development of robotic hands of signbot, advanced Malaysian sign-language performing robot

  • Al-Khulaidi, Rami Ali;Akmeliawati, Rini;Azlan, Norsinnira Zainul;Bakr, Nuril Hana Abu;Fauzi, Norfatehah M.
    • Advances in robotics research
    • /
    • v.2 no.3
    • /
    • pp.183-199
    • /
    • 2018
  • This paper presents the development of a 3D printed humanoid robotic hands of SignBot, which can perform Malaysian Sign Language (MSL). The study is considered as the first attempt to ease the means of communication between the general community and the hearing-impaired individuals in Malaysia. The signed motions performed by the developed robot in this work can be done by two hands. The designed system, unlike previously conducted work, includes a speech recognition system that can feasibly integrate with the controlling platform of the robot. Furthermore, the design of the system takes into account the grammar of the MSL which differs from that of Malay spoken language. This reduces the redundancy and makes the design more efficient and effective. The robot hands are built with detailed finger joints. Micro servo motors, controlled by Arduino Mega, are also loaded to actuate the relevant joints of selected alphabetical and numerical signs as well as phrases for emergency contexts from MSL. A database for the selected signs is developed wherein the sequential movements of the servo motor arrays are stored. The results showed that the system performed well as the selected signs can be understood by hearing-impaired individuals.

A Study-on Context-Dependent Acoustic Models to Improve the Performance of the Korea Speech Recognition (한국어 음성인식 성능향상을 위한 문맥의존 음향모델에 관한 연구)

  • 황철준;오세진;김범국;정호열;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.4
    • /
    • pp.9-15
    • /
    • 2001
  • In this paper we investigate context dependent acoustic models to improve the performance of the Korean speech recognition . The algorithm are using the Korean phonological rules and decision tree, By Successive State Splitting(SSS) algorithm the Hidden Merkov Netwwork(HM-Net) which is an efficient representation of phoneme-context-dependent HMMs, can be generated automatically SSS is powerful technique to design topologies of tied-state HMMs but it doesn't treat unknown contexts in the training phoneme contexts environment adequately In addition it has some problem in the procedure of the contextual domain. In this paper we adopt a new state-clustering algorithm of SSS, called Phonetic Decision Tree-based SSS (PDT-SSS) which includes contexts splits based on the Korean phonological rules. This method combines advantages of both the decision tree clustering and SSS, and can generated highly accurate HM-Net that can express any contexts To verify the effectiveness of the adopted methods. the experiments are carried out using KLE 452 word database and YNU 200 sentence database. Through the Korean phoneme word and sentence recognition experiments. we proved that the new state-clustering algorithm produce better phoneme, word and continuous speech recognition accuracy than the conventional HMMs.

  • PDF

Korean Word Recognition Using Diphone- Level Hidden Markov Model (Diphone 단위 의 hidden Markov model을 이용한 한국어 단어 인식)

  • Park, Hyun-Sang;Un, Chong-Kwan;Park, Yong-Kyu;Kwon, Oh-Wook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1
    • /
    • pp.14-23
    • /
    • 1994
  • In this paper, speech units appropriate for recognition of Korean language have been studied. For better speech recognition, co-articulatory effects within an utterance should be considered in the selection of a recognition unit. One way to model such effects is to use larger units of speech. It has been found that diphone is a good recognition unit because it can model transitional legions explicitly. When diphone is used, stationary phoneme models may be inserted between diphones. Computer simulation for isolated word recognition was done with 7 word database spoken by seven male speakers. Best performance was obtained when transition regions between phonemes were modeled by two-state HMM's and stationary phoneme regions by one-state HMM's excluding /b/, /d/, and /g/. By merging rarely occurring diphone units, the recognition rate was increased from $93.98\%$ to $96.29\%$. In addition, a local interpolation technique was used to smooth a poorly-modeled HMM with a well-trained HMM. With this technique we could get the recognition rate of $97.22\%$ after merging some diphone units.

  • PDF

Dialog System based on Speech Recognition for the Elderly with Dementia (음성인식에 기초한 치매환자 노인을 위한 대화시스템)

  • Kim, Sung-Il;Kim, Byoung-Chul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.6
    • /
    • pp.923-930
    • /
    • 2002
  • This study aims at developing dialog system to improve the quality of life of the elderly with a dementia. The proposed system mainly consists of three modules including speech recognition, automatic search of the time-sorted dialog database, and agreeable responses with the recorded voices of caregivers. For the first step, the dialog that dementia patients often utter at a nursing home is first investigated. Next, the system is organized to recognize the utterances in order to meet their requests or demands. The system is then responded with recorded voices of professional caregivers. For evaluation of the system, the comparison study was carried out when the system was introduced or not, respectively. The occupational therapists then evaluated a male subjects reaction to the system by photographing his behaviors. The evaluation results showed that the dialog system was more responsive in catering to the needs of dementia patient than professional caregivers. Moreover, the proposed system led the patient to talk more than caregivers did in mutual communication.

Communication Aid System For Dementia Patients (치매환자를 위한 대화 보조 시스템)

  • Sung-Ill Kim;Byoung-Chul Kim
    • Journal of Biomedical Engineering Research
    • /
    • v.23 no.6
    • /
    • pp.459-465
    • /
    • 2002
  • The goat of the present research is to improve the quality of life of both the elderly patients with dementia and their caregivers. For this Purpose, we developed a communication aid system that is consisted of three modules such as speech recognition engine, graphical agent. and database classified by a nursing schedule. The system was evaluated in an actual environment of nursing facility by introducing the system to an older mail patient with dementia. The comparison study was then carried out with and without system, respectively. The occupational therapists then evaluated subject"s reaction to the system by photographing his behaviors. The evaluation results revealed that the proposed system was more responsive in catering to needs of subject than professional caregivers. Moreover we could see that the frequency of causing the utterances of subject increased by introducing the system.

Cepstral Feature Normalization Methods Using Pole Filtering and Scale Normalization for Robust Speech Recognition (강인한 음성인식을 위한 극점 필터링 및 스케일 정규화를 이용한 켑스트럼 특징 정규화 방식)

  • Choi, Bo Kyeong;Ban, Sung Min;Kim, Hyung Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.4
    • /
    • pp.316-320
    • /
    • 2015
  • In this paper, the pole filtering concept is applied to the Mel-frequency cepstral coefficient (MFCC) feature vectors in the conventional cepstral mean normalization (CMN) and cepstral mean and variance normalization (CMVN) frameworks. Additionally, performance of the cepstral mean and scale normalization (CMSN), which uses scale normalization instead of variance normalization, is evaluated in speech recognition experiments in noisy environments. Because CMN and CMVN are usually performed on a per-utterance basis, in case of short utterance, they have a problem that reliable estimation of the mean and variance is not guaranteed. However, by applying the pole filtering and scale normalization techniques to the feature normalization process, this problem can be relieved. Experimental results using Aurora 2 database (DB) show that feature normalization method combining the pole-filtering and scale normalization yields the best improvements.

A study on Gaussian mixture model deep neural network hybrid-based feature compensation for robust speech recognition in noisy environments (잡음 환경에 효과적인 음성 인식을 위한 Gaussian mixture model deep neural network 하이브리드 기반의 특징 보상)

  • Yoon, Ki-mu;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.37 no.6
    • /
    • pp.506-511
    • /
    • 2018
  • This paper proposes an GMM(Gaussian Mixture Model)-DNN(Deep Neural Network) hybrid-based feature compensation method for effective speech recognition in noisy environments. In the proposed algorithm, the posterior probability for the conventional GMM-based feature compensation method is calculated using DNN. The experimental results using the Aurora 2.0 framework and database demonstrate that the proposed GMM-DNN hybrid-based feature compensation method shows more effective in Known and Unknown noisy environments compared to the GMM-based method. In particular, the experiments of the Unknown environments show 9.13 % of relative improvement in the average of WER (Word Error Rate) and considerable improvements in lower SNR (Signal to Noise Ratio) conditions such as 0 and 5 dB SNR.