• Title/Summary/Keyword: Speech signals

Search Result 499, Processing Time 0.034 seconds

EEG based Vowel Feature Extraction for Speech Recognition System using International Phonetic Alphabet (EEG기반 언어 인식 시스템을 위한 국제음성기호를 이용한 모음 특징 추출 연구)

  • Lee, Tae-Ju;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.1
    • /
    • pp.90-95
    • /
    • 2014
  • The researchs using brain-computer interface, the new interface system which connect human to macine, have been maded to implement the user-assistance devices for control of wheelchairs or input the characters. In recent researches, there are several trials to implement the speech recognitions system based on the brain wave and attempt to silent communication. In this paper, we studied how to extract features of vowel based on international phonetic alphabet (IPA), as a foundation step for implementing of speech recognition system based on electroencephalogram (EEG). We conducted the 2 step experiments with three healthy male subjects, and first step was speaking imagery with single vowel and second step was imagery with successive two vowels. We selected 32 channels, which include frontal lobe related to thinking and temporal lobe related to speech function, among acquired 64 channels. Eigen value of the signal was used for feature vector and support vector machine (SVM) was used for classification. As a result of first step, we should use over than 10th order of feature vector to analyze the EEG signal of speech and if we used 11th order feature vector, the highest average classification rate was 95.63 % in classification between /a/ and /o/, the lowest average classification rate was 86.85 % with /a/ and /u/. In the second step of the experiments, we studied the difference of speech imaginary signals between single and successive two vowels.

Speech Visualization of Korean Vowels Based on the Distances Among Acoustic Features (음성특징의 거리 개념에 기반한 한국어 모음 음성의 시각화)

  • Pok, Gouchol
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.5
    • /
    • pp.512-520
    • /
    • 2019
  • It is quite useful to represent speeches visually for learners who study foreign languages as well as the hearing impaired who cannot directly hear speeches, and a number of researches have been presented in the literature. They remain, however, at the level of representing the characteristics of speeches using colors or showing the changing shape of lips and mouth using the animation-based representation. As a result of such approaches, those methods cannot tell the users how far their pronunciations are away from the standard ones, and moreover they make it technically difficult to develop such a system in which users can correct their pronunciation in an interactive manner. In order to address these kind of drawbacks, this paper proposes a speech visualization model based on the relative distance between the user's speech and the standard one, furthermore suggests actual implementation directions by applying the proposed model to the visualization of Korean vowels. The method extract three formants F1, F2, and F3 from speech signals and feed them into the Kohonen's SOM to map the results into 2-D screen and represent each speech as a pint on the screen. We have presented a real system implemented using the open source formant analysis software on the speech of a Korean instructor and several foreign students studying Korean language, in which the user interface was built using the Javascript for the screen display.

A Comparative Performance Study of Speech Coders for Three-Way Conferencing in Digital Mobile Communication Networks (이동통신망에서 삼자회의를 위한 음성 부호화기의 성능에 관한 연구)

  • Lee, Mi-Suk;Lee, Yun-Geun;Kim, Gi-Cheol;Lee, Hwang-Su;Jo, Wi-Deok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.1E
    • /
    • pp.30-38
    • /
    • 1995
  • In this paper, we evaluated the performance of vocoders for three-way conferencing using signal summation technique in digital mobile communication network. The signal summation technique yields natural mode of three-way conferencing, in shich the mixed voice signal from two speakers are transmitted to a third person, though there has been no useful speech coding technique for the mixed voice signal yet. We established Qualcomm code term prediction (RPE-LTP) vocoders to provide three-way conferencing using signal summation techinique. In addition, as the conventional speech quality measures are not applicable to the vocoders for mixed voice signals, we proposed two kinds of subjective quality measures. These are the sentence discrimination (SD) test and the modified degraded mean opinion score (MDMOS) test. The experimental results show that the output speech quality of the VSELP vocoder is superior to other two.

  • PDF

Speech Segmentation using Weighted Cross-correlation in CASA System (계산적 청각 장면 분석 시스템에서 가중치 상호상관계수를 이용한 음성 분리)

  • Kim, JungHo;Kang, ChulHo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.5
    • /
    • pp.188-194
    • /
    • 2014
  • The feature extraction mechanism of the CASA(Computational Auditory Scene Analysis) system uses time continuity and frequency channel similarity to compose a correlogram of auditory elements. In segmentation, we compose a binary mask by using cross-correlation function, mask 1(speech) has the same periodicity and synchronization. However, when there is delay between autocorrelation signals with the same periodicity, it is determined as a speech, which is considered to be a drawback. In this paper, we proposed an algorithm to improve discrimination of channel similarity using Weighted Cross-correlation in segmentation. We conducted experiments to evaluate the speech segregation performance of the CASA system in background noise(siren, machine, white, car, crowd) environments by changing SNR 5dB and 0dB. In this paper, we compared the proposed algorithm to the conventional algorithm. The performance of the proposed algorithm has been improved as following: improvement of 2.75dB at SNR 5dB and 4.84dB at SNR 0dB for background noise environment.

Speech Compression by Non-uniform Sampling at the maxima and minima (극대 및 극소점에서의 비균일 표본화에 의한 음성압축)

  • Rheem, Jae-Yeol;Baek, Sung-Joon;Ann, Sou-Guil;Kim, Bum-Hoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.4
    • /
    • pp.36-44
    • /
    • 1992
  • To reduce the redundancy within samples that resulted from uniform sampling method, nonuniform sampling or nonredundant-sample coding methods can be considered. But it is well-known that when conventional nonuniform sampling methods are applied directly to speech signal, the amount of data required is comparable to or more than that required by uniform sampling method like PCM. To overcome this problem, we consider properties of speech signal in the sense of perception, and suggest a nonuniform sampling method at the maxima and minima of speech wave. To analyze the performance of the suggested method, compression ratio is considered. We show that compression ratio can be improved by silence detection, which can't be implemented by conventional methods based on uniform sampling. As experimental results, compression ratios of 1.54 without silence detection and 2.88 with silence detection for 8kHz 8-bit PCM signals are obtained.

  • PDF

Improvement of Signal-to-Noise Ratio for Speech under Noisy Environment (잡음환경 하에서의 음성의 SNR 개선)

  • Choi, Jae-Seung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.7
    • /
    • pp.1571-1576
    • /
    • 2013
  • This paper proposes an improvement algorithm of signal-to-noise ratios (SNRs) for speech signals under noisy environments. The proposed algorithm first estimates the SNRs in a low SNR, mid SNR and high SNR areas, in order to improve the SNRs in the speech signal from background noise, such as white noise and car noise. Thereafter, this algorithm subtracts the noise signal from the noisy speech signal at each bands using a spectrum sharpening method. In the experiment, good signal-to-noise ratios (SNR) are obtained for white noise and car noise compared with a conventional spectral subtraction method. From the experiment results, the maximal improvement in the output SNR results was approximately 4.2 dB and 3.7 dB better for white noise and car noise compared with the results of the spectral subtraction method, in the background noisy environment, respectively.

Speech Recognition Using Linear Discriminant Analysis and Common Vector Extraction (선형 판별분석과 공통벡터 추출방법을 이용한 음성인식)

  • 남명우;노승용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4
    • /
    • pp.35-41
    • /
    • 2001
  • This paper describes Linear Discriminant Analysis and common vector extraction for speech recognition. Voice signal contains psychological and physiological properties of the speaker as well as dialect differences, acoustical environment effects, and phase differences. For these reasons, the same word spelled out by different speakers can be very different heard. This property of speech signal make it very difficult to extract common properties in the same speech class (word or phoneme). Linear algebra method like BT (Karhunen-Loeve Transformation) is generally used for common properties extraction In the speech signals, but common vector extraction which is suggested by M. Bilginer et at. is used in this paper. The method of M. Bilginer et al. extracts the optimized common vector from the speech signals used for training. And it has 100% recognition accuracy in the trained data which is used for common vector extraction. In spite of these characteristics, the method has some drawback-we cannot use numbers of speech signal for training and the discriminant information among common vectors is not defined. This paper suggests advanced method which can reduce error rate by maximizing the discriminant information among common vectors. And novel method to normalize the size of common vector also added. The result shows improved performance of algorithm and better recognition accuracy of 2% than conventional method.

  • PDF

Robust Blind Source Separation to Noisy Environment For Speech Recognition in Car (차량용 음성인식을 위한 주변잡음에 강건한 브라인드 음원분리)

  • Kim, Hyun-Tae;Park, Jang-Sik
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.12
    • /
    • pp.89-95
    • /
    • 2006
  • The performance of blind source separation(BSS) using independent component analysis (ICA) declines significantly in a reverberant environment. A post-processing method proposed in this paper was designed to remove the residual component precisely. The proposed method used modified NLMS(normalized least mean square) filter in frequency domain, to estimate cross-talk path that causes residual cross-talk components. Residual cross-talk components in one channel is correspond to direct components in another channel. Therefore, we can estimate cross-talk path using another channel input signals from adaptive filter. Step size is normalized by input signal power in conventional NLMS filter, but it is normalized by sum of input signal power and error signal power in modified NLMS filter. By using this method, we can prevent misadjustment of filter weights. The estimated residual cross-talk components are subtracted by non-stationary spectral subtraction. The computer simulation results using speech signals show that the proposed method improves the noise reduction ratio(NRR) by approximately 3dB on conventional FDICA.

  • PDF

A Perceptual Study on the Temporal Cues of English Intervocalic Plosives for Various Groups Depending on Background Language, English Listening Ability, and Age (언어별, 연령별, 수준별 집단에 의한 모음간 영어 파열음 유/무성 인지 연구)

  • Kang, Seok-Han
    • Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.133-145
    • /
    • 2006
  • In order to understand the various groups' perceptual pattern in both VCV trochee and iambus, this study examined the identification correctness and cue robustness for the unit intervals in light of background language, age, and English listening ability. The 4 groups of Native Speakers of English, Korean College Students of High Listening Achievement, Korean College Students of Low Listening Achievement, and Korean Elementary Students took part in the experiments. Tokens of $/d{\ae}per,\;d{\ae}per,\;d{\ae}per,\;d{\ae}per,\;d{\ae}per,\;d{\ae}per$ in trochee and of $/{\eth}{\partial}\;p{\ae}d,\;{\eth}{\partial}\;b{\ae}d,\;{\eth}{\partial}\;t{\ae}d,\;{\eth}{\partial}\;d{\ae}d,\;{\eth}{\partial}\;k{\ae}d,\;{\eth}{\partial}\;g{\ae}d/$ in iambus were extracted and modified into experimental signals composed of two digits(voiced-1, voiceless-0) by following the temporal intervals, in which the signals consisted of preceding vowel, closure, VOT, and post-vowel. In the first experiment of identification correctness in VCV iambus environment, all groups showed almost 100% correctness rate, while in trochee environment all groups were different(native speaker 87%, college high 74%, college low 70%, elementary 65%). In the second experiment of cue robustness, all groups showed the similar perceptual pattern in both environments. There was the order of robustness cues in VCV trochee: pre-vowel ${\gg}$ closure ${\gg}$ VOT ${\gg}$ post-vowel, while the order in VCV iambus: VOT ${\gg}$ post-vowel ${\gg}$ closure ${\gg}$ pre-vowel. In some condition, however, we found moderately different perceptual pattern depending on language, age and listening level.

  • PDF

Evaluating Impressions of Robots According to the Robot's Embodiment Level and Response Speed (로봇의 외형 구체화 정도 및 반응속도에 따른 로봇 인상 평가)

  • Kang, Dahyun;Kwak, Sonya S.
    • Design Convergence Study
    • /
    • v.16 no.6
    • /
    • pp.153-167
    • /
    • 2017
  • Nowadays, as many robots are developed for desktop, users interact with the robots based on speech. However, due to technical limitations related to speech-based interaction, an alternative is needed. We designed this research to design a robot that interacts with the user by using unconditional reflection of biological signals. In order to apply bio-signals to robots more effectively, we evaluated the robots' overall service evaluation, perceived intelligence, appropriateness, trustworthy, and sociability according to the degree of the robot's embodiment level and the response speed of the robot. The result showed that in terms of intelligence and appropriateness, 3D robot with higher embodiment level was more positively evaluated than 2D robot with lower embodiment level. Also, the robot with faster response rate was evaluated more favorably in overall service evaluation, intelligence, appropriateness, trustworthy, and sociability than the robot with slower response rate. In addition, in service evaluation, trustworthy, and sociability, there were interaction effects according to the robot's embodiment level and the response speed.