• Title/Summary/Keyword: Speech signals

Search Result 498, Processing Time 0.026 seconds

Generalized cross correlation with phase transform sound source localization combined with steered response power method (조정 응답 파워 방법과 결합된 generalized cross correlation with phase transform 음원 위치 추정)

  • Kim, Young-Joon;Oh, Min-Jae;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.5
    • /
    • pp.345-352
    • /
    • 2017
  • We propose a methods which is reducing direction estimation error of sound source in the reverberant and noisy environments. The proposed algorithm divides speech signal into voice and unvoice using VAD. We estimate the direction of source when current frame is voiced. TDOA (Time-Difference of Arrival) between microphone array using the GCC-PHAT (Generalized Cross Correlation with Phase Transform) method will be estimated in that frame. Then, we compare the peak value of cross-correlation of two signals applied to estimated time-delay with other time-delay in time-table in order to improve the accuracy of source location. If the angle of current frame is far different from before and after frame in successive voiced frame, the angle of current frame is replaced with mean value of the estimated angle in before and after frames.

Efficient Implementation of IFFT and FFT for PHAT Weighting Speech Source Localization System (PHAT 가중 방식 음성신호방향 추정시스템의 FFT 및 IFFT의 효율적인 구현)

  • Kim, Yong-Eun;Hong, Sun-Ah;Chung, Jin-Gyun
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.1
    • /
    • pp.71-78
    • /
    • 2009
  • Sound source localization systems in service robot applications estimate the direction of a human voice. Time delay information obtained from a few separate microphones is widely used for the estimation of the sound direction. Correlation is computed in order to calculate the time delay between two signals. In addition, PHAT weighting function can be applied to significantly improve the accuracy of the estimation. However, FFT and IFFT operations in the PHAT weighting function occupy more than half of the area of the sound source localization system. Thus efficient FFT and IFFT designs are essential for the IP implementation of sound source localization system. In this paper, we propose an efficient FFT/IFFT design method based on the characteristics of human voice.

Audio Segmentation and Classification Using Support Vector Machine and Fuzzy C-Means Clustering Techniques (서포트 벡터 머신과 퍼지 클러스터링 기법을 이용한 오디오 분할 및 분류)

  • Nguyen, Ngoc;Kang, Myeong-Su;Kim, Cheol-Hong;Kim, Jong-Myon
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.19-26
    • /
    • 2012
  • The rapid increase of information imposes new demands of content management. The purpose of automatic audio segmentation and classification is to meet the rising need for efficient content management. With this reason, this paper proposes a high-accuracy algorithm that segments audio signals and classifies them into different classes such as speech, music, silence, and environment sounds. The proposed algorithm utilizes support vector machine (SVM) to detect audio-cuts, which are boundaries between different kinds of sounds using the parameter sequence. We then extract feature vectors that are composed of statistical data and they are used as an input of fuzzy c-means (FCM) classifier to partition audio-segments into different classes. To evaluate segmentation and classification performance of the proposed SVM-FCM based algorithm, we consider precision and recall rates for segmentation and classification accuracy for classification. Furthermore, we compare the proposed algorithm with other methods including binary and FCM classifiers in terms of segmentation performance. Experimental results show that the proposed algorithm outperforms other methods in both precision and recall rates.

A Study on Classification of Waveforms Using Manifold Embedding Based on Commute Time (컴뮤트 타임 기반의 다양체 임베딩을 이용한 파형 신호 인식에 관한 연구)

  • Hahn, Hee-Il
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.148-155
    • /
    • 2014
  • In this paper a commute time embedding is implemented by organizing patches according to the graph-based metric, and its properties are investigated via changing the number of nodes on the graph.. It is shown that manifold embedding methods generate the intrinsic geometric structures when waveforms such as speech or music instrumental sound signals are embedded on the low dimensional Euclidean space. Basically manifold embedding algorithms only project the training samples on the graph into an embedding subspace but can not generalize the learning results to test samples. They are very effective for data clustering but are not appropriate for classification or recognition. In this paper a commute time guided transform is adopted to enhance the generalization ability and its performance is analyzed by applying it to the classification of 6 kinds of music instrumental sounds.

Development of Neck-Type Electrolarynx Blueton and Acoustic Characteristic Analysis (경부형 전기인공후두 Blueton의 개발과 음향학적 성능 분석)

  • Choi, Seong-Hee;Park, Young-Jae;Park, Young-Kwan;Kim, Tae-Jung;Nam, Do-Hyun;Lim, Sung-Eun;Lee, Sung-Eun;Kim, Han-Soo;Choi, Hong-Shik;Kim, Kwang-Moon
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.15 no.1
    • /
    • pp.37-42
    • /
    • 2004
  • Electrolarynx(EL), battery operated vibrators which are held against the neck by on-off button, has been widely used as a verbal communication method among post-laryngectomized patients. EL speech can produce easily without need of any additional surgery or special training and be used with any other methods. This institute developed a neck-typed EL named "Blueton" in commperation with EL Company Linkus, which consists of 3 parts : Vibrator part, Control part, Battery part. In this study we evaluated the acoustic characteristics of the produced voices by Blueton compared with Servox-inton using MDVP. Three EL users (2 full time users, 1 part time user) were participated. The results revelaed that NHR higher in Servox than Blueton and intensity is higher in Blueton than Servox. The spectra for vowels produced by EL speakers are mixed signals combined with talkers' vocal output and electrolarynx noise. The spectra pattern is similar with two ELs. High, SPI index and vowel spectra from MDVP demonstrated characteristics of both electrolarynxes related to noise signal. This finding suggests that Blueton helps to provide one of useful rehabilitation options in the post laryngectomy patients.

  • PDF

A DCT Adaptive Subband Filter Algorithm Using Wavelet Transform (웨이브렛 변환을 이용한 DCT 적응 서브 밴드 필터 알고리즘)

  • Kim, Seon-Woong;Kim, Sung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.1
    • /
    • pp.46-53
    • /
    • 1996
  • Adaptive LMS algorithm has been used in many application areas due to its low complexity. In this paper input signal is transformed into the subbands with arbitrary bandwidth. In each subbands the dynamic range can be reduced, so that the independent filtering in each subbands has faster convergence rate than the full band system. The DCT transform domain LMS adaptive filtering has the whitening effect of input signal at each bands. This leads the convergence rate to very high speed owing to the decrease of eigen value spread Finally, the filtered signals in each subbands are synthesized for the output signal to have full frequency components. In this procedure wavelet filter bank guarantees the perfect reconstruction of signal without any interspectra interference. In simulation for the case of speech signal added additive white gaussian noise, the suggested algorithm shows better performance than that of conventional NLMS algorithm at high SNR.

  • PDF

Double Talk Detection before the Convergence of Echo Canceller (반향제거기의 수렴전 동시통화검출)

  • Yoo, Jae-Ha;Kim, Soo-Chan;Kim, Dong-Yon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.5
    • /
    • pp.203-208
    • /
    • 2013
  • In this paper, we proposed a performance improvement method of the double talk detector which can operate before the echo canceller converges. Microphone input signal is filtered by the linear prediction filter and this filtered signal is used for detection. The coefficients of the linear prediction filter are given by the far-end talker signal. During single talk, filtered signal has low power since the characteristics of the echo signal is similar with those of the far-end talker signal. But, during double talk, the filtered signal does not have low power because the signal of different characteristics is included in the microphone signal. Double talk is detected by this difference. Simulations using real speech signals verified that the proposed method outperformed the conventional methods.

A Study on the Robust Sound Localization System Using Subband Filter Bank (서브밴드 필터 뱅크를 이용한 강인한 음원 추적시스템에 대한 연구)

  • 박규식;박재현;온승엽;오상헌
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.36-42
    • /
    • 2001
  • This paper propose new sound localization algorithm that detects the sound source bearing in a closed office environment using two microphone array. The proposed Subband CPSP (Cross Power Spectrum Phase) algorithm is a development of previously Down CPSP method using subband approach. It first split the received microphone signals into subbands and then calculates subband CPSP which result in possible source bearings. This type of algorithm, Subband CPSP, can provide more robust and reliable sound localization system because it limits the effects of environmental noise within each subband. To verify the performance of the proposed Subband CPSP algorithm, a real time simulation was conducted and it was compared with previous CPSP method. From the simulation results, the proposed Subband CPSP is superior to previous CPSP algorithm more than 5% average accuracy for sound source detection.

  • PDF

Acoustic Echo Cancellation Based on Convolutive Blind Signal Separation Method (Convolutive 암묵신호분리방법에 기반한 음향반향 제거)

  • Lee, Haeng-Woo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.5
    • /
    • pp.979-986
    • /
    • 2018
  • This paper deals with acoustic echo cancellation using blind signal separation method. This method does not degrade the echo cancellation performance even during double-talk. In the closed echo environment, the mixing model of acoustic signals is multi-channel, so the convolutive blind signal separation method is applied and the mixing coefficients are calculated by using the feedback model without directly calculating the separation coefficients for signal separation. The coefficient update is performed by iterative calculations based on the second-order statistical properties, thus estimates the near-end speech. A number of simulations have been performed to verify the performance of the proposed blind signal separation method. The simulation results show that the acoustic echo canceller using this method operates safely regardless of the presence of double-talk, and the PESQ is improved by 0.6 point compared with the general adaptive FIR filter structure.

Deep Learning based Raw Audio Signal Bandwidth Extension System (딥러닝 기반 음향 신호 대역 확장 시스템)

  • Kim, Yun-Su;Seok, Jong-Won
    • Journal of IKEEE
    • /
    • v.24 no.4
    • /
    • pp.1122-1128
    • /
    • 2020
  • Bandwidth Extension refers to restoring and expanding a narrow band signal(NB) that is damaged or damaged in the encoding and decoding process due to the lack of channel capacity or the characteristics of the codec installed in the mobile communication device. It means converting to a wideband signal(WB). Bandwidth extension research mainly focuses on voice signals and converts high bands into frequency domains, such as SBR (Spectral Band Replication) and IGF (Intelligent Gap Filling), and restores disappeared or damaged high bands based on complex feature extraction processes. In this paper, we propose a model that outputs an bandwidth extended signal based on an autoencoder among deep learning models, using the residual connection of one-dimensional convolutional neural networks (CNN), the bandwidth is extended by inputting a time domain signal of a certain length without complicated pre-processing. In addition, it was confirmed that the damaged high band can be restored even by training on a dataset containing various types of sound sources including music that is not limited to the speech.