• 제목/요약/키워드: Acoustical parameter

Search Result 241, Processing Time 0.022 seconds

Comparison of Integration Methods of Speech and Lip Information in the Bi-modal Speech Recognition (바이모달 음성인식의 음성정보와 입술정보 결합방법 비교)

  • 박병구;김진영;최승호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.4
    • /
    • pp.31-37
    • /
    • 1999
  • A bimodal speech recognition using visual and audio information has been proposed and researched to improve the performance of ASR(Automatic Speech Recognition) system in noisy environments. The integration method of two modalities can be usually classified into an early integration and a late integration. The early integration method includes a method using a fixed weight of lip parameters and a method using a variable weight according to speech SNR information. The 4 late integration methods are a method using audio and visual information independently, a method using speech optimal path, a method using lip optimal path and a way using speech SNR information. Among these 6 methods, the method using the fixed weight of lip parameter showed a better recognition rate.

  • PDF

Design of A 3V CMOS Fully-Balanced Complementary Current-Mode Integrator (3V CMOS Fully-Balanced 상보형 전류모드 적분기 설계)

  • Lee, Geun-Ho;Bang, Jun-Ho;Cho, Seong-Ik;Kim, Dong-Yong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.3
    • /
    • pp.106-113
    • /
    • 1997
  • A 3V CMOS continuous-time fully-balanced integrator for low-voltage analog-digital mixed-mode signal processing is designed in this paper. The basic architecture of the designed fully-balanced integrator is complementary circuit which is composed of NMOS and PMOS transistor. And this complementary circuit can extend transconductance of an integrator. So. the unity gain frequency, pole and zero of integrator are increased by the extended transconductance. The SPICE simulation and small signal analysis results show that the UGF, pole and zero of the integrator is increased larger than those of the compared integrtors. The three-pole active low-pass filter is designed as a application circuit of the fully-balanced integrator, using 0.83V CMOS processing parameter.

  • PDF

Speaker Identification Based on Vowel Classification and Vector Quantization (모음 인식과 벡터 양자화를 이용한 화자 인식)

  • Lim, Chang-Heon;Lee, Hwang-Soo;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.8 no.4
    • /
    • pp.65-73
    • /
    • 1989
  • In this paper, we propose a text-independent speaker identification algorithm based on VQ(vector quantization) and vowel classification, and its performance is studied and compared with that of a conventional speaker identification algorithm using VQ. The proposed speaker identification algorithm is composed of three processes: vowel segmentation, vowel recognition and average distortion calculation. The vowel segmentation is performed automatlcally using RMS energy, BTR(Back-to-Total cavity volume Ratio)and SFBR(Signed Front-to-Back maximum area Ratio) extracted from input speech signal. If the Input speech signal Is noisy, particularity when the SNR is around 20dB, the proposed speaker identification algorithm performs better than the reference speaker identification algorithm when the correct vowel segmentation is done. The same result is obtained when we use the noisy telephone speech signal as an input, too.

  • PDF

A Temporal Decomposition Method Based on a Rate-distortion Criterion (비트율-왜곡 기반 음성 신호 시간축 분할)

  • 이기승
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.315-322
    • /
    • 2002
  • In this paper, a new temporal decomposition method is proposed. which takes into consideration not only spectral distortion but also bit rates. The interpolation functions, which are one of necessary parameters for temporal decomposition, are obtained from the training speech corpus. Since the interval between the two targets uniquely defines the interpolation function, the interpolation can be represented without additional information. The locations of the targets are determined by minimizing the bit rates while the maximum spectral distortion maintains below a given threshold. The proposed method has been applied to compressing the LSP coefficients which are widely used as a spectral parameter. The results of the simulation show that an average spectral distortion of about 1.4 dB can be achieved at an average bit rate of about 8 bits/Frame.

Investigating the Properties of the Light Bulb Source in Shallow-Water Environments (천해 환경에서의 전구 음원의 음향학적 특성 연구)

  • Oh Taekhwan;Na Jungyul;Lee Seongwook;Kim Seongil;Park Joung-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.6
    • /
    • pp.303-308
    • /
    • 2005
  • In this paper, the acoustic properties of the light bulb are presented based on a new light bulb source system of continuously transmitting implosive signal . We describe the results of analysis of bulb signals and comparison with Previous works. The results show that Peak-source-level and Primary resonant frequency are increasing with increasing source depth. This bulb source can be used for the purpose of geoacoustic parameter inversion and source tracking in sha]low water via matched field processing.

Voice Conversion Using Linear Multivariate Regression Model and LP-PSOLA Synthesis Method (선형다변회귀모델과 LP-PSOLA 합성방식을 이용한 음성변환)

  • 권홍석;배건성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.15-23
    • /
    • 2001
  • This paper presents a voice conversion technique that modifies the utterance of a source speaker as if it were spoken by a target speaker. Feature parameter conversion methods to perform the transformation of vocal tract and prosodic characteristics between the source and target speakers are described. The transformation of vocal tract characteristics is achieved by modifying the LPC cepstral coefficients using Linear Multivariate Regression (LMR). Prosodic transformation is done by changing the average pitch period between speakers, and it is applied to the residual signal using the LP-PSOLA scheme. Experimental results show that transformed speech by LMR and LP-PSOLA synthesis method contains much characteristics of the target speaker.

  • PDF

A Syllabic Segmentation Method for the Korean Continuous Speech (우리말 연속음성의 음절 분할법)

  • 한학용;고시영;허강인
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.70-75
    • /
    • 2001
  • This paper proposes a syllabic segmentation method for the korean continuous speech. This method are formed three major steps as follows. (1) labeling the vowel, consonants, silence units and forming the Token the sequence of speech data using the segmental parameter in the time domain, pitch, energy, ZCR and PVR. (2) scanning the Token in the structure of korean syllable using the parser designed by the finite state automata, and (3) re-segmenting the syllable parts witch have two or more syllables using the pseudo-syllable nucleus information. Experimental results for the capability evaluation toward the proposed method regarding to the continuous words and sentence units are 73.5%, 85.9%, respectively.

  • PDF

F0 Contour Model based on Temporal Decomposition (시간적 분해에 기반한 F0 궤적 모델에 관한 연구)

  • 변효진;김연준;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.8
    • /
    • pp.75-83
    • /
    • 1999
  • This paper proposes a new F0 contour model for intonation control in speech synthesis. We assume that the F0 contour of an utterance can be described using a sequence of time-overlapping events, which determine the fluctuation of a given F0 contour, described by asymmetric Gaussian functions. In addition, We propose a parameter estimation algorithm for the proposed model. The proposed model is not developed with a particular phonological theory in mind, and can be used in both F0 contour analysis and synthesis. For testing our F0 model, we collected 500 sentences from various genres and built a corresponding speech corpus uttered by a professional female announcer. As n result of F0 resynthesis experiment using the proposed model, the RMSE was 7.87Hz for given speech corpus.

  • PDF

A Design of Lowpass Active Filter for ADLS Tx/Rx Stage (ADSL 송수신단용 저역통과 능동필터 설계)

  • Lee Geun-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.1
    • /
    • pp.38-42
    • /
    • 2005
  • CMOS analog lowpass filters using speech signal bandwidth for a Asymmetrical Digital Subscriver Line(ADSL) modem are presented. Designed active lowpass filters are composed of the CMOS complementary high-swing cascode stage which can increase transconductance of an active element. As a result, their cutoff frequency are 138kHz and 1,100kHz respectively. A low-voltage high-swing cascode integrator which improved on a gain and unit gain frequency used to design the filters. The designed filters are verified by HSPICE simulation with the $0.251{\mu}m\;CMOS\;n-well$ Parameter and a single 2.5V power supply.

Korean Speech Recognition using DHMM (DHMM을 이용한 한국어 음성 인식)

  • Ann, T.O.;Lee, K.S.;Yoo, H.K.;Lee, H.J.;Cho, H.J.;Byun, Y.G.;Kim, S.H.
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.1
    • /
    • pp.52-60
    • /
    • 1991
  • This paper describes the study on isolated word recognition by using DHMM(Dynamic Hidden Markov Model) which has dynamic feature of spectrum as a parameter. This paper discusses speech recognition experiment basedon HMM which can evaluate not only instantaneous spectral features but also dynamic spectral features. LPC cepstrum parameters is used as a static feature and LPC cepstrum's regression coefficient is used as a dynamic feature. These two features are quantized by each VQ codebook. DHMM is modeled by receiving static vector and dynamic vector by input. In the whole experiment, as recognition experiment using DHMM shows 92.7% of recognition rate while the experiment using conventional HMM shows 88.8% of recognition rate, DHMM proved to be a useful model.

  • PDF