• Title/Summary/Keyword: Acoustical parameter

Search Result 241, Processing Time 0.027 seconds

EM Algorithm with Initialization Based on Incremental ${\cal}k-means$ for GMM and Its Application to Speaker Identification (GMM을 위한 점진적 ${\cal}k-means$ 알고리즘에 의해 초기값을 갖는 EM알고리즘과 화자식별에의 적용)

  • Seo Changwoo;Hahn Hernsoo;Lee Kiyong;Lee Younjeong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.141-149
    • /
    • 2005
  • Tn general. Gaussian mixture model (GMM) is used to estimate the speaker model from the speech for speaker identification. The parameter estimates of the GMM are obtained by using the Expectation-Maximization (EM) algorithm for the maximum likelihood (ML) estimation. However the EM algorithm has such drawbacks that it depends heavily on the initialization and it needs the number of mixtures to be known. In this paper, to solve the above problems of the EM algorithm. we propose an EM algorithm with the initialization based on incremental ${\cal}k-means$ for GMM. The proposed method dynamically increases the number of mixtures one by one until finding the optimum number of mixtures. Whenever adding one mixture, we calculate the mutual relationship between it and one of other mixtures respectively. Finally. based on these mutual relationships. we can estimate the optimal number of mixtures which are statistically independent. The effectiveness of the proposed method is shown by the experiment for artificial data. Also. we performed the speaker identification by applying the proposed method comparing with other approaches.

α-feature map scaling for raw waveform speaker verification (α-특징 지도 스케일링을 이용한 원시파형 화자 인증)

  • Jung, Jee-weon;Shim, Hye-jin;Kim, Ju-ho;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.441-446
    • /
    • 2020
  • In this paper, we propose the α-Feature Map Scaling (α-FMS) method which extends the FMS method that was designed to enhance the discriminative power of feature maps of deep neural networks in Speaker Verification (SV) systems. The FMS derives a scale vector from a feature map and then adds or multiplies them to the features, or sequentially apply both operations. However, the FMS method not only uses an identical scale vector for both addition and multiplication, but also has a limitation that it can only add a value between zero and one in case of addition. In this study, to overcome these limitations, we propose α-FMS to add a trainable parameter α to the feature map element-wise, and then multiply a scale vector. We compare the performance of the two methods: the one where α is a scalar, and the other where it is a vector. Both α-FMS methods are applied after each residual block of the deep neural network. The proposed system using the α-FMS methods are trained using the RawNet2 and tested using the VoxCeleb1 evaluation set. The result demonstrates an equal error rate of 2.47 % and 2.31 % for the two α-FMS methods respectively.

Identification of the Sectional Distribution of Sound Source in a Wide Duct (넓은 덕트 단면내의 음원 분포 규명)

  • Heo, Yong-Ho;Ih, Jeong-Guon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.2
    • /
    • pp.87-93
    • /
    • 2014
  • If one identifies the detailed distribution of pressure and axial velocity at a source plane, the position and strength of major noise sources can be known, and the propagation characteristics in axial direction can be well understood to be used for the low noise design. Conventional techniques are usually limited in considering the constant source characteristics specified on the whole source surface; then, the source activity cannot be known in detail. In this work, a method to estimate the pressure and velocity field distribution on the source surface with high spatial resolution is studied. The matrix formulation including the evanescent modes is given, and the nearfield measurement method is proposed. Validation experiment is conducted on a wide duct system, at which a part of the source plane is excited by an acoustic driver in the absence of airflow. Increasing the number of evanescent modes, the prediction of pressure spectrum becomes further precise, and it has less than -25 dB error with 26 converged evanescent modes within the Helmholtz number range of interest. By using the converged modal amplitudes, the source parameter distribution is restored, and the position of the driver is clearly identified at kR = 1. By applying the regularization technique to the restored result, the unphysical minor peaks at the source plane can be effectively suppressed with the filtering of the over-estimated pure radial modes.

Performance analysis of weakly-supervised sound event detection system based on the mean-teacher convolutional recurrent neural network model (평균-교사 합성곱 순환 신경망 모델을 이용한 약지도 음향 이벤트 검출 시스템의 성능 분석)

  • Lee, Seokjin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.2
    • /
    • pp.139-147
    • /
    • 2021
  • This paper introduces and implements a Sound Event Detection (SED) system based on weakly-supervised learning where only part of the data is labeled, and analyzes the effect of parameters. The SED system estimates the classes and onset/offset times of events in the acoustic signal. In order to train the model, all information on the event class and onset/offset times must be provided. Unfortunately, the onset/offset times are hard to be labeled exactly. Therefore, in the weakly-supervised task, the SED model is trained by "strongly labeled data" including the event class and activations, "weakly labeled data" including the event class, and "unlabeled data" without any label. Recently, the SED systems using the mean-teacher model are widely used for the task with several parameters. These parameters should be chosen carefully because they may affect the performance. In this paper, performance analysis was performed on parameters, such as the feature, moving average parameter, weight of the consistency cost function, ramp-up length, and maximum learning rate, using the data of DCASE 2020 Task 4. Effects and the optimal values of the parameters were discussed.

Feasibility of hearing aid gain self-adjustment using speech recognition (말소리 인지를 이용한 보청기 이득 자가 조절의 실현)

  • Yun, Donghyeon;Shen, Yi;Zhang, Zhuohuang
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.76-86
    • /
    • 2022
  • Personal hearing devices, such as hearing aids, may be fine-tuned by allowing the users to conduct self-adjustment. Two self-adjustment procedures were developed to collect the listener preferred gains in six octave-frequency bands from 0.25 kHz to 8 kHz. These procedures were designed to allow rapid exploration of a multi-dimensional parameter space using a simple, one-dimensional user control interface (i.e., a programmable knob). The two procedures differ in whether the user interface controls the gains in all frequency bands simultaneously (Procedure A) or only the gain in one frequency band (Procedure B) on a given trial. Monte-Carlo simulations suggested that for both procedures the gain preference identified by simulated listeners rapidly converged to the ground-truth preferred gain profile over the first 20 trials. Initial behavioral evaluations of the self-adjustment procedures, in terms of test-retest reliability, were conducted using 20 young, normal-hearing listeners. Each estimate of the preferred gain profile took less than 20 minutes. The deviation between two separate estimates of the preferred gain profile, conducted at least a week apart, was about 10 dB ~ 15 dB.

Acoustic outputs from clinical ballistic extracorporeal shock wave therapeutic devices (임상에서 사용중인 탄도형 체외충격파 치료기의 음향 출력)

  • Cho, Jin Sik;Kwon, Oh Bin;Jeon, Sung Joung;Lee, Min Young;Kim, Jong Min;Choi, Min Joo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.5
    • /
    • pp.570-588
    • /
    • 2022
  • We scrutinized the acoustic outputs from the 70 shock wave generators of the 15 product models whose technical documents were available, among the 46 ballistic extracorporeal shock wave therapeutic devices of 11 domestic and 6 foreign manufacturers, approved by the Minster of Food & Drug Safety (Rep. Korea). We found that the acoustic Energy Flux Density (EFD), the most popular exposure parameter, was different by up to 563.64 times among shock wave generators at their minimum output settings and by up to 74.62 times at their maximum settings. In the same product model, the EFD was shown to vary depending on shock wave transmitters by up to 81.82 times at its minimum output setting and by up to 46.15 times at its maximum setting. The lowest EFD 0.013 mJ/mm2 at the maximum output settings was much lower (2.1 %) than the maximum value 0.62 mJ/mm2 at the minimum settings. The Large acoustic output differences (tens to hundreds of times)from the therapeutic devices approved for the same clinical indications imply that their therapeutic efficacy & safety may not be assured. The findings suggest the regulatory authority to revise her guideline to give clearer criteria for clinical approval and equality in performance, and recommend the authority to initiate a post-approval surveillance as well as a test in conformance between the data in technical documents and the real acoustic outputs clinically used.

Underwater acoustic communication performance in reverberant water tank (잔향음 우세 수조 환경에서의 수중음향 통신성능 분석)

  • Choi, Kang-Hoon;Hwang, In-Seong;Lee, Sangkug;Choi, Jee Woong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.184-191
    • /
    • 2022
  • Underwater acoustic wave in shallow water is propagated through multipath that has a large delay spread causing Inter-Symbol Interference (ISI) and these characteristics deteriorate the performance in the communication system. In order to analyze the communication performance and investigate the correlation with multipath delay spread in a reverberant environment, an underwater acoustic communication experiment using Binary Phase-Shift Keying (BPSK) signals with symbol rates from 100 sym/s to 8000 sym/s was conducted in a 5 × 5 × 5 m3 water tank. The acoustic channels in a well-controlled tank environment had the characteristics of dense multipath delay spread due to multiple reflections from the interfaces and walls within the tank and showed the maximum excess delay of 40 ms or less, and the Root Mean Squared (RMS) delay spread of 8 ms or less. In this paper, the performances of Bit Error Rate (BER) and output Signal-to-Noise Ratio (SNR) were analyzed using four types of communication demodulation techniques. And the parameter, Symbol interval to Delay spread Ratio in reverberant environment (SDRrev), which is the ratio of symbol interval to RMS delay spread in the reverberant environment is defined. Finally, the SDRrev was compared to the BER and the output SNR. The results present the reference symbol rate in which high communication performance can be guaranteed.

Measurement of the Plane Wave Reflection Coefficient for the Saturated Granular Medium in the Water Tank and Comparison to Predictions by the Biot Theory (수조에서 입자 매질의 평면파 반사계수 측정과 Biot 이론에 의한 예측)

  • Lee Keun-Hwa
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.5
    • /
    • pp.246-256
    • /
    • 2006
  • The plane wave reflection coefficient is an acoustic property containing all the information concerning the ocean bottom and can be used as an input parameter to various acoustic propagation models. In this paper, we measure the plane wave reflection coefficient, the sound speed, thd the attenuation for saturated granular medium in the water tank. Three kinds of glass beads and natural sand are used as the granular medium. The reflection experiment is performed with the sinusoidal tone bursts of 100 kHz at incident angles from 28 to 53 degrees, and the sound speed and attenuation experiment are performed also with the same signal. From the measured reflection signal, the reflection coefficient is calculated with the self calibration method and the experimental uncertainties are discussed. The sound speed and the attenuation measurements are used for the estimation of the porosity and permeability, the main Biot parameters. The estimated values are compared to the directly measured values and used as input values to the Biot theory in order to calculate the theoretical reflection coefficient. Finally, the reflection coefficient predicted by Biot theory is compared to the measured reflection coefficient and their characteristics are discussed.

A Performance Improvement Method using Variable Break in Corpus Based Japanese Text-to-Speech System (가변 Break를 이용한 코퍼스 기반 일본어 음성 합성기의 성능 향상 방법)

  • Na, Deok-Su;Min, So-Yeon;Lee, Jong-Seok;Bae, Myung-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.155-163
    • /
    • 2009
  • In text-to-speech systems, the conversion of text into prosodic parameters is necessarily composed of three steps. These are the placement of prosodic boundaries. the determination of segmental durations, and the specification of fundamental frequency contours. Prosodic boundaries. as the most important and basic parameter. affect the estimation of durations and fundamental frequency. Break prediction is an important step in text-to-speech systems as break indices (BIs) have a great influence on how to correctly represent prosodic phrase boundaries, However. an accurate prediction is difficult since BIs are often chosen according to the meaning of a sentence or the reading style of the speaker. In Japanese, the prediction of an accentual phrase boundary (APB) and major phrase boundary (MPB) is particularly difficult. Thus, this paper presents a method to complement the prediction errors of an APB and MPB. First, we define a subtle BI in which it is difficult to decide between an APB and MPB clearly as a variable break (VB), and an explicit BI as a fixed break (FB). The VB is chosen using the classification and regression tree, and multiple prosodic targets in relation to the pith and duration are then generated. Finally. unit-selection is conducted using multiple prosodic targets. In the MOS test result. the original speech scored a 4,99. while proposed method scored a 4.25 and conventional method scored a 4.01. The experimental results show that the proposed method improves the naturalness of synthesized speech.

Speech Reinforcement Based on G.729A Speech Codec Parameter Under Near-End Background Noise Environments (근단 배경 잡음 환경에서 G.729A 음성부호화기 파라미터에 기반한 새로운 음성 강화 기법)

  • Choi, Jae-Hun;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.4
    • /
    • pp.392-400
    • /
    • 2009
  • In this paper, we propose an effective speech reinforcement technique base on ITU-T G.729A CS-ACELP codec under the near-end background noise environments. In general, since the intelligibility of the far-end speech for the near-end listener is significantly reduced under near-end noise environments, we require a far-end speech reinforcement approach to avoid this phenomena. In contrast to the conventional speech reinforcement algorithm, we reinforce the excitation signal of the codec's parameters received from the far-end speech signal based on the G.729A speech codec under various background noise environments. Specifically, we first estimate the excitation signal of ambient noise at the near-end through the encoder of the G.729A speech codec, reinforcing the excitation signal of the far-end speech transmitted from the far-end. we specially propose a novel approach to directly reinforce the excitation signal of far-end speech signal based on the decoder of the G.729A. The performance of the proposed algorithm is evaluated by the CCR (Comparison Category Rating) test of the method for subjective determination of transmission quality in ITU-T P.800 under various noise environments and shows better performances compared with conventional SNR Recovery methods.