• Title/Summary/Keyword: 음향음성학

Search Result 749, Processing Time 0.023 seconds

Wiener filtering-based ambient noise reduction technique for improved acoustic target detection of directional frequency analysis and recording sonobuoy (Directional frequency analysis and recording 소노부이의 표적 탐지 성능 향상을 위한 위너필터링 기반 주변 소음 제거 기법)

  • Hong, Jungpyo;Bae, Inyeong;Seok, Jongwon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.2
    • /
    • pp.192-198
    • /
    • 2022
  • As an effective weapon system for anti-submarine warfare, DIrectional Frequency Analysis and Recording (DIFAR) sonobuoy detects underwater targets via beamforming with three channels composed of an omni-direcitonal and two directional channels. However, ambient noise degrades the detection performance of DIFAR sonobouy in specific direction (0°, 90°, 180°, 270°). Thus, an ambient noise redcution technique is proposed for performance improvement of acoustic target detection of DIFAR sonobuoy. The proposed method is based on OTA (Order Truncate Average), which is widely used in sonar signal processing area, for ambient noise estimation and Wiener filtering, which is widely used in speech signal processing area, for noise reduction. For evaluation, we compare mean square errors of target bearing estmation results of conventional and proposed methods and we confirmed that the proposed method is effective under 0 dB signal-to-noise ratio.

Non-Stationary/Mixed Noise Estimation Algorithm Based on Minimum Statistics and Codebook Driven Short-Term Predictor Parameter Estimation (최소 통계법과 Short-Term 예측계수 코드북을 이용한 Non-Stationary/Mixed 배경잡음 추정 기법)

  • Lee, Myeong-Seok;Noh, Myung-Hoon;Park, Sung-Joo;Lee, Seok-Pil;Kim, Moo-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.3
    • /
    • pp.200-208
    • /
    • 2010
  • In this work, the minimum statistics (MS) algorithm is combined with the codebook driven short-term predictor parameter estimation (CDSTP) to design a speech enhancement algorithm that is robust against various background noise environments. The MS algorithm functions well for the stationary noise but relatively not for the non-stationary noise. The CDSTP works efficiently for the non-stationary noise, but not for the noise that was not considered in the training stage. Thus, we propose to combine CDSTP and MS. Compared with the single use of MS and CDSTP, the proposed method produces better perceptual evaluation of speech quality (PESQ) score, and especially works excellent for the mixed background noise between stationary and non-stationary noises.

Lip-Synch System Optimization Using Class Dependent SCHMM (클래스 종속 반연속 HMM을 이용한 립싱크 시스템 최적화)

  • Lee, Sung-Hee;Park, Jun-Ho;Ko, Han-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.7
    • /
    • pp.312-318
    • /
    • 2006
  • The conventional lip-synch system has a two-step process, speech segmentation and recognition. However, the difficulty of speech segmentation procedure and the inaccuracy of training data set due to the segmentation lead to a significant Performance degradation in the system. To cope with that, the connected vowel recognition method using Head-Body-Tail (HBT) model is proposed. The HBT model which is appropriate for handling relatively small sized vocabulary tasks reflects co-articulation effect efficiently. Moreover the 7 vowels are merged into 3 classes having similar lip shape while the system is optimized by employing a class dependent SCHMM structure. Additionally in both end sides of each word which has large variations, 8 components Gaussian mixture model is directly used to improve the ability of representation. Though the proposed method reveals similar performance with respect to the CHMM based on the HBT structure. the number of parameters is reduced by 33.92%. This reduction makes it a computationally efficient method enabling real time operation.

Design of the Noise Suppressor Using the Perceptual Model and Wavelet Packet Transform (인지 모델과 웨이블릿 패킷 변환을 이용한 잡음 제거기 설계)

  • Kim, Mi-Seon;Park, Seo-Young;Kim, Young-Ju;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.7
    • /
    • pp.325-332
    • /
    • 2006
  • In this paper. we Propose the noise suppressor with the Perceptual model and wavelet packet transform. The objective is to enhance speech corrupted colored or non-stationary noise. If corrupted noise is colored. subband approach would be more efficient than whole band one. To avoid serious residual noise and speech distortion, we must adjust the Wavelet Coefficient Threshold (WCT). In this Paper. the subband is designed matching with the critical band and WCT is adapted noise masking threshold (NMT) and segmental signal to noise ratio (seg_SNR). Consequently. it has similar Performance with EVRC in PESQ-MOS. But it's better than wavelet packet transform using universal threshold about 0.289 in PESQ-MOS. The important thing is that it's more useful than EVRC in coded speech. In coded speech. PESQ-MOS is higher than EVRC about 0.23.

A Unit Selection Methods using Flexible Break in a Japanese TTS (일본어 합성기에서 유동 Break를 이용한 합성단위 선택 방법)

  • Song, Young-Hwan;Na, Deok-Su;Kim, Jong-Kuk;Bae, Myung-Jin;Lee, Jong-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.8
    • /
    • pp.403-408
    • /
    • 2007
  • In a large corpus-based speech synthesizer, a break, which is a parameter influencing the naturalness and intelligibility, is used as an important feature during a unit selection process. Japanese is a language having intonations, which ate indicated by the relative differences in pitch heights and the APs(Accentual Phrases) are placed according to the changes of the accents while a break occurs on a boundary of the APs. Although a break can be predicted by using J-ToBI(Japanese-Tones and Break Indices), which is a rule-based or statistical approach, it is very difficult to predict a break exactly due to the flexibility. Therefore, in this paper, a method is to conduct a unit search by dividing breaks into two types, such as a fixed break and a flexible break, in order to use the advantages of a large-scale corpus, which includes various types of prosodies. As a result of an experiment, the proposed unit selection method contributed itself to enhance the naturalness of synthesized speeches.

Method of Harmonic Magnitude Quantization for Harmonic Coder Using the Straight Line and DCT (Discrete Cosine Transform) (하모닉 코더를 위한 직선과 이산코사인변환 (DCT)을 이용한 하모닉 크기값 (Magnitude) 양자화 기법)

  • Choi, Ji-Wook;Jeong, Gyu-Hyeok;Lee, In-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.4
    • /
    • pp.200-206
    • /
    • 2008
  • This paper presents a method of quantization to extract quantization parameters using the straight-line and DCT (Discrete Cosine Transform) for two splited frequency bands. As the number of harmonic is variable frame to frame, harmonics in low frequency band is oversampled to fix the dimension and straight-lines present a spectral envelope, then the discontinuous points of straight-lines in low frequency is sent to quantizer. Thus, extraction of quantization parameters using the straight-line provides a fixed dimension. Harmonics in high frequency use variable DCT to obtain quantization parameters and this paper proposes a method of quantization combining the straight-line with DCT. The measurement (If proposed method of quantization uses spectral distortion (SD) for spectral magnitudes. As a result, The proposed method of quantization improved 0.3dB in term of SD better than HVXC.

Comparison of the Voice Outcome After Injection Laryngoplasty: Unilateral Vocal Fold Paralysis Due to Cancer Nerve Invasion and Iatrogenic Injury (성대주입술 후 음향학적 분석결과 비교: 암의 신경 침윤으로 인한 일측성 성대마비 환자와 수술 후 발생한 일측성 성대마비 환자)

  • Yongmin, Cho;Hyunseok, Choi;Kyoung Ho, Oh;Seung-Kuk, Baek;Jeong-Soo, Woo;Soon Young, Kwon;Kwang-Yoon, Jung;Jae-Gu, Cho
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.33 no.3
    • /
    • pp.172-178
    • /
    • 2022
  • Background and Objectives Injection laryngoplasty is a common method for treatment of unilateral vocal fold paralysis. Unilateral vocal fold paralysis has various causes, including idiopathic, infection, stroke, neurologic condition, surgery and nerve invasion by cancer. To the knowledge of the authors, there was no study on the relationship between the causes of vocal cord paralysis and the outcome of injection laryngoplasty. Therefore, we tried to investigate the difference in the outcomes of injection laryngoplasty between vocal cord paralysis after surgery group and nerve invasion by cancer group. Materials and Method A retrospective analysis was performed for 24 patients who underwent vocal cord injection due to unilateral vocal cord paralysis caused by surgery or nerve invasion by cancer. The objective quality of the voice was assessed by acoustic voice analysis with the Multi-Dimensional Voice Program. Results Both group showed an improvement of fundamental frequemcy (F0), jitter percent, shimmer (percent), and noise to hearmonic ratio (NHR) after injection laryngoplasty. The vocal cord paralysis due to nerve invasion group showed more improvement in both the mean and median value of F0, shimmer percent and NHR than the vocal cord paralysis due to surgery group, but there was not statistically significant. Conclusion Our study did not show a statistically significant difference in outcome between vocal cord paralysis due to cancer invasion group and surgery group, but statistically tendency was suggested. The vocal cord paralysis due to nerve invasion group showed more improvement in both the mean and median value of acoustic voice analysis than surgery group.

Diagnosis and Evaluation of Humanities Therapy: The Phonetic Analysis of Speech Rates and Fundamental Frequency According to Preferred Sensation Type (인문치료의 진단 및 평가: 감각유형에 따른 말속도와 기본주파수의 실험음성학적 분석)

  • Lee, Chan-Jong;Heo, Yun-Ju
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.4
    • /
    • pp.231-237
    • /
    • 2011
  • The purpose of this study is to examine the correlation between the preferred sensation type and speech sounds, especially on $F_0$ and the speech rates. Data for the sensation types and speech sounds were collected from 36 undergraduate and graduate students (17 male, 19 female). Subjects were asked to read a given text (400 syllables), describe a drawing, and give answers to some questions. We measured speakers' $F_0$ and speech rates. The results show that type V (Visual) has the correlation with the speech rates when type D (Digital) was ruled out, and type A (Auditory) has the correlation with the speech rates when type D was included. Furthermore, the analysis of the mean values of V, A, K (Visual, Auditory, Kinethetic) indicates that type V is characterized with faster speech rates and higher $F_0$ in all parts except for interview and the same is true for that of V, A, K, D (Visual, Auditory, Kinethetic, Digital) in all parts. In conclusion, this study proved that the preferred sensation type has the correlation with $F_0$ and speech rates. Based on the results of this study, $F_0$ and speech rates can be used to analyze the sensation types for individualized education as well as consultation. In addition, this study has great significance in that it lays a foundation for the study on the correlation between a preferred sensation type and speech sounds.

A Speaker Pruning Method for Reducing Calculation Costs of Speaker Identification System (화자식별 시스템의 계산량 감소를 위한 화자 프루닝 방법)

  • 김민정;오세진;정호열;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.6
    • /
    • pp.457-462
    • /
    • 2003
  • In this paper, we propose a speaker pruning method for real-time processing and improving performance of speaker identification system based on GMM(Gaussian Mixture Model). Conventional speaker identification methods, such as ML (Maximum Likelihood), WMR(weighting Model Rank), and MWMR(Modified WMR) we that frame likelihoods are calculated using the whole frames of each input speech and all of the speaker models and then a speaker having the biggest accumulated likelihood is selected. However, in these methods, calculation cost and processing time become larger as the increase of the number of input frames and speakers. To solve this problem in the proposed method, only a part of speaker models that have higher likelihood are selected using only a part of input frames, and identified speaker is decided from evaluating the selected speaker models. In this method, fm can be applied for improving the identification performance in speaker identification even the number of speakers is changed. In several experiments, the proposed method showed a reduction of 65% on calculation cost and an increase of 2% on identification rate than conventional methods. These results means that the proposed method can be applied effectively for a real-time processing and for improvement of performance in speaker identification.

EM Algorithm with Initialization Based on Incremental ${\cal}k-means$ for GMM and Its Application to Speaker Identification (GMM을 위한 점진적 ${\cal}k-means$ 알고리즘에 의해 초기값을 갖는 EM알고리즘과 화자식별에의 적용)

  • Seo Changwoo;Hahn Hernsoo;Lee Kiyong;Lee Younjeong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.141-149
    • /
    • 2005
  • Tn general. Gaussian mixture model (GMM) is used to estimate the speaker model from the speech for speaker identification. The parameter estimates of the GMM are obtained by using the Expectation-Maximization (EM) algorithm for the maximum likelihood (ML) estimation. However the EM algorithm has such drawbacks that it depends heavily on the initialization and it needs the number of mixtures to be known. In this paper, to solve the above problems of the EM algorithm. we propose an EM algorithm with the initialization based on incremental ${\cal}k-means$ for GMM. The proposed method dynamically increases the number of mixtures one by one until finding the optimum number of mixtures. Whenever adding one mixture, we calculate the mutual relationship between it and one of other mixtures respectively. Finally. based on these mutual relationships. we can estimate the optimal number of mixtures which are statistically independent. The effectiveness of the proposed method is shown by the experiment for artificial data. Also. we performed the speaker identification by applying the proposed method comparing with other approaches.