• Title/Summary/Keyword: Acoustic feature

Search Result 238, Processing Time 0.027 seconds

A Broadband FIR Beamformer for Underwater Acoustic Communications (수중음향통신을 위한 광대역 FIR 빔형성기)

  • Choi, Young-Chol;Lim, Yong-Kon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.12
    • /
    • pp.2151-2156
    • /
    • 2006
  • Beamforming for underwater acoustic communication (UAC) is affected by the broadband feature of UAC signal, which has relatively low currier frequency as compared to the signal bandwidth. The narrow-band assumption does not hold good in UAC. In this paper, we discuss a broadband FIR beamformer for UAC using the baseband equivalent way signal model. We consider the broadband FIR beamformer for QPSK UAC with carrier frequency 25kHz and symbol rate 5kHz. Array geometry is a uniform linear way with 8 omni-directional elements and sensor spacing is the half of the carrier wavelength. The simulation results show that the broadband n beamformer achieves nearly optimum signal to interference and noise ratio (SINR) and outperforms the conventional narrowband beamformer by SINR 0.5dB when two-tap FIR filter is employed at each sensor and the inter-tap delay is a quarter of the symbol interval. The broadband FIR beamformer performance is more degraded as the FIR filter length is increased above a certain value. If the inter-tap delay is not greater than half of the symbol period, SINR performance does not depend on the inter-tap delay. More training period is required when the inter-tap delay is same as the symbol period.

Classification of Acoustic Emission Signals for Fatigue Crack Opening and Closure by Artificial Neural Network Based on Principal Component Analysis (주성분 분석과 인공신경망을 이용한 피로균열 열림.닫힘 시 음향방출 신호분류)

  • Kim, Ki-Bok;Yoon, Dong-Jin;Jeong, Jung-Chae;Lee, Seung-Seok
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.22 no.5
    • /
    • pp.532-538
    • /
    • 2002
  • This study was performed to classify the fatigue crack opening and closure for three kinds of aluminum alloy using principal component analysis (PCA). Fatigue cycle loading test was conducted to acquire AE signals which come from different source mechanisms such as crack opening and closure, rubbing, fretting etc. To extract the significant feature from AE signal, correlation analysis was performed. Over 94% of the variance of AE parameters could accounted for the first two principal components. The results of the PCA on AE parameters showed that the first principal component was associated with the size of AE signals and the second principal component was associated with the shape of AE signals. An artificial neural network (ANN) an analysis was successfully used to classify AE signals into six classes. The ANN classifier based on PCA appeared to be a promising tool to classify AE signals for fatigue crack opening and closure.

Performance Comparison of State-of-the-Art Vocoder Technology Based on Deep Learning in a Korean TTS System (한국어 TTS 시스템에서 딥러닝 기반 최첨단 보코더 기술 성능 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.2
    • /
    • pp.509-514
    • /
    • 2020
  • The conventional TTS system consists of several modules, including text preprocessing, parsing analysis, grapheme-to-phoneme conversion, boundary analysis, prosody control, acoustic feature generation by acoustic model, and synthesized speech generation. But TTS system with deep learning is composed of Text2Mel process that generates spectrogram from text, and vocoder that synthesizes speech signals from spectrogram. In this paper, for the optimal Korean TTS system construction we apply Tacotron2 to Tex2Mel process, and as a vocoder we introduce the methods such as WaveNet, WaveRNN, and WaveGlow, and implement them to verify and compare their performance. Experimental results show that WaveNet has the highest MOS and the trained model is hundreds of megabytes in size, but the synthesis time is about 50 times the real time. WaveRNN shows MOS performance similar to that of WaveNet and the model size is several tens of megabytes, but this method also cannot be processed in real time. WaveGlow can handle real-time processing, but the model is several GB in size and MOS is the worst of the three vocoders. From the results of this study, the reference criteria for selecting the appropriate method according to the hardware environment in the field of applying the TTS system are presented in this paper.

Sound event detection model using self-training based on noisy student model (잡음 학생 모델 기반의 자가 학습을 활용한 음향 사건 검지)

  • Kim, Nam Kyun;Park, Chang-Soo;Kim, Hong Kook;Hur, Jin Ook;Lim, Jeong Eun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.479-487
    • /
    • 2021
  • In this paper, we propose an Sound Event Detection (SED) model using self-training based on a noisy student model. The proposed SED model consists of two stages. In the first stage, a mean-teacher model based on an Residual Convolutional Recurrent Neural Network (RCRNN) is constructed to provide target labels regarding weakly labeled or unlabeled data. In the second stage, a self-training-based noisy student model is constructed by applying different noise types. That is, feature noises, such as time-frequency shift, mixup, SpecAugment, and dropout-based model noise are used here. In addition, a semi-supervised loss function is applied to train the noisy student model, which acts as label noise injection. The performance of the proposed SED model is evaluated on the validation set of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge Task 4. The experiments show that the single model and ensemble model of the proposed SED based on the noisy student model improve F1-score by 4.6 % and 3.4 % compared to the top-ranked model in DCASE 2020 challenge Task 4, respectively.

Hi, KIA! Classifying Emotional States from Wake-up Words Using Machine Learning (Hi, KIA! 기계 학습을 이용한 기동어 기반 감성 분류)

  • Kim, Taesu;Kim, Yeongwoo;Kim, Keunhyeong;Kim, Chul Min;Jun, Hyung Seok;Suk, Hyeon-Jeong
    • Science of Emotion and Sensibility
    • /
    • v.24 no.1
    • /
    • pp.91-104
    • /
    • 2021
  • This study explored users' emotional states identified from the wake-up words -"Hi, KIA!"- using a machine learning algorithm considering the user interface of passenger cars' voice. We targeted four emotional states, namely, excited, angry, desperate, and neutral, and created a total of 12 emotional scenarios in the context of car driving. Nine college students participated and recorded sentences as guided in the visualized scenario. The wake-up words were extracted from whole sentences, resulting in two data sets. We used the soundgen package and svmRadial method of caret package in open source-based R code to collect acoustic features of the recorded voices and performed machine learning-based analysis to determine the predictability of the modeled algorithm. We compared the accuracy of wake-up words (60.19%: 22%~81%) with that of whole sentences (41.51%) for all nine participants in relation to the four emotional categories. Accuracy and sensitivity performance of individual differences were noticeable, while the selected features were relatively constant. This study provides empirical evidence regarding the potential application of the wake-up words in the practice of emotion-driven user experience in communication between users and the artificial intelligence system.

Extending StarGAN-VC to Unseen Speakers Using RawNet3 Speaker Representation (RawNet3 화자 표현을 활용한 임의의 화자 간 음성 변환을 위한 StarGAN의 확장)

  • Bogyung Park;Somin Park;Hyunki Hong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.7
    • /
    • pp.303-314
    • /
    • 2023
  • Voice conversion, a technology that allows an individual's speech data to be regenerated with the acoustic properties(tone, cadence, gender) of another, has countless applications in education, communication, and entertainment. This paper proposes an approach based on the StarGAN-VC model that generates realistic-sounding speech without requiring parallel utterances. To overcome the constraints of the existing StarGAN-VC model that utilizes one-hot vectors of original and target speaker information, this paper extracts feature vectors of target speakers using a pre-trained version of Rawnet3. This results in a latent space where voice conversion can be performed without direct speaker-to-speaker mappings, enabling an any-to-any structure. In addition to the loss terms used in the original StarGAN-VC model, Wasserstein distance is used as a loss term to ensure that generated voice segments match the acoustic properties of the target voice. Two Time-Scale Update Rule (TTUR) is also used to facilitate stable training. Experimental results show that the proposed method outperforms previous methods, including the StarGAN-VC network on which it was based.

Cavitation signal detection based on time-series signal statistics (시계열 신호 통계량 기반 캐비테이션 신호 탐지)

  • Haesang Yang;Ha-Min Choi;Sock-Kyu Lee;Woojae Seong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.43 no.4
    • /
    • pp.400-405
    • /
    • 2024
  • When cavitation noise occurs in ship propellers, the level of underwater radiated noise abruptly increases, which can be a critical threat factor as it increases the probability of detection, particularly in the case of naval vessels. Therefore, accurately and promptly assessing cavitation signals is crucial for improving the survivability of submarines. Traditionally, techniques for determining cavitation occurrence have mainly relied on assessing acoustic/vibration levels measured by sensors above a certain threshold, or using the Detection of Envelop Modulation On Noise (DEMON) method. However, technologies related to this rely on a physical understanding of cavitation phenomena and subjective criteria based on user experience, involving multiple procedures, thus necessitating the development of techniques for early automatic recognition of cavitation signals. In this paper, we propose an algorithm that automatically detects cavitation occurrence based on simple statistical features reflecting cavitation characteristics extracted from acoustic signals measured by sensors attached to the hull. The performance of the proposed technique is evaluated depending on the number of sensors and model test conditions. It was confirmed that by sufficiently training the characteristics of cavitation reflected in signals measured by a single sensor, the occurrence of cavitation signals can be determined.

Realization a Text Independent Speaker Identification System with Frame Level Likelihood Normalization (프레임레벨유사도정규화를 적용한 문맥독립화자식별시스템의 구현)

  • 김민정;석수영;김광수;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.3 no.1
    • /
    • pp.8-14
    • /
    • 2002
  • In this paper, we realized a real-time text-independent speaker recognition system using gaussian mixture model, and applied frame level likelihood normalization method which shows its effects in verification system. The system has three parts as front-end, training, recognition. In front-end part, cepstral mean normalization and silence removal method were applied to consider speaker's speaking variations. In training, gaussian mixture model was used for speaker's acoustic feature modeling, and maximum likelihood estimation was used for GMM parameter optimization. In recognition, likelihood score was calculated with speaker models and test data at frame level. As test sentences, we used text-independent sentences. ETRI 445 and KLE 452 database were used for training and test, and cepstrum coefficient and regressive coefficient were used as feature parameters. The experiment results show that the frame-level likelihood method's recognition result is higher than conventional method's, independently the number of registered speakers.

  • PDF

HORIZON RUN 4 SIMULATION: COUPLED EVOLUTION OF GALAXIES AND LARGE-SCALE STRUCTURES OF THE UNIVERSE

  • KIM, JUHAN;PARK, CHANGBOM;L'HUILLIER, BENJAMIN;HONG, SUNGWOOK E.
    • Journal of The Korean Astronomical Society
    • /
    • v.48 no.4
    • /
    • pp.213-228
    • /
    • 2015
  • The Horizon Run 4 is a cosmological N-body simulation designed for the study of coupled evolution between galaxies and large-scale structures of the Universe, and for the test of galaxy formation models. Using 63003 gravitating particles in a cubic box of Lbox = 3150 h−1Mpc, we build a dense forest of halo merger trees to trace the halo merger history with a halo mass resolution scale down to Ms = 2.7 × 1011h−1M. We build a set of particle and halo data, which can serve as testbeds for comparison of cosmological models and gravitational theories with observations. We find that the FoF halo mass function shows a substantial deviation from the universal form with tangible redshift evolution of amplitude and shape. At higher redshifts, the amplitude of the mass function is lower, and the functional form is shifted toward larger values of ln(1/σ). We also find that the baryonic acoustic oscillation feature in the two-point correlation function of mock galaxies becomes broader with a peak position moving to smaller scales and the peak amplitude decreasing for increasing directional cosine μ compared to the linear predictions. From the halo merger trees built from halo data at 75 redshifts, we measure the half-mass epoch of halos and find that less massive halos tend to reach half of their current mass at higher redshifts. Simulation outputs including snapshot data, past lightcone space data, and halo merger data are available at http://sdss.kias.re.kr/astro/Horizon-Run4.

Stable Bottom Detection and Optimum Bottom Offset for Echo Integration of Demersal Fish (저서어자원량의 음향추정에 있어서 해저기준과 해저 오프셋의 최소화)

  • 황두진
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.36 no.3
    • /
    • pp.195-201
    • /
    • 2000
  • This paper discusses methods for the stable bottom detection and the optimum bottom offset which enable to separate the fish echoes from the bottom echoes with echo integration of demersal fish. In preprocessing of the echo signal, the bottom detection has to be done stably against the fluctuation of echo level and the bottom offset has to be set to a minimum height such that near bottom fish echoes are included Two methods of bottom detection, namely echo level threshold method and maximum echo slope method were compared and analyzed. The echo level method works well if the ideal threshold level was given but it sometimes misses the bottom because of the fluctuation of the echo. Another method to detect the bottom which uses maximum echo slope indicates the simple and stable bottom detection. In addition, the bottom offset has to be set near to the bottom but not to include the bottom echo. Optimum bottom offset should be set a few samples before the detected bottom echo which relates the beginning of pulse shape and acoustic beam pattern to the bottom feature.

  • PDF