• Title/Summary/Keyword: 음향적 결합

Search Result 209, Processing Time 0.027 seconds

Low Rate Speech Coding Using the Harmonic Coding Combined with CELP Coding (하모닉 코딩과 CELP방법을 이용한 저 전송률 음성 부호화 방법)

  • 김종학;이인성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.26-34
    • /
    • 2000
  • In this paper, we propose a 4kbps speech coder that combines the harmonic vector excitation coding with time-separated transition coding. The harmonic vector excitation coding uses the harmonic excitation coding in the voiced frame and uses the vector excitation coding with the structure of analysis-by-synthesis in the unvoiced frame, respectively. But two mode coding method is not effective for transition frame mixed in voiced and unvoiced signal and a new method beyond using unvoiced/voiced mode coding is needed. Thus, we designed a time-separated transition coding method for transition frame in which a voiced/unvoiced decision algorithm separates unvoiced and voiced duration in a frame, and harmonic-harmonic excitation coding and vector-harmonic excitation coding method is selectively used depending on the previous frame U/V decision. In the decoder, the voiced excitation signals are generated efficiently through the inverse FFT of harmonic magnitudes and the unvoiced excitation signals are made by the inverse vector quantization. The reconstructed speech signal are synthesized by the Overlap/Add method.

  • PDF

Transmission waveform design for compressive sensing active sonar using the matrix projection from Gram matrix to identity matrix and a constraint for bandwidth (대역폭 제한 조건과 Gram 행렬의 단위행렬로의 사영을 이용한 압축센싱 능동소나 송신파형 설계)

  • Lee, Sehyun;Lee, Keunhwa;Lim, Jun-Seok;Cheong, Myoung-Jun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.5
    • /
    • pp.522-533
    • /
    • 2019
  • The compressive sensing model for range-Doppler estimation can be expressed as an under-determined linear system y = Ax. To find the solution of the linear system with the compressive sensing method, matrix A should be sufficiently incoherent and x to be sparse. In this paper, we propose a transmission waveform design method that maintains the bandwidth required by the sonar system while lowering the mutual coherence of the matrix A so that the matrix A is incoherent. The proposed method combines two methods of optimizing the sensing matrix with the alternating projection and suppressing unwanted frequency bands using the DFT (Discrete Fourier Transform) matrix. We compare range-Doppler estimation performance of existing waveform LFM(Linear Frequency Modulated) and designed waveform using the matched filter and the compressive sensing method. Simulation shows that the designed transmission waveform has better detection performance than the existing waveform LFM.

Performance of selective combining according to channel selection decision method of frequency diversity in underwater frequency selective channel (수중 주파수 선택적 채널에서 주파수 다이버시티의 채널 선택 판정법에 따른 선택 합성법의 성능)

  • Lee, Chaehui;Jeong, Hyunsoo;Park, Kyu-Chil;Park, Jihyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.3
    • /
    • pp.436-442
    • /
    • 2022
  • In this paper, the performance of the selective combining according to the channel selection decision method of frequency diversity is evaluated in the underwater frequency selective channel. The underwater acoustic channel in the shallow sea has a complex multipath characteristic by combining various environmental factors such as boundary surface reflection and sound wave refraction according to the water temperature layer. In particular, frequency selectivity due to multipath causes energy fluctuation in a communication channel, which reduces SNR (Signal to Noise Ratio) and deteriorates communication performance. In this paper, we applied the frequency diversity technique using multiple channels to secure the communication performance according to the frequency selectivity by multipath. For each channel, 4-FSK (Frequency Shift Keying) and selective combining were applied, the performance was evaluated by applying the maximum value, average value, and majority decision of the signal in order to decide the demodulation channel selection of the selective combining.

Electroencephalogram-based emotional stress recognition according to audiovisual stimulation using spatial frequency convolutional gated transformer (공간 주파수 합성곱 게이트 트랜스포머를 이용한 시청각 자극에 따른 뇌전도 기반 감정적 스트레스 인식)

  • Kim, Hyoung-Gook;Jeong, Dong-Ki;Kim, Jin Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.5
    • /
    • pp.518-524
    • /
    • 2022
  • In this paper, we propose a method for combining convolutional neural networks and attention mechanism to improve the recognition performance of emotional stress from Electroencephalogram (EGG) signals. In the proposed method, EEG signals are decomposed into five frequency domains, and spatial information of EEG features is obtained by applying a convolutional neural network layer to each frequency domain. As a next step, salient frequency information is learned in each frequency band using a gate transformer-based attention mechanism, and complementary frequency information is further learned through inter-frequency mapping to reflect it in the final attention representation. Through an EEG stress recognition experiment involving a DEAP dataset and six subjects, we show that the proposed method is effective in improving EEG-based stress recognition performance compared to the existing methods.

Performance comparison evaluation of speech enhancement using various loss functions (다양한 손실 함수를 이용한 음성 향상 성능 비교 평가)

  • Hwang, Seo-Rim;Byun, Joon;Park, Young-Cheol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.2
    • /
    • pp.176-182
    • /
    • 2021
  • This paper evaluates and compares the performance of the Deep Nerual Network (DNN)-based speech enhancement models according to various loss functions. We used a complex network that can consider the phase information of speech as a baseline model. As the loss function, we consider two types of basic loss functions; the Mean Squared Error (MSE) and the Scale-Invariant Source-to-Noise Ratio (SI-SNR), and two types of perceptual-based loss functions, including the Perceptual Metric for Speech Quality Evaluation (PMSQE) and the Log Mel Spectra (LMS). The performance comparison was performed through objective evaluation and listening tests with outputs obtained using various combinations of the loss functions. Test results show that when a perceptual-based loss function was combined with MSE or SI-SNR, the overall performance is improved, and the perceptual-based loss functions, even exhibiting lower objective scores showed better performance in the listening test.

Automatic Indexing Algorithm of Golf Video Using Audio Information (오디오 정보를 이용한 골프 동영상 자동 색인 알고리즘)

  • Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.5
    • /
    • pp.441-446
    • /
    • 2009
  • This paper proposes an automatic indexing algorithm of golf video using audio information. In the proposed algorithm, the input audio stream is demultiplexed into the stream of video and audio. By means of Adaboost-cascade classifier, the continuous audio stream is classified into announcer's speech segment recorded in studio, music segment accompanied with players' names on TV screen, reaction segment of audience according to the play, reporter's speech segment with field background, filed noise segment like wind or waves. And golf swing sound including drive shot, iron shot, and putting shot is detected by the method of impulse onset detection and modulation spectrum verification. The detected swing and applause are used effectively to index action or highlight unit. Compared with video based semantic analysis, main advantage of the proposed system is its small computation requirement so that it facilitates to apply the technology to embedded consumer electronic devices for fast browsing.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

Voice Activity Detection using Motion and Variation of Intensity in The Mouth Region (입술 영역의 움직임과 밝기 변화를 이용한 음성구간 검출 알고리즘 개발)

  • Kim, Gi-Bak;Ryu, Je-Woong;Cho, Nam-Ik
    • Journal of Broadcast Engineering
    • /
    • v.17 no.3
    • /
    • pp.519-528
    • /
    • 2012
  • Voice activity detection (VAD) is generally conducted by extracting features from the acoustic signal and a decision rule. The performance of such VAD algorithms driven by the input acoustic signal highly depends on the acoustic noise. When video signals are available as well, the performance of VAD can be enhanced by using the visual information which is not affected by the acoustic noise. Previous visual VAD algorithms usually use single visual feature to detect the lip activity, such as active appearance models, optical flow or intensity variation. Based on the analysis of the weakness of each feature, we propose to combine intensity change measure and the optical flow in the mouth region, which can compensate for each other's weakness. In order to minimize the computational complexity, we develop simple measures that avoid statistical estimation or modeling. Specifically, the optical flow is the averaged motion vector of some grid regions and the intensity variation is detected by simple thresholding. To extract the mouth region, we propose a simple algorithm which first detects two eyes and uses the profile of intensity to detect the center of mouth. Experiments show that the proposed combination of two simple measures show higher detection rates for the given false positive rate than the methods that use a single feature.

Turbulent-Induced Noise around a Circular Cylinder using Permeable FW-H Method (Permeable FW-H 방법을 이용한 원형 실린더 주변의 난류유동소음해석)

  • Choi, Woen-Sug;Hong, Suk-Yoon;Song, Jee-Hun;Kwon, Hyun-Wung;Jung, Chul-Min
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.20 no.6
    • /
    • pp.752-759
    • /
    • 2014
  • Varieties of research on turbulent-induced noise is conducted with combinations of acoustic analogy methods and computational fluid dynamic methods to analyze efficiently and accurately. Application of FW-H acoustic analogy without turbulent noise is the most popular method due to its calculation cost. In this paper, turbulent-induced noise is predicted using RANS turbulence model and permeable FW-H method. For simplicity, noise from 2D cylinder is examined using three different methods, direct method of RANS, FW-H method without turbulent noise and permeable FW-H method which can take into account of turbulent-induced noise. Turbulent noise was well predicted using permeable FW-H method with same computational cost of original FW-H method. Also, ability of permeable FW-H method to predict highly accurate turbulent-induced noise by applying adequate permeable surface is presented. The procedure to predict turbulent-induced noise using permeable FW-H is established and its usability is shown.

Sound Attenuation Coefficients and Biogenic Gas Content in the Offshore Surficial Sediments Around the Korean Peninsula (韓半島 周邊海域 海底 表層蓄積物 音波 空曠係數와 생物起源 氣滯含量)

  • 김한준;덕봉철
    • 한국해양학회지
    • /
    • v.25 no.1
    • /
    • pp.26-35
    • /
    • 1990
  • Sound velocities and attenuation coefficients of marine surface sediments were calculated from insitu acoustic experiments on 4 nearshore areas off Pohang, Pusan Yeosu, and Kunsan around the Korean Peninsula. The relationship between these values and physical properties of sediments was examined and attenuation mechanism was analysed using the estimated gas content. Sound velocities and attenuation coefficients ranging from 1470 to 1616 m/sec and 0.0565 to 0.6604 dB/kHz-m, respectively, are well related to sediment types. The attenuation coefficient is maximum in coarse silts, and the sound velocity increases with density. The gas content estimated less than 8 ppm increases with the decreasing sediment grain size. When the sediment size is greater than fine sand, sound attenuation is mostly due to friction losses, and probably negligible viscous loss remains unchanged with the varying physical properties of sediments. The maximum attenuation in coarse silts result from both friction loss and cohesion of finer sediments between the contacts of silt grains. The cohesion begins to be the dominant dissipative process with decreasing grain size from medium and fine silts.

  • PDF