• Title/Summary/Keyword: Overlap and Add

Search Result 49, Processing Time 0.027 seconds

Low Rate Speech Coding Using the Harmonic Coding Combined with CELP Coding (하모닉 코딩과 CELP방법을 이용한 저 전송률 음성 부호화 방법)

  • 김종학;이인성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.26-34
    • /
    • 2000
  • In this paper, we propose a 4kbps speech coder that combines the harmonic vector excitation coding with time-separated transition coding. The harmonic vector excitation coding uses the harmonic excitation coding in the voiced frame and uses the vector excitation coding with the structure of analysis-by-synthesis in the unvoiced frame, respectively. But two mode coding method is not effective for transition frame mixed in voiced and unvoiced signal and a new method beyond using unvoiced/voiced mode coding is needed. Thus, we designed a time-separated transition coding method for transition frame in which a voiced/unvoiced decision algorithm separates unvoiced and voiced duration in a frame, and harmonic-harmonic excitation coding and vector-harmonic excitation coding method is selectively used depending on the previous frame U/V decision. In the decoder, the voiced excitation signals are generated efficiently through the inverse FFT of harmonic magnitudes and the unvoiced excitation signals are made by the inverse vector quantization. The reconstructed speech signal are synthesized by the Overlap/Add method.

  • PDF

Time Domain Multiple-channel Signal Processing Method for Converting the Variable Frequency Band (가변 주파수 변환을 위한 시간 영역 다중채널 신호처리 알고리즘)

  • Yoo, Jae-Ho;Kim, Hyeon-Su;Lee, Kyu-Ha;Lee, Jung-Sub;Chung, Jae-Hak
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.1A
    • /
    • pp.71-79
    • /
    • 2010
  • The algorithm of multiple channel signal processing requires the flexibility of variable frequency band, efficient allocation of transmission power, and flexible frequency band reallocation to satisfy various service types which requires different transmission rates and frequency band. This paper proposes an improved multiple channel signal processing for converting the frequency band of multiple carrier signals efficiently using a window function and DFT in the time domain. In contrast to the previous algorithm of multiple-channel signal processing performing band-pass signal processing in the frequency domain, the proposed algorithm is a method of block signal processing using a window function in the time domain. In addition, the complexity of proposed algorithm of the window function is lower than that of the previous algorithm performing signal processing in the frequency domain, and it performs the frequency band transform efficiently. The computer simulation result shows that the perfect reconstruction of output signal and the flexible frequency band reallocation is performed efficiently by the proposed algorithm.

A Fast Normalized Cross-Correlation Computation for WSOLA-based Speech Time-Scale Modification (WSOLA 기반의 음성 시간축 변환을 위한 고속의 정규상호상관도 계산)

  • Lim, Sangjun;Kim, Hyung Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.31 no.7
    • /
    • pp.427-434
    • /
    • 2012
  • The overlap-add technique based on waveform similarity (WSOLA) method is known to be an efficient high-quality algorithm for time scaling of speech signal. The computational load of WSOLA is concentrated on the repeated normalized cross-correlation (NCC) calculation to evaluate the similarity between two signal waveforms. To reduce the computational complexity of WSOLA, this paper proposes a fast NCC computation method, in which NCC is obtained through pre-calculated sum tables to eliminate redundancy of repeated NCC calculations in the adjacent regions. While the denominator part of NCC has much redundancy irrespective of the time-scale factor, the numerator part of NCC has less redundancy and the amount of redundancy is dependent on both the time-scale factor and optimal shift value, thereby requiring more sophisticated algorithm for fast computation. The simulation results show that the proposed method reduces about 40%, 47% and 52% of the WSOLA execution time for the time-scale compression, 2 and 3 times time-scale expansions, respectively, while maintaining exactly the same speech quality of the conventional WSOLA.

A modified U-net for crack segmentation by Self-Attention-Self-Adaption neuron and random elastic deformation

  • Zhao, Jin;Hu, Fangqiao;Qiao, Weidong;Zhai, Weida;Xu, Yang;Bao, Yuequan;Li, Hui
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.1-16
    • /
    • 2022
  • Despite recent breakthroughs in deep learning and computer vision fields, the pixel-wise identification of tiny objects in high-resolution images with complex disturbances remains challenging. This study proposes a modified U-net for tiny crack segmentation in real-world steel-box-girder bridges. The modified U-net adopts the common U-net framework and a novel Self-Attention-Self-Adaption (SASA) neuron as the fundamental computing element. The Self-Attention module applies softmax and gate operations to obtain the attention vector. It enables the neuron to focus on the most significant receptive fields when processing large-scale feature maps. The Self-Adaption module consists of a multiplayer perceptron subnet and achieves deeper feature extraction inside a single neuron. For data augmentation, a grid-based crack random elastic deformation (CRED) algorithm is designed to enrich the diversities and irregular shapes of distributed cracks. Grid-based uniform control nodes are first set on both input images and binary labels, random offsets are then employed on these control nodes, and bilinear interpolation is performed for the rest pixels. The proposed SASA neuron and CRED algorithm are simultaneously deployed to train the modified U-net. 200 raw images with a high resolution of 4928 × 3264 are collected, 160 for training and the rest 40 for the test. 512 × 512 patches are generated from the original images by a sliding window with an overlap of 256 as inputs. Results show that the average IoU between the recognized and ground-truth cracks reaches 0.409, which is 29.8% higher than the regular U-net. A five-fold cross-validation study is performed to verify that the proposed method is robust to different training and test images. Ablation experiments further demonstrate the effectiveness of the proposed SASA neuron and CRED algorithm. Promotions of the average IoU individually utilizing the SASA and CRED module add up to the final promotion of the full model, indicating that the SASA and CRED modules contribute to the different stages of model and data in the training process.

The Current Practices and Teacher's Perceptions of Highschool Home Economics Education -Focusing on Busan, Ulsan and Kyoungnam Area- (고등학교 가정과학의 운영실태 및 교과에 대한 담당교사들의 인식 -부산시, 울산시, 경남지역 일반계 고등학교 가정과학 담당교사를 대상으로-)

  • Kim Sang-Hee
    • Journal of Korean Home Economics Education Association
    • /
    • v.17 no.2
    • /
    • pp.61-77
    • /
    • 2005
  • This study focuses on the current practices and teacher's perceptions of highschool Home Economics Education in Busan, Ulsan and Kyungnam area. Data were collected from 70 teachers with the questionnaire by mail. The results were following : 1. Home Economics have been teamed at schools of class rooms mere than 31, women's high schools and public schools. Teachers more than $70\%$ have operated and lessoned with 4 or 5 among 5 sectors of subject matters, especially emphasized family${\cdot}$human development and food${\cdot}$nutrition sectors. The most difficulties were the shortages of student's interest and reference books. 2. Teachers have cognized highly the connection of Technology${\cdot}$Home Economics and Home Economics, but evaluated lowly the job-course education among the H.E's goals. 3. Necessities of HE were evaluated highly, but student's interests by teacher's judgement lowly. Manual theses of clothing${\cdot}$textile and housing sectors have needs to alter partly. 4. Subject matters to add for the near future revision were related to dissatisfaction of H.E's matters. Teachers cognized problems about the excess, overlap, old-fashioned and irrelevant deepen levels of subject matters.

  • PDF

Extensibility of Visual Expression in Projection Mapping Installation Art; Focused on Examples and Projection Mapping Installation Artwork Domino (프로젝션맵핑 기반 영상 설치 미술의 시각적 표현 확장성 -사례 분석 및 작품 을 중심으로-)

  • Fang, Bin-Zhou;Lim, Young-Hoon;Paik, Joon-Ki
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.207-220
    • /
    • 2021
  • Recent advances in new media for sensory experiences keep expanding visual expression methods in installation art such as projection mapping and virtual reality. Artists can create and develop visual expression techniques based on such new media. Projection mapping is a new medium that continues to add various possibilities to visual expression in media art. Under the projection mapping environment, artists can recompose the object or space with the digital content by projecting video onto three-dimensional surfaces in the space. This paper focuses on the process where visual expression with the projection mapping technology leads to viewers' sensory experience. To this end, "reproducibility," "dissemination," "virtuality," and "interactivity" of media were analyzed to describe the meaning and *definition of visual expression. Artworks are considered as an example to study visual expression techniques such as "repetition and overlap," "simulacrum and metaphor," and "displacement and conversion." I applied the analysis and created Domino, a projection mapping artwork, which helps the research on visual expression techniques that can lead to sensory experience the extensibility of visual expression.

Real-time Implementation of Variable Transmission Bit Rate Vocoder Integrating G.729A Vocoder and Reduction of the Computational Amount SOLA-B Algorithm Using the TMS320C5416 (TMS320C5416을 이용한 G.729A 보코더와 계산량 감소된 SOLA-B 알고리즘을 통합한 가변 전송율 보코더의 실시간 구현)

  • 함명규;배명진
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.40 no.6
    • /
    • pp.84-89
    • /
    • 2003
  • In this paper, we real-time implemented to the TMS320C5416 the vocoder of variable bit rate applied the SOLA-B algorithm by Henja to the ITU-T G.729A vocoder of 8kbps transmission rate. This proposed method using the SOLA-B algorithm is that it is reduced the duration of the speech in encoding and is played at the speed of normal by extending the duration of the speech in decoding. At this time, we bandied that the interval of cross correlation function if skipped every 3 sample for decreasing the computational amount of SOLA-B algorithm. The real-time implemented vocoder of C.729A and SOLA-B algorithm is represented the complexity of maximum that is 10.2MIPS in encoder and 2.8MIPS in decoder of 8kbps transmission rate. Also, it is represented the complexity of maximum that is 18.5MIPS in encoder and 13.1MIPS in decoder of 6kbps, it is 18.5MIPS in encoder and 13.1MIPS in decoder of 4kbps. The used memory is about program ROM 9.7kwords, table ROM 4.5kwords, RAM 5.1 kwords. The waveform of output is showed by the result of C simulator and Bit Exact. Also, for evaluation of speech quality of the vocoder of real-time implemented variable bit rate, it is estimated the MOS score of 3.69 in 4kbps.

A study on end-to-end speaker diarization system using single-label classification (단일 레이블 분류를 이용한 종단 간 화자 분할 시스템 성능 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.536-543
    • /
    • 2023
  • Speaker diarization, which labels for "who spoken when?" in speech with multiple speakers, has been studied on a deep neural network-based end-to-end method for labeling on speech overlap and optimization of speaker diarization models. Most deep neural network-based end-to-end speaker diarization systems perform multi-label classification problem that predicts the labels of all speakers spoken in each frame of speech. However, the performance of the multi-label-based model varies greatly depending on what the threshold is set to. In this paper, it is studied a speaker diarization system using single-label classification so that speaker diarization can be performed without thresholds. The proposed model estimate labels from the output of the model by converting speaker labels into a single label. To consider speaker label permutations in the training, the proposed model is used a combination of Permutation Invariant Training (PIT) loss and cross-entropy loss. In addition, how to add the residual connection structures to model is studied for effective learning of speaker diarization models with deep structures. The experiment used the Librispech database to generate and use simulated noise data for two speakers. When compared with the proposed method and baseline model using the Diarization Error Rate (DER) performance the proposed method can be labeling without threshold, and it has improved performance by about 20.7 %.

A Study about the Users's Preferred Playing Speeds on Categorized Video Content using WSOLA method (WSOLA를 이용한 동영상 미세배속 재생 서비스에 대한 콘텐츠별 배속 선호도 분석 연구)

  • Kim, I-Gil
    • Journal of Digital Contents Society
    • /
    • v.16 no.2
    • /
    • pp.291-298
    • /
    • 2015
  • In a fast-paced information technology environment, consumption of video content is changing from one-way television viewing to VOD (Video on Demand) playing anywhere, anytime, on any device. This video-watching trend gives additional importance to videos with fine-speed-control, in addition to the strength of the digital video signal. Currently, many video players provide a fine-speed-control function which can speed up the video to skip a boring part, or slow it down to focus on an exciting scene. The audio information is just as important as the visual information for understanding the content of the speed-controlled video. Thus, a number of algorithms for fine-speed-control video-playing technologies have been proposed to solve the pitch distortion in the audio-processing area. In this study, well-known techniques for prosodic modification of speech signals, WSOLA (Waveform-Similarity-Based Overlap-Add), have been applied to analyze users' needs for fine-speed-control video playing. By surveying the users' preferred speeds on categorized video content and analyzing the results, this paper proposes that various fine-speed adjustments are needed to accommodate users' preferred video consumption.