• Title/Summary/Keyword: voice source model

Search Result 49, Processing Time 0.022 seconds

Selective Low-Pass Filtering Method on Estimation of Voice Source Parameters (음원변수 추출에서 선택적 저역통과필터링)

  • 엄기완
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.238-241
    • /
    • 1998
  • 성문파 신호로부터 음원변수들을 추출하는 방법과 그 전 단계에서 역 필터링 방법에 의해 구한 미분성문파 신호로부터 고주파 잡음을 제거하기 위해 음원구간에 따라 필터의 대역폭을 달리함으로서 음원변수 추출과정에서 저역통과 필터에 의해 발생할 수 있는 오차를 최소화하기 위한 선택적 저역통과 필터링 방법을 제안한다. 이 방법은 음원모델중 하나인 LF-model 펄스를 합성하여 필터링 함으로서 그 성능을 비교, 평가하였다.

  • PDF

A Feedback Control Model for ABR Traffic with Long Delays (긴 지연시간을 갖는 ABR 트래픽에 대한 피드백제어 모델)

  • O, Chang-Yun;Bae, Sang-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1211-1216
    • /
    • 2000
  • Asynchronous transfer mode (ATM) can be efficiently used to transport packet data services. The switching system will support voice and packet data services simultaneously from end to end applications. To guarantee quality of service (QoS) of the offered services, source rateot send packet data is needed to control the network overload condition. Most existing control algorithms are shown to provide the threshold-based feedback control technique. However, real-time voice calls can be dynamically connected and released during data services in the network. If the feedback control information delays, quality of the serviced voice can be degraded due to a time delay between source and destination in the high speed link. An adaptive algorithm based on the optimal least mean square error technique is presented for the predictive feedback control technique. The algorithm attempts to predict a future buffer size from weight (slope) adaptation of unknown functions, which are used fro feedback control. Simulation results are presented, which show the effectiveness of the algorithm.

  • PDF

A study imitating human auditory system for tracking the position of sound source (인간의 청각 시스템을 응용한 음원위치 추정에 관한 연구)

  • Bae, Jeen-Man;Cho, Sun-Ho;Park, Chong-Kuk
    • Proceedings of the KIEE Conference
    • /
    • 2003.11c
    • /
    • pp.878-881
    • /
    • 2003
  • To acquire an appointed speaker's clear voice signal from inspect-camera, picture-conference or hands free microphone eliminating interference noises needs to be preceded speaker's position automatically. Presumption of sound source position's basic algorithm is about measuring TDOA(Time Difference Of Arrival) from reaching same signals between two microphones. This main project uses ADF(Adaptive Delay Filter) [4] and CPS(Cross Power Spectrum) [5] which are one of the most important analysis of TDOA. From these analysis this project proposes presumption of real time sound source position and improved model NI-ADF which makes possible to presume both directions of sound source position. NI-ADF noticed that if auditory sense of humankind reaches above to some specified level in specified frequency, it will accept sound through activated nerve. NI-ADF also proposes practicable algorithm, the presumption of real time sound source position including both directions, that when microphone loads to some specified system, it will use sounds level difference from external system related to sounds of diffraction phenomenon. In accordance with the project, when existing both direction adaptation filter's algorithm measures sound source, it increases more than twice number by measuring one way. Preserving this weak point, this project proposes improved algorithm to presume real time in both directions.

  • PDF

Noise Elimination Using Improved MFCC and Gaussian Noise Deviation Estimation

  • Sang-Yeob, Oh
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.1
    • /
    • pp.87-92
    • /
    • 2023
  • With the continuous development of the speech recognition system, the recognition rate for speech has developed rapidly, but it has a disadvantage in that it cannot accurately recognize the voice due to the noise generated by mixing various voices with the noise in the use environment. In order to increase the vocabulary recognition rate when processing speech with environmental noise, noise must be removed. Even in the existing HMM, CHMM, GMM, and DNN applied with AI models, unexpected noise occurs or quantization noise is basically added to the digital signal. When this happens, the source signal is altered or corrupted, which lowers the recognition rate. To solve this problem, each voice In order to efficiently extract the features of the speech signal for the frame, the MFCC was improved and processed. To remove the noise from the speech signal, the noise removal method using the Gaussian model applied noise deviation estimation was improved and applied. The performance evaluation of the proposed model was processed using a cross-correlation coefficient to evaluate the accuracy of speech. As a result of evaluating the recognition rate of the proposed method, it was confirmed that the difference in the average value of the correlation coefficient was improved by 0.53 dB.

Influence of User Innovativeness and Knowledge Base on Acceptance of Voice Shopping (사용자의 혁신성 및 지식수준이 가상비서 기반 음성쇼핑의 이용에 미치는 영향)

  • Jo, Woong;Ahn, Suho;Chung, Doohee
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.15 no.2
    • /
    • pp.153-169
    • /
    • 2020
  • A new way of shopping based on virtual assistant, so called voice shopping, is drawing attention. The voice shopping market is growing around the world, and Korea is on the verge of full-scale commercialization of this new shopping. For the development of voice shopping-related industries, it is necessary to research on specific issues related to this new shopping methods, such as the quality of services, efficient processes tailored to new ways, and ways to build customer relationships. As part of such an attempt, the study seeks to determine the factors that affect consumers' perception and attitudes toward voice shopping. The study conducted the analysis based on survey response data of 171 online shopping users. In addition to the typical factors of the technology acceptability model(TAM) such as perceived usefulness and ease of use, the impact of perceived playfulness was included for analyzing the intention on the acceptance of voice shopping. In particular, this study focuses on the impact of user attributes. For the spread of voice shopping, it is necessary to set up a valid target customer and understand users for establishing an effective customer relationship. Therefore, this study tries to analyze how the perceptions on the voice shopping(perceived usefulness, ease of use, and perceived playfulness) are affected by users' attributes, such as user innovativeness and user knowledge level. The result of analysis shows that user innovativeness have a positive relationship with all of perceived usefulness, ease of use, and perceived playfulness. The user knowledge base, however, was not significant to all these three variables. The user knowledge base is shown to have a positive effect on user innovativeness which is the source of positively significant factor for the variable of the perceptions on the voice shopping. Meanwhile, among the variables of extended technology acceptance model, perceived usefulness and perceived playfulness have positive effects on the acceptance of voice shopping, while ease of use has no significant impact on the voice shopping acceptance. Ease of use has a positive relationship with perceived usefulness and playfulness. This study is meaningful in providing implications on the development of voice shopping platforms and related services, and establishment of customer relationship.

I-vector similarity based speech segmentation for interested speaker to speaker diarization system (화자 구분 시스템의 관심 화자 추출을 위한 i-vector 유사도 기반의 음성 분할 기법)

  • Bae, Ara;Yoon, Ki-mu;Jung, Jaehee;Chung, Bokyung;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.461-467
    • /
    • 2020
  • In noisy and multi-speaker environments, the performance of speech recognition is unavoidably lower than in a clean environment. To improve speech recognition, in this paper, the signal of the speaker of interest is extracted from the mixed speech signals with multiple speakers. The VoiceFilter model is used to effectively separate overlapped speech signals. In this work, clustering by Probabilistic Linear Discriminant Analysis (PLDA) similarity score was employed to detect the speech signal of the interested speaker, which is used as the reference speaker to VoiceFilter-based separation. Therefore, by utilizing the speaker feature extracted from the detected speech by the proposed clustering method, this paper propose a speaker diarization system using only the mixed speech without an explicit reference speaker signal. We use phone-dataset consisting of two speakers to evaluate the performance of the speaker diarization system. Source to Distortion Ratio (SDR) of the operator (Rx) speech and customer speech (Tx) are 5.22 dB and -5.22 dB respectively before separation, and the results of the proposed separation system show 11.26 dB and 8.53 dB respectively.

A Phonetic Study of 'Sasang Constitution' (음성학적으로 본 사상체질)

  • Moon, Seung-Jae;Tak, Ji-Hyun;Hwang, Hye-Jeong
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.63-66
    • /
    • 2005
  • Sasang Constitution, one branch of oriental medicine, claims that people can be classified into four different 'constitutions:' Taeyang, Taeum, Soyang, and Soeum. This study investigates whether the classification of the 'constitutions' could be accurately made solely based on people's voice by analyzing the data from 46 different voices whose constitutions were already determined. Seven source-related parameters and four filter-related parameters were phonetically analyzed and the GMM(gaussian mixture model) was tried with the data. Both the results from phonetic analyses and GMM showed that all the parameters (except one)failed to distinguish the constitutions of the people successfully. And even the single exception, the bandwidth of F2, did not provide us with sufficient reasons to be the source of distinction. This result seems to suggest one of the two conclusions: either the Sasang Constitutions cannot be substantiated with phonetic characteristics of peoples' voices with reliable accuracy, or we need to find yet some other parameters which haven't been conventionally proposed.

  • PDF

A study on deep neural speech enhancement in drone noise environment (드론 소음 환경에서 심층 신경망 기반 음성 향상 기법 적용에 관한 연구)

  • Kim, Jimin;Jung, Jaehee;Yeo, Chaneun;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.342-350
    • /
    • 2022
  • In this paper, actual drone noise samples are collected for speech processing in disaster environments to build noise-corrupted speech database, and speech enhancement performance is evaluated by applying spectrum subtraction and mask-based speech enhancement techniques. To improve the performance of VoiceFilter (VF), an existing deep neural network-based speech enhancement model, we apply the Self-Attention operation and use the estimated noise information as input to the Attention model. Compared to existing VF model techniques, the experimental results show 3.77%, 1.66% and 0.32% improvements for Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligence (STOI), respectively. When trained with a 75% mix of speech data with drone sounds collected from the Internet, the relative performance drop rates for SDR, PESQ, and STOI are 3.18%, 2.79% and 0.96%, respectively, compared to using only actual drone noise. This confirms that data similar to real data can be collected and effectively used for model training for speech enhancement in environments where real data is difficult to obtain.

Design of Gesture based Interfaces for Controlling GUI Applications (GUI 어플리케이션 제어를 위한 제스처 인터페이스 모델 설계)

  • Park, Ki-Chang;Seo, Seong-Chae;Jeong, Seung-Moon;Kang, Im-Cheol;Kim, Byung-Gi
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.1
    • /
    • pp.55-63
    • /
    • 2013
  • NUI(Natural User Interfaces) has been developed through CLI(Command Line Interfaces) and GUI(Graphical User Interfaces). NUI uses many different input modalities, including multi-touch, motion tracking, voice and stylus. In order to adopt NUI to legacy GUI applications, he/she must add device libraries, modify relevant source code and debug it. In this paper, we propose a gesture-based interface model that can be applied without modification of the existing event-based GUI applications and also present the XML schema for the specification of the model proposed. This paper shows a method of using the proposed model through a prototype.

A study on the characterization and traffic modeling of MPEG video sources (MPEG 비디오 소스의 특성화 및 트래픽 모델링에 관한 연구)

  • Jeon, Yong-Hee;Park, Jung-Sook
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.11
    • /
    • pp.2954-2972
    • /
    • 1998
  • It is expected that the transport of compressed video will become a significant part of total network traffic because of the widespread introduction of multimedial services such as VOD(video on demand). Accordingly, VBR(variable bit-rate) encoded video will be widely used, due to its advantages in statistical multiplexing gain and consistent vido quality. Since the transport of video traffic requires larger bandwidth than that of voice and data, the characterization of video source and traffic modeling is very important for the design of proper resource allocation scheme in ATM networks. Suitable statistical source models are also required to analyze performance metrics such as packet loss, delay and jitter. In this paper, we analyzed and described on the characterization and traffic modeling of MPEG video sources. The models are broadly classified into two categories; i.e., statistical models and deterministic models. In statistical models, the models are categorized into five groups: AR(autoregressive), Markov, composite Marko and AR, TES, and selfsimilar models. In deterministic models, the models are categorized into $({\sigma},\;{\rho}$, parameterized model, D-BIND, and Empirical Envelopes models. Each model was analyzed for its characteristics along with corresponding advantages and shortcomings, and we made comparisons on the complexity of each model.

  • PDF