• Title/Summary/Keyword: 화자검증

Search Result 62, Processing Time 0.032 seconds

Acoustic Echo Cancellation Based on Convolutive Blind Signal Separation Method (Convolutive 암묵신호분리방법에 기반한 음향반향 제거)

  • Lee, Haeng-Woo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.5
    • /
    • pp.979-986
    • /
    • 2018
  • This paper deals with acoustic echo cancellation using blind signal separation method. This method does not degrade the echo cancellation performance even during double-talk. In the closed echo environment, the mixing model of acoustic signals is multi-channel, so the convolutive blind signal separation method is applied and the mixing coefficients are calculated by using the feedback model without directly calculating the separation coefficients for signal separation. The coefficient update is performed by iterative calculations based on the second-order statistical properties, thus estimates the near-end speech. A number of simulations have been performed to verify the performance of the proposed blind signal separation method. The simulation results show that the acoustic echo canceller using this method operates safely regardless of the presence of double-talk, and the PESQ is improved by 0.6 point compared with the general adaptive FIR filter structure.

Wide Coverage Microphone System for Lecture Using Ceiling-Mounted Array Structure (천정형 배열 마이크를 이용한 강의용 광역 마이크 시스템)

  • Oh, Woojin
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.4
    • /
    • pp.624-633
    • /
    • 2018
  • While the multimedia lecture system has been getting smart using immerging technology, the microphone still relies on the classical approach such as holding in hand or attaching on the body. In this paper, we propose a ceiling mounted array microphone system that allows a wide reception coverage and instructors to move freely without attaching microphone. The proposed system adopts cell and handover of mobile communication instead of a complicated beamforming method and implements a wide range microphone over several cells with low cost. Since the characteristics of unvoiced speech is similar to Pseudo Noise it is shown that soft handover are possible with 3 microphones connected to delay-sum multipath receiver. The proposed system is tested in $6.3{\times}1.5m$ area. For real-time processing the correlation range can be reduced by 82% or more, and the output latency delay can be improved by using the delay adaptive filter.

Online blind source separation and dereverberation of speech based on a joint diagonalizability constraint (공동 행렬대각화 조건 기반 온라인 음원 신호 분리 및 잔향제거)

  • Yu, Ho-Gun;Kim, Do-Hui;Song, Min-Hwan;Park, Hyung-Min
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.5
    • /
    • pp.503-514
    • /
    • 2021
  • Reverberation in speech signals tends to significantly degrade the performance of the Blind Source Separation (BSS) system. Especially in online systems, the performance degradation becomes severe. Methods based on joint diagonalizability constraints have been recently developed to tackle the problem. To improve the quality of separated speech, in this paper, we add the proposed de-reverberation method to the online BSS algorithm based on the constraints in reverberant environments. Through experiments on the WSJCAM0 corpus, the proposed method was compared with the existing online BSS algorithm. The performance evaluation by the Signal-to-Distortion Ratio and the Perceptual Evaluation of Speech Quality demonstrated that SDR improved from 1.23 dB to 3.76 dB and PESQ improved from 1.15 to 2.12 on average.

The usefulness of the depth images in image-based speech synthesis (영상 기반 음성합성에서 심도 영상의 유용성)

  • Ki-Seung Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.1
    • /
    • pp.67-74
    • /
    • 2023
  • The images acquired from the speaker's mouth region revealed the unique patterns according to the corresponding voices. By using this principle, the several methods were proposed in which speech signals were recognized or synthesized from the images acquired at the speaker's lower face. In this study, an image-based speech synthesis method was proposed in which the depth images were cooperatively used. Since depth images yielded depth information that cannot be acquired from optical image, it can be used for the purpose of supplementing flat optical images. In this paper, the usefulness of depth images from the perspective of speech synthesis was evaluated. The validation experiment was carried out on 60 Korean isolated words, it was confirmed that the performance in terms of both subjective and objective evaluation was comparable to the optical image-based method. When the two images were used in combination, performance improvements were observed compared with when each image was used alone.

Development and Application of the Mode Choice Models According to Zone Sizes (분석대상 규모에 따른 수단분담모형의 추정과 적용에 관한 연구)

  • Kim, Ju-Yeong;Lee, Seung-Jae;Kim, Do-Gyeong;Jeon, Jang-U
    • Journal of Korean Society of Transportation
    • /
    • v.29 no.6
    • /
    • pp.97-106
    • /
    • 2011
  • Mode choice model is an essential element for estimating- the demand of new means of transportation in the planning stage as well as in the establishment phase. In general, current demand analysis model developed for the mode choice analysis applies common parameters of utility function in each region which causes inaccuracy in forecasting mode choice behavior. Several critical problems from using common parameters are: a common parameter set can not reflect different distribution of coefficient for travel time and travel cost by different population. Consequently, the resulting model fails to accurately explain policy variables such as travel time and travel cost. In particular, the nonlinear logit model applied to aggregation data is vulnerable to the aggregation error. The purpose of this paper is to consider the regional characteristics by adopting the parameters fitted to each area, so as to reduce prediction errors and enhance accuracy of the resulting mode choice model. In order to estimate parameter of each area, this study used Household Travel Survey Data of Metropolitan Transportation Authority. For the verification of the model, the value of time by marginal rate of substitution is evaluated and statistical test for resulting coefficients is also carried out. In order to crosscheck the applicability and reliability of the model, changes in mode choice are analyzed when Seoul subway line 9 is newly opened and the results are compared with those from the existing model developed without considering the regional characteristics.

Automatic speech recognition using acoustic doppler signal (초음파 도플러를 이용한 음성 인식)

  • Lee, Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.35 no.1
    • /
    • pp.74-82
    • /
    • 2016
  • In this paper, a new automatic speech recognition (ASR) was proposed where ultrasonic doppler signals were used, instead of conventional speech signals. The proposed method has the advantages over the conventional speech/non-speech-based ASR including robustness against acoustic noises and user comfortability associated with usage of the non-contact sensor. In the method proposed herein, 40 kHz ultrasonic signal was radiated toward to the mouth and the reflected ultrasonic signals were then received. Frequency shift caused by the doppler effects was used to implement ASR. The proposed method employed multi-channel ultrasonic signals acquired from the various locations, which is different from the previous method where single channel ultrasonic signal was employed. The PCA(Principal Component Analysis) coefficients were used as the features of ASR in which hidden markov model (HMM) with left-right model was adopted. To verify the feasibility of the proposed ASR, the speech recognition experiment was carried out the 60 Korean isolated words obtained from the six speakers. Moreover, the experiment results showed that the overall word recognition rates were comparable with the conventional speech-based ASR methods and the performance of the proposed method was superior to the conventional signal channel ASR method. Especially, the average recognition rate of 90 % was maintained under the noise environments.

Development and validation of Speech Range Profile task (발화범위 프로파일 과제 개발 및 타당성 검증)

  • Kim, Jaeock;Lee, Seung Jin
    • Phonetics and Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.77-87
    • /
    • 2019
  • The study aimed to develop Speech Range Profile (SRP) and to examine and validate its clinical application. Forty-five participants without voice disorders aged 18-29 years were compared using SRP and Voice Range Profile (VRP). The authors developed the "Fire!" paragraph as a SRP task compromising 14 sentences including all Korean spoken phonemes and sentence types. To compare SRP and VRP results, the participants read the paragraph (reading) and counted from 21 to 30 (counting) as a part of SRP tasks, and produced a vowel /a/ from low to high frequencies (gliding) and a shortened form of the VRP as a part of VRP tasks. $F0_{max}$, $F0_{min}$, $F0_{range}$, $I_{max}$, $I_{min}$, and $I_{range}$ for each task were measured and compared, showing that $F0_{max}$, $F0_{min}$, $F0_{range}$, $I_{max}$, and $I_{range}$ were not different between reading and gliding. $I_{min}$, had the lowest value in counting. It is concluded that the newly developed SRP task, reading the "Fire" paragraph, can yield a maximum phonation range similar to that found by VRP. Therefore, it is expected that voice evaluation can be effectively performed in a relatively short time by applying SRP with the "Fire" paragraph, a functional utterance task, in place of VRP, which may be difficult to measure long term or in cases of severe voice disorders.

Implicit Interpretation of Advertising Content Language and Possible Connection of Media Literacy Education (미디어콘텐츠 언어의 암묵적 의미 해석과 미디어 리터러시 교육의 연계 가능성)

  • Lim, Ji-Won
    • Journal of Korea Entertainment Industry Association
    • /
    • v.15 no.3
    • /
    • pp.243-250
    • /
    • 2021
  • The purpose of this study is to discuss the implicit meaning of advertising content with highly persuasive language formats from a communication perspective and its interpretation process in relation to communication education, while simultaneously developing interpretative codes for media literacy education in modern society. For a successful discussion, I assumed the narrative content of advertising content that implements a special purpose as a general conversational act, and raised the issue anew that regularity exists for implicit semantic expressions. It also said that in order for media literacy education in the present society to proceed correctly, linguistic interpretations of implicit meaning cannot be guided as a principle of communication in prior research. As a solution, we confirm that socio-cultural sharing knowledge and recognition are essential interpretation codes. For further discussion, the analysis of advertising media languages with special purposes in terms of language usage was conducted to verify the process of interpreting the implicit meaning shown in them. After analyzing the implicit advertising language that I arbitrarily typified, I found that the linguistic meaning implicit with the intention of persuading the speaker can be provided mostly as media literacy education as a framework for analysis by various information and cognitive effects. In other words, acceptors should not perform only literal interpretations in the process of interpreting the implicit meaning inherent in the media language. If guided by including native language materials and background knowledge, socio-cultural customs, and general common knowledge, efficient media literacy education can be expected.

A method of wall absorption treatment for enhancing the speech intelligibility at a directional microphone array in a room (실내 공간 내 지향성 마이크 어레이에서의 음성 명료도 개선을 위한 벽면 흡음 처리 방법)

  • Ko, Byeong-Yun;Ih, Jeong-Guon;Cho, Wan-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.6
    • /
    • pp.649-659
    • /
    • 2021
  • Wall absorption treatment effectively reduces reverberation, but requires a large area for a live room and each wall absorption affects speech intelligibility differently. In this study, we try to find the most effective wall for the absorption treatment using the beamforming array microphone in terms of speech intelligibility. The absorption importance factor is defined by using the collision number of reflected sounds on each wall. It allows estimating how much the speech signal will be enhanced by the absorption treatment. A cuboid room with a size of 107 m3 and a reverberation time of 1.1 s is selected for the simulation. When a Helmholtz-type absorption is treated on the wall with the most significant importance factor, the modified clarity for 500 and 1k Hz is improved by 5.1 dB and 4.8 dB respectively, and the speech transmission index is enhanced by 0.06. The difference in results between the proposed method and commercial simulation code is less than a Just-Noticeable Difference (JND). The absorption treatment on the wall with the most significant importance factor shows improvement greater than the wall with the largest area, and its difference is larger than a JND value.

Comparison of Korean Speech De-identification Performance of Speech De-identification Model and Broadcast Voice Modulation (음성 비식별화 모델과 방송 음성 변조의 한국어 음성 비식별화 성능 비교)

  • Seung Min Kim;Dae Eol Park;Dae Seon Choi
    • Smart Media Journal
    • /
    • v.12 no.2
    • /
    • pp.56-65
    • /
    • 2023
  • In broadcasts such as news and coverage programs, voice is modulated to protect the identity of the informant. Adjusting the pitch is commonly used voice modulation method, which allows easy voice restoration to the original voice by adjusting the pitch. Therefore, since broadcast voice modulation methods cannot properly protect the identity of the speaker and are vulnerable to security, a new voice modulation method is needed to replace them. In this paper, using the Lightweight speech de-identification model as the evaluation target model, we compare speech de-identification performance with broadcast voice modulation method using pitch modulation. Among the six modulation methods in the Lightweight speech de-identification model, we experimented on the de-identification performance of Korean speech as a human test and EER(Equal Error Rate) test compared with broadcast voice modulation using three modulation methods: McAdams, Resampling, and Vocal Tract Length Normalization(VTLN). Experimental results show VTLN modulation methods performed higher de-identification performance in both human tests and EER tests. As a result, the modulation methods of the Lightweight model for Korean speech has sufficient de-identification performance and will be able to replace the security-weak broadcast voice modulation.