• Title/Summary/Keyword: 음성다중

Search Result 350, Processing Time 0.027 seconds

The usefulness of the depth images in image-based speech synthesis (영상 기반 음성합성에서 심도 영상의 유용성)

  • Ki-Seung Lee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.1
    • /
    • pp.67-74
    • /
    • 2023
  • The images acquired from the speaker's mouth region revealed the unique patterns according to the corresponding voices. By using this principle, the several methods were proposed in which speech signals were recognized or synthesized from the images acquired at the speaker's lower face. In this study, an image-based speech synthesis method was proposed in which the depth images were cooperatively used. Since depth images yielded depth information that cannot be acquired from optical image, it can be used for the purpose of supplementing flat optical images. In this paper, the usefulness of depth images from the perspective of speech synthesis was evaluated. The validation experiment was carried out on 60 Korean isolated words, it was confirmed that the performance in terms of both subjective and objective evaluation was comparable to the optical image-based method. When the two images were used in combination, performance improvements were observed compared with when each image was used alone.

Implementation of Adaptive Multi Rate (AMR) Vocoder for the Asynchronous IMT-2000 Mobile ASIC (IMT-2000 비동기식 단말기용 ASIC을 위한 적응형 다중 비트율 (AMR) 보코더의 구현)

  • 변경진;최민석;한민수;김경수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.56-61
    • /
    • 2001
  • This paper presents the real-time implementation of an AMR (Adaptive Multi Rate) vocoder which is included in the asynchronous International Mobile Telecommunication (IMT)-2000 mobile ASIC. The implemented AMR vocoder is a multi-rate coder with 8 modes operating at bit rates from 12.2kbps down to 4.75kbps. Not only the encoder and the decoder as basic functions of the vocoder are implemented, but VAD (Voice Activity Detection), SCR (Source Controlled Rate) operation and frame structuring blocks for the system interface are also implemented in this vocoder. The DSP for AMR vocoder implementation is a 16bit fixed-point DSP which is based on the TeakLite core and consists of memory block, serial interface block, register files for the parallel interface with CPU, and interrupt control logic. Through the implementation, we reduce the maximum operating complexity to 24MIPS by efficiently managing the memory structure. The AMR vocoder is verified throughout all the test vectors provided by 3GPP, and stable operation in the real-time testing board is also proved.

  • PDF

An Adjustable Round Robin Scheduling Algorithm for the High Data Rate Mobile Communication System (고속 이동 통신을 위한 적응 가능한 라운드 로빈 스케줄링 방식)

  • Bae, Jeong-Min;Song, Young-Keum;Kim, Dong-Woo
    • Journal of KIISE:Information Networking
    • /
    • v.34 no.1
    • /
    • pp.27-32
    • /
    • 2007
  • Next-generation wireless networks are expected to support a wide range of services, including high-rate data applications, Various service types request differentiated QoSs(Qualities of Service) such as minimum data rate, accuracy, fairness and so on. Although resources of radio systems are limited, for many applications, it is important that certain QoS targets are required to be met. In this paper, we propose a QoS based scheduling algorithm for next generation systems, based on analyzing previous researches, and we develop the proposed QoS algorithm only for MIMO(multi-Input Multi-Output) systems. Moreover, we subsequently prove that the proposed algorithm optimize throughput relative to prespecified target values and converge to certain throughput.

Design of Digital Transmultiplexing System for PRS Transmission (PRS 전송 방식을 위한 디지털 변환다중장치의 설계)

  • 오용선;강창언
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.14 no.4
    • /
    • pp.423-434
    • /
    • 1989
  • In this paper, a PRS transmission system using TMRCP as the unit pulse is proposed, which solves the problems occur when the PRS method will be applied to the digital transmultiplexer for each channel. And a design technique which uses this PRS method for the FFT polyphase filter transmultiplexer concept is given. TMRCP-PRS signal require a bandwidth about 2.5KHz(including some guard-band) for a 4-KHz bandlimited voice channel. Therefore, in he 24 channel transmission line, it gives the same advantages as he ordinary PRS system and sloves the inter-channel interference problems. And its good speed-tolerance reduces the time-errors by the environments and the power loss, so it makes the system to be stable. The total system, however, attaces the filters for PCM-PRS, PRS-PCM conversion before and after the transmultiplexer respectively.

  • PDF

Erlang Capacity for the Reverse Link of a DS/CDMA Cellular System Supporting Voice and Data Service in Rayleigh Fading Channel (레일레이 페이딩 채널에서 음성 및 데이터 서비스를 지원하는 DS/CDMA 셀룰라 시스템의 역방향 링크에 대한 얼랑 용량)

  • Kim, Hang-Rae;Kim, Nam
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.38 no.12
    • /
    • pp.20-28
    • /
    • 2001
  • In this paper, a extended blocking probability formula for reverse link of a DS/CDMA cellular system supporting voice and data service is derived in Rayleigh fading channel. Also, voice and data Erlang capacity considering shadowing are analyzed and compared with those considering both shadowing and multipath fading, respectively. Assuming that the blocking probability set 1% in the Rayleigh fading channel, they are observed that voice Erlang capacity of 13.38 Erlangs and data Erlang capacity of 8.92 Erlangs are supported at the data rate $R_b$=9.6 kbps, and voice Erlang capacity of 7.47 Erlangs and data Erlang capacity of 4.98 Erlangs are supported at the data rate $R_b$=14.4 kbps, respectively, and then are less 21.4% for $R_b$=9.6 kbps, 24.9% for $R_b$=14.4 kbps than Erlang capacity considering shadowing only, respectively. It is shown that the effect of multipath fading must not be ignored. Also, it is presented that accurate voice and data Erlang capacity which could be supported by the DS/CDMA cellular system.

  • PDF

Analysis of Korean Spontaneous Speech Characteristics for Spoken Dialogue Recognition (대화체 연속음성 인식을 위한 한국어 대화음성 특성 분석)

  • 박영희;정민화
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.3
    • /
    • pp.330-338
    • /
    • 2002
  • Spontaneous speech is ungrammatical as well as serious phonological variations, which make recognition extremely difficult, compared with read speech. In this paper, for conversational speech recognition, we analyze the transcriptions of the real conversational speech, and then classify the characteristics of conversational speech in the speech recognition aspect. Reflecting these features, we obtain the baseline system for conversational speech recognition. The classification consists of long duration of silence, disfluencies and phonological variations; each of them is classified with similar features. To deal with these characteristics, first, we update silence model and append a filled pause model, a garbage model; second, we append multiple phonetic transcriptions to lexicon for most frequent phonological variations. In our experiments, our baseline morpheme error rate (WER) is 31.65%; we obtain MER reductions such as 2.08% for silence and garbage model, 0.73% for filled pause model, and 0.73% for phonological variations. Finally, we obtain 27.92% MER for conversational speech recognition, which will be used as a baseline for further study.

Multi-Modal Biometries System for Ubiquitous Sensor Network Environment (유비쿼터스 센서 네트워크 환경을 위한 다중 생체인식 시스템)

  • Noh, Jin-Soo;Rhee, Kang-Hyeon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.4 s.316
    • /
    • pp.36-44
    • /
    • 2007
  • In this paper, we implement the speech & face recognition system to support various ubiquitous sensor network application services such as switch control, authentication, etc. using wireless audio and image interface. The proposed system is consist of the H/W with audio and image sensor and S/W such as speech recognition algorithm using psychoacoustic model, face recognition algorithm using PCA (Principal Components Analysis) and LDPC (Low Density Parity Check). The proposed speech and face recognition systems are inserted in a HOST PC to use the sensor energy effectively. And improve the accuracy of speech and face recognition, we implement a FEC (Forward Error Correction) system Also, we optimized the simulation coefficient and test environment to effectively remove the wireless channel noises and correcting wireless channel errors. As a result, when the distance that between audio sensor and the source of voice is less then 1.5m FAR and FRR are 0.126% and 7.5% respectively. The face recognition algorithm step is limited 2 times, GAR and FAR are 98.5% and 0.036%.

Optimizing Multiple Pronunciation Dictionary Based on a Confusability Measure for Non-native Speech Recognition (타언어권 화자 음성 인식을 위한 혼잡도에 기반한 다중발음사전의 최적화 기법)

  • Kim, Min-A;Oh, Yoo-Rhee;Kim, Hong-Kook;Lee, Yeon-Woo;Cho, Sung-Eui;Lee, Seong-Ro
    • MALSORI
    • /
    • no.65
    • /
    • pp.93-103
    • /
    • 2008
  • In this paper, we propose a method for optimizing a multiple pronunciation dictionary used for modeling pronunciation variations of non-native speech. The proposed method removes some confusable pronunciation variants in the dictionary, resulting in a reduced dictionary size and less decoding time for automatic speech recognition (ASR). To this end, a confusability measure is first defined based on the Levenshtein distance between two different pronunciation variants. Then, the number of phonemes for each pronunciation variant is incorporated into the confusability measure to compensate for ASR errors due to words of a shorter length. We investigate the effect of the proposed method on ASR performance, where Korean is selected as the target language and Korean utterances spoken by Chinese native speakers are considered as non-native speech. It is shown from the experiments that an ASR system using the multiple pronunciation dictionary optimized by the proposed method can provide a relative average word error rate reduction of 6.25%, with 11.67% less ASR decoding time, as compared with that using a multiple pronunciation dictionary without the optimization.

  • PDF

Efficient Mixture IMM Algorithm for Speech Enhancement under Nonstationary Additive Colored Noise (시변가산유색잡음하의 음성 향상을 위한 효율적인 Mixture IMM 알고리즘)

  • 이기용;임재열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.8
    • /
    • pp.42-47
    • /
    • 1999
  • In this paper, a mixture interacting multiple model (MIMM) algorithm is proposed to enhance speech contaminated by additive nonstationary noise. In this approach, a mixture hidden filter model (HFM) is used to model the clean speech and the noise process is modeled by a single hidden filter. The MIMM algorithm, however. needs large computation time because it is a recursive method based on multiple Kalman filters with mixture HFM. Thereby, a computationally efficient implementation of the algorithm is developed by exploiting the structure of the Kalman filtering equation. The simulation results show that the proposed method offers performance gain compared to the previous results in [4,5] with slightly increased complexity.

  • PDF

Performance Evaluation of AAL2 Bandwidth Gain on $I_{ub}$ in UMTS Network (UMTS망의 $I_{ub}$에서 AAL2 대역이득 성능평가)

  • 이현진;김재현
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.8B
    • /
    • pp.739-746
    • /
    • 2004
  • An ATM/AAL2 is standardized to transmit delay sensitive application services, which has small size packet, efficiently. An AAL2 transmission scheme is used to deliver voice and data traffic on the lob interface between base station (Node-B) and Radio Network Controller (RNC) in UMTS network. To predict AAL2 performance, a detailed end-to-end UMTS network performance simulator was developed. We performed detailed simulation(cell packing density and bandwidth gain) for voice and data services in UTRAN. The results indicate that the maximum bandwidth gain in Node-B is about 17% and the bandwidth gain of AAL2 multiplexing in $I_{ub}$ for data services is less than that for voice service. Futhermore, the more offered load increase the more the bandwidth gain decreases in a concentrator.