Search | Korea Science

Variational autoencoder for prosody-based speaker recognition

Starlet Ben Alex;Leena Mary
- ETRI Journal
- /
- v.45 no.4
- /
- pp.678-689
- /
- 2023
This paper describes a novel end-to-end deep generative model-based speaker recognition system using prosodic features. The usefulness of variational autoencoders (VAE) in learning the speaker-specific prosody representations for the speaker recognition task is examined herein for the first time. The speech signal is first automatically segmented into syllable-like units using vowel onset points (VOP) and energy valleys. Prosodic features, such as the dynamics of duration, energy, and fundamental frequency (F₀), are then extracted at the syllable level and used to train/adapt a speaker-dependent VAE from a universal VAE. The initial comparative studies on VAEs and traditional autoencoders (AE) suggest that the former can efficiently learn speaker representations. Investigations on the impact of gender information in speaker recognition also point out that gender-dependent impostor banks lead to higher accuracies. Finally, the evaluation on the NIST SRE 2010 dataset demonstrates the usefulness of the proposed approach for speaker recognition.
https://doi.org/10.4218/etrij.2021-0377 인용 PDF

Context-Weighted Metrics for Example Matching (문맥가중치가 반영된 문장 유사 척도)

Kim, Dong-Joo;Kim, Han-Woo
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.43 no.6 s.312
- /
- pp.43-51
- /
- 2006
This paper proposes a metrics for example matching under the example-based machine translation for English-Korean machine translation. Our metrics served as similarity measure is based on edit-distance algorithm, and it is employed to retrieve the most similar example sentences to a given query. Basically it makes use of simple information such as lemma and part-of-speech information of typographically mismatched words. Edit-distance algorithm cannot fully reflect the context of matched word units. In other words, only if matched word units are ordered, it is considered that the contribution of full matching context to similarity is identical to that of partial matching context for the sequence of words in which mismatching word units are intervened. To overcome this drawback, we propose the context-weighting scheme that uses the contiguity information of matched word units to catch the full context. To change the edit-distance metrics representing dissimilarity to similarity metrics, to apply this context-weighted metrics to the example matching problem and also to rank by similarity, we normalize it. In addition, we generalize previous methods using some linguistic information to one representative system. In order to verify the correctness of the proposed context-weighted metrics, we carry out the experiment to compare it with generalized previous methods.
PDF KSCI

Unsupervised Incremental Learning of Associative Cubes with Orthogonal Kernels

Kang, Hoon;Ha, Joonsoo;Shin, Jangbeom;Lee, Hong Gi;Wang, Yang
- Journal of the Korean Institute of Intelligent Systems
- /
- v.25 no.1
- /
- pp.97-104
- /
- 2015
An 'associative cube', a class of auto-associative memories, is revisited here, in which training data and hidden orthogonal basis functions such as wavelet packets or Fourier kernels, are combined in the weight cube. This weight cube has hidden units in its depth, represented by a three dimensional cubic structure. We develop an unsupervised incremental learning mechanism based upon the adaptive least squares method. Training data are mapped into orthogonal basis vectors in a least-squares sense by updating the weights which minimize an energy function. Therefore, a prescribed orthogonal kernel is incrementally assigned to an incoming data. Next, we show how a decoding procedure finds the closest one with a competitive network in the hidden layer. As noisy test data are applied to an associative cube, the nearest one among the original training data are restored in an optimal sense. The simulation results confirm robustness of associative cubes even if test data are heavily distorted by various types of noise.
https://doi.org/10.5391/JKIIS.2015.25.1.097 인용 PDF KSCI

Stopband-Extended and Size-Miniaturized Low-Pass Filter with Three Transmission Zeros

Li, Lin;Bao, Jia;Du, Jing-Jing;Wang, Yaming
- ETRI Journal
- /
- v.36 no.2
- /
- pp.286-292
- /
- 2014
This paper presents a compact structure composed of an upper high-impedance transmission line, a middle extended parallel coupled line, and a pair of inter-coupled symmetrical stepped impedance stubs. Detailed investigation into this structure based on an equivalent circuit analysis reveals that this proposed structure exhibits a quasi-elliptic low-pass filtering response with three transmission zeros. Moreover, the positions of the three transmission zeros can be tuned and reallocated flexibly by choosing the proper circuit parameters. Finally, the design concept is validated through the design, fabrication, and measurement of two exemplary low-pass filters (LPFs) with one single unit and two cascaded asymmetric units. The measured results agree well with the simulated results. In addition, in the range of $1.42f_c$ to $7.03f_c$, the fabricated quasi-elliptic LPFs experimentally demonstrate a very wide upper-stopband of 20 dB using a compact size of only $0.0089{\lambda}_g{^2}$, where ${\lambda}_g$ is the guided wavelength of a $50{\Omega}$ transmission line at the central frequency.
https://doi.org/10.4218/etrij.14.0113.0430 인용 PDF KSCI KPUBS

The Analysis of Flight Data Processing System (비행자료 처리시스템 분석)

Kim, Do-woo;Oh, Seung Hee;Lee, Deok Gyu;Lee, Seoung Hyeon;Han, Jong-wook
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2009.05a
- /
- pp.785-788
- /
- 2009
The flight data processing system is the system which processes and manages all flight related data for the aircraft control and performs the trajectory modeling. It takes charge of the role of performing the core function of the integrated information processing system for the flight control. For the safe aircraft's flight, the information transfer and exchange among air traffic control units are the essential element through the flight data processing. Therefore, for the development of the flight data processing system we are going to analyze its function and look into the necessary consideration in a design in this paper.
PDF

AB9: A neural processor for inference acceleration

Cho, Yong Cheol Peter;Chung, Jaehoon;Yang, Jeongmin;Lyuh, Chun-Gi;Kim, HyunMi;Kim, Chan;Ham, Je-seok;Choi, Minseok;Shin, Kyoungseon;Han, Jinho;Kwon, Youngsu
- ETRI Journal
- /
- v.42 no.4
- /
- pp.491-504
- /
- 2020
We present AB9, a neural processor for inference acceleration. AB9 consists of a systolic tensor core (STC) neural network accelerator designed to accelerate artificial intelligence applications by exploiting the data reuse and parallelism characteristics inherent in neural networks while providing fast access to large on-chip memory. Complementing the hardware is an intuitive and user-friendly development environment that includes a simulator and an implementation flow that provides a high degree of programmability with a short development time. Along with a 40-TFLOP STC that includes 32k arithmetic units and over 36 MB of on-chip SRAM, our baseline implementation of AB9 consists of a 1-GHz quad-core setup with other various industry-standard peripheral intellectual properties. The acceleration performance and power efficiency were evaluated using YOLOv2, and the results show that AB9 has superior performance and power efficiency to that of a general-purpose graphics processing unit implementation. AB9 has been taped out in the TSMC 28-nm process with a chip size of 17 × 23 ㎟. Delivery is expected later this year.
https://doi.org/10.4218/etrij.2020-0134 인용 PDF KSCI

A 3D facial Emotion Editor Using a 2D Comic Model (2D 코믹 모델을 이용한 3D 얼굴 표정 에디터)

이용후;김상운;청목유직
- Proceedings of the IEEK Conference
- /
- 2000.06d
- /
- pp.226-229
- /
- 2000
A 2D comic model, a comic-style line drawing model having only eyebrows, eyes, nose and mouth, is much easier to generate facial expressions with small number of points than that of 3D model. In this paper we propose a 3D emotional editor using a 2D comic model, where emotional expressions are represented by using action units(AU) of FACS. Experiments show a possibility that the proposed method could be used efficiently for intelligent sign-language communications between avatars of different languages in the Internet cyberspace.
PDF

An introduction of FTTH Passive Optical Network and Deployment Strategy (FTTH 수동 광가입자망 기술 소개 및 진화 방안)

Kim, Chong-Ahn;Kim, Dae-Young
- Proceedings of the IEEK Conference
- /
- 2005.11a
- /
- pp.253-256
- /
- 2005
In this paper, we explain a various fiber to the home technology and give some important standardization status. And, passive optical networks which are WDM-PON, Ethernet PON and/or Gigabit-PON will be mainly deployed in populated subscriber area and multiple dwelling units with taking great advantage of OPEX. And finally we discuss FTTH deployment strategy with low capital cost.
PDF

Color Reproduction Simulator Using Standard Display (표준 디스플레이를 이용한 발색 시뮬레이터)

Park, Gyun-Deuk
- Proceedings of the IEEK Conference
- /
- 2005.11a
- /
- pp.419-422
- /
- 2005
In this paper, a display color characteristic simulation algorithm is proposed for nonstandard display units under development. In this algorithm, signal transformation matrix is calculated from the transfer characteristic of a nonstandard display unit to reproduce the same color of the one on a standard CRT display. Proposed algorithm can be used for the simulation of various color reproduction characteristics and the performance improvement of the nonstandard display.
PDF

Power Operation Accelerator to speed up lighting in 3D graphics

Young-Su Kwon;In-
- Proceedings of the IEEK Conference
- /
- 1998.10a
- /
- pp.1129-1132
- /
- 1998
This paper presents a design of special hardware developed for enhancing the floating-point power operations which are actively used at the lighting stage to calculate the specular term in 3D graphics geometry engines. The power operation takes just 4 cycles in our floating-point multiplier while it takes about 100-200 cycles in conventional floating-point units. Although an approximation algorithm is employed in the power operation to reduce the hardware complexity required, the error of power value from the developed floatingpoint multiplier is so minimal that no difference can be found by human eyes.
PDF

Search Result 479, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)