Search | Korea Science

Parameter Generation Algorithm for LSTM-RNN-based Speech Synthesis (LSTM-RNN 기반 음성합성을 위한 파라미터 생성 알고리즘)

Park, Sangjun;Hahn, Minsoo
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2017.06a
- /
- pp.105-106
- /
- 2017
본 논문에서는 최대 우도 기반 파라미터 생성 알고리즘을 적용하여 인공 신경망의 출력인 음향 파라미터 열의 정확성 및 자연성을 향상시키는 방법을 제안하였다. 인공 신경망의 출력으로 정적 특징벡터 뿐 만 아니라 동적 특징벡터도 함께 사용하였고, 미리 계산된 파라미터 분산을 파라미터 생성에 사용하였다. 추정된 정적, 동적 특징벡터의 평균, 분산을 EM 알고리즘에 적용하여 최대 우도 기준 파라미터를 추정할 수 있다. 제안된 알고리즘은 파라미터 생성 시 동적 특징벡터 및 분산을 함께 적용하여 시간축에서의 자연성을 향상시켰다. 제안된 알고리즘의 객관적 평가로 MCD, F0 의 RMSE 를 측정하였고, 주관적평가로 선호도 평가를 실시하였다. 그 결과 기존 알고리즘 대비 객관적, 주관적 성능이 향상되는 것을 검증하였다.
PDF

The Output SINR of the Linearly Constrained Broadband Beamformer (선형 제한 조건을 갖는 광대역 빔 형성기의 출력 SINR)

Gwak, Byeong-Jae;Kim, Gi-Man;Cha, Il-Hwan;Yun, Dae-Hui
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.2E
- /
- pp.14-19
- /
- 1994
In this paper, we derive expressions for the output signal-to-interference plus noise ratio(SINR) of the linearly constrained broadband beamformer in noncoherent situations using a vector approach. The incoming broadband signals are assumed to have flat spectra.
PDF

A Study on the Characteristics of Noise Comparison in Voice Warning System in the automobile indoors (차량실내에서 음성출력장치의 소음비교특성에 관한 연구)

한영출;김대열;오상기
- Transactions of the Korean Society of Automotive Engineers
- /
- v.11 no.2
- /
- pp.196-202
- /
- 2003
The object of this article is to study the plausibility of applying human voice warning system to automobiles. Human voice is considered the best tool for warning system in automobiles. For the purpose of comprehending the specific characteristics of relation between noises and properties of the automobiles indoors and voice warning system researcher performed FRF test in order to examine the characteristics of voice output, and FEM simulation to learn the specific properties of the car indoors. And furthermore, surveyed the quality of voice output, using the written inquiry to examine members. The result of the study shows that it is much possible to apply voice warning system to automobiles.
PDF KSCI

FSF laser Development for the optical communication diagnosis and medical tomography application (광통신용 및 의용 계측을 위한 FSF Laser의 개발)

지명훈;이영우
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2002.05a
- /
- pp.514-517
- /
- 2002
We developed Frequency-shifted feedback laser using AOM inside the cavity. The feedback loop of the laser is formed with the first-order diffracted light of the AOM to output mirror. It is shown that the FSF laser output has spectral output called“chirped frequency comb”with an ultrafast frequency chirp rate of several hundreds of PHz/s. It can know the range using chirped frequency comb in the optical range measurement that is FSF laser as source.
PDF

Hybrid CTC-Attention Based End-to-End Speech Recognition Using Korean Grapheme Unit (한국어 자소 기반 Hybrid CTC-Attention End-to-End 음성 인식)

Park, Hosung;Lee, Donghyun;Lim, Minkyu;Kang, Yoseb;Oh, Junseok;Seo, Soonshin;Rim, Daniel;Kim, Ji-Hwan
- Annual Conference on Human and Language Technology
- /
- 2018.10a
- /
- pp.453-458
- /
- 2018
본 논문은 한국어 자소를 인식 단위로 사용한 hybrid CTC-Attention 모델 기반 end-to-end speech recognition을 제안한다. End-to-end speech recognition은 기존에 사용된 DNN-HMM 기반 음향 모델과 N-gram 기반 언어 모델, WFST를 이용한 decoding network라는 여러 개의 모듈로 이루어진 과정을 하나의 DNN network를 통해 처리하는 방법을 말한다. 본 논문에서는 end-to-end 모델의 출력을 추정하기 위해 자소 단위의 출력구조를 사용한다. 자소 기반으로 네트워크를 구성하는 경우, 추정해야 하는 출력 파라미터의 개수가 11,172개에서 49개로 줄어들어 보다 효율적인 학습이 가능하다. 이를 구현하기 위해, end-to-end 학습에 주로 사용되는 DNN 네트워크 구조인 CTC와 Attention network 모델을 조합하여 end-to-end 모델을 구성하였다. 실험 결과, 음절 오류율 기준 10.05%의 성능을 보였다.
PDF

수중무선전화기 설계를 위한 수중소음분석

박문갑;윤갑동;김석재;윤종락
- Proceedings of the Korean Society of Fisheries Technology Conference
- /
- 2001.05a
- /
- pp.73-74
- /
- 2001
해양 수중 소음이 수중 무선 전화기를 설계하는데 있어서 미치는 영향에 대한 연구는 거의 이루어지지 않고 있다. 본 연구는 잠수기 어업용 수중 무선 전화기의 최적 반송 주파수 선정과 통신 가능한 수신기의 SNR 및 송신기의 최적 음향 출력 준위를 산정하기 위한 것이다. 우리 나라 잠수기 어업의 주 조업장에서의 수중 소음을 측정 분석하여 이들 수중 무선전화기 설계 패러미터을 해석하였다. (중략)
PDF

Assessment of Plastic Deformation in Al6061 Alloy using Acoustic Nonlinearity of Laser-Generated Surface Wave (레이저 여기 표면파의 음향비선형성을 이용한 Al6061 합금의 소성변형 평가)

Kim, Chung-Seok;Nam, Tae-Hyung;Choi, Sung-Ho;Jhang, Kyung-Young
- Journal of the Korean Society for Nondestructive Testing
- /
- v.32 no.1
- /
- pp.20-26
- /
- 2012
The objective of this study is to assess plastic deformation in aluminium alloy by acoustic nonlinearity of laser-generated surface waves. A line-arrayed laser beam made by high-power pulsed laser and mask slits is utilized to generate the narrowband surface wave and the frequency characteristics of laser-generated surface waves are controlled by varying the slit opening width and slit interval of mask slits. Various degrees of tensile deformation were induced by interrupting the tensile tests so as to obtain aluminum specimens with different degrees of plastic deformation. The experimental results show that the acoustic nonlinear parameter of a laser-generated surface wave increased with the level of tensile deformation and it has a good correlation with the results of micro-Vickers hardness test and electron backscatter diffraction (EBSD) test. Consequently, acoustic nonlinearity of laser-generated surface wave could be potential to characterize plastic deformation of aluminum alloy.
https://doi.org/10.7779/JKSNT.2012.32.1.020 인용 PDF KSCI

Analysis of the beam pattern of a thickness shear mode vibrator for vector hydrophones (벡터 하이드로폰을 위한 두께 전단형 진동자의 빔 패턴 해석)

Kim, Jungsuk;Kim, Hoeyong;Roh, Yongrae
- The Journal of the Acoustical Society of Korea
- /
- v.36 no.3
- /
- pp.158-164
- /
- 2017
Typical hydrophones in line array sensors for early detection of covert underwater targets can measure only sound-pressure-magnitude with the limitation of being unable to identify the direction of an incoming wave. In this study, a thickness shear mode vibrator was proposed as the main component of an inertia type vector hydrophone to measure both magnitude and direction of acoustic signals from targets. The equation to analyze the output voltage of the vibrator to an external force was derived, and the validity of the equation was verified through finite element analysis of a PMN-PT single crystal vibrator. The analysis results from this study will be utilized in the future for the design of inertia type vector hydrophones made of thickness shear vibrators.
https://doi.org/10.7776/ASK.2017.36.3.158 인용 PDF KSCI

The Enhancement of the Acoustic Image by Combining Bases of Support for SFR (Spatial Frequency Response) (공간주파수응답의 기저대역 확장에 의한 초음파영상의 개선)

Song, Dae-Geon;Oh, Tong-In;Kim, Hyun;Jun, Kye-Suk
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.5
- /
- pp.408-417
- /
- 2003
In this paper, we have studied the enhancement of the acoustic image by combining bases of support for SFR (Spatial Frequency Response) taken at multi-frequencies. The scanning acoustic microscope system have been constructed using the quadrature detector that is able to measure the amplitude and phase of the reflected signal simultaneously. Both real and quadrature components of reflected signal have been acquired at 4.4 ㎒ to 5.6 ㎒ reliably and accurately. In this experimental result, better depth resolution can be obtained by numerically combining images taken at several different frequencies. Image intensity have been better about 3.4 times at multi-frequency than one at a single frequency.
PDF KSCI

Quality Improvement of Karaoke Mode in SAOC using Cross Prediction based Vocal Estimation Method (교차 예측 기반의 보컬 추정 방법을 이용한 SAOC Karaoke 모드에서의 음질 향상 기법에 대한 연구)

Lee, Tung Chin;Park, Young-Cheol;Youn, Dae Hee
- The Journal of the Acoustical Society of Korea
- /
- v.32 no.3
- /
- pp.227-236
- /
- 2013
In this paper, we present a vocal suppression algorithm that can enhance the quality of music signal coded using Spatial Audio Object Coding (SAOC) in Karaoke mode. The residual vocal component in the coded music signal is estimated by using a cross prediction method in which the music signal coded in Karaoke mode is used as the primary input and the vocal signal coded in Solo mode is used as a reference. However, the signals are extracted from the same downmix signal and highly correlated, so that the music signal can be severely damaged by the cross prediction. To prevent this, a psycho-acoustic disturbance rule is proposed, in which the level of disturbance to the reference input of the cross prediction filter is adapted according to the auditory masking property. Objective and subjective test were performed and the results confirm that the proposed algorithm offers improved quality.
https://doi.org/10.7776/ASK.2013.32.3.227 인용 PDF KSCI

Search Result 394, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)