Search | Korea Science

On a Research of Improving the Performance of Voice Activity Detector in G.723.1 (G.723.1 음성 활동 검출 장치 성능 향상에 관한 연구)

JANG KyungA;KIM JeongJin;Chang YoungOh;HONG SeongHoon;BAE MyungJin
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.53-56
- /
- 1999
ITU-T 국제 표준화 기구에서 인터넷 폰과 화상회의를 목적으로 개발된 G.723.1 음성 부호화기는 잡음 구간에서의 전송률을 낮추기 위한 방법으로 VAD(Voice Activity Detector)와 CNG(Comfort Noise Generator)를 사용하고 있다 이중 VAD는 최종적으로 현재 프레임의 에너지 레벨을 비교하여 음성의 활동 유무를 판정하고 있다. 하지만 G.723.1 VAD에서는 보다 안정적인 판정을 위해 음성 활동 구간 사이에 삽입되어 있는 묵음 구간에 대해서는 거의 대부분 음성이 활동하는 영역으로 판정을 하고 있다. 따라서 본 논문에서는 묵음 구간에 대해 보다 정확한 판정을 통하여 기존의 방법에 비해 전송율을 더욱 감소시킬 수 있는 방법을 제안한다. 실험에서는 묵음구간을 길게 조절한 문장을 사용하여 측정한 결과 평균 $46.8\%$ 정도의 전송율을 감소시킬 수 있었으며, 주관적인 음질평가의 경우 음질의 열하는 거의 발생하지 않았다.
PDF

Measurement of Rhythmic Similarity for Auditory Memory Game (청각 기억 게임을 위한 리듬 유사도 측정 기술)

Kim, Ju-Wan;Lee, Se-Won;Park, Ho-Chong
- The Journal of the Acoustical Society of Korea
- /
- v.30 no.3
- /
- pp.136-141
- /
- 2011
In this paper, a method for measuring rhythmic similarity between two sound signals for auditory memory game is proposed. The proposed method analyzes energy fluctuation, the temporal duration of energy peak, the timbre of two signals, and detects beat positions for each signal. Then, it determines the rhythm vector after compensating a difference in tempo and the number of beats between two signals. Finally, a method for rhythmic similarity measurement is defined as a function of the dissimilarity between two rhythm vectors and a difference in the number of beats. The rhythmic similarity measured by the proposed method and that by the subjective listening test are compared, and the correlation of 0.86 between two results is achieved.
https://doi.org/10.7776/ASK.2011.30.3.136 인용 PDF KSCI

Design of a Variable half rate speech codec (가변율 half rate 음성 부호화기의 설계)

성호상
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1998.06e
- /
- pp.293-296
- /
- 1998
본 논문에서는 다양한 멀티미디어 서비스를 위해 가변율 half rate 음성 부호화기를 설계하였다. 유, 무성음과 묵음의 구분을 위해 본 논문에서는 프레임 에너지와 음성 파라메터들을 이용한 효과적인 voicing 결정 알고리즘을 사용하였다. 유성음을 위한 half rate 음성 부호화기는 저속에서 좋은 특성을 보이는 generalized AbS구조를 이용하였다. LPC 계수는 LSP 계수로 변환한 후 predictive 2-stage VQ를 통해서 양자화하며, 여기 신호는 음질저하를 최소화하며 복잡도를 감소시킨 shift 방식의 대수적 고정 코드북 구조를 사용하고, 적응코드북과 여기코드북의 이득은 VQ로 양자화 하였다. 무성음을 위한 부호화기는 대부분이 유성음을 위한 부호화기와 동일하지만, 무성음에서는 피치간 상관도가 매우 낮으므로 피치 보간 방법을 사용하지 않고 개루프로 피치 lag를 찾은 후 전체 프레임에 사용한다. 1 kb/s 부호화기는 묵음 구간과 주변소음 구간에 사용되며 이 구간의 신호를 피치 성분이 미약한 주변소음들로 제한하고 이에 최적인 부음성 부호화기를 설계하였다. 최종적으로 완성된 가변율 half rate 부호화기는 voice activity factor(VAF)가 0.47인 시험음성에서 약 2.6 kb/s의 평균 전송률을 보였다. 주관적 음질 평가의 일환으로 IS-96 표준 코덱인 가변율 8 kb/s QCELP와 A-B preference 시험을 실시하였다. 시험 결과 평균전송률이 약 2배인 가변율 8 kb/s QCELP 보다 우수한 음질 성능을 보였다.
PDF

Fast Implementation Algorithms for EVRC (EVRC의 고속 구현 알고리듬)

정성교;최용수;김남건;윤대희
- The Journal of the Acoustical Society of Korea
- /
- v.20 no.1
- /
- pp.43-49
- /
- 2001
EVRC (Enhanced Variable Rate Codec) has been adopted as a standard coder for the CDMA digital cellular system in North America and Korea, and known to provide good call quality at 8kbps. In this paper, fast implementation algorithms for EVRC encoder are proposed. The proposed algorithms are based on both efficient pitch detection scheme and fast fixed codebook search algorithm. In the codebook search, computational complexity is reduced down to 70％ of the original EVRC by limiting the number of pulse position combination and by using a truncated impulse response. The proposed algorithms enable us to implement the EVRC with much smaller computational works. Also, informal subjective tests confirmed that the difference in the speech quality between the original EVRC and the proposed method was indistinguishable.
PDF

An Audio Watermarking Method Using the Attribute of the Tonal Masker (토널 마스커 특성을 이용한 오디오 워터마킹)

이희숙;이우선
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.5
- /
- pp.367-374
- /
- 2003
In this paper, we propose an audio watermarking method using the attribute of tonal masker. First, the attribute of tonal masker as an audio watermarking attribute is analyzed. According to existing researches, it is possible to be imperceptible modulation for the energies of the frequencies that compose a tonal masker. And when the relation between the tone energy and the left or right frequency energy after various signal processing is compared with the one before the processing, very few changes are showed. We propose an audio watermarking method using these attributes of tonal masker. A watermark bit is embedded by the modulation of the difference between the two neighboring frequency energies of a tone. In the detection, the modulated the tonal masker is searched using the key wed in the embedding without original audio and the embedded watermark bit is detected. After each attack of noise insertion, band-pass filtering, re-sampling, compression, echo transform and equalization, the detection error ratios of the proposed method were average 0.11%, 1.26% for Classics and Pops. And the SDG(Subjective Diff-Grades) scale evaluation of the sound quality of the watermarked audio result in the average SDG -0.31.
PDF KSCI

Sound Quality Evaluation Method of Korean Bell (한국 범종의 음질 평가 방법 고찰)

Sung, Weonchan;Lee, HyeongRae;Kang, Yeonjune
- Proceedings of the Korean Society for Noise and Vibration Engineering Conference
- /
- 2013.10a
- /
- pp.708-712
- /
- 2013
The appearance of the Korean bell is exceptional comparing to that of other bells. The sound of the Korean bell is also distinguishable because of its remarkable appearance. The sound of the Korean bell is composed of striking sound and residue sound. This study focuses mainly on the striking sound of the Korean bell. The primary purpose of this study is to suggest a proper sound quality evaluation method for Korean bell. To implement this study, examine the formerly established methods which are method using rating method and method using generally used Psychoacoustics metrics. Also, subjective Sound Quality evaluation was conducted to compare the results of method using rating method and the method using psychoacoustics metrics.
PDF

The Research of Improving The Performance of the G.723.1 MP-MLQ Vocoder (G.723.1 MP-MLQ 부호화기의 성능개선에 관한 연구)

Min SoYeon;Na DuckSn;Kim JeongJin;BAE MyungJin
- Proceedings of the Acoustical Society of Korea Conference
- /
- autumn
- /
- pp.49-52
- /
- 1999
4.8kbps 내외의 전송률에서 양호한 음질을 제공하는 CELP 계열 음성 부호화기 중에서 인터넷 폰 및 화상회의를 목적으로 개발된 G.723.1은 5.3kbps ACELP(Algebraic CELP)와 6.3kbps MP-MLQ(Multi-Pulse Maximum Likelihood Quantization) 두 개의 부호화기를 포함하고 있다[1]. 이 중 MP-MLQ는 고정 코드북 검색 시 많은 계산량으로 인해 실시간 구현에 어려움이 따르고 있다. 이러한 문제점을 개선하기 위해 본 논문에서는 유/무성음을 분리한 후 grid bit를 먼저 결정하여 코드북을 검색하는 방법 제안한다. LSP 파라미터의 분포특성을 이용하여 유/무성음을 분리한 후 무성음에 대해서는 스펙트럼 정보만 전송하고 유성음에 대해서만 코드북 검색을 수행한다. 그리고 코드북 검색 시 Grid bit를 먼저 결정하여 수행하였다. Grid bit는 짝/홀수번째 전체 펄스를 이용하여 합성한 합성음과 DC 성분이 제거된 원음과의 비교를 통하여 결정하였다. 실험 결과 전체 처리시간은 평균적으로 약 $20.55\%$ 감소하였으며 주관적 음질평가 결과 음질 열하는 거의 발생하지 않았다.
PDF

The Design of Adaptive Quantizer to Improve Image Quality of the H.263 (H.263의 화질 개선을 위한 적응 양자화기 설계)

신경철;이광형
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.6
- /
- pp.77-83
- /
- 1999
H.263 is an international standard of ITU-T that can makes the service such as video phone, video conference in the transmission line less than 64Kbps. This recommendation draft has used motion estimation/compensation, transform coding and quantizing methods. TMN5 used for the performance estimation of H.263 has fundamentally used DCT in transform coding method and presented quantizer for quantizing the DCT transform coefficient. This paper is presenting adaptive quantizer effectively able to quantize DCT coefficient considering the human visual sensitivity while the structure of TMN5 is maintaining. As quantizer that proposed DCT-based H.263 could make transmit more frame than TMN5 in a same transfer speed, it could lower the frame drop effect. And the luminance signal appeared the difference of -0.3 ～ +0.7dB in the average PSNR for the estimation of objective image quality and the chrominance signal appeared the improvement in about 1.5dB in comparision with TMN5. As a result it can attain the better image quality compared to TMN5 in the estimation of subjective image quality.
PDF

Chord-based stepwise Korean Trot music generation technique using RNN-GAN (RNN-GAN을 이용한 코드 기반의 단계적 트로트 음악 생성 기법)

Hwang, Seo-Rim;Park, Young-Cheol
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.6
- /
- pp.622-628
- /
- 2020
This paper proposes a music generation technique that automatically generates trot music using a Generative Adversarial Network (GAN) model composed of a Recurrent Neural Network (RNN). The proposed method uses a method of creating a chord as a skeleton of the music, creating a melody and bass in stages based on the chord progression made, and attaching it to the corresponding chord to complete the structured piece. Also, a new chorus chord progression is created from the verse chord progression by applying the characteristics of a trot song that repeats the structure divided into an individual section, such as intro, verse, and chorus. And it extends the length of the created trot. The quality of the generated music was specified using subjective evaluation and objective evaluation methods. It was confirmed that the generated music has similar characteristics to the existing trot.
https://doi.org/10.7776/ASK.2020.39.6.622 인용 PDF KSCI

The usefulness of the depth images in image-based speech synthesis (영상 기반 음성합성에서 심도 영상의 유용성)

Ki-Seung Lee
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.1
- /
- pp.67-74
- /
- 2023
The images acquired from the speaker's mouth region revealed the unique patterns according to the corresponding voices. By using this principle, the several methods were proposed in which speech signals were recognized or synthesized from the images acquired at the speaker's lower face. In this study, an image-based speech synthesis method was proposed in which the depth images were cooperatively used. Since depth images yielded depth information that cannot be acquired from optical image, it can be used for the purpose of supplementing flat optical images. In this paper, the usefulness of depth images from the perspective of speech synthesis was evaluated. The validation experiment was carried out on 60 Korean isolated words, it was confirmed that the performance in terms of both subjective and objective evaluation was comparable to the optical image-based method. When the two images were used in combination, performance improvements were observed compared with when each image was used alone.
https://doi.org/10.7776/ASK.2023.42.1.067 인용 PDF

Search Result 166, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)