Search | Korea Science

An Automatic Diphone Segmentation for Korean Speech Synthesis-by-Rule (한국어 규칙 합성을 위한 다이폰의 자동 추출)

정인종;경연정;김한우;이양희
- The Journal of the Acoustical Society of Korea
- /
- v.12 no.2E
- /
- pp.63-72
- /
- 1993
본 논문에서는 무제한 음성 생성을 위한 단위음성으로서의 다이폰을 2음절 자연음성으로부터 자동 추출하는 알고리즘을 제안한다. 입력음성을 개량 켑스트럼 파라미터로 분석하여 이로부터 다이폰 추출 파라미터들을 도출한다. 제안된 파라미터로는 에너지 레벨을 나타내는 0차 켑스트럼의 동적변화량, 스펙트럼의 시간 변화량 영교차율, 캡스트럼의 유클리디안 거리이다. 스펙트럼 포락의 변화가 완만한 모음 연쇄등의 음소 경계를 보다 효율적으로 검출하기 위해 스펙트럼의 시간 변화를 미세부분과 개형부분으로 나누어 각각을 파라미터로 사용한다. VV(모음연쇄), VCV(C: 반모음, 자음), VCCV형들로 이루어진 2음절 단어들에 대해 실험한 결과, 모음연쇄 등이 포함되어 있음에도 약 85% 정확도의 음소경계검출을 얻었다. 본 논문에 의한 다이폰을 이용한 합성음의 청취실험 결과 명료도가 높음을 확인하였다.
PDF

A study on the simplification of HRTF within low frequency region (저역 주파수 영역에서 HRTF의 간략화에 관한 연구)

Lee, Chai-Bong
- The Journal of the Korea institute of electronic communication sciences
- /
- v.5 no.6
- /
- pp.581-587
- /
- 2010
In this study, we investigated the effect of the simplification for low frequency region in Head-Related Transfer Function(HRTF) on the sound localization. For this purpose, HRTF was measured and analyzed. The result in the standard deviation of HRTF showed that the directional dependence of low frequency was smaller than that of high frequency region, which means the possibility of simplification in the low frequency region. Simplification was performed by flattening of the low frequency amplitude characteristics with the insertion of the high-pass filter, whose cutoff frequency is given by boundary frequency. Auditory experiments were performed to evaluate the simplified HRTF. The result showed that direction perception was not influenced by the simplification of the frequency characteristics of HRTF for the error of sound localization. The rate of confusion for the front and back was not affected by the simplification of the frequency characteristics within 1kHz of HRTF. Finally, we made it clear that the sound localization was not affected by the simplification of frequency characteristics of HRTF within 1kHz. The result is expected to be utilized to reduce the size of speech information with no deterioration of the directional characteristics of the speech signal.
PDF KSCI

A study on the effect of leading sound and following sound on sound localization (선행음 및 후속음이 음원의 방향지각에 미치는 영향에 관한 연구)

Lee, Chai-Bong
- Journal of the Institute of Convergence Signal Processing
- /
- v.16 no.2
- /
- pp.40-43
- /
- 2015
In this paper, the effects of the leading and the following sounds with single frequency on sound localization are investigated. The sounds with different levels and ISIs(Inter Stimuli Intervals) were used. The width of test sound is 2ms, and those of the leading and the following sounds are 10ms. 1 kHz of the test sound is utilized. The arrival time difference in the subject's ears is set to be 0.5ms. The four kinds of level differences used for one ISI are 0, -10, -15, and -20dB interval. The leading sound is found to have more effect on sound localization than the following sound is. The effect of the leading sound is also found to be dependent on the value of ISI. When the value of the ISI is small, different effects affecting the sound localization are observed.
PDF KSCI

Amplitude Panning Algorithm for Virtual Sound Source Rendering in the Multichannel Loudspeaker System (다채널 스피커 환경에서 가상 음원을 생성하기 위한 레벨 패닝 알고리즘)

Jeon, Se-Woon;Park, Young-Cheol;Lee, Seok-Pil;Youn, Dae-Hee
- The Journal of the Acoustical Society of Korea
- /
- v.30 no.4
- /
- pp.197-206
- /
- 2011
In this paper, we proposes the virtual sound source panning algorithm in the multichannel system. Recently, High-definition (HD) and Ultrahigh-definition (UHD) video formats are accepted for the multimedia applications and they provide the high-quality resolution pixels and the wider view angle. The audio format also needs to generate the wider sound field and more immersive sound effects. However, the conventional stereo system cannot satisfy the desired sound quality in the latest multimedia system. Therefore, the various multichannel systems that can make more improved sound field generation are proposed. In the mutichannel system, the conventional panning algorithms have acoustic problems about directivity and timbre of the virtual sound source. To solve these problems in the arbitrary positioned multichannel loudspeaker system, we proposed the virtual sound source panning algorithm using multiple vectors base nonnegative amplitude panning gains. The proposed algorithm can be easily controlled by the gain control function to generate an accurate localization of the virtual sound source and also it is available for the both symmetric and asymmetric loudspeakers format. Its performance of sound localization is evaluated by subjective tests comparing with conventional amplitude panning algorithms, e.g. VBAP and MDAP, in the symmetric and asymmetric formats.
https://doi.org/10.7776/ASK.2011.30.4.197 인용 PDF KSCI

Time-Synchronization Method for Dubbing Signal Using SOLA (SOLA를 이용한 더빙 신호의 시간축 동기화)

이기승;지철근;차일환;윤대희
- Journal of Broadcast Engineering
- /
- v.1 no.2
- /
- pp.85-95
- /
- 1996
The purpose of this paper Is to propose a dubbed signal time-synchroniztion technique based on the SOLA(Synchronized Over-Lap and Add) method which has been widely used to modify the time scale of speech signal. In broadcasting audio recording environments, the high degree of background noise requires dubbing process. Since the time difference between the original and the dubbed signal ranges about 200mili seconds, process is required to make the dubbed signal synchronize to the corresponding image. The proposed method finds he starting point of the dubbing signal using the short-time energy of the two signals. Thereafter, LPC cepstrum analysis and DTW(Dynamic Time Warping) process are applied to synchronize phoneme positions of the two signals. After determining the matched point by the minimum mean square error between orignal and dubbed LPC cepstrums, the SOLA method is applied to the dubbed signal, to maintain the consistency of the corresponding phase. Effectiveness of proposed method is verified by comparing the waveforms and the spectrograms of the original and the time synchronized dubbing signal.
PDF

Evaluation of a signal segregation by FDBM (FDBM의 음원분리 성능평가)

Lee, Chai-Bong
- The Journal of the Korea institute of electronic communication sciences
- /
- v.8 no.12
- /
- pp.1793-1802
- /
- 2013
Various approaches for sound source segregation have been proposed. Among these approaches, frequency domain binaural model(FDBM) has the advantages of low computational load and effective howling cancellation. A binaural hearing assistance system based on FDBM has been proposed. This system can enhance desired signal based on the directivity information. Although FDBM has been evaluated in terms of signal-to-noise ratio (SNR) and coherence function, the evaluation results do not always agree with the human impressions. These evaluation methods provide physical measures, and do not take account of perceptual aspect of human being. Considering a binaural hearing assistance system as a one of major applications, the quality of segregated sound should keep level enough. In the paper, signal segregation performance by means of FDBM is evaluated by three objective methods, i.e., SNR, coherence and Perceptual Evaluation of Speech Quality(PESQ), to discuss the characteristic of FDBM on the sound source segregation performance. The simulation's evaluation results show that FDBM improves the quality of the left and right channel signals to an equivalent level. And the results suggest the possibility that PESQ provides a more useful measure than SNR and coherence in terms of the segregation performance of FDBM. The evaluation results by PESQ show the effects from segregation parameters and indicate appropriate parameters under the conditions. In the paper, signal segregation performance by means of FDBM is evaluated by three objective methods, i.e., SNR, coherence and PESQ, to discuss the characteristic of FDBM on the sound source segregation performance. The simulation's evaluation results show that FDBM improves the quality of the left and right channel signals to an equivalent level. And the results suggest the possibility that PESQ provides a more useful measure than SNR and coherence in terms of the segregation performance of FDBM. The evaluation results by PESQ show the effects from segregation parameters and indicate appropriate parameters under the conditions.
https://doi.org/10.13067/JKIECS.2013.8.11.1793 인용 PDF KSCI

A Study of Acoustic Masking Effect from Formant Enhancement in Digital Hearing Aid (디지털 보청기에서의 포먼트 강조에 의한 마스킹 효과 연구)

Jeon, Yu-Yong;Kil, Se-Kee;Yoon, Kwang-Sub;Lee, Sang-Min
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.45 no.5
- /
- pp.13-20
- /
- 2008
Although digital hearing aid algorithms have been developed to compensate hearing loss and to help hearing impaired people to communicate with others, digital hearing aid user still complain about difficulty of hearing the speech. The reason could be the quality of speech through digital hearing aid is insufficient to understand the speech caused by feedback, residual noise and etc. And another thing is masking effect among formants that makes sound quality low. In this study, we measured the masking characteristics of normal listeners and hearing impaired listeners having presbyacusis to confirm masking effect in speech itself. The experiment is composed of 5 tests; pure tone test, speech reception threshold (SRT) test, word recognition score (WRS) test, puretone masking test and speech masking test. In speech masking test, there are 25 speeches in each speech set. And log likelihood ratio (LLR) is introduced to evaluate the distortion of each speech objectively. As a result, the speech perception became lower by increasing the quantity of formant enhancement. And each enhanced speech in a speech set has statistically similar LLR, however speech perception is not. It means that acoustic masking effect rather than distortion influences speech perception. In actuality, according to the result of frequency analysis of the speech that people can not answer correctly, level difference between first formant and second formant is about 35dB, and it is similar to result of pure tone masking test(normal hearing subject:36.36dB, hearing impaired subject:32.86dB). Characteristics of masking effect is not similar between normal listeners and hearing impaired listeners. So it is required to check the characteristics of masking effect before wearing a hearing aid and to apply this characteristics to fitting.
PDF KSCI

Enhanced Grid-Based Trajectory Cloaking Method for Efficiency Search and User Information Protection in Location-Based Services (위치기반 서비스에서 효율적 검색과 사용자 정보보호를 위한 향상된 그리드 기반 궤적 클로킹 기법)

Youn, Ji-Hye;Song, Doo-Hee;Cai, Tian-Yuan;Park, Kwang-Jin
- KIPS Transactions on Computer and Communication Systems
- /
- v.7 no.8
- /
- pp.195-202
- /
- 2018
With the development of location-based applications such as smart phones and GPS navigation, active research is being conducted to protect location and trajectory privacy. To receive location-related services, users must disclose their exact location to the server. However, disclosure of users' location exposes not only their locations but also their trajectory to the server, which can lead to concerns of privacy violation. Furthermore, users request from the server not only location information but also multimedia information (photographs, reviews, etc. of the location), and this increases the processing cost of the server and the information to be received by the user. To solve these problems, this study proposes the EGTC (Enhanced Grid-based Trajectory Cloaking) technique. As with the existing GTC (Grid-based Trajectory Cloaking) technique, EGTC method divides the user trajectory into grids at the user privacy level (UPL) and creates a cloaking region in which a random query sequence is determined. In the next step, the necessary information is received as index by considering the sub-grid cell corresponding to the path through which the user wishes to move as c(x,y). The proposed method ensures the trajectory privacy as with the existing GTC method while reducing the amount of information the user must listen to. The excellence of the proposed method has been proven through experimental results.
https://doi.org/10.3745/KTCCS.2018.7.8.195 인용 PDF

Fire Alarm Sound Transmission in Apartment Units (공동주택에서의 화재경보음 전달)

Jeong, Jeong-Ho
- Fire Science and Engineering
- /
- v.32 no.3
- /
- pp.67-75
- /
- 2018
To reduce the number of casualties in the case of fire, an alarm sound needs to be delivered to the people who remain in the apartment unit. On the other hand, it was reported that the fire alarm sound generated in the elevator hall was not delivered sufficiently to the people staying in the apartment units. In this study, the background noise level and noise level generated in an apartment unit were measured during the day and night time. In addition, the transmission of the fire alarm sound into the each room of apartment units was simulated and compared with the background noise level. The fire alarm sound generated in the elevator halls was reduced by the fire door and doors, and was not transmitted sufficiently into the internal spaces of the apartment units. Starting evacuation action was difficult after hearing the fire alarm sound generated outside the apartment units. To improve the transmission of an alarm sound to the inner spaces of apartment units, an acoustic simulation was carried out for cases where the alarm sound generator was installed on a wall-pad in the living room and the alarm sound generator was installed on the ceiling of each rooms in apartment units. Background noise of + 15 dB and 75 dB (A) were satisfied when alarm sound generator was installed on the ceiling of each room.
https://doi.org/10.7731/KIFSE.2018.32.3.067 인용 PDF KSCI HTML

Perceptive evaluation of Korean native speakers on the polysemic sentence final ending produced by Chinese Korean learners (KFL중국인학습자들의 한국어 동형다의 종결어미 발화문에 대한 원어민화자의 지각 평가 양상)

Yune, Youngsook
- Phonetics and Speech Sciences
- /
- v.12 no.4
- /
- pp.27-36
- /
- 2020
The aim of this study is to investigate the perceptive aspects of the polysemic sentence final ending "-(eu)lgeol" produced by Chinese Korean learners. "-(Eu)lgeol" has two different meanings, that is, a guess and a regret, and these different meanings are expressed by the different prosodic features of the last syllable of "-(eu)lgeol". To examine how Korean native speakers perceive "-(eu)lgeol" sentences produced by Chinese Korean learners and the most saliant prosodic variable for the semantic discrimination of "-(eu)lgeol" at the perceptive level, we performed a perceptual experiment. The analysed material constituted four Korean sentences containing "-(eu)lgeol" in which two sentences expressed guesses and the other two expressed regret. Twenty-five Korean native speakers participated in the perceptual experiment. Participants were asked to mark whether "-(eu)lgeol" sentences they listened to were (1) definitely regrets, (2) probably regrets, (3) ambiguous, (4) probably guesses, or (5) definitely guesses based on the prosodic features of the last syllable of "-(eu)lgeol". The analysed prosodic variables were sentence boundary tones, slopes of boundary tones, pitch difference between sentence-final and penultimate syllables, and pitch levels of boundary tones. The results show that all the analysed prosodic variables are significantly correlated with the semantic discrimination of "-(eu)lgeol" and among these prosodic variables, the most salient role in the semantic discrimination of "-(eu)lgeol" is pitch difference between sentence-final syllable and penultimate syllable.
https://doi.org/10.13064/KSSS.2020.12.4.027 인용 PDF KSCI

Search Result 20, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)