• Title/Summary/Keyword: Pitch detect

Search Result 74, Processing Time 0.02 seconds

Spectral Modeling of Haegeum Using Cepstral Analysis (캡스트럼 분석을 이용한 해금의 스펙트럼 모델링)

  • Hong, Yeon-Woo;Kang, Myeong-Su;Cho, Sang-Jin;Kim, Jong-Myon;Lee, Jung-Chul;Chong, Ui-Pil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.4
    • /
    • pp.243-250
    • /
    • 2010
  • This paper proposes a spectral modeling of Korean traditional instrument, Haegeum, using cepstral analysis to naturally describe Haegeum sounds varying with time. To get a precise result of cepstral analysis, we set the frame size to 3 periods of input signal and more cepstral coefficients are used to extract formants. The performance is enhanced by flexibly controlling the cutoff frequency of bandpass filter depending on the resonances in the synthesis process of sinusoidal components and the deleting peaks remained in the residual signal. To detect the change of pitch, we divide the input frames into silence, attack, and sustain region and determine which region the current frame is involved in. Then, the proposed method readjusts the frame size according to the fundamental frequency in the case of the current frame is in attack region and corrects the extraction errors of the fundamental frequency for the frames in sustain region. With these processes, the synthesized sounds are much more similar to the originals. The evaluation result through the listening test by a Haegeum player says that the synthesized sounds are almost similar to originals (96~100 % similar to the original sounds).

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

A COMPARISON OF PERIAPICAL RADIOGRAPHS AND THEIR DIGITAL IMAGES FOR THE DETECTION OF SIMULATED INTERPROXIMAL CARIOUS LESIONS (모의 인접면 치아우식병소의 진단을 위한 구내 표준방사선사진과 그 디지털 영상의 비교)

  • Kim Hyun;Chung Hyun-Dae
    • Journal of Korean Academy of Oral and Maxillofacial Radiology
    • /
    • v.24 no.2
    • /
    • pp.279-290
    • /
    • 1994
  • The purpose of this study was to compare the diagnostic accuracy of periapical radiographs and their digitized images for the detection of simulated interproximal carious lesions. A total of 240 interproximal surfaces was used in this study. The case sample was composed of 80 anterior teeth, 80 bicuspids and 80 molars which were prepared in order to distribute the surfaces from carious free to those containing simulated carious lesions of varying depths (0.5㎜, 0.8㎜, and 1.2㎜). The periapical radiographs were taken by paralleling technique and film used was Kodak Ektaspeed(E group). All radiographs were evaluated by five dentist to recognize the true status of simulated carious lesion. They were asked to give a score of 0, 1, 2, or 3. Digitized images were obtained using a commercial video processor(FOTOVIX Ⅱ- XS). And the computer system was 486 DX PC with PC Vision and frame grabber. The 17' display monitor had a resolution of 1280×1024 pixels(0.26㎜ dot pitch). But the one frame of the intraoral radiograph has a resolution of 700×480 pixels and each pixel has a grey level value of 256. All the radiographs and digital images were viewed under uniform subdued lighting in the same reading room. After a week the second interpretation was performed in the same condition. The detection of lesions on the monitor was compared with the finding of simulated interproximal carious lesions on the film images. The results were as follows: 1. When the scoring criteria was dichotomous ; lesion present and not present 1) The overall sensitivity, specificity and diagnostic accuracy of periapical radiographs and their digital images showed no statistically significant difference. 2) The sensitivity and specificity according to the region of teeth and the grade of lesions showed no statistically significant difference between periapical radiographs and their digital images. 2. When estimate the grade of lesions ; score 0, 1, 2, 3 1) The overall diagnostic accuracy was 53.3% on the intraoral films and 52.9% on digital images. There was no significant difference. 2) The diagnostic accuracy according to the region of teeth showed no statistically significant difference between periapical radiographs and their digital images. 3. The degree of agreement and reliability 1) Using gamma value to show the degree of agreement, there was similarity between periapical films and digital images. 2) The reliability of each twice interpretation of periapical films and digital images showed no statistically significant difference. In all cases P value was greater than 0.05, showing that both techniques can be used to detect the incipient and moderate interproximal carious lesions with similar accuracy.

  • PDF

Towards 3D Modeling of Buildings using Mobile Augmented Reality and Aerial Photographs (모바일 증강 현실 및 항공사진을 이용한 건물의 3차원 모델링)

  • Kim, Se-Hwan;Ventura, Jonathan;Chang, Jae-Sik;Lee, Tae-Hee;Hollerer, Tobias
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.2
    • /
    • pp.84-91
    • /
    • 2009
  • This paper presents an online partial 3D modeling methodology that uses a mobile augmented reality system and aerial photographs, and a tracking methodology that compares the 3D model with a video image. Instead of relying on models which are created in advance, the system generates a 3D model for a real building on the fly by combining frontal and aerial views. A user's initial pose is estimated using an aerial photograph, which is retrieved from a database according to the user's GPS coordinates, and an inertial sensor which measures pitch. We detect edges of the rooftop based on Graph cut, and find edges and a corner of the bottom by minimizing the proposed cost function. To track the user's position and orientation in real-time, feature-based tracking is carried out based on salient points on the edges and the sides of a building the user is keeping in view. We implemented camera pose estimators using both a least squares estimator and an unscented Kalman filter (UKF). We evaluated the speed and accuracy of both approaches, and we demonstrated the usefulness of our computations as important building blocks for an Anywhere Augmentation scenario.