• Title/Summary/Keyword: 3-D Segmentation

Search Result 451, Processing Time 0.027 seconds

Support Vector Machine Based Phoneme Segmentation for Lip Synch Application

  • Lee, Kun-Young;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.193-210
    • /
    • 2004
  • In this paper, we develop a real time lip-synch system that activates 2-D avatar's lip motion in synch with an incoming speech utterance. To realize the 'real time' operation of the system, we contain the processing time by invoking merge and split procedures performing coarse-to-fine phoneme classification. At each stage of phoneme classification, we apply the support vector machine (SVM) to reduce the computational load while retraining the desired accuracy. The coarse-to-fine phoneme classification is accomplished via two stages of feature extraction: first, each speech frame is acoustically analyzed for 3 classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; secondly, each frame is further refined in classification for detailed lip shape using formant information. We implemented the system with 2-D lip animation that shows the effectiveness of the proposed two-stage procedure in accomplishing a real-time lip-synch task. It was observed that the method of using phoneme merging and SVM achieved about twice faster speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed for our method was in the order of 18.22 milliseconds while an HMM method applied under identical conditions resulted about 30.67 milliseconds.

  • PDF

Assembly Performance Evaluation for Prefabricated Steel Structures Using k-nearest Neighbor and Vision Sensor (k-근접 이웃 및 비전센서를 활용한 프리팹 강구조물 조립 성능 평가 기술)

  • Bang, Hyuntae;Yu, Byeongjun;Jeon, Haemin
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.35 no.5
    • /
    • pp.259-266
    • /
    • 2022
  • In this study, we developed a deep learning and vision sensor-based assembly performance evaluation method isfor prefabricated steel structures. The assembly parts were segmented using a modified version of the receptive field block convolution module inspired by the eccentric function of the human visual system. The quality of the assembly was evaluated by detecting the bolt holes in the segmented assembly part and calculating the bolt hole positions. To validate the performance of the evaluation, models of standard and defective assembly parts were produced using a 3D printer. The assembly part segmentation network was trained based on the 3D model images captured from a vision sensor. The sbolt hole positions in the segmented assembly image were calculated using image processing techniques, and the assembly performance evaluation using the k-nearest neighbor algorithm was verified. The experimental results show that the assembly parts were segmented with high precision, and the assembly performance based on the positions of the bolt holes in the detected assembly part was evaluated with a classification error of less than 5%.

Depth-adaptive Sharpness Adjustments for Stereoscopic Perception Improvement and Hardware Implementation

  • Kim, Hak Gu;Kang, Jin Ku;Song, Byung Cheol
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.3
    • /
    • pp.110-117
    • /
    • 2014
  • This paper reports a depth-adaptive sharpness adjustment algorithm for stereoscopic perception improvement, and presents its field-programmable gate array (FPGA) implementation results. The first step of the proposed algorithm was to estimate the depth information of an input stereo video on a block basis. Second, the objects in the input video were segmented according to their depths. Third, the sharpness of the foreground objects was enhanced and that of the background was maintained or weakened. This paper proposes a new sharpness enhancement algorithm to suppress visually annoying artifacts, such as jagging and halos. The simulation results show that the proposed algorithm can improve stereoscopic perception without intentional depth adjustments. In addition, the hardware architecture of the proposed algorithm was designed and implemented on a general-purpose FPGA board. Real-time processing for full high-definition stereo videos was accomplished using 30,278 look-up tables, 24,553 registers, and 1,794,297 bits of memory at an operating frequency of 200MHz.

A study on the Exhaust Noise Reduction of the Heavy Truck through the Muffler Redesign (소음기의 재설계를 통한 대형 상용차의 배출 소음 저감에 관한 연구)

  • 박기춘;전영두;김양한;강신일;강종민
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 1995.10a
    • /
    • pp.39-43
    • /
    • 1995
  • 대형 상용차용 내연기관에서 방출되는 소음의 음압 레벨은 경우에 따라 140dBA 이상으로 환경소음 배출규제에 부합되는 배출 소음을 위하여는 소음기의 적절한 설계가 필요하다 하겠다. 소음기는 방출 소음을 저감시키는 기본적인 작용 이외에 엔진의 효율을 저하시키는 배압(back-pressure) 상승의 효과를 동반할 뿐만 아니라 차체에의 장착을 위하여 크기와 무게의 제약을 받는다. 따라서 소음기는 설계자의 경험과 과거에 사용되어 온 소음기에 대한 이해, 소음기를 구성하는 단위 요소에 대한 음향학적 해석이론, 제작 과정에서의 경험과 튜닝 등에 바탕을 두고 설계되어져 왔다. 본 연구에서는 대형 상용차에 장착되는 기존 소음기 구성요소의 투과 손실(Transmission Loss: TL)을 전달 행렬법으로 해석하여 음향학적 특성을 규명하고 개별 소음기 요소에 대한 기존의 연구 결과를 바탕으로 공명기와 다공 요소를 이용하여 기존 소음기를 재설계함으로써 배출 소음의 저감을 이루어 나간 과정을 소개하고 이를 적용 사례를 중심으로 살펴보고 있다. 소음기의 설계를 위하여 고려할 수 있는 음향학적 요소는 그 기능과 형태면에서 다양하나 본 연구에서는 대형 상용차용 소음기에 주로 사용 가능한 공명기와 다공관을 주된 설계요소로 생각하였다. 공명기는 공명 주파수 대역의 소음을 저감하는 역할을 하므로 일정한 엔진 회전수 하에서 엔진의 방출 소음중 폭발 성분에 의한 소음을 줄이는데 효과적으로 사용될 수 있지만 가속 주행시에는 회전수(rpm)의 변화에 따라 폭발 주기가 변화하게 되므로 공명기의 설계에 주의를 기울여야 한다. 내연 기관용 소음기에 빈번하게 쓰이는 다공 요소의 해석 방법으로는 Sullivan[1], Kim and Yoon[2] 등의 분할 접근 방법(Segmentation approach)과 Jayaraman and Yam[3], Munjal[4], Peat[5] 등의 연성 제거 방법(Decoupling Approach)등이 제시되었고 평면파 영역에 한하여 해서되어져 왔다. 본 논문에서는 분할 접근 방법(Segmentation Approach)을 이용하여 다공 요소로 이루어진 소음기를 해석하는데 적용하였다.

  • PDF

4-Dimensional dose evaluation using deformable image registration in respiratory gated radiotherapy for lung cancer (폐암의 호흡동조방사선치료 시 변형영상정합을 이용한 4차원 선량평가)

  • Um, Ki Cheon;Yoo, Soon Mi;Yoon, In Ha;Back, Geum Mun
    • The Journal of Korean Society for Radiation Therapy
    • /
    • v.30 no.1_2
    • /
    • pp.83-95
    • /
    • 2018
  • Purpose : After planning the Respiratory Gated Radiotherapy for Lung cancer, the movement and volume change of sparing normal structures nearby target are not often considered during dose evaluation. This study carried out 4-D dose evaluation which reflects the movement of normal structures at certain phase of Respiratory Gated Radiotherapy, by using Deformable Image Registration that is well used for Adaptive Radiotherapy. Moreover, the study discussed the need of analysis and established some recommendations, regarding the normal structures's movement and volume change due to Patient's breathing pattern during evaluation of treatment plans. Materials and methods : The subjects were taken from 10 lung cancer patients who received Respiratory Gated Radiotherapy. Using Eclipse(Ver 13.6 Varian, USA), the structures seen in the top phase of CT image was equally set via Propagation or Segmentation Wizard menu, and the structure's movement and volume were analyzed by Center-to Center method. Also, image from each phase and the dose distribution were deformed into top phase CT image, for 4-dimensional dose evaluation, via VELOCITY Program. Also, Using $QUASAR^{TM}$ Phantom(Modus Medical Devices) and $GAFCHROMIC^{TM}$ EBT3 Film(Ashland, USA), verification carried out 4-D dose distribution for 4-D gamma pass rate. Result : The movement of the Inspiration and expiration phase was the most significant in axial direction of right lung, as $0.989{\pm}0.34cm$, and was the least significant in lateral direction of spinal cord, as -0.001 cm. The volume of right lung showed the greatest rate of change as 33.5 %. The maximal and minimal difference in PTV Conformity Index and Homogeneity Index between 3-dimensional dose evaluation and 4-dimensional dose evaluation, was 0.076, 0.021 and 0.011, 0.0 respectfully. The difference of 0.0045~2.76 % was determined in normal structures, using 4-D dose evaluation. 4-D gamma pass rate of every patients passed reference of 95 % gamma pass rate. Conclusion : PTV Conformity Index was more significant in all patients using 4-D dose evaluation, but no significant difference was observed between two dose evaluations for Homogeneity Index. 4-D dose distribution was shown more homogeneous dose compared to 3D dose distribution, by considering the movement from breathing which helps to fill out the PTV margin area. There was difference of 0.004~2.76 % in 4D evaluation of normal structure, and there was significant difference between two evaluation methods in all normal structures, except spinal cord. This study shows that normal structures could be underestimated by 3-D dose evaluation. Therefore, 4-D dose evaluation with Deformable Image Registration will be considered when the dose change is expected in normal structures due to patient's breathing pattern. 4-D dose evaluation with Deformable Image Registration is considered to be a more realistic dose evaluation method by reflecting the movement of normal structures from patient's breathing pattern.

  • PDF

Development of Building 3D Spatial Information Extracting System using HSI Color Model (HSI 컬러모델을 활용한 건물의 3차원 공간정보 추출시스템 개발)

  • Choi, Yun Woong;Yook, Wan Man;Cho, Gi Sung
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.21 no.4
    • /
    • pp.151-159
    • /
    • 2013
  • The building information should be up-to-date information and propagated rapidly for urban modeling, terrain analysis, life information, navigational system, and location-based services(LBS), hence the most recent and updated data of the building information have been required of researchers. This paper presents the developed system to extract the 3-dimension spatial information from aerial orthoimage and LiDAR data of HSI color model. In particular, this paper presents the image processing algorithm to extract the outline of specific buildings and generate the building polygon from the image using HIS color model, recursive backtracking algorithm and the search maze algorithm. Also, this paper shows the effectivity of the HIS color model in the image segmentation.

Intra-Rater and Inter-Rater Reliability of Brain Surface Intensity Model (BSIM)-Based Cortical Thickness Analysis Using 3T MRI

  • Jeon, Ji Young;Moon, Won-Jin;Moon, Yeon-Sil;Han, Seol-Heui
    • Investigative Magnetic Resonance Imaging
    • /
    • v.19 no.3
    • /
    • pp.168-177
    • /
    • 2015
  • Purpose: Brain surface intensity model (BSIM)-based cortical thickness analysis does not require complicated 3D segmentation of brain gray/white matters. Instead, this technique uses the local intensity profile to compute cortical thickness. The aim of the present study was to evaluate intra-rater and inter-rater reliability of BSIM-based cortical thickness analysis using images from elderly participants. Materials and Methods: Fifteen healthy elderly participants (ages, 55-84 years) were included in this study. High-resolution 3D T1-spoiled gradient recalled-echo (SPGR) images were obtained using 3T MRI. BSIM-based processing steps included an inhomogeneity correction, intensity normalization, skull stripping, atlas registration, extraction of intensity profiles, and calculation of cortical thickness. Processing steps were automatic, with the exception of semiautomatic skull stripping. Individual cortical thicknesses were compared to a database indicating mean cortical thickness of healthy adults, in order to produce Z-score thinning maps. Intra-class correlation coefficients (ICCs) were calculated in order to evaluate inter-rater and intra-rater reliabilities. Results: ICCs for intra-rater reliability were excellent, ranging from 0.751-0.940 in brain regions except the right occipital, left anterior cingulate, and left and right cerebellum (ICCs = 0.65-0.741). Although ICCs for inter-rater reliability were fair to excellent in most regions, poor inter-rater correlations were observed for the cingulate and occipital regions. Processing time, including manual skull stripping, was $17.07{\pm}3.43min$. Z-score maps for all participants indicated that cortical thicknesses were not significantly different from those in the comparison databases of healthy adults. Conclusion: BSIM-based cortical thickness measurements provide acceptable intra-rater and inter-rater reliability. We therefore suggest BSIM-based cortical thickness analysis as an adjunct clinical tool to detect cortical atrophy.

Comparison of mastoid air cell volume in patients with or without a pneumatized articular tubercle

  • Adisen, Mehmet Zahit;Aydogdu, Merve
    • Imaging Science in Dentistry
    • /
    • v.52 no.1
    • /
    • pp.27-32
    • /
    • 2022
  • Purpose: The aim of this study was to compare mastoid air cell volumes in patients with or without a pneumatized articular tubercle (PAT) on cone-beam computed tomography (CBCT) images. Materials and Methods: The CBCT images of 224 patients were retrospectively analyzed for the presence of PAT. The Digital Imaging and Communications in Medicine data of 30 patients with PAT and 30 individuals without PAT were transferred to 3D Doctor Software. Mastoid air cell volumes were measured using semi-automatic segmentation on axial sections. Data were analyzed using SPSS version 20.0. Results: The patients with PAT and those without PAT had a mean mastoid volume of 6.31±2.86 cm3 and 3.25±1.99 cm3, respectively. There were statistically significant differences in mastoid air cell volumes between patients with and without PAT regardless of sex and mastoid air cell side (P<0.05). Conclusion: The detection of PAT on routine dental radiographic examinations might be a potential prognostic factor that could be used to detect extensive pneumatization in the temporal bone. Clinicians should be aware that there may be widespread pneumatization of mastoid air cells in patients in whom PAT is detected. Advanced imaging should be performed in these cases, and possible complications due to surgical interventions should be considered.

Moving Object Extraction and Relative Depth Estimation of Backgrould regions in Video Sequences (동영상에서 물체의 추출과 배경영역의 상대적인 깊이 추정)

  • Park Young-Min;Chang Chu-Seok
    • The KIPS Transactions:PartB
    • /
    • v.12B no.3 s.99
    • /
    • pp.247-256
    • /
    • 2005
  • One of the classic research problems in computer vision is that of stereo, i.e., the reconstruction of three dimensional shape from two or more images. This paper deals with the problem of extracting depth information of non-rigid dynamic 3D scenes from general 2D video sequences taken by monocular camera, such as movies, documentaries, and dramas. Depth of the blocks are extracted from the resultant block motions throughout following two steps: (i) calculation of global parameters concerned with camera translations and focal length using the locations of blocks and their motions, (ii) calculation of each block depth relative to average image depth using the global parameters and the location of the block and its motion, Both singular and non-singular cases are experimented with various video sequences. The resultant relative depths and ego-motion object shapes are virtually identical to human vision.

A Study on the Spoken Korean Citynames Using Multi-Layered Perceptron of Back-Propagation Algorithm (오차 역전파 알고리즘을 갖는 MLP를 이용한 한국 지명 인식에 대한 연구)

  • Song, Do-Sun;Lee, Jae-Gheon;Kim, Seok-Dong;Lee, Haing-Sei
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.6
    • /
    • pp.5-14
    • /
    • 1994
  • This paper is about an experiment of speaker-independent automatic Korean spoken words recognition using Multi-Layered Perceptron and Error Back-propagation algorithm. The object words are 50 citynames of D.D.D local numbers. 43 of those are 2 syllables and the rest 7 are 3 syllables. The words were not segmented into syllables or phonemes, and some feature components extracted from the words in equal gap were applied to the neural network. That led independent result on the speech duration, and the PARCOR coefficients calculated from the frames using linear predictive analysis were employed as feature components. This paper tried to find out the optimum conditions through 4 differerent experiments which are comparison between total and pre-classified training, dependency of recognition rate on the number of frames and PAROCR order, recognition change due to the number of neurons in the hidden layer, and the comparison of the output pattern composition method of output neurons. As a result, the recognition rate of $89.6\%$ is obtaimed through the research.

  • PDF