• Title/Summary/Keyword: segmentation error rate

Search Result 55, Processing Time 0.028 seconds

Performance of Pseudomorpheme-Based Speech Recognition Units Obtained by Unsupervised Segmentation and Merging (비교사 분할 및 병합으로 구한 의사형태소 음성인식 단위의 성능)

  • Bang, Jeong-Uk;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.155-164
    • /
    • 2014
  • This paper proposes a new method to determine the recognition units for large vocabulary continuous speech recognition (LVCSR) in Korean by applying unsupervised segmentation and merging. In the proposed method, a text sentence is segmented into morphemes and position information is added to morphemes. Then submorpheme units are obtained by splitting the morpheme units through the maximization of posterior probability terms. The posterior probability terms are computed from the morpheme frequency distribution, the morpheme length distribution, and the morpheme frequency-of-frequency distribution. Finally, the recognition units are obtained by sequentially merging the submorpheme pair with the highest frequency. Computer experiments are conducted using a Korean LVCSR with a 100k word vocabulary and a trigram language model obtained by a 300 million eojeol (word phrase) corpus. The proposed method is shown to reduce the out-of-vocabulary rate to 1.8% and reduce the syllable error rate relatively by 14.0%.

Pronunciation Variation Patterns of Loanwords Produced by Korean and Grapheme-to-Phoneme Conversion Using Syllable-based Segmentation and Phonological Knowledge (한국인 화자의 외래어 발음 변이 양상과 음절 기반 외래어 자소-음소 변환)

  • Ryu, Hyuksu;Na, Minsu;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.139-149
    • /
    • 2015
  • This paper aims to analyze pronunciation variations of loanwords produced by Korean and improve the performance of pronunciation modeling of loanwords in Korean by using syllable-based segmentation and phonological knowledge. The loanword text corpus used for our experiment consists of 14.5k words extracted from the frequently used words in set-top box, music, and point-of-interest (POI) domains. At first, pronunciations of loanwords in Korean are obtained by manual transcriptions, which are used as target pronunciations. The target pronunciations are compared with the standard pronunciation using confusion matrices for analysis of pronunciation variation patterns of loanwords. Based on the confusion matrices, three salient pronunciation variations of loanwords are identified such as tensification of fricative [s] and derounding of rounded vowel [ɥi] and [$w{\varepsilon}$]. In addition, a syllable-based segmentation method considering phonological knowledge is proposed for loanword pronunciation modeling. Performance of the baseline and the proposed method is measured using phone error rate (PER)/word error rate (WER) and F-score at various context spans. Experimental results show that the proposed method outperforms the baseline. We also observe that performance degrades when training and test sets come from different domains, which implies that loanword pronunciations are influenced by data domains. It is noteworthy that pronunciation modeling for loanwords is enhanced by reflecting phonological knowledge. The loanword pronunciation modeling in Korean proposed in this paper can be used for automatic speech recognition of application interface such as navigation systems and set-top boxes and for computer-assisted pronunciation training for Korean learners of English.

Segmentation-based tnage Coding Method without Need for Transmission of Contour Information (윤곽선 정보의 전송이 불필요한 분할기반 영상 부호화 방법)

  • Choi Jae Gark;Kang Hyun-Soo;Koh Chang-Rim;Kwon Oh-Jun;Lee Jong-Keuk
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.5
    • /
    • pp.187-195
    • /
    • 2005
  • A new segmentation-based image coding method which no needs transmission of contour data is proposed. The shape information acts as bottleneck in the segmentation-based video coding because it has much portion of transmission data. The proposed method segments a previous decoded frame, instead of a current frame. As a result, there is no need for transmission of contour information to a decoder. Therefore, the saved bits can be assigned to encode other information such as error signals. As shown in experiment results, if data rate is very highly increased due to abrupt motion under very low bit rate coding having limited transmission bits, PSNR of conventional block-based method go down about 20dB, while the proposed method shows a good reconstruction quality without rapid PSNR drop.

The Estimation of Parameters to minimize the Energy Function of the Piecewise Constant Model Using Three-way Analysis of Variance (3원 변량분석을 이용한 구분적으로 일정한 모델의 에너지 함수 최소화를 위한 매개변수들 추정)

  • Joo, Ki-See;Cho, Deog-Sang;Seo, Jae-Hyung
    • Journal of Advanced Navigation Technology
    • /
    • v.16 no.5
    • /
    • pp.846-852
    • /
    • 2012
  • The result of imaging segmentation becomes different with the parameters involved in the segmentation algorithms; therefore, the parameters for the optimal segmentation have been found through a try and error. In this paper, we propose the method to find the best values of parameters involved in the area-based active contour method using three-way ANOVA. The segmentation result applied by three-way ANOVA is compared with the optimal segmentation which is drawn by user. We use the global consistency rate for comparing two segmentations. Finally, we estimate the main effects and interactions between each parameter using three-way ANOVA, and then calculate the point and interval estimate to find the best values of three parameters. The proposed method will be a great help to find the optimal parameters before working the motion segmentation using piecewise constant model.

Utilization of Syllabic Nuclei Location in Korean Speech Segmentation into Phonemic Units (음절핵의 위치정보를 이용한 우리말의 음소경계 추출)

  • 신옥근
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.13-19
    • /
    • 2000
  • The blind segmentation method, which segments input speech data into recognition unit without any prior knowledge, plays an important role in continuous speech recognition system and corpus generation. As no prior knowledge is required, this method is rather simple to implement, but in general, it suffers from bad performance when compared to the knowledge-based segmentation method. In this paper, we introduce a method to improve the performance of a blind segmentation of Korean continuous speech by postprocessing the segment boundaries obtained from the blind segmentation. In the preprocessing stage, the candidate boundaries are extracted by a clustering technique based on the GLR(generalized likelihood ratio) distance measure. In the postprocessing stage, the final phoneme boundaries are selected from the candidates by utilizing a simple a priori knowledge on the syllabic structure of Korean, i.e., the maximum number of phonemes between any consecutive nuclei is limited. The experimental result was rather promising : the proposed method yields 25% reduction of insertion error rate compared that of the blind segmentation alone.

  • PDF

Very Low Bit Rate Video Coding Algorithm Using Uncovered Region Prediction (드러난 영역 예측을 이용한 초저 비트율 동영상 부호화)

  • 정영안;한성현;최종수;정차근
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.22 no.4
    • /
    • pp.771-781
    • /
    • 1997
  • In order to solve the problem of uncovered background region due to the region-due to the region-based motion estimation, this paper presents a new method which generates the uncovered region memory using motion estimation and shows the application of the algorithm for very low bit rate video coding. The proposed algorithm can be briefly described as follows it detects the changed region by using the information of FD(frame difference) and segmentation, and then as for only that region the backward motion estimation without transmission of shape information is done. Therefore, from only motion information the uncovered background region memory is generated and updated. The contents stored in the uncovered background region memory are referred whenever the uncovered region comes into existence. The regions with large prediction error are transformed and coded by using DCT. As results of simulation, the proposed algorithm shows the superior improvement in the subjective and objective image quality due to the remarkable reduction of transmission bits for prediction error.

  • PDF

Augmentation of Hidden Markov Chain for Complex Sequential Data in Context

  • Sin, Bong-Kee
    • Journal of Multimedia Information System
    • /
    • v.8 no.1
    • /
    • pp.31-34
    • /
    • 2021
  • The classical HMM is defined by a parameter triple �� = (��, A, B), where each parameter represents a collection of probability distributions: initial state, state transition and output distributions in order. This paper proposes a new stationary parameter e = (e1, e2, …, eN) where N is the number of states and et = P(|xt = i, y) for describing how an input pattern y ends in state xt = i at time t followed by nothing. It is often said that all is well that ends well. We argue here that all should end well. The paper sets the framework for the theory and presents an efficient inference and training algorithms based on dynamic programming and expectation-maximization. The proposed model is applicable to analyzing any sequential data with two or more finite segmental patterns are concatenated, each forming a context to its neighbors. Experiments on online Hangul handwriting characters have proven the effect of the proposed augmentation in terms of highly intuitive segmentation as well as recognition performance and 13.2% error rate reduction.

Restoration of Bi-level Images via Iterative Semi-blind Wiener Filtering (반복 semi-blind 위너 필터링을 이용한 이진영상의 복원)

  • Kim, Jeong-Tae
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.57 no.7
    • /
    • pp.1290-1294
    • /
    • 2008
  • We present a novel deblurring algorithm for bi-level images blurred by some parameterizable point spread function. The proposed method iteratively searches unknown parameters in the point spread function and noise-to-signal ratio by minimizing an objective function that is based on the binariness and the difference between two intensity values of restoring image. In simulations and experiments, the proposed method showed improved performance compared with the Wiener filtering based method in terms of bit error rate after segmentation.

Brain MR Multimodal Medical Image Registration Based on Image Segmentation and Symmetric Self-similarity

  • Yang, Zhenzhen;Kuang, Nan;Yang, Yongpeng;Kang, Bin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.3
    • /
    • pp.1167-1187
    • /
    • 2020
  • With the development of medical imaging technology, image registration has been widely used in the field of disease diagnosis. The registration between different modal images of brain magnetic resonance (MR) is particularly important for the diagnosis of brain diseases. However, previous registration methods don't take advantage of the prior knowledge of bilateral brain symmetry. Moreover, the difference in gray scale information of different modal images increases the difficulty of registration. In this paper, a multimodal medical image registration method based on image segmentation and symmetric self-similarity is proposed. This method uses modal independent self-similar information and modal consistency information to register images. More particularly, we propose two novel symmetric self-similarity constraint operators to constrain the segmented medical images and convert each modal medical image into a unified modal for multimodal image registration. The experimental results show that the proposed method can effectively reduce the error rate of brain MR multimodal medical image registration with rotation and translation transformations (average 0.43mm and 0.60mm) respectively, whose accuracy is better compared to state-of-the-art image registration methods.

A Blind Segmentation Algorithm for Speaker Verification System (화자확인 시스템을 위한 분절 알고리즘)

  • 김지운;김유진;민홍기;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.45-50
    • /
    • 2000
  • This paper proposes a delta energy method based on Parameter Filtering(PF), which is a speech segmentation algorithm for text dependent speaker verification system over telephone line. Our parametric filter bank adopts a variable bandwidth along with a fixed center frequency. Comparing with other methods, the proposed method turns out very robust to channel noise and background noise. Using this method, we segment an utterance into consecutive subword units, and make models using each subword nit. In terms of EER, the speaker verification system based on whole word model represents 6.1%, whereas the speaker verification system based on subword model represents 4.0%, improving about 2% in EER.

  • PDF