• 제목/요약/키워드: text input

검색결과 355건 처리시간 0.022초

VQ와 GMM을 이용한 문맥독립 화자인식기의 성능 비교 (Performance comparison of Text-Independent Speaker Recognizer Using VQ and GMM)

  • 김성종;정훈;정익주
    • 음성과학
    • /
    • 제7권2호
    • /
    • pp.235-244
    • /
    • 2000
  • This paper was focused on realizing the text-independent speaker recognizer using the VQ and GMM algorithm and studying the characteristics of the speaker recognizers that adopt these two algorithms. Because it was difficult ascertain the effect two algorithms have on the speaker recognizer theoretically, we performed the recognition experiments using various parameters and, as the result of the experiments, we could show that GMM algorithm had better recognition performance than VQ algorithm as following. The GMM showed better performance with small training data, and it also showed just a little difference of recognition rate as the kind of feature vectors and the length of input data vary. The GMM showed good recognition performance than the VQ on the whole.

  • PDF

Ternary Decomposition and Dictionary Extension for Khmer Word Segmentation

  • Sung, Thaileang;Hwang, Insoo
    • Journal of Information Technology Applications and Management
    • /
    • 제23권2호
    • /
    • pp.11-28
    • /
    • 2016
  • In this paper, we proposed a dictionary extension and a ternary decomposition technique to improve the effectiveness of Khmer word segmentation. Most word segmentation approaches depend on a dictionary. However, the dictionary being used is not fully reliable and cannot cover all the words of the Khmer language. This causes an issue of unknown words or out-of-vocabulary words. Our approach is to extend the original dictionary to be more reliable with new words. In addition, we use ternary decomposition for the segmentation process. In this research, we also introduced the invisible space of the Khmer Unicode (char\u200B) in order to segment our training corpus. With our segmentation algorithm, based on ternary decomposition and invisible space, we can extract new words from our training text and then input the new words into the dictionary. We used an extended wordlist and a segmentation algorithm regardless of the invisible space to test an unannotated text. Our results remarkably outperformed other approaches. We have achieved 88.8%, 91.8% and 90.6% rates of precision, recall and F-measurement.

Text-driven Speech Animation with Emotion Control

  • Chae, Wonseok;Kim, Yejin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권8호
    • /
    • pp.3473-3487
    • /
    • 2020
  • In this paper, we present a new approach to creating speech animation with emotional expressions using a small set of example models. To generate realistic facial animation, two example models called key visemes and expressions are used for lip-synchronization and facial expressions, respectively. The key visemes represent lip shapes of phonemes such as vowels and consonants while the key expressions represent basic emotions of a face. Our approach utilizes a text-to-speech (TTS) system to create a phonetic transcript for the speech animation. Based on a phonetic transcript, a sequence of speech animation is synthesized by interpolating the corresponding sequence of key visemes. Using an input parameter vector, the key expressions are blended by a method of scattered data interpolation. During the synthesizing process, an importance-based scheme is introduced to combine both lip-synchronization and facial expressions into one animation sequence in real time (over 120Hz). The proposed approach can be applied to diverse types of digital content and applications that use facial animation with high accuracy (over 90%) in speech recognition.

Analysis and Interpretation of Intonation Contours of Slovene

  • Ales Dobnikar
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 1996년도 10월 학술대회지
    • /
    • pp.542-547
    • /
    • 1996
  • Prosodic characteristics of natural speech, especially intonation, in many cases represent specific feelings of the speaker at the time of the utterance, with relatively vast variations of speaking styles over the same text. We analyzed a collected speech corpus, recorded with ten Slovene speakers. Interpretation of observed intonation contours was done for the purpose of modelling the intonation contour in synthesis process. We devised a scheme for modeling the intonation contour for different types of intonation units based on the results of analyzing intonation contours. The intonation scheme uses a superpositional approach, which defines the intonation contour as the sum of global (intonation unit) and local (accented syllables or syntactic boundaries) components. Near-to-natural intonation contour was obtained by rules, using only the text of the utterance as input.

  • PDF

워드이미지로부터 영문인식을 위한 트루타입 특성 추출 (Deriving TrueType Features for Letter Recognition in Word Images)

  • SeongAh CHIN
    • 한국시뮬레이션학회논문지
    • /
    • 제11권3호
    • /
    • pp.35-48
    • /
    • 2002
  • In the work presented here, we describe a method to extract TrueType features for supporting letter recognition. Even if variously existing document processing techniques have been challenged, almost few methods are capable of recognize a letter associated with its TrueType features supporting OCR free, which boost up fast processing time for image text retrieval. By reviewing the mechanism generating digital fonts and birth of TrueType, we realize that each TrueType is drawn by its contour of the glyph table. Hence, we are capable of deriving the segment with density for a letter with a specific TrueType, defined by the number of occurrence over a segment width. A certain number of occurrence appears frequently often due to the fixed segment width. We utilize letter recognition by comparing TrueType feature library of a letter with that from input word images. Experiments have been carried out to justify robustness of the proposed method showing acceptable results.

  • PDF

Rapid and Brief Communication GPU implementation of neural networks

  • Oh, Kyoung-Su;Jung, Kee-Chul
    • 한국HCI학회:학술대회논문집
    • /
    • 한국HCI학회 2007년도 학술대회 3부
    • /
    • pp.322-325
    • /
    • 2007
  • Graphics processing unit (GPU) is used for a faster artificial neural network. It is used to implement the matrix multiplication of a neural network to enhance the time performance of a text detection system. Preliminary results produced a 20-fold performance enhancement using an ATI RADEON 9700 PRO board. The parallelism of a GPU is fully utilized by accumulating a lot of input feature vectors and weight vectors, then converting the many inner-product operations into one matrix operation. Further research areas include benchmarking the performance with various hardware and GPU-aware learning algorithms. (c) 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

시조문화 정보시스템 개발의 가능성과 전망 (The possibility and prospect for developing Sijo Munhwa information system)

  • 한창훈
    • 한국시조학회지:시조학논총
    • /
    • 제19권1호
    • /
    • pp.37-62
    • /
    • 2003
  • 이 논문은 시조문화 정보 시스템 개발의 가능성과 전망을 논한 것이다. 본문의 내용을 요약하여 제시하면 다음과 같다. 1. 언어 자료로서의 시조 문화 정보 시스템은 원전 자료를 포함한 자료의 입력ㆍ교정 작업이 우선 필요하다. 2. 구축된 원시 자료를 가공할 때에는 TEI (Text Encoding Initiative), 시소러스(Thesauras) 등의 작업이 고려되어야 한다. 3. 이는 Topic Map을 구축하는 기본 작업이 된다. 4. 예술 자료로서의 시조 문화 정보 시스템은 다른 자료들 - 이미지, 동영상 등 -과의 연결(link)이 매우 중요하다.

  • PDF

Analyzing Customer Experience in Hotel Services Using Topic Modeling

  • Nguyen, Van-Ho;Ho, Thanh
    • Journal of Information Processing Systems
    • /
    • 제17권3호
    • /
    • pp.586-598
    • /
    • 2021
  • Nowadays, users' reviews and feedback on e-commerce sites stored in text create a huge source of information for analyzing customers' experience with goods and services provided by a business. In other words, collecting and analyzing this information is necessary to better understand customer needs. In this study, we first collected a corpus with 99,322 customers' comments and opinions in English. From this corpus we chose the best number of topics (K) using Perplexity and Coherence Score measurements as the input parameters for the model. Finally, we conducted an experiment using the latent Dirichlet allocation (LDA) topic model with K coefficients to explore the topic. The model results found hidden topics and keyword sets with high probability that are interesting to users. The application of empirical results from the model will support decision-making to help businesses improve products and services as well as business management and development in the field of hotel services.

재무 보고서의 키워드 검출 기반 딥러닝 감성분석 기법 (Toward Sentiment Analysis Based on Deep Learning with Keyword Detection in a Financial Report)

  • Jo, Dongsik;Kim, Daewhan;Shin, Yoojin
    • 한국정보통신학회논문지
    • /
    • 제24권5호
    • /
    • pp.670-673
    • /
    • 2020
  • Recent advances in artificial intelligence have allowed for easier sentiment analysis (e.g. positive or negative forecast) of documents such as a finance reports. In this paper, we investigate a method to apply text mining techniques to extract in the financial report using deep learning, and propose an accounting model for the effects of sentiment values in financial information. For sentiment analysis with keyword detection in the financial report, we suggest the input layer with extracted keywords, hidden layers by learned weights, and the output layer in terms of sentiment scores. Our approaches can help more effective strategy for potential investors as a professional guideline using sentiment values.

Improving Transformer with Dynamic Convolution and Shortcut for Video-Text Retrieval

  • Liu, Zhi;Cai, Jincen;Zhang, Mengmeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권7호
    • /
    • pp.2407-2424
    • /
    • 2022
  • Recently, Transformer has made great progress in video retrieval tasks due to its high representation capability. For the structure of a Transformer, the cascaded self-attention modules are capable of capturing long-distance feature dependencies. However, the local feature details are likely to have deteriorated. In addition, increasing the depth of the structure is likely to produce learning bias in the learned features. In this paper, an improved Transformer structure named TransDCS (Transformer with Dynamic Convolution and Shortcut) is proposed. A Multi-head Conv-Self-Attention module is introduced to model the local dependencies and improve the efficiency of local features extraction. Meanwhile, the augmented shortcuts module based on a dual identity matrix is applied to enhance the conduction of input features, and mitigate the learning bias. The proposed model is tested on MSRVTT, LSMDC and Activity-Net benchmarks, and it surpasses all previous solutions for the video-text retrieval task. For example, on the LSMDC benchmark, a gain of about 2.3% MdR and 6.1% MnR is obtained over recently proposed multimodal-based methods.