• Title/Summary/Keyword: Parsing Algorithm

Search Result 70, Processing Time 0.029 seconds

A Parallel Speech Recognition Model on Distributed Memory Multiprocessors (분산 메모리 다중프로세서 환경에서의 병렬 음성인식 모델)

  • 정상화;김형순;박민욱;황병한
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.44-51
    • /
    • 1999
  • This paper presents a massively parallel computational model for the efficient integration of speech and natural language understanding. The phoneme model is based on continuous Hidden Markov Model with context dependent phonemes, and the language model is based on a knowledge base approach. To construct the knowledge base, we adopt a hierarchically-structured semantic network and a memory-based parsing technique that employs parallel marker-passing as an inference mechanism. Our parallel speech recognition algorithm is implemented in a multi-Transputer system using distributed-memory MIMD multiprocessors. Experimental results show that the parallel speech recognition system performs better in recognition accuracy than a word network-based speech recognition system. The recognition accuracy is further improved by applying code-phoneme statistics. Besides, speedup experiments demonstrate the possibility of constructing a realtime parallel speech recognition system.

  • PDF

Automatic Parsing of MPEG-Compressed Video (MPEG 압축된 비디오의 자동 분할 기법)

  • Kim, Ga-Hyeon;Mun, Yeong-Sik
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.4
    • /
    • pp.868-876
    • /
    • 1999
  • In this paper, an efficient automatic video parsing technique on MPEG-compressed video that is fundamental for content-based indexing is described. The proposed method detects scene changes, regardless of IPB picture composition. To detect abrupt changes, the difference measure based on the dc coefficient in I picture and the macroblock reference feature in P and B pictures are utilized. For gradual scene changes, we use the macroblock reference information in P and B pictures. the process of scene change detection can be efficiently handled by extracting necessary data without full decoding of MPEG sequence. The performance of the proposed algorithm is analyzed based on precision and recall. the experimental results verified the effectiveness of the method for detecting scene changes of various MPEG sequences.

  • PDF

Customized Search System using Real-time Contexts of User (사용자의 실시간 상황정보를 이용한 사용자 맞춤 검색 시스템)

  • Kwon, Mi-Rim;Hong, Kwang-Jin;Jung, Kee-Chul
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.5
    • /
    • pp.19-30
    • /
    • 2016
  • In these days, people get information from internet easily. However, there are too many information. It makes interrupt and inefficient for searching data. Therefore, we need user customized web search system which provides appropriate information. In this paper, we propose a searching system that can collect semi-automatically conditions of users such as weather, location and time and provide essential information to users. Using these context data, the proposed system can understand what information users want in specific situations and can provide more useful information to users than existing systems. The proposed system based on 'Production/Sharing Service of Personal Korean Contents with Voluntary Sharing Economy System' and we add data parsing algorithm in each input, store and search part. In the experiments, we compare and analyze the results of existing system and the proposed system using some general key words.

A Sentence Reduction Method using Part-of-Speech Information and Templates (품사 정보와 템플릿을 이용한 문장 축소 방법)

  • Lee, Seung-Soo;Yeom, Ki-Won;Park, Ji-Hyung;Cho, Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.5
    • /
    • pp.313-324
    • /
    • 2008
  • A sentence reduction is the information compression process which removes extraneous words and phrases and retains basic meaning of the original sentence. Most researches in the sentence reduction have required a large number of lexical and syntactic resources and focused on extracting or removing extraneous constituents such as words, phrases and clauses of the sentence via the complicated parsing process. However, these researches have some problems. First, the lexical resource which can be obtained in loaming data is very limited. Second, it is difficult to reduce the sentence to languages that have no method for reliable syntactic parsing because of an ambiguity and exceptional expression of the sentence. In order to solve these problems, we propose the sentence reduction method which uses templates and POS(part of speech) information without a parsing process. In our proposed method, we create a new sentence using both Sentence Reduction Templates that decide the reduction sentence form and Grammatical POS-based Reduction Rules that compose the grammatical sentence structure. In addition, We use Viterbi algorithms at HMM(Hidden Markov Models) to avoid the exponential calculation problem which occurs under applying to Sentence Reduction Templates. Finally, our experiments show that the proposed method achieves acceptable results in comparison to the previous sentence reduction methods.

A Study on the Methodologies of Korean Language Processing Avoiding Dead-end State (통제불능 상태를 회피하는 한국어 정보처리 방법론 연구)

  • Kang, Seung-Shik
    • Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.89-103
    • /
    • 1999
  • It is relatively easy to develop a prototype of a Korean language processing system, but it is very difficult to make it an operational system. In this paper, we survey the current status and methodological issues of the Korean language processing systems such as morphological analyzer, parser and machine translator. In most cases, Korean language processing system easily comes to a dead-end state where its performance can not be improved any more. The reason is that it adopts a general algorithm covering similar problems as a whole because specific low-level problems are not clearly defined and their algorithms are unclear. So, when we add some restrictions to solve an individual linguistic problem, they are also applied to other linguistic phenomena as a side effect. It causes a critical problem that the improvement of the algorithm is very difficult. This paper proposes a 2-step paradigm, a divide-and-conquer method by the functional modularization, a simplification method, and an exception handling technique to develop an operational system that does not fall into a dead-end state.

  • PDF

Korean Character processing: Part I. Theoretical Foundation (한글문자의 컴퓨터 처리: I. 이론)

  • 정원량
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.16 no.3
    • /
    • pp.1-8
    • /
    • 1979
  • This is Part I of a two-part article on Korean character processing by a computer. In part I, the problems in Korean character processing are identified and the theoretical foundation is laid out as a viable solution to them. The one-and two-dimensional syntactic structures of Korean characters are formally defined by means of BNF and " Patternal structure " respectively. Formal discussion of lexical and syntactic algorithms is given for character conversion. This character conversion algorithm is applicable to both input and output. For device-independence and implementation-independence, the concept of " cardinal symbol set " is introduced. We will present a historical survey of Korean character processing and discussion of implementation problems for the above algorithm In Part II.lgorithm In Part II.

  • PDF

A Study on the Pattern Recognition of Korean Characters by Syntactic Method (Syntactic법에 의한 한글의 패턴 인식에 관한 연구)

  • ;安居院猛
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.14 no.5
    • /
    • pp.15-21
    • /
    • 1977
  • The syntactic pattern recognition system of Korean characters is composed of three main functional parts; Preprocessing, Graph-representation, and Segmentation. In preprocessing routine, the input pattern has been thinned using the Hilditch's thinning algorithm. The graph-representation is the detection of a number of nodes over the input pattern and codification of branches between nodes by 8 directional components. Next, segmentation routine which has been implemented by top down nondeterministic parsing under the control of tree grammar identifies parts of the graph-represented Pattern as basic components of Korean characters. The authors have made sure that this system is effective for recognizing Korean characters through the recognition simulations by digital computer.

  • PDF

A Program Similarity Evaluation Algorithm (프로그램 유사도 평가 알고리즘)

  • Kim Young-Chul;Hwang Seog-Chan;Choi Jaeyoung
    • Journal of Internet Computing and Services
    • /
    • v.6 no.1
    • /
    • pp.51-64
    • /
    • 2005
  • In this paper, we introduce a system for evaluating similarity of C program source code using method which compares syntax-trees each others. This method supposes two characteristic features as against other systems. It is not sensitive for program style such as indentation, white space, and comments, and changing order of control structure like sentences, code block, procedures, and so on. Another is that it can detect a syntax-error cause of using paring technique, We introduce algorithms for similarity evaluation method and grouping method that reduces the number of comparison, In the examination section, we show a test result of program similarity evaluation and its reduced iteration by grouping algorithm.

  • PDF

A Distance Approach for Open Information Extraction Based on Word Vector

  • Liu, Peiqian;Wang, Xiaojie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.6
    • /
    • pp.2470-2491
    • /
    • 2018
  • Web-scale open information extraction (Open IE) plays an important role in NLP tasks like acquiring common-sense knowledge, learning selectional preferences and automatic text understanding. A large number of Open IE approaches have been proposed in the last decade, and the majority of these approaches are based on supervised learning or dependency parsing. In this paper, we present a novel method for web scale open information extraction, which employs cosine distance based on Google word vector as the confidence score of the extraction. The proposed method is a purely unsupervised learning algorithm without requiring any hand-labeled training data or dependency parse features. We also present the mathematically rigorous proof for the new method with Bayes Inference and Artificial Neural Network theory. It turns out that the proposed algorithm is equivalent to Maximum Likelihood Estimation of the joint probability distribution over the elements of the candidate extraction. The proof itself also theoretically suggests a typical usage of word vector for other NLP tasks. Experiments show that the distance-based method leads to further improvements over the newly presented Open IE systems on three benchmark datasets, in terms of effectiveness and efficiency.

Design and Implementation of the VoiceXML Interpreter for Voice Web-service (음성 웹서비스를 위한 VoiceXML 해석기의 설계 및 구현)

  • 신현경;강동남;염세훈;유재우
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4
    • /
    • pp.42-47
    • /
    • 2001
  • In this paper, we propose an interpreter, which recognizes the VoiceXML markups, verifies the validation of the document, and interprets the VoiceXML documents using DI parser and the generated AST by the parser. The VoiceXML interpreter consists of DI parser and executor, and the DI parser uses recursive descent parsing technology, and the executor uses FIA (Form Interpretation Algorithm) proposed by VXML forum. This system uses the Java language in order to develop the runtime environment for VoiceXML efficiently, thus this system has portability.

  • PDF