• Title/Summary/Keyword: automatic machine translation

Search Result 38, Processing Time 0.027 seconds

Intelligent Pattern Matching Based on Geometric Features for Machine Vision Inspection (머신비전검사를 위한 기하학적 특징 기반 지능 패턴 정합)

  • Moon Soon-Hwan;Kim Gyung-Bum;Kim Tae-Hoon
    • The Journal of the Korea Contents Association
    • /
    • v.6 no.6
    • /
    • pp.1-8
    • /
    • 2006
  • This paper presents an intelligent pattern matching method that can be used to acquire the reliable calibration data for automatic PCB pattern inspection. The inaccurate calibration data is often acquired by geometric pattern variations and selecting an inappropriate model manual. It makes low the confidence of inspection and also the inspection processing time has been delayed. In this paper, the geometric features of PCB patterns are utilized to calculate the accurate calibration data. An appropriate model is selected automatically based on the geometric features, and then the calibration data to be invariant to the geometric variations(translation, rotation, scaling) is calculated. The method can save the inspection time unnecessary by eliminating the need for manual model selection. As the result, it makes a fast, accurate and reliable inspection of PCB patterns.

  • PDF

Detection of Similar Answers to Avoid Duplicate Question in Retrieval-based Automatic Question Generation (검색 기반의 질문생성에서 중복 방지를 위한 유사 응답 검출)

  • Choi, Yong-Seok;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.1
    • /
    • pp.27-36
    • /
    • 2019
  • In this paper, we propose a method to find the most similar answer to the user's response from the question-answer database in order to avoid generating a redundant question in retrieval-based automatic question generation system. As a question of the most similar answer to user's response may already be known to the user, the question should be removed from a set of question candidates. A similarity detector calculates a similarity between two answers by utilizing the same words, paraphrases, and sentential meanings. Paraphrases can be acquired by building a phrase table used in a statistical machine translation. A sentential meaning's similarity of two answers is calculated by an attention-based convolutional neural network. We evaluate the accuracy of the similarity detector on an evaluation set with 100 answers, and can get the 71% Mean Reciprocal Rank (MRR) score.

Automatic Extraction of Paraphrases from a Parallel Bible Corpus (정렬된 성경 코퍼스로부터 바꿔쓰기표현(paraphrase)의 자동 추출)

  • Lee, Kong-Joo;Yun, Bo-Hyun
    • Korean Journal of Cognitive Science
    • /
    • v.17 no.4
    • /
    • pp.323-336
    • /
    • 2006
  • In this paper, we present a pilot system that can extract paraphrases from a parallel corpus using to-training method. Paraphrases are useful for the applications that should rreate a varied ind fluent text, such as machine translation, question-answering system, and multidocument summarization system. One of the difficulties in extracting paraphrases is to find a rich source from which we can extract paraphrases. The bible is one of the good sources fur extracting paraphrases as it has several Korean versions in which every sentence can be easily aligned by the chapter and the verse. We ran extract not only the lexical-level paraphrases but also the phrasal-level paraphrases from the parallel corpus which consists of the bibles using co-training method.

  • PDF

Protocol Conformance Testing of INAP Protocol in SDL (SDL을 사용한 INAP 프로토콜 시험)

  • 도현숙;조준모;김성운
    • Journal of Korea Multimedia Society
    • /
    • v.1 no.1
    • /
    • pp.109-119
    • /
    • 1998
  • This paper describes a research result on automatic generation of Abstract Test Suite from INAP protocol in formal specifications by applying many existing related algorithms such as Rural Chinese Postman Tour and UIO sequence concepts. We use the I/O FSM generated from SDL specifications and a characterizing sequence concepts. We use the I/O FSM generated from SDL specifications and a characterizing sequence, called UIO sequence, is defined for the I/O FSM. The UIO sequence is combined with the concept of Rural Chinese Postman tour to obtain an optimal test sequence. It also proposes an estimation methodology of the fault courage for the Test Suite obtained by our method and their translation into the standardized test notation TTCN.

  • PDF

Automatic Evaluation of Speech and Machine Translation Systems by Linguistic Test Points (자동통번역 시스템의 언어 현상별 자동 평가)

  • Choi, Sung-Kwon;Choi, Gyu-Hyun;Kim, Young-Gil
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.1041-1044
    • /
    • 2019
  • 자동통번역의 성능을 평가하는데 가장 잘 알려진 자동평가 기술은 BLEU이다. 그러나 BLEU로는 자동통번역 결과의 어느 부분이 강점이고 약점인지를 파악할 수 없다. 본 논문에서는 자동통번역 시스템의 언어 현상별 자동평가 방법을 소개하고자 한다. 언어 현상별 자동평가 방법은 BLEU가 제시하지 못하는 언어 현상별 자동평가가 가능하며 개발자로 하여금 해당 자동통번역 시스템의 언어 현상별 강점과 약점을 직관적으로 파악할 수 있도록 한다. 언어 현상별 정확도 측정은 Google 과 Naver Papago 를 대상으로 실시하였다. 정확률이 40%이하를 약점이라고 간주할 때, Google 영한 자동번역기의 약점은 스타일(32.50%)번역이었으며, Google 영한 자동통역기의 약점은 음성(30.00%)인식, 담화(30.00%)처리였다. Google 한영 자동번역기 약점은 구문(34.00%)분석, 모호성(27.50%)해소, 스타일(20.00%)번역이었으며, Google 한영 자동통역기 약점은 담화(30.00%)처리였다. Papago 영한 자동번역기는 대부분 정확률이 55% 이상이었으며 Papago 영한 자동통역기의 약점은 담화(30.00%)처리였다. 또한 Papago 한영 자동번역기의 약점은 구문(38.00%)분석, 모호성(32.50%)해소, 스타일(20.00%)번역이었으며, Google 한영 자동통역기 약점은 담화(20.00%)처리였다. 언어 현상별 자동평가의 궁극적인 목표는 자동통번역기의 다양한 약점을 찾아내어 약점과 관련된 targeted corpus 를 반자동 수집 및 구축하고 재학습을 하여 자동통번역기의 성능을 점증적으로 향상시키는 것이다.

An Automatic Extraction of English-Korean Bilingual Terms by Using Word-level Presumptive Alignment (단어 단위의 추정 정렬을 통한 영-한 대역어의 자동 추출)

  • Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.6
    • /
    • pp.433-442
    • /
    • 2013
  • A set of bilingual terms is one of the most important factors in building language-related applications such as a machine translation system and a cross-lingual information system. In this paper, we introduce a new approach that automatically extracts candidates of English-Korean bilingual terms by using a bilingual parallel corpus and a basic English-Korean lexicon. This approach can be useful even though the size of the parallel corpus is small. A sentence alignment is achieved first for the document-level parallel corpus. We can align words between a pair of aligned sentences by referencing a basic bilingual lexicon. For unaligned words between a pair of aligned sentences, several assumptions are applied in order to align bilingual term candidates of two languages. A location of a sentence, a relation between words, and linguistic information between two languages are examples of the assumptions. An experimental result shows approximately 71.7% accuracy for the English-Korean bilingual term candidates which are automatically extracted from 1,000 bilingual parallel corpus.

The Development of an Automatic Indexing System based on a Thesaurus (시소러스를 기반으로 하는 자동색인 시스템에 관한 연구)

  • 임형묵;정상철
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.1
    • /
    • pp.213-242
    • /
    • 1993
  • During the past decades,several automatic indexing systems have been developed such as single term indexing.phrase indexing and thesaurus basedidndexing systems.Among these systems,single term indexing has been known as superior to others despte its simpicity of extracting meaningful terms.On the other hand,thesaurus based one has been conceived as producing low retrival rate ,mainly because thesauri do not usually have enough index terms.so that much of text data fail to be indexed if they do not match with any of index terms in thesauri.This paper develops a thesaurus based indexing system THINS that yields higher retrieval rate than other systems.by doing syntactic analysis of text data and matching them with index terms in thesauri partially.First,the system analyzes the input text syntactically by using the machine translation suystem MATES/EK and extracts noun phrases.After deleting stop words from noun phrases and stemming the remaining ones.it tries to index these with similar index terms in the thesaurus as much as possible. We conduct an experiment with CACM data set that measures the retrieval effectiveness with CACM data set that measures the retrieval effectuvenss of THINS with single term based one under HYKIS-a thesaurus based information retrieval system.It turns out that THINS yields about 10 percent higher precision than single term based one.while shows 8to9 percent lower recall.This retrieval rate shows that THINS improves much better than privious ones that only yields 25 or 30 percent lower precision than single term based one.We also argue that the relatively lower recall is cause by that CRCS-the thesaurus included in CACM datea set is very incomplete one,having only more than one thousand terms,thus THINS is expected to produce much higher rate if it is associated with currently available large thesaurus.

LOTOS Protocol Conformance Testing for Formal Description Specifications (형식 기술 기법에 의한 LOTOS 프로토콜 적합성 시험)

  • Chin, Byoung-Moon;Kim, Sung-Un;Ryu, Young-Suk
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.7
    • /
    • pp.1821-1841
    • /
    • 1997
  • This paper presents an automated protocol conformance test sequence generation based on formal methods for LOTOS specification by using and applying many existing related algorithms and technique, such as the testing framework, Rural Chinese Postman tour concepts. We use the state-transition graphs obtained from LOTOS specifications by means of the CAESAR tool. This tool compiles a specification written in LOTOS into an extended Petri net, from which a transition graph of a event finite-state machine(EvFSM) including data is generated. A new characterizing sequence(CS), called Unique Event sequence(UE sequence) is defined. An UE sequence for a state is a sequence of accepted gate events that is unique for this state. Some experiences about UE sequence, partial UE sequence and signature are also explained. These sequences are combined with the concept of the Rural Chinese Postman Tour to obtain an optimal test sequence which is a minimum cost tour of the reference transition graph of the EvFSM. This paper also presents a fault coverage estimation experience of an automated method for optimized test sequences generation and the translation of the test sequence obtained by using our tool to TTCN notation are also given. A prototype of the proposed framework has been built with special attention to real application in order to generated the executable test cases in an automatic way. This formal method on conformance testing can be applied to the protocols related to IN, PCS and ATM for the purpose of verifying the correctness of implementation with respect to the given specification.

  • PDF