• Title/Summary/Keyword: 자동 번역

Search Result 218, Processing Time 0.024 seconds

Named Entity Recognition and Dictionary Construction for Korean Title: Books, Movies, Music and TV Programs (한국어 제목 개체명 인식 및 사전 구축: 도서, 영화, 음악, TV프로그램)

  • Park, Yongmin;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.7
    • /
    • pp.285-292
    • /
    • 2014
  • A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs (persons, locations and organizations). They are usually proper nouns or unregistered words, and traditional named entity recognizers use these characteristics to find out named entity candidates. The titles of books, movies and TV programs have different characteristics than PLO entities. They are sometimes multiple phrases, one sentence, or special characters. This makes it difficult to find the named entity candidates. In this paper we propose a method to quickly extract title named entities from news articles and automatically build a named entity dictionary for the titles. For the candidates identification, the word phrases enclosed with special symbols in a sentence are firstly extracted, and then verified by the SVM with using feature words and their distances. For the classification of the extracted title candidates, SVM is used with the mutual information of word contexts.

Automatically Extracting Unknown Translations Using Phrase Alignment (정렬기법을 이용한 미등록 대역어의 자동 추출)

  • Kim, Jae-Hoon;Yang, Sung-Il
    • The KIPS Transactions:PartB
    • /
    • v.14B no.3 s.113
    • /
    • pp.231-240
    • /
    • 2007
  • In this paper, we propose an automatic extraction model for unknown translations and implement an unknown translation extraction system using the proposed model. The proposed model as a phrase-alignment model is incorporated with three models: a phrase-boundary model, a language model, and a translation model. Using the proposed model we implement the system for extracting unknown translations, which consists of three parts: construction of parallel corpora, alignment of Korean and English words, extraction of unknown translations. To evaluate the performance of the proposed system we have established the reference corpus for extracting unknown translation, which comprises of 2,220 parallel sentences including about 1,500 unknown translations. Through several experiments, we have observed that the proposed model is very useful for extracting unknown translations. In the future, researches on objective evaluation and establishment of parallel corpora with good quality should be performed and studies on improving the performance of unknown translation extraction should be kept up.

A Design and Implementation of HTML Filter for Wireless Internet Service (무선 인터넷 서비스를 위한 HTML Filter의 설계 및 개발)

  • Jeong, Sam-Jin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.9 no.6
    • /
    • pp.1659-1665
    • /
    • 2008
  • In this paper, we propose a design and implementation of HTML Filter to provide wireless internet service efficiently. The HTML Filter translates the Web documents written by HTML into the WML documents for portable wireless equipments automatically that are suitable for the WAP environment. The HTML Filter consists of "the HTML lexical analyzer" which gathers HTML source into token units, "the HTML syntax analyzer" which takes the lexical units from the lexical analyzer and analyses grammar for the source, "the HTML abstract parser tree" which can represent meaning of HTML sources more effectively called an intermediate code, and "the target code generator" which can convert the abstract parser tree into WML documents. The proposed translating method using ordinary compiler technology provides more easy and effective than the existing converters, and will be applied to other wireless language translating methods on wireless internet.

BackTranScription (BTS)-based Jeju Automatic Speech Recognition Post-processor Research (BackTranScription (BTS)기반 제주어 음성인식 후처리기 연구)

  • Park, Chanjun;Seo, Jaehyung;Lee, Seolhwa;Moon, Heonseok;Eo, Sugyeong;Jang, Yoonna;Lim, Heuiseok
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.178-185
    • /
    • 2021
  • Sequence to sequence(S2S) 기반 음성인식 후처리기를 훈련하기 위한 학습 데이터 구축을 위해 (음성인식 결과(speech recognition sentence), 전사자(phonetic transcriptor)가 수정한 문장(Human post edit sentence))의 병렬 말뭉치가 필요하며 이를 위해 많은 노동력(human-labor)이 소요된다. BackTranScription (BTS)이란 기존 S2S기반 음성인식 후처리기의 한계점을 완화하기 위해 제안된 데이터 구축 방법론이며 Text-To-Speech(TTS)와 Speech-To-Text(STT) 기술을 결합하여 pseudo 병렬 말뭉치를 생성하는 기술을 의미한다. 해당 방법론은 전사자의 역할을 없애고 방대한 양의 학습 데이터를 자동으로 생성할 수 있기에 데이터 구축에 있어서 시간과 비용을 단축 할 수 있다. 본 논문은 BTS를 바탕으로 제주어 도메인에 특화된 음성인식 후처리기의 성능을 향상시키기 위하여 모델 수정(model modification)을 통해 성능을 향상시키는 모델 중심 접근(model-centric) 방법론과 모델 수정 없이 데이터의 양과 질을 고려하여 성능을 향상시키는 데이터 중심 접근(data-centric) 방법론에 대한 비교 분석을 진행하였다. 실험결과 모델 교정없이 데이터 중심 접근 방법론을 적용하는 것이 성능 향상에 더 도움이 됨을 알 수 있었으며 모델 중심 접근 방법론의 부정적 측면 (negative result)에 대해서 분석을 진행하였다.

  • PDF

On the Sequences of Dialogue Acts and the Dialogue Flows-w.r.t. the appointment scheduling dialogues (대화행위의 연쇄관계와 대화흐름에 대하여 -[일정협의 대화] 중심으로)

  • 박혜은;이민행
    • Korean Journal of Cognitive Science
    • /
    • v.10 no.2
    • /
    • pp.27-34
    • /
    • 1999
  • The main purpose of this paper is to propose a general dialogue flow in 'the a appointment scheduling dialogues' in German using the concept of dialogue acts. A basic a assumption of this research is that dialogue acts contribute to the improvement of a translation system. They might be very useful to solve the problems that syntactic and semantic module could not resolve using contextual knowledge. The classification of the dialogue acts was conducted as a work of VERBMOBIL project and was based on real dialogues transcribed by experts. The real dialogues were analyzed in terms of the dialogue acts. We empirically analyzed the sequences of the dialogue acts not only in a series of dialogue turns but also in one dialogue turn. We attempted to analyZe the sequences in one dialogue turn additionally because the dialogue data used in this research showed some difference from the ones in other existing researches. By examining the sequences in dialogue acts. we proposed the dialogue flowchart in 'the a appointment scheduling dialogues' 'Based on the statistical analysis of the sequences of the most frequent dialogue acts. the dialogue flowcharts seem to represent' the a appointment scheduling dialogues' in general. A further research is required on c classification of dialogue acts which was a base for the analysis of dialogues. In order to e extract the most generalized model. we did not subcategorize each dialogue acts and used a limited number of items of dialogue acts. However. generally defined dialogue acts need to be defined more concretely and new dialogue acts for specific situations should be a added.

  • PDF

Emergence of Social Networked Journalism Model: A Case Study of Social News Site, "wikitree" (소셜 네트워크 저널리즘 모델의 출현: 소셜 뉴스사이트, "위키트리" 사례연구)

  • Seol, Jinah
    • Journal of Internet Computing and Services
    • /
    • v.16 no.1
    • /
    • pp.83-90
    • /
    • 2015
  • This paper examines the rising value of social networked journalism and analyzes the case of a social news site based on the theory of networked journalism. Social networked journalism allows the public to be involved in every aspect of journalism production through crowd-sourcing and interactivity. The networking effect with the public is driving journalism to transform into a more open, more networked and more responsive venue. "wikitree" is a social networking news service on which anybody can write news and disseminate it via Facebook and Twitter. It is operated as an open sourced program which incorporates "Google Translate" to automatically convert all its content, enabling any global citizen with an Internet access to contribute news production and share either their own creative contents or generated contents from other sources. Since its inception, "wikitree global" site has been expanding its coverage rapidly with access points arising from 160 countries. Analyzing its international coverage by country and by news category as well as by the unique visit numbers via SNS, the results of the case study imply that networking with the global public can enhance news traffic to the social news site as well as to specific news items. The results also suggest that the utilization of Twitter and Facebook in social networked journalism can break the boundary between local and global public by extending news-gathering ability while growing audience's interest in the site, and engender a feasible business model for a local online journalism.

Circadian Clock Genes, PER1 and PER2, as Tumor Suppressors (체내 시계 유전자 PER1과 PER2의 종양억제자 기능)

  • Son, Beomseok;Do, Hyunhee;Kim, EunGi;Youn, BuHyun;Kim, Wanyeon
    • Journal of Life Science
    • /
    • v.27 no.10
    • /
    • pp.1225-1231
    • /
    • 2017
  • Disruptive expression patterns of the circadian clock genes are highly associated with many human diseases, including cancer. Cell cycle and proliferation is linked to a circadian rhythm; therefore, abnormal clock gene expression could result in tumorigenesis and malignant development. The molecular network of the circadian clock is based on transcriptional and translational feedback loops orchestrated by a variety of clock activators and clock repressors. The expression of 10~15% of the genome is controlled by the overall balance of circadian oscillation. Among the many clock genes, Period 1 (Per1) and Period 2 (Per2) are clock repressor genes that play an important role in the regulation of normal physiological rhythms. It has been reported that PER1 and PER2 are involved in the expression of cell cycle regulators including cyclins, cyclin-dependent kinases (CDKs), and CDK inhibitors. In addition, correlation of the down-regulation of PER1 and PER2 with development of many cancer types has been revealed. In this review, we focused on the molecular function of PER1 and PER2 in the circadian clock network and the transcriptional and translational targets of PER1 and PER2 involved in cell cycle and tumorigenesis. Moreover, we provide information suggesting that PER1 and PER2 could be promising therapeutic targets for cancer therapies and serve as potential prognostic markers for certain types of human cancers.

A Study on Verification of Back TranScription(BTS)-based Data Construction (Back TranScription(BTS)기반 데이터 구축 검증 연구)

  • Park, Chanjun;Seo, Jaehyung;Lee, Seolhwa;Moon, Hyeonseok;Eo, Sugyeong;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.109-117
    • /
    • 2021
  • Recently, the use of speech-based interfaces is increasing as a means for human-computer interaction (HCI). Accordingly, interest in post-processors for correcting errors in speech recognition results is also increasing. However, a lot of human-labor is required for data construction. in order to manufacture a sequence to sequence (S2S) based speech recognition post-processor. To this end, to alleviate the limitations of the existing construction methodology, a new data construction method called Back TranScription (BTS) was proposed. BTS refers to a technology that combines TTS and STT technology to create a pseudo parallel corpus. This methodology eliminates the role of a phonetic transcriptor and can automatically generate vast amounts of training data, saving the cost. This paper verified through experiments that data should be constructed in consideration of text style and domain rather than constructing data without any criteria by extending the existing BTS research.