• Title/Summary/Keyword: Word alignment

Search Result 47, Processing Time 0.026 seconds

Word class information in perception of prosodic prominence by Korean learners of English

  • Im, Suyeon
    • Phonetics and Speech Sciences
    • /
    • v.11 no.4
    • /
    • pp.1-8
    • /
    • 2019
  • This study aims to investigate how prosodic prominence is perceived in relation to word class information (or parts-of-speech) by Korean learners of English compared with native English speakers in public speech. Two groups, Korean learners of English and native English speakers, were asked to judge words perceived as prominent simultaneously while listening to a speech. Parts-of-speech and three acoustic cues (i.e., max F0, mean phone duration, and mean intensity) were analyzed for each word in the speech. The results showed that content words tended to be higher in pitch and longer in duration than function words. Both groups of listeners rated prominence on content words more frequently than on function words. This tendency, however, was significantly greater for Korean learners of English than for native English speakers. Among the parts-of-speech of the content words, Korean learners of English were more likely than native English speakers to judge nouns and verbs as prominent. This study presents evidence that Korean learners of English consider most, if not all, content words as landing locations of prosodic prominence, in alignment with the previous study on the production of prominence.

An Alignment based technique for Text Translation between Traditional Chinese and Simplified Chinese

  • Sue J. Ker;Lin, Chun-Hsien
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.147-156
    • /
    • 2002
  • Aligned parallel corpora have proved very useful in many natural language processing tasks, including statistical machine translation and word sense disambiguation. In this paper, we describe an alignment technique for extracting transfer mapping from the parallel corpus. During building our system and data collection, we observe that there are three types of translation approaches can be used. We especially focuses on Traditional Chinese and Simplified Chinese text lexical translation and a method for extracting transfer mappings for machine translation.

  • PDF

Korean-English Sentence Alignment Based on Sentence Length and Word Alignment (문장 길이와 단어 정렬에 기반한 한-영 문장 정렬)

  • Lim, Jae-Soo;Seo, Hee-Cheol;Lee, Sang-Zoo;Rim, Hae-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 2001.10d
    • /
    • pp.302-309
    • /
    • 2001
  • 말뭉치를 통한 통계적인 자연 언어 처리에 관한 연구가 다국어 처리 분야에서도 활발히 진행되고 있는 가운데, 본 논문에서는 병렬 말뭉치 구축 및 활용의 기본이 되는 문장 정렬을 위한 효과적인 방법을 제안한다. 먼저, 기존의 문장 길이를 이용한 방법을 한-영 문장 정렬에 적용해 보고, 길이 정보만을 이용했을 때의 한계점을 지적한다. 그리고, 사전과 품사 대응 확률을 이용한 단어 정렬을 통하여, 길이 기반의 정렬 방식이 갖는 문제점을 보완할 수 있는 방법을 제시한다. 실험을 통하여 제안한 방법이 길이에 기반한 방법에 비하여 높은 성능을 나타냄을 알 수 있었다. 또한 한-영 문장 정렬에의 어휘 정보 활용에 있어서 문제가 될 수 있는 요소가 어떤 것들이 있는지 알아본다.

  • PDF

A Postprocessing method for Statistical English-Korean Word Alignment Reflecting Alignment Tendency Between Parts-of-Speeches (품사간 정렬 경향을 반영한 통계 기반 영한 단어 정렬 후처리 방법)

  • Lee, Jae-Hee;Lee, Seung-Wook;Hwang, Young-Sook;Kim, Sang-Bum;Rim, Hae-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.242-246
    • /
    • 2009
  • 병렬 말뭉치 내에서 서로 대응되는 단어를 찾아내는 단어 정렬 작업은 기계 번역에서 가장 기본적으로 수행되는 작업이고 다양한 분야에서 유용하게 사용된다. 본 논문에서는 영한 단어 정렬에서 기존의 통계 기반 정렬 모델의 문제점을 파악하고 이를 해결하기 위해 영한의 품사간 정렬 경향을 단어 정렬에 반영하는 방법을 제안한다. 실험을 통해서 기존 통계 기반 영한 단어 정렬 결과와 비교하여 제안된 방법이 정확률, 재현율, F-measure 측면에서 모두 향상시키는 것을 보였다.

  • PDF

Toward A Bilingual Legal Term Glossary from Context Profiles

  • Kwong, Oi-Yee
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.249-258
    • /
    • 2002
  • We propose an algorithm for the automatic acquisition of a bilingual lexicon in the legal domain. We make use of a parallel corpus of bilingual court judgments, aligned to the sentence level, and analyse the bilingual context profiles to extract corresponding legal terms in both languages. Our method is different from those in past studies as it does not require any prior knowledge source, and naturally extends to multi-word terms in either language. A pilot test was done with a sample of ten legal terms, each with ten or more occurrences in the data. Encouraging results of about 75% average accuracy were obtained. This figure does not only reflect the effectiveness of the method for bilingual lexicon acquisition, but also its potential for bilingual alignment at the word or expression level.

  • PDF

Using Statistical Correction Rule to Improve Word Alignment (통계적 수정규칙을 이용한 한국어-중국어 단어정렬 개선방법)

  • Jin, Chang-Hu;Li, Jin-Ji;Na, Hwidong;Kim, Dong-Il;Lee, Jong-Hyeok
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.231-236
    • /
    • 2009
  • 본 논문에서는 통계적으로 추출한 수정규칙을 이용하여 구 기반 한-중 통계기계번역 시스템(PBSMT)의 단어정렬 결과를 개선하는 방법을 제안한다. 논문에서 제안하는 수정규칙은 단어정렬의 결과를 사람이 만든 정답과 비교하여 통계적으로 추출하였다. 본 논문에서는 위에서 추출한 수정규칙을 이용하여 한-중 통계기계번역 시스템의 단어정렬의 결과에서 한국어 기능어(functional word)에 나타나는 오류를 수정함으로써 단어정렬의 결과를 개선하였고 최종적으로 기계번역의 성능을 제고하였다.

  • PDF

Word Alignment Using Chinese-Korean Linguistic Contrastive Information (중-한 대조분석정보를 이용한 단어정렬)

  • Li, Jin-Ji;Kim, Dong-Il;Lee, Jong-Hyeok
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.40-46
    • /
    • 2002
  • 본 논문에서는 범용 병렬코퍼스에서도 적용할 수 있는 단어정렬의 방법을 제안한다. 단어 단위로 정렬된 병렬코퍼스는 자연언어처리의 다양한 분야에 도움을 준다. 예를 들면 변환기반의 기계번역에서 변환패턴의 구축, MWTU(Multi Word Translation Unit)의 자동추출, 사전 구축, 의미 중의성 해소 등 분야에 적용된다. 중한 병렬 코퍼스의 단어정렬은 서로 다른 어족간의 관계의 규명을 포함하고 있기 때문에 본 논문에서는 통계적인 모델보다 중한 대역어 사전, 단일어 시소러스, 품사정보 및 언어학적 대조분석 정보 등 기존에 있는 리소스를 이용하여 재현율과 정확률을 높이는 방법에 대해 제시한다. 성능 평가를 위해 중앙일보에서 임의로 추출한 500개 대응문장을 이용하여 실험한 결과 82.2%의 정확률과 64.8%의 재현율을 보였다.

  • PDF

Pivot Discrimination Approach for Paraphrase Extraction from Bilingual Corpus (이중 언어 기반 패러프레이즈 추출을 위한 피봇 차별화 방법)

  • Park, Esther;Lee, Hyoung-Gyu;Kim, Min-Jeong;Rim, Hae-Chang
    • Korean Journal of Cognitive Science
    • /
    • v.22 no.1
    • /
    • pp.57-78
    • /
    • 2011
  • Paraphrasing is the act of writing a text using other words without altering the meaning. Paraphrases can be used in many fields of natural language processing. In particular, paraphrases can be incorporated in machine translation in order to improve the coverage and the quality of translation. Recently, the approaches on paraphrase extraction utilize bilingual parallel corpora, which consist of aligned sentence pairs. In these approaches, paraphrases are identified, from the word alignment result, by pivot phrases which are the phrases in one language to which two or more phrases are connected in the other language. However, the word alignment is itself a very difficult task, so there can be many alignment errors. Moreover, the alignment errors can lead to the problem of selecting incorrect pivot phrases. In this study, we propose a method in paraphrase extraction that discriminates good pivot phrases from bad pivot phrases. Each pivot phrase is weighted according to its reliability, which is scored by considering the lexical and part-of-speech information. The experimental result shows that the proposed method achieves higher precision and recall of the paraphrase extraction than the baseline. Also, we show that the extracted paraphrases can increase the coverage of the Korean-English machine translation.

  • PDF

A Scalable Word-based RSA Cryptoprocessor with PCI Interface Using Pseudo Carry Look-ahead Adder (가상 캐리 예측 덧셈기와 PCI 인터페이스를 갖는 분할형 워드 기반 RSA 암호 칩의 설계)

  • Gwon, Taek-Won;Choe, Jun-Rim
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.8
    • /
    • pp.34-41
    • /
    • 2002
  • This paper describes a scalable implementation method of a word-based RSA cryptoprocessor using pseudo carry look-ahead adder The basic organization of the modular multiplier consists of two layers of carry-save adders (CSA) and a reduced carry generation and Propagation scheme called the pseudo carry look-ahead adder for the high-speed final addition. The proposed modular multiplier does not need complicated shift and alignment blocks to generate the next word at each clock cycle. Therefore, the proposed architecture reduces the hardware resources and speeds up the modular computation. We implemented a single-chip 1024-bit RSA cryptoprocessor based on the word-based modular multiplier with 256 datapaths in 0.5${\mu}{\textrm}{m}$ SOG technology after verifying the proposed architectures using FPGA with PCI bus.

Korean isolated word recognizer using new time alignment method of speech signal (새로운 시간축 정규화 방법을 이용한 한국어 고립단어 인식기)

  • Nam, Myeong-U;Park, Gyu-Hong;No, Seung-Yong
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.38 no.5
    • /
    • pp.567-575
    • /
    • 2001
  • This paper suggests new method to get fixed size parameter from different length of voice signals. The efficiency of speech recognizer is determined by how to compare the similarity(distance of each pattern) of the parameter from voice signal. But the variation of voice signal and the difference of speech speed make it difficult to extract the fixed size parameter from the voice signal. The method suggested in this paper is to normalize the parameter at fixed size by using the 2 dimension DCT(Discrete Cosine Transform) after representing the parameter by spectrogram. To prove validity of the suggested method, parameter extracted from 32 auditory filter-bank(it estimates auditory nerve firing probabilities) is used for the input of neural network after being processed by 2 dimension DCT. And to compare with conventional methods, we used one of conventional methods which solve time alignment problem. The result shows more efficient performance and faster recognition speed in the speaker dependent and independent isolated word recognition than conventional method.

  • PDF