한국정보과학회:학술대회논문집 (Proceedings of the Korean Information Science Society Conference)
- 한국정보과학회 2004년도 가을 학술발표논문집 Vol.31 No.2 (1)
- /
- Pages.787-789
- /
- 2004
- /
- 1598-5164(pISSN)
웹 번역문서 판별과 병렬 말뭉치 구축
Judging Translated Web Document & Constructing Bilingual Corpus
- Jee-hyung, Kim (Dept. of Computer Science, Yonsei University) ;
- Yill-byung, Lee (Dept. of Computer Science, Yonsei University)
- 발행 : 2004.10.01
초록
People frequently feel the need of a general searching tool that frees from language barrier when they find information through the internet. Therefore, it is necessary to have a multilingual parallel corpus to search with a word that includes a search keyword and has a corresponding word in another language, Multilingual parallel corpus can be built and reused effectively through the several processes which are judgment of the web documents, sentence alignment and word alignment. To build a multilingual parallel corpus, multi-lingual dictionary should be constructed in each language and HTML should be simplified. And by understanding the meaning and the statistics of document structure, judgment on translated web documents will be made and the searched web pages will be aligned in sentence unit.
키워드