Identification of Chinese Personal Names in Unrestricted Texts

  • Cheung, Lawrence (Language Information Sciences Research Ctr) ;
  • Tsou, Benjamin K. (Language Information Sciences Research Ctr) ;
  • Sun, Mao-Song (National AI Lab., Tsinghua University)
  • Published : 2002.02.01

Abstract

Automatic identification of Chinese personal names in unrestricted texts is a key task in Chinese word segmentation, and can affect other NLP tasks such as word segmentation and information retrieval, if it is not properly addressed. This paper (1) demonstrates the problems of Chinese personal name identification in some If applications, (2) analyzes the structure of Chinese personal names, and (3) further presents the relevant processing strategies. The geographical differences of Chinese personal names between Beijing and Hong Kong are highlighted at the end. It shows that variation in names across different Chinese communities constitutes a critical factor in designing Chinese personal name Identification algorithm.

Keywords