Journal of the Korean Data and Information Science Society
- Volume 19 Issue 4
- /
- Pages.1297-1304
- /
- 2008
- /
- 1598-9402(pISSN)
Language- Independent Sentence Boundary Detection with Automatic Feature Selection
Abstract
This paper proposes a machine learning approach for language-independent sentence boundary detection. The proposed method requires no heuristic rules and language-specific features, such as part-of-speech information, a list of abbreviations or proper names. With only the language-independent features, we perform experiments on not only an inflectional language but also an agglutinative language, having fairly different characteristics (in this paper, English and Korean, respectively). In addition, we obtain good performances in both languages. We have also experimented with the methods under a wide range of experimental conditions, especially for the selection of useful features.