DOI QR코드

DOI QR Code

History Document Image Background Noise and Removal Methods

  • Ganchimeg, Ganbold (School of E-Open Institute of The Mongolian University of Science and Technology)
  • 투고 : 2015.02.14
  • 심사 : 2015.04.19
  • 발행 : 2015.12.30

초록

It is common for archive libraries to provide public access to historical and ancient document image collections. It is common for such document images to require specialized processing in order to remove background noise and become more legible. Document images may be contaminated with noise during transmission, scanning or conversion to digital form. We can categorize noises by identifying their features and can search for similar patterns in a document image to choose appropriate methods for their removal. In this paper, we propose a hybrid binarization approach for improving the quality of old documents using a combination of global and local thresholding. This article also reviews noises that might appear in scanned document images and discusses some noise removal methods.

키워드

참고문헌

  1. Bao-ping, W., Huai-liang, L., Nan-jing, L., & Wei-xin, X. (2005). A novel adaptive image fuzzy enhancement algorithm. Xi'an, 32, 307-313.
  2. Bernsen, J. (1986). Dynamic thresholding of gray-level images, Proceedings 8th International Conference on Pattern Recognition, Paris, 1251-1255.
  3. Deborah, H., & Arymurthy, A. (2010). Image Enhancement and Image Restoration for Old Document Image Using Genetic Algorithm. 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies, 108-112.
  4. Fan, K., Wang, Y., & Lay, T. (2002). Marginal noise removal of document images. Pattern Recognition, 35(11), 2593-2611. https://doi.org/10.1016/S0031-3203(01)00205-9
  5. Farahmand, A., Sarrafzadeh, A., & Shanbehzadeh, J. (2013). Document Image Noises and Removal Methods. Proceedings of the International MultiConference of Engineers and Computer Scientists 2013, 1, 436-440.
  6. Feng, M., & Tan, Y. (2004). Adaptive binarization method for document image analysis. 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763), 1, 339-342.
  7. Fisher, R., Perkins, S., Walker, A., & Wolfart, E. (2003). Histogram Equalization. Retrieved January 16, 2015, from http://homepages.inf.ed.ac.uk/rbf/HIPR2/histeq.htm
  8. Ganchimeg, G. (2013). Application exhibits of historical virtual museum. ICEIC 2013, 188-190.
  9. Ganchimet, G., Turbat, R. (2014). Detection of Edges in Color Images. Journal of IEEK Transactions on Smart Processing and Computing, 3(6), 345-352. https://doi.org/10.5573/IEIESPC.2014.3.6.345
  10. Gatos, B., Pratikakis, I., & Perantonis, S. (2004). An Adaptive Binarization Technique for Low Quality Historical Documents. Document Analysis Systems VI Lecture Notes in Computer Science, 3163, 102-113. https://doi.org/10.1007/978-3-540-28640-0_10
  11. Gatos, B., Pratikakis, I., & Perantonis, S. (2006). Adaptive degraded document image binarization. Pattern Recognition, 39, 317-327. https://doi.org/10.1016/j.patcog.2005.09.010
  12. Gonzales, R. C., & Woods, R. E. (2002). Digital Image Processing 2nd Edition. New Jersey: Prentice-Hall.
  13. Hao, N. B. (2008). Fuzzy enhancement algorithm based on rough fuzzy sets theory for the medical volumetric data, Micro-electron. Com put, 25, 137-140.
  14. Kim, J., Kim, L., & Hwang, S. (2001). An advanced contrast enhancement using partially overlapped sub-block histogram equalization. IEEE Trans. Circuits Syst. Video Technol. IEEE Transactions on Circuits and Systems for Video Technology, 11, 475-484. https://doi.org/10.1109/76.915354
  15. Kohmura, H., & Wakahara, T. (2006). Determining Optimal Filters for Binarization of Degraded Characters in Color Using Genetic Algorithms. 18th International Conference on Pattern Recognition (ICPR'06), 3, 661-664.
  16. Kuppannan, J., Rangasamy, P., Thirupathi, D., & Palaniappan, N. (2006). Intuitionistic Fuzzy Approach to Enhance Text Documents. 2006 3rd International IEEE Conference Intelligent Systems, 733-737.
  17. Ming, L., Xie, G., & Wang, Y. (2008). Fuzzy enhancement algorithm based on rough fuzzy sets theory for the medical volumetric data. Micro-electron. Com Put, 25, 137-140.
  18. Niblack, W. (1986). In An introduction to digital image processing. Englewood Cliffs (p. 198), N.J.: Prentice-Hall International.
  19. Nomura, S., Yamanaka, K., Shiose, T., Kawakami, H., & Katai, O. (2009). Morphological preprocessing method to thresholding degraded word images. Pattern Recognition Letters, 30(8), 729-744. https://doi.org/10.1016/j.patrec.2009.03.008
  20. Otsu, N. (1979). A threshold selection method form gray-level histograms. Proceedings of the 1986 IEEE Transactions Systems, 9(1), 62-66.
  21. Paulinas, M., & Usinskas, A. (2007). A survey of Genetic Algorithms Applications for Image Enhancement and Segmentation. Information Technology and Control, 36(3), 278-284.
  22. Peerawit, W., & Kawtrakul, A. (2004). Marginal Noise Removal from Document Images Using Edge Density. Proceedings of Fourth Information and Computer Eng. Postgraduate Workshop.
  23. Said, J., Cheriet, M., & Suen, C. (1996). Dynamical morphological processing: A fast method for base line extraction. Proceedings of 13th International Conference on Pattern Recognition, 2, 8-12.
  24. Sauvola, J., & Pietikainen, M. (2000). Adaptive document image binarization. Pattern Recognition, 33(2), 225-236. https://doi.org/10.1016/S0031-3203(99)00055-2
  25. Shafait, F., Beusekom, J., Keysers, D., & Breuel, T. (2008). Document cleanup using page frame detection. IJDAR International Journal of Document Analysis and Recognition (IJDAR), 11(2), 81-96. https://doi.org/10.1007/s10032-008-0071-7
  26. Shafait, F., & Breuel, T. (2009). A simple and effective approach for border noise removal from document images. 2009 IEEE 13th International Multitopic Conference, 126-137.
  27. Zadeh, L. (1965). Fuzzy Sets. Information and Control, 8, 338-353. https://doi.org/10.1016/S0019-9958(65)90241-X
  28. Zhang, Z., & Tan, C. (2001). Recovery of distorted document images from bound volumes. Proceedings of Sixth International Conference on Document Analysis and Recognition, 429-433.