DOI QR코드

DOI QR Code

Scrambling Occurrence Frequency in HDB-3 in UTF-8 Coding of UNICODE Hangul Jamo

Unicode 한글낱자의 UTF-8 부호화에 따른 HDB-3 스크램블링 발생빈도

  • Hong, Wan-Pyo (Department of Information and Telecommunication Engineering, Hansei University)
  • 홍완표 (한세대학교 정보통신공학과)
  • Received : 2015.03.15
  • Accepted : 2015.04.10
  • Published : 2015.04.30

Abstract

This paper has studied about the scrambling occurrence frequency in UTF-8 coding system for Unicode Hangul Jamo codes. The scrambling methode applied in the study is HDB-3 in AMI line coding that is international transmission standard. In the study, the source coding rule was applied to analysis the scrambling occurrence. The quantity of the scrambling occurrence was calculated by the number of times and frequency rate of the scrambling occurrence in Hangul Jamo and Compatibity Hangul Jamo. In the case of Hangul Jamo, the number of times and frequency rate in Unicode and UTF-8 were 24times, 52% and 148times, 228% respectively. In the case of Compatibility Hangul Jamo, that were 10times, 14% and 83times, 131% respectively. As a result, when Hangul Jamo and Compatibility Hangul Jamo in UNICODE were transformed to UTF-8, the scrambling frequency rates were increased 340% and 851% respectively.

본 논문은 국제적 문자부호체계인 유니코드 내에 있는 한글낱자와 호환용 한글낱자 부호를 통신망에 전송하기 위해 UTF-8부호로 변환할 때 회선부호기에서 발생하는 스크램블링의 발생빈도를 연구하였다. 본 논문에서 적용한 회선부호기의 스크램블링 방식은 ITU 및 한국의 표준전송방식인 HDB-3방식으로 하였다. 각 한글낱자부호에서 발생하는 스크램블링을 분석하기 위해 원천 부호화규칙을 적용하였다. 스크램블링 발생량은 각 한글낱자에서 발생하는 스크램블링의 발생횟수와 각 한글낱자의 사용빈도에 의한 발생 빈도율을 추출하였다. 연구결과 한글낱자의 부호에 대한 스크램블링 발생은 유니코드 체계내에서 24번, 52%, UTF-8체계 내에서 148번, 228%발생하였다. 호환용 한글낱자 부호에서는 유니코드 체계내에서 10번, 14%, UTF-8 체계 내에서 83번, 131%발생하였다. 즉, 유니코드체계의 한글낱자와 호환용 한글낱자는 UTF-8체계로 변환하면서 스크램블링 발생 빈도율이 각각 340%, 851%증가하는 것으로 나타났다.

References

  1. B. A. Forouzan, Data communications, New York, NY: McGraw-Hill, 2008.
  2. P. V. Sreekanth, Digital Microwave Communication Systems, Hyderguda, India: Universities press(india) Private Limited, 2003.
  3. P. C. Gupta, Data Communications and Computer Networks, 2nd ed, Delhi, India: PHI Learning Private Limited, 2014.
  4. H. S. Kim, Korean Use Frequency Survey, Seoul, Korea: The National Institute of the Korean Language, 2005.
  5. W. Hong, "Coding rule of characters by 2 bytes with 4x4 bits to improve the transmission efficiency in data communications," The Journal of Korea Navigation Institute, Vol. 15, No. 5, pp. 745-751, Oct. 2011.
  6. J. Alipranda, The Unicode Standard, Boston, MA: Addison Wesley, 2004.
  7. The Unicode Consortium. Components of The Unicode Standard Version 1.0.0 [Internet]. Available: http://www.unicode.org/versions/components-1.0.0.html
  8. F. Yergeau, UTF-8, a Transformation Format of Unicode and ISO 10646, Montreal, Canada: Alis Technologies, 1996.
  9. The Unicode Consortium. UTF-8 encoding table and Unic ode characters page with code points U+1100 to U+11FF. [Internet]. Available: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=4352
  10. The Unicode Consortium. UTF-8 encoding table and Un icode characters page with code points U+3100 to U+31FF. [Internet]. Available: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=12544
  11. The Unicode Consortium. Components of The Unicode Standard Version 1.0.0 [Internet]. Available: http://www.unicode.org/charts/PDF/U1000.pdf
  12. The Unicode Consortium. Components of The Unicode Standard Version 1.0.0 [Internet]. Available: http://www.unicode.org/charts/PDF/U3000.pdf
  13. The Unicode Consortium. Unicode 7.0.0 Released: 2014 June 16 [Internet]. Available: http://www.unicode.org/versions/Unicode7.0.0/