• Title/Summary/Keyword: Unicode characters

Search Result 29, Processing Time 0.033 seconds

Problems with Chinese Ideographs Search in Unicode and Solutions to Them (유니코드 한자 검색의 문제점 및 개선방안)

  • Lee, Jeong-hyeon
    • Informatization Policy
    • /
    • v.19 no.3
    • /
    • pp.50-63
    • /
    • 2012
  • This thesis is designed to analyze how the search for Chinese ideographs is done in Koreanology-related domestic databases, domestic library databases, domestic academic databases, and overseas library databases, with a view to identifying problems and suggesting solutions to them. The major reasons that impede Chinese ideographs search in Unicode are classified as 'multicode characters', 'simplified characters', and 'variant characters', and three characters are chosen as samples to describe the current practice. Thirteen Koreanology-related databases, five domestic library databases, five domestic academic databases and two overseas library databases are analyzed in terms of Chinese ideographs search. To support search for multicode characters, the open source of the Unicode consortium must be applied. To improve search for simplified and variant characters, a matching table must be standardized and proposed to the Unicode consortium.

  • PDF

An Anti-Forensic Technique for Hiding Data in NTFS Index Record with a Unicode Transformation (유니코드 변환이 적용된 NTFS 인덱스 레코드에 데이터를 숨기기 위한 안티포렌식 기법)

  • Cho, Gyu-Sang
    • Convergence Security Journal
    • /
    • v.15 no.7
    • /
    • pp.75-84
    • /
    • 2015
  • In an "NTFS Index Record Data Hiding" method messages are hidden by using file names. Windows NTFS file naming convention has some forbidden ASCII characters for a file name. When inputting Hangul with the Roman alphabet, if the forbidden characters for the file name and binary data are used, the codes are convert to a designated unicode point to avoid a file creation error due to unsuitable characters. In this paper, the problem of a file creation error due to non-admittable characters for the file name is fixed, which is used in the index record data hiding method. Using Hangul with Roman alphabet the characters cause a file creation error are converted to an arbitrary unicode point except Hangul and Roman alphabet area. When it comes to binary data, all 256 codes are converted to designated unicode area except an extended unicode(surrogate pairs) and ASCII code area. The results of the two cases, i.e. the Hangul with Roman alphabet case and the binary case, show the applicability of the proposed method.

Development of a Font Processing System for GSM Mobile Phone (GSM 핸드폰을 위한 폰트 처리 시스템의 설계 및 구현)

  • Lee, Sang-Bum;Lee, Yong-Hun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.3
    • /
    • pp.951-957
    • /
    • 2010
  • In this thesis, we propose a font development system that can handle various fonts efficiently in the GSM mobile terminals. The ASCII code was widely used to express characters on the computer in the beginning but it has limitation for representing many characters. Recently, Unicode was developed to add more characters. Researches on code systems are still on going to express the characters more efficiently. Attempt of applying this kind of Unicode to the mobile terminal didn't work efficiently since there are too many characters for various languages. In this research, we designed and developed a font system to shorten processing time and efforts that apply Unicode to mobile terminals to solve these problems. Our system can save processing time and efforts since it reduces the meaningless processing compared to other systems.

New Text Steganography Technique Based on Part-of-Speech Tagging and Format-Preserving Encryption

  • Mohammed Abdul Majeed;Rossilawati Sulaiman;Zarina Shukur
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.1
    • /
    • pp.170-191
    • /
    • 2024
  • The transmission of confidential data using cover media is called steganography. The three requirements of any effective steganography system are high embedding capacity, security, and imperceptibility. The text file's structure, which makes syntax and grammar more visually obvious than in other media, contributes to its poor imperceptibility. Text steganography is regarded as the most challenging carrier to hide secret data because of its insufficient redundant data compared to other digital objects. Unicode characters, especially non-printing or invisible, are employed for hiding data by mapping a specific amount of secret data bits in each character and inserting the character into cover text spaces. These characters are known with limited spaces to embed secret data. Current studies that used Unicode characters in text steganography focused on increasing the data hiding capacity with insufficient redundant data in a text file. A sequential embedding pattern is often selected and included in all available positions in the cover text. This embedding pattern negatively affects the text steganography system's imperceptibility and security. Thus, this study attempts to solve these limitations using the Part-of-speech (POS) tagging technique combined with the randomization concept in data hiding. Combining these two techniques allows inserting the Unicode characters in randomized patterns with specific positions in the cover text to increase data hiding capacity with minimum effects on imperceptibility and security. Format-preserving encryption (FPE) is also used to encrypt a secret message without changing its size before the embedding processes. By comparing the proposed technique to already existing ones, the results demonstrate that it fulfils the cover file's capacity, imperceptibility, and security requirements.

Support on Ideograph Characters Search of Unicode Based Information System (정보 시스템의 유니코드 기반 한자 검색 지원)

  • Yoon, So-Young
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.375-391
    • /
    • 2007
  • Unicode Han ideograph character set differed from the our principle of the phonetic value ordering in that it followed the principle of KangXi radical-stroke ordering of the characters. Therefore, information system should support ideograph search on precise analysis of materials which consist of korean character (hangul) and ideograph character (hanja). History Information system has been maintaining Hanja(Chinese Character) to Hangul Dictionary, Terminology Dictionary for composition, borrowing, non-ideographic principles, Variant Forms Dictionary, and Recently discovered Chinese Characters List.

Hangul Encoding Standard based on Unicode (유니코드의 한글 인코딩 표준안)

  • Ahn, Dae-Hyuk;Park, Young-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1083-1092
    • /
    • 2007
  • In Unicode, two types of Hangul encoding schemes are currently in use, namely, the "precomposed modern Hangul syllables" model and the "conjoining Hangul characters" model. The current Unicode Hangul conjoining rules allow a precomposed Hangul syllable to be a member of a syllable which includes conjoining Hangul characters; this has resulted in a number of different Hangul encoding implementations. This unfortunate problem stems from an incomplete understanding of the Hangul writing system when the normalization and encoding schemes were originally designed. In particular, the extended use of old Hangul was not taken into consideration. As a result, there are different ways to represent Hangul syllables, and this cause problem in the processing of Hangul text, for instance in searching, comparison and sorting functions. In this paper, we discuss the problems with the normalization of current Hangul encodings, and suggest a single efficient rule to correctly process the Hangul encoding in Unicode.

Consideration of CJK Joint Hanja Unicode when is used in AMI/HDB-3 Line Coding (AMI/HDB-3 회선부호화와 한·중·일 한자 유니코드 체계 고찰)

  • Tai, Dong-Zhen;Hong, Wan Pyo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.7
    • /
    • pp.1011-1015
    • /
    • 2013
  • This paper analyses the violation rate of CJK joint Chines character Unicode to the source code rule. In the paper, Chinese character 150ea in Chinese Unicode which have relatively a higher frequency in use of a character was chosen to study. The frequency rate in use of the 150ea characters is about 50% of the total frequency rate of the Chinese characters. The study was applied the AMI/HDB-3 line coding/scrambling and HDLC protocol, According to the analyses, the number of violated characters were 77ea of 150 ea, frequency rate in use 29%. Therefore, when the violated 77ea characters are replaced to the matched character codes to the source coding rule, the processing rate of the line coder can be improved about 37%.

A study on Code System of Latin Character to Improve Transmission Efficiency in Data Communications (데이터통신 전송효율과 라틴어 부호 체계 고찰)

  • Hong, Wan-Pyo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.7 no.4
    • /
    • pp.761-776
    • /
    • 2012
  • This paper proposes the revised Roman character code system using Unicode 3.0. The background of the paper is whether the Latin character code system th using in the world in Unicode V.3 is proper or not in the side of the transmission efficiency in data communications. In data communications, when the consecutive 4 bits or 8 bits of "0" bit from the information devices input into the line coder, its consecutive "0" bits are scrambled to the predetermined bit patterns to avoid the syncronization loss. The paper was based on the statistical data for the using frequency of the alphabet letter and the proposed rule for characters coding in [1]. The paper was focused to improve of Unicode itself and UTF-8 code system. As a result of the paper, when the proposed coding systems for Latin character in Unicode 3.0 itself and UTF-8 code system, the scrambler efficiency using HDB-3 in the line coder of the data transmission system could be improved about 3645 ~ 31400% and 480 ~ 1700% respectively.

A study on Mapping the Unicode based Hangul-Hanja for prescription names in Korean Medicine (처방명 연계를 위한 유니코드 한자 기반의 한글-한자 매핑정보 구축에 관한 연구)

  • Jeon, Byoung-Uk;Kim, An-Na;Kim, Ji-Young;Oh, Yong-Taek;Kim, Chul;Song, Mi-Young;Jang, Hyun-Chul
    • Korean Journal of Oriental Medicine
    • /
    • v.18 no.3
    • /
    • pp.133-139
    • /
    • 2012
  • Objective : UMLS is 'Ontology' which establishes the database for medical terminology by gathering various medical vocabularies representing same fundamental concepts. Method : Although Chinese character are represented in the Chinese part of Korean Unicode system in a computer, writing of Chinese characters is vary depending on Chinese input systems and Chinese writers' levels of knowledge. As the result of this, representation of Chinese writing in a computer will be considerably different from an old Chinese document. Therefore, a meaningful relationship between digital Chinese terminology and translated Korean is necessary in order to build Ontology for Chinese medical terms from Oriental medical prescription in a computer system. Result : This research will present 1:1 mapping information among the Chinese characters used in the Oriental medical prescription with analysis of 'same character different sound' and 'same meaning different shape' in Chinese part of Unicode systems. Conclusions : Furthermore, the research will provide top-down menu of relationship between Chinese term and Korean term in medical prescription with assumption of that the Oriental medical prescription has its own unique meaning.

Study on the Prerequisite Chinese Characters for Education of Traditional Korean Medicine (한의학 입문을 위한 필수한자 추출 및 분석연구)

  • Chae, Han;Hwang, Sang-Moon;Kwon, Young-Kyu;Baik, Yu-Sang;Shin, Sang-Woo;Yang, Gi-Young;Lee, Byung-Ryul;Kim, Jae-Kyu;Lee, Byung-Wook
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.24 no.3
    • /
    • pp.373-379
    • /
    • 2010
  • There has been a need for establishing operational curriculum for chinese characters and chinese writing used by traditional korean medicine (TKM), but it was not carefully recognized so far. We analysed the frequency of unicode chinese characters from five medical textbooks and showed prerequisite chinese characters for TKM beginners. It was found that 之, 者, 不, 也, 而, 氣, 陽, 陰, 下, 其, 病, 爲, 人, 以, 中, 則, 於, 脈, 上, 故 are the most frequently used 20 chinese characters. We also showed that adequate prerequisite chinese character should be designated for the more efficient education of TKM. This study was the first systematic approach to get essential and prerequisite chinese characters for the education of TKM. The prerequisite characters by this study will be used for the development of KEET (Korean Medicine Education Eligibility Test), entrance exam to the Colleges of Oriental Medicine and textbooks, and educational curriculum of premed students.