• Title/Summary/Keyword: Unicode characters

Search Result 29, Processing Time 0.023 seconds

A study of MeSH Compatibility between Korea and Chinese (한국과 중국의 MeSH 호환성 연구)

  • Kwon, Young-Kyu;Lee, Byung-Wook
    • Journal of the Korean Institute of Oriental Medical Informatics
    • /
    • v.11 no.2
    • /
    • pp.65-82
    • /
    • 2005
  • The findings from this study are summarized as follows: 1. Hangul 2004 has 16,023 Chinese Character codes. Among them, 15,231 Chinese Character codes are searched by DB, the others are unsearchable codes. 2. Among 15,231 Chinese Character codes of Hangul 2004, 2,471 Chinese Character codes are converted into 2,232 Simplified Chinese Character codes by Traditional and Simplified Chinese Character Converting program in Hangul 2004. 3. The 5th edition TCM-MeSH has 6,385 thesauruses and 2,142 kinds of Chinese Characters. 4. If we use Simplified Chinese Character of Hangul 2004 to search for TCM-MeSH, we will find 94.3% of TCM-MeSH. But If we use Traditional Chinese Character of Hangul 2004 to search for TCM-MeSH, we will find only 34.2% of TCM-MeSH.

  • PDF

Source Coding Rule of Characters to Minimize HDB-3 Scrambling in Line Coder for UTF-8 code (UTF-8 부호의 HDB-3스크램블링 최소화를 위한 문자의 원천부호화 규칙)

  • Hong, Wan-Pyo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.9
    • /
    • pp.1019-1026
    • /
    • 2015
  • This paper studied the source coding rule of the characters to minimize the HDB-3 scrambling for UTF-8 code. An existing source coding rule of the characters to minimize the HDB-3 scrambling in the line coder is for the source codes which are directly entered into the line coder without any transformation. Therefore the existing source coding rule can't apply the UTF-8 code which is directly came into an input of line coder. The reason is that the scrambling code in the source codes are not same as UTF-8 codes. So, if they want to analysis the scrambling occurrence situation in UTF-8 codes and make an unscrambling UTF-8 code, they should make a UTF-8 code table for the source codes, find out the scrambling occurrence codes and then encode the unscrambling source code. The source coding rule for UTF-8 code showing this paper can omit such a complicated procedure to encode an unscrambling source code.

A Text Processing Method for Devanagari Scripts in Andriod (안드로이드에서 힌디어 텍스트 처리 방법)

  • Kim, Jae-Hyeok;Maeng, Seung-Ryol
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.12
    • /
    • pp.560-569
    • /
    • 2011
  • In this paper, we propose a text processing method for Hindi characters, Devanagari scripts, in the Android. The key points of the text processing are to device automata, which define the combining rules of alphabets into a set of syllables, and to implement a font rendering engine, which retrieves and displays the glyph images corresponding to specific characters. In general, an automaton depends on the type and the number of characters. For the soft-keyboard, we designed the automata with 14 consonants and 34 vowels based on Unicode. Finally, a combined syllable is converted into a glyph index using the mapping table, used as a handle to load its glyph image. According to the multi-lingual framework of Freetype font engine, Dvanagari scripts can be supported in the system level by appending the implementation of our method to the font engine as the Hindi module. The proposed method is verified through a simple message system.

A Study on Data Sharing Codes Definition of Chinese in CAI Application Programs (CAI 응용프로그램 작성시 자료공유를 위한 한자 코드 체계 정의에 관한 연구)

  • Kho, Dae-Ghon
    • Journal of The Korean Association of Information Education
    • /
    • v.2 no.2
    • /
    • pp.162-173
    • /
    • 1998
  • Writing a CAI program containing Chinese characters requires a common Chinese character code to share information for educational purposes. A Chinese character code setting needs to allow a mixed use of both vowel and stroke order, to represent Chinese characters in simplified Chinese as well as in Japanese version, and to have a conversion process for data exchange among different sets of Chinese codes. Waste in code area is expected when vowel order is used because heteronyms are recognized as different. However, using stroke order facilitates in data recovery preventing duplicate code generation, though it does not comply with the phonetic rule. We claim that the first and second level Chinese code area needs to be expanded as much as academic and industrial circles have demanded. Also, we assert that Unicode can be a temporary measure for an educational code system due to its interoperability, expandability, and expressivity of character sets.

  • PDF

Design and Implementation of Conversion System Between ISO/IEC 10646 and Multi-Byte Code Set (ISO/IEC 10646과 멀티바이트 코드 세트간의 변환시스템의 설계 및 구현)

  • Kim, Chul
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.4
    • /
    • pp.319-324
    • /
    • 2018
  • In this paper, we designed and implemented a code conversion method between ISO/IEC 10646 and the multi-byte code set. The Universal Multiple-Octet Coded Character Set(UCS) provides codes for more than 65,000 characters, huge increase over ASCII's code capacity of 128 characters. It is applicable to the representation, transmission, interchange, processing, storage, input and presentation of the written form of the language throughout the world. Therefore, it is so important to guide on code conversion methods to their customers during customer systems are migrated to the environment which the UCS code system is used and/or the current code systems, i.e., ASCII PC code and EBCDIC host code, are used with the UCS together. Code conversion utility including the mapping table between the UCS and IBM new host code is shown for the purpose of the explanation of code conversion algorithm and its implementation in the system. The programs are successfully executed in the real system environments and so can be delivered to the customer during its migration stage from the UCS to the current IBM code system and vice versa.

Distance Measures in HMM Clustering for Large-scale On-line Chinese Character Recognition (대용량 온라인 한자 인식을 위한 클러스터링 거리계산 척도)

  • Kim, Kwang-Seob;Ha, Jin-Young
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.9
    • /
    • pp.683-690
    • /
    • 2009
  • One of the major problems that prevent us from building a good recognition system for large-scale on-line Chinese character recognition using HMMs is increasing recognition time. In this paper, we propose a clustering method to solve recognition speed problem and an efficient distance measure between HMMs. From the experiments, we got about twice the recognition speed and 95.37% 10-candidate recognition accuracy, which is only 0.9% decrease, for 20,902 Chinese characters defined in Unicode CJK unified ideographs.

Huffman Code Design and PSIP Structure of Hangul Data for Digital Broadcasting (디지털 방송용 한글 허프만 부호 설계 및 PSIP 구조)

  • 황재정;진경식;한학수;최준영;이진환
    • Journal of Broadcast Engineering
    • /
    • v.6 no.1
    • /
    • pp.98-107
    • /
    • 2001
  • In this paper we derive an optimal Huffman code set with escape coding that miximizes coding efficiency for the Hangul text data. The Hangul code can be represented in the standard Wansung or Unicode format, and we can generate a set of Huffamn codes for both. The current Korean DT standard has not defined a Hangul compression algorithm which may be confronted with a serious data rate for the digital data broadcasting system Generation of the optimal Huffman code set is to solve the data transmission problem. A relevant PSIP structure for the DTB standard is also proposed As a result characters which have the probability of less than 0.0043 are escape coded, showing the optimum compression efficiency of 46%.

  • PDF

Study on the ASCII Code in the side of the Transmission Efficiency in Data Communications (데이터통신 전송효율과 ASCII 부호체계 고찰)

  • Hong, Wan-Pyo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.5
    • /
    • pp.657-664
    • /
    • 2011
  • This paper proposes the revised ASCII code. The study started with consideration whether the ASCII code is proper or not in the side of the transmission efficiency in data communications. In data communications, when the consecutive "0" bits from the information devices input into the line coder, its consecutive "0" bits are scrambled to the predetermined patterns not to the consecutive "0" signal. The paper used to study with the statistical data for the frequency of the letters of the alphabets and the proposed rule of characters coding in reference. As a result of the study, when the proposed ASCII code is applied, the operation efficiency of the scrambler in the line coder is improved upto average 30%.

A Study of the framework of search patterns for Hangul characters and its relationship with Hangout code for Hangeul Character based Index (한글 글자 단위 인덱스를 위한 검색 유형 정의 및 한글 부호계와의 연관성에 관한 연구)

  • Lee, Jung-Hwa;Lee, Jong-Min;Kim, Seong-Woo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.6
    • /
    • pp.1083-1088
    • /
    • 2007
  • In this paper, We investigate the search patterns that are applied to the character based word search and make the search algorithm. We used to various hangout coded set that are KS X 1001 hangeul coded set and unicode 3.0 for the character based word search algorithm. In each case, We study of efficiency of algorithms that are related to hangeul coded set.