• Title/Summary/Keyword: Unicode

Search Result 67, Processing Time 0.027 seconds

Improvement plan for 'Newly found ideographs(新出漢字)' in the digitalizing business of the old Korean documents (고전 자료 디지털화사업에서의 신출한자 처리 개선방안)

  • Lee, Jeong-Hwa
    • Korean Journal of Oriental Medicine
    • /
    • v.10 no.1
    • /
    • pp.1-14
    • /
    • 2004
  • As entering the information age of the 21st century, Korea is actively processing many digitalizing businesses related to information source of the Korean academic science at the government level based on the Korean advanced digital technologies, which makes them more evolved through the internet networks in Korea. The definition of 'Newly found ideographs(新出漢字)' are made by researching and extracting from the old Chinese documents through the digitalizing process and they are not registered yet among the block of Unicode & extended Chinese characters those are existent international standard. Presently Korea is in the middle of brisk developing computerized old documents in the huge scale. Meanwhile, the international standard of Chinese characters in mostly Asian countries where using them is processing and developing by IRG. Therefore, Korean processing works is very important which are included extracting precisely 'Newly found ideographs' founded from building its database, organizing as an international standard code, submitting the International organization and finally registering as the best standard code.

  • PDF

Study on the ASCII Code in the side of the Transmission Efficiency in Data Communications (데이터통신 전송효율과 ASCII 부호체계 고찰)

  • Hong, Wan-Pyo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.5
    • /
    • pp.657-664
    • /
    • 2011
  • This paper proposes the revised ASCII code. The study started with consideration whether the ASCII code is proper or not in the side of the transmission efficiency in data communications. In data communications, when the consecutive "0" bits from the information devices input into the line coder, its consecutive "0" bits are scrambled to the predetermined patterns not to the consecutive "0" signal. The paper used to study with the statistical data for the frequency of the letters of the alphabets and the proposed rule of characters coding in reference. As a result of the study, when the proposed ASCII code is applied, the operation efficiency of the scrambler in the line coder is improved upto average 30%.

A Study on the Language Independent Dictionary Creation Using International Phoneticizing Engine Technology (국제 음소 기술에 의한 언어에 독립적인 발음사전 생성에 관한 연구)

  • Shin, Chwa-Cheul;Woo, In-Sung;Kang, Heung-Soon;Hwang, In-Soo;Kim, Suk-Dong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.1E
    • /
    • pp.1-7
    • /
    • 2007
  • One result of the trend towards globalization is an increased number of projects that focus on natural language processing. Automatic speech recognition (ASR) technologies, for example, hold great promise in facilitating global communications and collaborations. Unfortunately, to date, most research projects focus on single widely spoken languages. Therefore, the cost to adapt a particular ASR tool for use with other languages is often prohibitive. This work takes a more general approach. We propose an International Phoneticizing Engine (IPE) that interprets input files supplied in our Phonetic Language Identity (PLI) format to build a dictionary. IPE is language independent and rule based. It operates by decomposing the dictionary creation process into a set of well-defined steps. These steps reduce rule conflicts, allow for rule creation by people without linguistics training, and optimize run-time efficiency. Dictionaries created by the IPE can be used with the Sphinx speech recognition system. IPE defines an easy-to-use systematic approach that can lead to internationalization of automatic speech recognition systems.

A Study of the framework of search patterns for Hangul characters and its relationship with Hangout code for Hangeul Character based Index (한글 글자 단위 인덱스를 위한 검색 유형 정의 및 한글 부호계와의 연관성에 관한 연구)

  • Lee, Jung-Hwa;Lee, Jong-Min;Kim, Seong-Woo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.6
    • /
    • pp.1083-1088
    • /
    • 2007
  • In this paper, We investigate the search patterns that are applied to the character based word search and make the search algorithm. We used to various hangout coded set that are KS X 1001 hangeul coded set and unicode 3.0 for the character based word search algorithm. In each case, We study of efficiency of algorithms that are related to hangeul coded set.

A Character Shape Encoding Method to Input Chinese Characters in Old Documents (고문헌 벽자(僻字) 입력을 위한 한자 자형 부호화 방법)

  • Kim, Kiwang
    • Journal of Korean Medical classics
    • /
    • v.32 no.1
    • /
    • pp.105-116
    • /
    • 2019
  • Objectives : There are many secluded Chinese characters - so called Byeokja (僻字) in ancient classic literature, and Chinese characters that are not registered in Unicode and Variant characters (heterogeneous characters) that cannot be found in the current font sets often appear. In order to register all possible Chinese characters including such characters as units of information exchange, this study attempts to propose a method to encode the morphological information of Chinese characters according to certain rules. Methods : This study suggests the methods to encode the connection between the nodules constituting the Chinese character and the coordinates of the nodules. In addition to that, rules for expressing information about curves, expressions of aspect ratios of characters, rules for minimizing coordinate lines, and rules for expressing aggregation status of character components are added. Results : Through the proposed method, it is possible to generate codes of a certain length by extracting only information expressing the morphological configuration of characters. Conclusions : The method of character encoding proposed in this study can be used to distinguish variant characters with small variations in Byeokja, new Chinese characters and character strokes and to store and search them.

Source Coding Rule of Characters to Minimize HDB-3 Scrambling in Line Coder for UTF-8 code (UTF-8 부호의 HDB-3스크램블링 최소화를 위한 문자의 원천부호화 규칙)

  • Hong, Wan-Pyo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.9
    • /
    • pp.1019-1026
    • /
    • 2015
  • This paper studied the source coding rule of the characters to minimize the HDB-3 scrambling for UTF-8 code. An existing source coding rule of the characters to minimize the HDB-3 scrambling in the line coder is for the source codes which are directly entered into the line coder without any transformation. Therefore the existing source coding rule can't apply the UTF-8 code which is directly came into an input of line coder. The reason is that the scrambling code in the source codes are not same as UTF-8 codes. So, if they want to analysis the scrambling occurrence situation in UTF-8 codes and make an unscrambling UTF-8 code, they should make a UTF-8 code table for the source codes, find out the scrambling occurrence codes and then encode the unscrambling source code. The source coding rule for UTF-8 code showing this paper can omit such a complicated procedure to encode an unscrambling source code.

A Text Processing Method for Devanagari Scripts in Andriod (안드로이드에서 힌디어 텍스트 처리 방법)

  • Kim, Jae-Hyeok;Maeng, Seung-Ryol
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.12
    • /
    • pp.560-569
    • /
    • 2011
  • In this paper, we propose a text processing method for Hindi characters, Devanagari scripts, in the Android. The key points of the text processing are to device automata, which define the combining rules of alphabets into a set of syllables, and to implement a font rendering engine, which retrieves and displays the glyph images corresponding to specific characters. In general, an automaton depends on the type and the number of characters. For the soft-keyboard, we designed the automata with 14 consonants and 34 vowels based on Unicode. Finally, a combined syllable is converted into a glyph index using the mapping table, used as a handle to load its glyph image. According to the multi-lingual framework of Freetype font engine, Dvanagari scripts can be supported in the system level by appending the implementation of our method to the font engine as the Hindi module. The proposed method is verified through a simple message system.

Inclusion of the Traditional Korean Medical Terms into the UMLS (한의학 용어의 UMLS 등재 - KIOM 용어정제연구 중 경혈명(經穴名)을 중심으로-)

  • Kim, Jin-hyun;Kim, Sang-kyun;Jang, Hyunchul;Kim, Minah;Oh, Yong-taek;Bae, Sunhee;Kim, Changseok;Jeon, Byounguk;Kim, Jae-Hun;Song, Mi-young
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2011.05a
    • /
    • pp.185-186
    • /
    • 2011
  • 본 연구의 목적은 미국국립의학도서관(NLM)의 Unified Medical Language System(UMLS)에 한의학의 용어를 등재하는 것이다. 1차 등재 대상으로 한국한의학연구원(KIOM)에서 진행중인 한의학 용어 정제 연구 결과물 중 임(任) 독맥(督脈)과 12정경(正經)에 속하는 360개의 경혈명(經穴名)의 Concept Unique Identify(CUI)를 선정하였다. UMLS Knowledge Source Server (UMLSKS)의 메타시소러스 검색을 통해 UMLS 내 경혈(經穴)용어에 대한 terminology정보를 수집, 분석하였다. 이를 바탕으로 용어간 개념 비교를 통해 UMLS의 경혈(經穴)용어와 KIOM의 경혈(經穴)용어를 매핑하였다. 마지막으로 Rich Release Format(RRF)로 데이터를 저장하고, Unicode Transformation Format-8 (UTF-8)로 인코딩하여 NLM으로 송부하였다. NLM에서 용어 등재에 대한 적합성 여부를 판별한 후 2010AB버젼에 "TKMT2010"이라는 소스명으로 한글로 된 한의학 용어가 최초로 UMLS에 등재되었다. 향후 UMLS와 같은 국제표준의학용어와 연계를 통해 한의학 용어가 다양한 의학용어체계와 상호호완성을 확보하고 표준화, 세계화할 수 있도록 지속적인 연구가 필요하다.

  • PDF

A Study on Data Sharing Codes Definition of Chinese in CAI Application Programs (CAI 응용프로그램 작성시 자료공유를 위한 한자 코드 체계 정의에 관한 연구)

  • Kho, Dae-Ghon
    • Journal of The Korean Association of Information Education
    • /
    • v.2 no.2
    • /
    • pp.162-173
    • /
    • 1998
  • Writing a CAI program containing Chinese characters requires a common Chinese character code to share information for educational purposes. A Chinese character code setting needs to allow a mixed use of both vowel and stroke order, to represent Chinese characters in simplified Chinese as well as in Japanese version, and to have a conversion process for data exchange among different sets of Chinese codes. Waste in code area is expected when vowel order is used because heteronyms are recognized as different. However, using stroke order facilitates in data recovery preventing duplicate code generation, though it does not comply with the phonetic rule. We claim that the first and second level Chinese code area needs to be expanded as much as academic and industrial circles have demanded. Also, we assert that Unicode can be a temporary measure for an educational code system due to its interoperability, expandability, and expressivity of character sets.

  • PDF

Design and Implementation of Conversion System Between ISO/IEC 10646 and Multi-Byte Code Set (ISO/IEC 10646과 멀티바이트 코드 세트간의 변환시스템의 설계 및 구현)

  • Kim, Chul
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.4
    • /
    • pp.319-324
    • /
    • 2018
  • In this paper, we designed and implemented a code conversion method between ISO/IEC 10646 and the multi-byte code set. The Universal Multiple-Octet Coded Character Set(UCS) provides codes for more than 65,000 characters, huge increase over ASCII's code capacity of 128 characters. It is applicable to the representation, transmission, interchange, processing, storage, input and presentation of the written form of the language throughout the world. Therefore, it is so important to guide on code conversion methods to their customers during customer systems are migrated to the environment which the UCS code system is used and/or the current code systems, i.e., ASCII PC code and EBCDIC host code, are used with the UCS together. Code conversion utility including the mapping table between the UCS and IBM new host code is shown for the purpose of the explanation of code conversion algorithm and its implementation in the system. The programs are successfully executed in the real system environments and so can be delivered to the customer during its migration stage from the UCS to the current IBM code system and vice versa.