• Title/Summary/Keyword: Unicode characters

Search Result 29, Processing Time 0.032 seconds

Study on the Chinese Character Use in Acupuncture & Moxibustion Textbook (침구학 교재에서의 한자사용 분석연구)

  • Chae, Han;Hwang, Sang-Moon;Lee, Byung-Wook;Yang, Gi-Young;Lee, Byung-Ryul;Kim, Jae-Kyu
    • Journal of Acupuncture Research
    • /
    • v.27 no.4
    • /
    • pp.187-194
    • /
    • 2010
  • Objectives : There has been a need for establishing operational curriculum for chinese characters and chinese writing used by traditional Korean medicine(TKM), but it was not thoroughly recognized so far. Methods : We analysed the usage of unicode chinese characters of acupuncture & moxibustion textbook to recognize the prerequisite chinese characters for TKM studies as clinical perspectives. Results : It was found that 穴, 經, 鍼, 法, 寸, 部, 分, 刺, 下, 上, 中, 位, 氣, 陽, 灸, 脈, 陰, 治, 足, 主 are the most frequently used 20 chinese characters. We also showed that adequate prerequisite chinese character should be designated for the more efficient education of TKM. Conclusions : This study was the first systematic approach to get essential and prerequisite chinese characters for the education of TKM especially for the acupuncture & moxibustion. The prerequisite characters by this study will be used for the development of KEET (Korean Medicine Education Eligibility Test), entrance exam to the Colleges of Oriental Medicine and textbooks, and educational curriculum of premed students.

A Character Shape Encoding Method to Input Chinese Characters in Old Documents (고문헌 벽자(僻字) 입력을 위한 한자 자형 부호화 방법)

  • Kim, Kiwang
    • Journal of Korean Medical classics
    • /
    • v.32 no.1
    • /
    • pp.105-116
    • /
    • 2019
  • Objectives : There are many secluded Chinese characters - so called Byeokja (僻字) in ancient classic literature, and Chinese characters that are not registered in Unicode and Variant characters (heterogeneous characters) that cannot be found in the current font sets often appear. In order to register all possible Chinese characters including such characters as units of information exchange, this study attempts to propose a method to encode the morphological information of Chinese characters according to certain rules. Methods : This study suggests the methods to encode the connection between the nodules constituting the Chinese character and the coordinates of the nodules. In addition to that, rules for expressing information about curves, expressions of aspect ratios of characters, rules for minimizing coordinate lines, and rules for expressing aggregation status of character components are added. Results : Through the proposed method, it is possible to generate codes of a certain length by extracting only information expressing the morphological configuration of characters. Conclusions : The method of character encoding proposed in this study can be used to distinguish variant characters with small variations in Byeokja, new Chinese characters and character strokes and to store and search them.

A Unicode based Deep Handwritten Character Recognition model for Telugu to English Language Translation

  • BV Subba Rao;J. Nageswara Rao;Bandi Vamsi;Venkata Nagaraju Thatha;Katta Subba Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.101-112
    • /
    • 2024
  • Telugu language is considered as fourth most used language in India especially in the regions of Andhra Pradesh, Telangana, Karnataka etc. In international recognized countries also, Telugu is widely growing spoken language. This language comprises of different dependent and independent vowels, consonants and digits. In this aspect, the enhancement of Telugu Handwritten Character Recognition (HCR) has not been propagated. HCR is a neural network technique of converting a documented image to edited text one which can be used for many other applications. This reduces time and effort without starting over from the beginning every time. In this work, a Unicode based Handwritten Character Recognition(U-HCR) is developed for translating the handwritten Telugu characters into English language. With the use of Centre of Gravity (CG) in our model we can easily divide a compound character into individual character with the help of Unicode values. For training this model, we have used both online and offline Telugu character datasets. To extract the features in the scanned image we used convolutional neural network along with Machine Learning classifiers like Random Forest and Support Vector Machine. Stochastic Gradient Descent (SGD), Root Mean Square Propagation (RMS-P) and Adaptative Moment Estimation (ADAM)optimizers are used in this work to enhance the performance of U-HCR and to reduce the loss function value. This loss value reduction can be possible with optimizers by using CNN. In both online and offline datasets, proposed model showed promising results by maintaining the accuracies with 90.28% for SGD, 96.97% for RMS-P and 93.57% for ADAM respectively.

Analysis of Korean Language to Optimize the Hangul Character Coding for Information Processing and Communication (한글의 정보처리 및 통신용 부호 최적화를 위한 한국어 분석)

  • Hong, Wan-Pyo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.3
    • /
    • pp.375-380
    • /
    • 2015
  • This paper is studied the Korean language to optimize the Hangul character coding for information processing in information terminal device and transmission in network. The paper analyzed Hangul character in Korean language and use frequency of each character. The paper also compared the analysis result to Hangul characters which are coded in standard in Korean character and Unicode. This study referred "Modern Korean Use Frequency Rate Survey Result" issued by The National Institute of the Korean Language. There are total 58,437 Korean words in the report. As a result of this paper, the Korean word 58,437ea are consisted of Hangul character total 1,540ea. The highest use frequency character is "다" and its use frequency to total use frequency rate is 15%. The lowest use character is "휫"and its use frequency to total use frequency rate is 0.00003%. The number of analyzed Hangul character 1,540 is less 7.2 times and 1.5 times than Korean and Unicode standard respectively.

Hangul Font Editor based on Multiple Master Glyph Algorithm (다중 마스터 글리프 알고리즘을 적용한 한글 글꼴 에디터)

  • Lim, Soon-Bum;Kim, Hyun-Young;Chung, Hwaju;Park, Ki-Deok;Choi, Kyong-Sun
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.11
    • /
    • pp.699-705
    • /
    • 2015
  • Thousands of glyphs are necessary for Hangul font generation. It is mandatory to generate the required glyphs before producing Hangul font. This paper, entitled "Multiple Master Glyph Algorithm", presents an process that generates a target number of glyphs automatically from a very small number of glyphs by using a combination rule setting and a glyph interpolation method. A font editor, which is able to generate Hangul glyphs or fonts, is developed based on this algorithm. The editor generates a target number of fundamental glyphs automatically by using a combination rule setting and four master glyphs, which can be set up by a user. The automatically generated glyphs can be used to generate a target font by combining KSX1001 standard Hangul 2350 characters or Unicode standard Hangul 11172 characters automatically. The efficiency of the proposed Hangul editor is analyzed quantitatively in this paper through application to several commercial typefaces.

한국한자자형(韓國漢字字形)과 대만국자표준자체(臺灣國字標準字體) 비교 연구 - 교육용(敎育用) 기초한자(基礎漢字) 중학교용(中學校用) 900자(字)를 중심(中心)으로

  • Gang, Hye-Geun
    • 중국학논총
    • /
    • no.71
    • /
    • pp.1-22
    • /
    • 2021
  • 雖然韓國和臺灣使用的漢字都屬于繁體字, 但是有些字的字形並不太一樣。筆者對韓國漢字 (韓國敎育用基礎漢字中初中學用900個字) 和臺灣國字標準字體的字形進行了比較, 比較的第一個結果是字形相同的有628個, 佔69.8% ; 字形類似的有241個, 佔26.8% ; 字形不同的有31個, 佔3.4%。比較的第二個結果是韓國和臺灣使用的漢字中有19個"統一碼不同的漢字", 其中字形不同的有"脚腳, 强強, 擧舉, 鷄雞, 敎教, 旣既, 郞郎, 氷冰, 衆眾, 卽即, 窓窗, 靑青, 淸清"等13組 ; 字形類似的有"眞真, 産產, 尙尚, 顔顏, 飮飲"等5組。除此之外, 還有字形完全一模一樣的, 比如"晩"字。這些字應當特别注意一下。從臺灣教育部《國字標準字體教師手冊》第四項 "標準字體的研訂原則與實例"中的5個"硏訂基本原則", 40個"研訂通則"和120個"硏訂分則"可以知道, 臺灣確定的每个字形都是有憑有據的。同時也可以知道臺灣教育部確定字體的一個主要方向是"合乎字理", 它所依據的是《說文解字》的小篆字形的結構, 而韓國漢字的字形一般是沿用《康熙字典》的单字条目上的字形。

Study on the prerequisite Chinese characters for the education of traditional Korean medicine (한의학 교육을 위한 필수한자 추출 및 분석연구)

  • Hwang, Sang-Moon;Lee, Byung-Wook;Shin, Sang-Woo;Cho, Su-In;Yim, Yun-Kyoung;Chae, Han
    • Journal of Korean Medical classics
    • /
    • v.24 no.5
    • /
    • pp.147-158
    • /
    • 2011
  • There has been a need for an operational curriculum for teaching Chinese characters used by traditional Korean medicine (TKM), but the it was not thoroughly reviewed so far. We analysed the frequency of unicode Chinese characters with five textbooks of traditional Korean medicine used as a national standard. We found that 氣, 經, 陽, 陰, 不, 熱, 血, 脈, 病, 證, 寒, 中, 心, 痛, 虛, 大, 生, 治, 本, 之 are the 20 most frequently used Chinese characters, and also showed 100 frequently used characters for each textbook. We used a cumulative frequency analysis method to suggest a list of 1,000 prerequisite Chinese characters for the TKM education (TKM 1000). which represents the current usage of Chinese characters in TKM and covers 99% of all textbook use if combined with MEST 1800. This study showed prerequisite and essential Chinese characters for the implementation of evidence-based teaching in TKM. The TKM 1000, a prerequisite characters by this study based on the TKM textbooks can be used for the development of Korean Medicine Education Eligibility Test (KEET), entrance exam to the Colleges of Oriental Medicine or textbooks, and educational curriculum for premed students.

Development of EUC-KR based Locale and Application Program Supporting North Korean Collating Sequence (북한 한글 순서를 지원하는 EUC-KR 기반의 로캘과 응용 프로그램 개발)

  • Jung Il-dong;Lee Jung-hwa;Kim Yong-ho;Kim Kyongsok
    • The KIPS Transactions:PartB
    • /
    • v.11B no.7 s.96
    • /
    • pp.875-884
    • /
    • 2004
  • UCS (=ISO/IEC 10646, =Unicode) will be used widely as globalization. If UCS is used for official purpose in Koreas, UCS solves a Problem in different hangeul code between South and North Korea. But, UCS is not a solution for problems in unequal order with the same character. IS0/1EC 146sl : 2000 (International String Ordering), which is a international standard for string ordering, defines a framework sorting all char-acter strings consisting multi-national scripts. Because the Common Template Table in ISO/IEC 14651 defines orders of characters, we can change orders of characters without changes of characters sequences in programs. Therefore, we can solve a ordering problem without unifying order of hangeul in South and North Korea. Functions related ISO/IEC 14651 are contained by system librarys in unix-based operating system such as Linux, Solaris and FreeBSD. We implement EUC-KR-based North Korean locale, which includes North Korean hangeul order, in Linux in order to use North Korean locale in South Korea. And we develop a program ordering strings with South and North Korean hangout order.

Improvement plan for 'Newly found ideographs(新出漢字)' in the digitalizing business of the old Korean documents (고전 자료 디지털화사업에서의 신출한자 처리 개선방안)

  • Lee, Jeong-Hwa
    • Korean Journal of Oriental Medicine
    • /
    • v.10 no.1
    • /
    • pp.1-14
    • /
    • 2004
  • As entering the information age of the 21st century, Korea is actively processing many digitalizing businesses related to information source of the Korean academic science at the government level based on the Korean advanced digital technologies, which makes them more evolved through the internet networks in Korea. The definition of 'Newly found ideographs(新出漢字)' are made by researching and extracting from the old Chinese documents through the digitalizing process and they are not registered yet among the block of Unicode & extended Chinese characters those are existent international standard. Presently Korea is in the middle of brisk developing computerized old documents in the huge scale. Meanwhile, the international standard of Chinese characters in mostly Asian countries where using them is processing and developing by IRG. Therefore, Korean processing works is very important which are included extracting precisely 'Newly found ideographs' founded from building its database, organizing as an international standard code, submitting the International organization and finally registering as the best standard code.

  • PDF

A Sorting of Unicode 3.0 CJK Chinese Characters (유니코드 3.0의 CJK 한자 정렬)

  • 윤지헌;변정용
    • Proceedings of the Korea Multimedia Society Conference
    • /
    • 2000.04a
    • /
    • pp.462-465
    • /
    • 2000
  • 최근 많은 양의 문서가 전자화되어 컴퓨터에 저장되고 인터넷을 통하여 공유가 되고 있고, 그 범위를 고문헌에까지 넓혀가고 있다. 그러나 한자 문화권의 고문헌은 대부분 2만에서 3만여자의 한자로 작섣되어 있어서 한자 입력시 코드문제가 뒤따른다. 하지만 유니코드 3.0에서는 27,786자의 한자를 코드화 하여 놓아서 한자 문화권 나라에 많은 도움을 주고 있다. 하지만 한중일 3개국에서 많이 쓰이는 한자를 대상으로 하여 부수, 획수 순으로 정렬하여 국내 실정에 맞지 않고 그나마 유니코드 한자를 입력할 수 있는 환경도 MS Word 2000 정도로 제한적이다. 본 논문에서는 유니코드 3.0 한자 입력기에서 기본 한자 코드로 상요될 CJK 한자 영역에 배정된 한자를 정렬하는 방안을 제안하고 운영체제 독립적인 한자 입력 시스템에 활용한다.

  • PDF