• Title/Summary/Keyword: Unicode

Search Result 67, Processing Time 0.021 seconds

Study on the prerequisite Chinese characters for the education of traditional Korean medicine (한의학 교육을 위한 필수한자 추출 및 분석연구)

  • Hwang, Sang-Moon;Lee, Byung-Wook;Shin, Sang-Woo;Cho, Su-In;Yim, Yun-Kyoung;Chae, Han
    • Journal of Korean Medical classics
    • /
    • v.24 no.5
    • /
    • pp.147-158
    • /
    • 2011
  • There has been a need for an operational curriculum for teaching Chinese characters used by traditional Korean medicine (TKM), but the it was not thoroughly reviewed so far. We analysed the frequency of unicode Chinese characters with five textbooks of traditional Korean medicine used as a national standard. We found that 氣, 經, 陽, 陰, 不, 熱, 血, 脈, 病, 證, 寒, 中, 心, 痛, 虛, 大, 生, 治, 本, 之 are the 20 most frequently used Chinese characters, and also showed 100 frequently used characters for each textbook. We used a cumulative frequency analysis method to suggest a list of 1,000 prerequisite Chinese characters for the TKM education (TKM 1000). which represents the current usage of Chinese characters in TKM and covers 99% of all textbook use if combined with MEST 1800. This study showed prerequisite and essential Chinese characters for the implementation of evidence-based teaching in TKM. The TKM 1000, a prerequisite characters by this study based on the TKM textbooks can be used for the development of Korean Medicine Education Eligibility Test (KEET), entrance exam to the Colleges of Oriental Medicine or textbooks, and educational curriculum for premed students.

A minimal pair searching tool based on dictionary (사전 기반 최소대립쌍 검색 도구)

  • Kim, Tae-Hoon;Lee, Jae-Ho;Chang, Moon-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.2
    • /
    • pp.117-122
    • /
    • 2014
  • The minimal pairs mean the pairs that have same phonotactics except just one sound in the sequences cause different lexical items. This paper proposes the searching tool of minimal pairs for efficiency of phonological researches with minimal pairs. We suggest a guide to develop Korean minimal pair searching programs by comparing to other programs. Proposing tool has user-friendly interface, minimizing key inputs, for linguistics who are not fluent in computer programs. And it serves the function which classifies the words in dictionary for the detailed researches. And for efficiency, it increases speed of dictionary loading by separating syllables through Unicode analysis, and optimizes dictionary structure for searching efficiency. The searching algorithm gains in speed by hashing algorithm using syllable counts. In our tool, the speed is improved more than earlier version about 5 times at converting dictionary and about 3 times at searching.

A Study on the Development of Digital Library Model for PUST in North Korea (북한 PUST 디지털도서관 모델 개발 연구)

  • Lee, Jong-Moon
    • Journal of the Korean Society for information Management
    • /
    • v.25 no.3
    • /
    • pp.143-158
    • /
    • 2008
  • This study was conducted under the premise of providing the model for the construction of the library and the digital library in PUST, the joint construction from South and North Korea. First, we determined the problems in the construction of digital libraries as well as possible issues that may rise from the construction of the digital library in PUST. The results of the research showed that there were imminent problems from the operation of a digital library with the inadequate progress in the Held of copyright. In addition, the difference in the language system and the knowledge foundations of the two countries will produce problems in the homepage access, database construction, and information retrieval. In order to overcome these predictable problems, this research proposes the following: (1) parallel operation of both digital and high-drive libraries; (2) duplexing the homepage through the application of unicode concerning the digital library; (3) development and application of converted letter codes through the establishment of NCHAR data type; and (4) construction of an authority database.

Research on Methods for Processing Nonstandard Korean Words on Social Network Services (소셜네트워크서비스에 활용할 비표준어 한글 처리 방법 연구)

  • Lee, Jong-Hwa;Le, Hoanh Su;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.3
    • /
    • pp.35-46
    • /
    • 2016
  • Social network services (SNS) that help to build relationship network and share a particular interest or activity freely according to their interests by posting comments, photos, videos,${\ldots}$ on online communities such as blogs have adopted and developed widely as a social phenomenon. Several researches have been done to explore the pattern and valuable information in social networks data via text mining such as opinion mining and semantic analysis. For improving the efficiency of text mining, keyword-based approach have been applied but most of researchers argued the limitations of the rules of Korean orthography. This research aims to construct a database of non-standard Korean words which are difficulty in data mining such abbreviations, slangs, strange expressions, emoticons in order to improve the limitations in keyword-based text mining techniques. Based on the study of subjective opinions about specific topics on blogs, this research extracted non-standard words that were found useful in text mining process.

A Research on Cool URI based on Internationalized Resource Identifier for Web 2.0 (웹 2.0을 위한 다국어 식별자 기반의 Cool URI에 대한 연구)

  • Jung, Eui-Hyun;Kim, Weon;Song, Kwan-Ho;Park, Chan-Ki
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.5 s.43
    • /
    • pp.223-230
    • /
    • 2006
  • Web 2.0 and Semantic Web technology will be merged to be a next generation Web that leads presentation-oriented Web to data-centric Web. In the next generation Web, semantic processing, Web platform, and data fusion are most important technology factors. Among them, Cool URI used for data fusion can provide permanent and human readable URI and it has been already used in Blog system. However, Cool URI is not suitable to I18N environment such as Hangul and it is difficult to be adopted because several encodings are mingled in Korea Web society. In this paper, we discussed technology issues and Cool URI Web component to use Cool URI with Internationalized Resource Identifier (IRI). A proposed approach provides same function regardless of encoding type and supports both file system based and CGI based methods to be easily used from other applications. Results from several test environments show the implemented Web component satisfy the design purpose.

  • PDF

Language-Independent Word Acquisition Method Using a State-Transition Model

  • Xu, Bin;Yamagishi, Naohide;Suzuki, Makoto;Goto, Masayuki
    • Industrial Engineering and Management Systems
    • /
    • v.15 no.3
    • /
    • pp.224-230
    • /
    • 2016
  • The use of new words, numerous spoken languages, and abbreviations on the Internet is extensive. As such, automatically acquiring words for the purpose of analyzing Internet content is very difficult. In a previous study, we proposed a method for Japanese word segmentation using character N-grams. The previously proposed method is based on a simple state-transition model that is established under the assumption that the input document is described based on four states (denoted as A, B, C, and D) specified beforehand: state A represents words (nouns, verbs, etc.); state B represents statement separators (punctuation marks, conjunctions, etc.); state C represents postpositions (namely, words that follow nouns); and state D represents prepositions (namely, words that precede nouns). According to this state-transition model, based on the states applied to each pseudo-word, we search the document from beginning to end for an accessible pattern. In other words, the process of this transition detects some words during the search. In the present paper, we perform experiments based on the proposed word acquisition algorithm using Japanese and Chinese newspaper articles. These articles were obtained from Japan's Kyoto University and the Chinese People's Daily. The proposed method does not depend on the language structure. If text documents are expressed in Unicode the proposed method can, using the same algorithm, obtain words in Japanese and Chinese, which do not contain spaces between words. Hence, we demonstrate that the proposed method is language independent.

Clustering Performance Analysis of Autoencoder with Skip Connection (스킵연결이 적용된 오토인코더 모델의 클러스터링 성능 분석)

  • Jo, In-su;Kang, Yunhee;Choi, Dong-bin;Park, Young B.
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.12
    • /
    • pp.403-410
    • /
    • 2020
  • In addition to the research on noise removal and super-resolution using the data restoration (Output result) function of Autoencoder, research on the performance improvement of clustering using the dimension reduction function of autoencoder are actively being conducted. The clustering function and data restoration function using Autoencoder have common points that both improve performance through the same learning. Based on these characteristics, this study conducted an experiment to see if the autoencoder model designed to have excellent data recovery performance is superior in clustering performance. Skip connection technique was used to design autoencoder with excellent data recovery performance. The output result performance and clustering performance of both autoencoder model with Skip connection and model without Skip connection were shown as graph and visual extract. The output result performance was increased, but the clustering performance was decreased. This result indicates that the neural network models such as autoencoders are not sure that each layer has learned the characteristics of the data well if the output result is good. Lastly, the performance degradation of clustering was compensated by using both latent code and skip connection. This study is a prior study to solve the Hanja Unicode problem by clustering.