• Title/Summary/Keyword: Abbreviations

Search Result 51, Processing Time 0.023 seconds

Abbreviation Disambiguation using Topic Modeling (토픽모델링을 이용한 약어 중의성 해소)

  • Woon-Kyo Lee;Ja-Hee Kim;Junki Yang
    • Journal of the Korea Society for Simulation
    • /
    • v.32 no.1
    • /
    • pp.35-44
    • /
    • 2023
  • In recent, there are many research cases that analyze trends or research trends with text analysis. When collecting documents by searching for keywords in abbreviations for data analysis, it is necessary to disambiguate abbreviations. In many studies, documents are classified by hand-work reading the data one by one to find the data necessary for the study. Most of the studies to disambiguate abbreviations are studies that clarify the meaning of words and use supervised learning. The previous method to disambiguate abbreviation is not suitable for classification studies of documents looking for research data from abbreviation search documents, and related studies are also insufficient. This paper proposes a method of semi-automatically classifying documents collected by abbreviations by going topic modeling with Non-Negative Matrix Factorization, an unsupervised learning method, in the data pre-processing step. To verify the proposed method, papers were collected from academic DB with the abbreviation 'MSA'. The proposed method found 316 papers related to Micro Services Architecture in 1,401 papers. The document classification accuracy of the proposed method was measured at 92.36%. It is expected that the proposed method can reduce the researcher's time and cost due to hand work.

Searching for Variants Using Trie-Index (트라이 인덱스를 이용한 이형태 검색)

  • Park, In-Cheol
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.8
    • /
    • pp.1986-1992
    • /
    • 2009
  • A user often searches a data by inputting a variant such as the abbreviation or substring of a word, or a misspelled word. The simple approach to the searching for variants is to build a variants dictionary. However, it entails enormous cost and time and can not handle variants by misspelling. Approximate searching, searching by approximate string matching, is a good approach to the searching. A problem in the approach is that it cannot handle variants by abbreviations. This paper propose a method for searching various variants including abbreviations and misspelled words, by using the trie indexing. First, this paper shows a variant matching method with the calculation of path weighted-metric. In addition, it provides variant searching algorithm to reduce the search time.

A review on the method of coined words by Korean and Chinese characters (한·중 인물지칭 신어 조어방식에 관한 고찰 - 2017년과 2018년을 중심으로 -)

  • Wang, Yan
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.3
    • /
    • pp.178-185
    • /
    • 2022
  • This study compared and analyzed the characteristics of new words by classifying 197 newly coined Korean and Chinese characters in 2017 and 2018 into single, compound, derivative, abbreviated, and hybrid words according to the coined method. In the case of a single language, Korean is all words borrowed from Chinese and English. However, no monolingual language appeared in Chinese. In the case of compound words, the format of the Chinese synthesis method was much more diverse and the generative power was stronger than that of Korea. In the case of derivatives, there are not many prefixes in both countries, and Korean suffixes have the strongest productivity of Chinese suffixes and weak productivity of foreign and native suffixes. Korean foreign language suffixes were characterized by relatively more appearance than Chinese. In the case of abbreviations, it can be seen that the productivity of dark syllables is stronger for Korean abbreviations, and the productivity of empty syllables is stronger for Chinese abbreviations. In the case of mixed languages, the hybrid form of Korean was much more diverse than that of Chinese. Through this study, it will be possible to help Chinese Korean learners understand the process of forming a new language, and to develop their ability to guess the meaning of Korean words while learning a new language.

Language- Independent Sentence Boundary Detection with Automatic Feature Selection

  • Lee, Do-Gil
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.4
    • /
    • pp.1297-1304
    • /
    • 2008
  • This paper proposes a machine learning approach for language-independent sentence boundary detection. The proposed method requires no heuristic rules and language-specific features, such as part-of-speech information, a list of abbreviations or proper names. With only the language-independent features, we perform experiments on not only an inflectional language but also an agglutinative language, having fairly different characteristics (in this paper, English and Korean, respectively). In addition, we obtain good performances in both languages. We have also experimented with the methods under a wide range of experimental conditions, especially for the selection of useful features.

  • PDF

An LSTM Method for Natural Pronunciation Expression of Foreign Words in Sentences (문장에 포함된 외국어의 자연스러운 발음 표현을 위한 LSTM 방법)

  • Kim, Sungdon;Jung, Jaehee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.4
    • /
    • pp.163-170
    • /
    • 2019
  • Korea language has postpositions such as eul, reul, yi, ga, wa, and gwa, which are attached to nouns and add meaning to the sentence. When foreign notations or abbreviations are included in sentences, the appropriate postposition for the pronunciation of the foreign words may not be used. Sometimes, for natural expression of the sentence, two postpositions are used with one in parentheses as in "eul(reul)" so that both postpositions can be acceptable. This study finds examples of using unnatural postpositions when foreign words are included in Korean sentences and proposes a method for using natural postpositions by learning the final consonant pronunciation of nouns. The proposed method uses a recurrent neural network model to naturally express postpositions connected to foreign words. Furthermore, the proposed method is proven by learning and testing with the proposed method. It will be useful for composing perfect sentences for machine translation by using natural postpositions for English abbreviations or new foreign words included in Korean sentences in the future.

Analysis and Implications of Twitter Data during the 2012 Election

  • Yun, Hongwon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.19 no.6
    • /
    • pp.7-13
    • /
    • 2014
  • Twitter is a microblogging service that allows users to post short messages on a variety of topics in real-time. In this work, we analyze Twitter messages posted during the 2012 elections and find those implications. This study uses Twitter messages related to the 2012 South Korean presidential campaign. The three main candidates are represented by the abbreviations A, M, and P. According to the statistical analysis, the number of tweets and re-tweets for candidate P was relatively stable over the entire campaign period. Candidate P had the highest percentage of terms related to elections pledges, and candidates A and M were judged to be a little bit poorer with respect to campaign promises. The positive terms ratio for candidate P was higher than those for the other two candidates. The negative terms ratio in the Twitter messages of P was considerably smaller than those of candidates A and M. After considering all these results, it is suggested cautiously that Twitter messages posted during an election campaign could be correlated with the outcome of the election.

Improved Algorithms for the Identification of Yeast Proteins and Significant Transcription Factor and Motif Analysis

  • Lee Seung-Won;Hong Seong-Eui;Lee Kyoo-Yeol;Choi Do-Il;Chung Hae-Young;Hur Cheol-Goo
    • Genomics & Informatics
    • /
    • v.4 no.2
    • /
    • pp.87-93
    • /
    • 2006
  • With the rapid development of MS technologiesy, the demands for a more sophisticated MS interpretation algorithm haves grown as well. We have developed a new protein fingerprinting method using a binomial distribution, (fBIND). With the fBIND, we improved the performance accuracy of protein fingerprinting up to the maximum 49% (more than MOWSE) and 2% than(at a previous binomial distribution approach studied by of Wool et al.) as compared to the established algorithms. Moreover, we also suggest a the statistical approach to define the significance of transcription factors and motifs in the identified proteins based on the Gene Ontology (GO). Abbreviations: fBIND, fingerprinting using binomial distribution; GO, Gene Ontology; MS, Mass Spectrometry; PMF, peptide mass fingerprinting; nr, nonredundant; SGD, Saccharomyces Genome Database

Normalization of Clinical Medical Records by Disambiguating Abbreviations and Acronyms (약어와 두문자어의 모호성 해결을 통한 임상 의무기록의 정규화)

  • Inho Bae;Jin-Sang Kim;Yoon-Nyun Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.676-678
    • /
    • 2008
  • 임상 의무기록에 나타나는 많은 두문자어들은 기계적인 처리과정에서 의무기록의 모호성을 크게 증가시키기 때문에, 정보추출이나 텍스트 마이닝을 하기 전에 전처리 과정으로 의무기록이 정규화 되어야 한다. 본 연구에서는 임상 의무기록 중 하나인 퇴원요약지에 사용된 약어와 두문자어들의 모호성을 제거하기 위한 정규화 시스템을 설계하고 구현했다. 정규화를 위해 문맥정보를 이용하여 의무기록의 종류와 기록내 위치정보를 파악하였고 이를 이용하여 약어와 두문자어의 의미를 학습하고 분류하였다. 본 연구에서 구현한 정규화 시스템은 실험에서 6가지 두문자어들이 가지는 16가지 의미들에 대해 94.7%의 정확률을 얻었다.

문헌정보학과 WWW홈페이지의 필요성과 준비에 관한 연구- 한국과 북미주 지역 대학을 중심으로 -

  • 박일종
    • Journal of Korean Library and Information Science Society
    • /
    • v.24
    • /
    • pp.413-448
    • /
    • 1996
  • Nowadays, the World Wide Web (WWW) has become an important resource of timely information for the information-related people such as information scientists, librarians, and students in Library and Information Sciences area. They are information professionals who navigate the information on the internet. Also, they need to be information providers who build a WWW homepage. This paper is a study of the necessity, preparation, and building WWW Homepage files for school of Library and Information Sciences in the age of competition among disciplines. It is particularly focused on the colleges and universities in Republic of Korea (ROK) and North America area. The purpose of this study is to provide various kinds of reference information to prepare a homepage in the future as utilizing information on the internet effectively. Even though a Web page was necessary for a school of Library and Information Sciences to show news, and introduce the purpose of the disciplines and the curriculum of the school, and the professors of a class etc., it was not well-prepared yet in ROK. However, a web page was used well enough and prosperous in North America area (Canada, the United States, and Puerto Rico) comparatively. Those web pages were analyzed and studied to prepare for a good designing of homepages for school of Library and Information Sciences in Korea and for the age of competition among disciplines in this paper. Suggestions for designing a good homepage and guidelines for preparing a best one were studied after both reviewing literature and utilizing experiences by the author who currently serves in the School of Library and Information Sciences in Keimyung University and builds homepage for the school. As a result, the major suggestions are ; premiered, and they are as follows: (1) English version of a homepage is necessary, (2) Provide a multimedia presentation about the nature of a school (3) Incorporate a place to let people make suggestions on the contents (of a homepage), (4) Bear in mind that potential users must be familiar with abbreviations you used, (5) Absolutely do not use abbreviations that may make the content more difficult to understand, (6) Add a feature on the every single page that will take the user back to the main page, (7) Use clear, short and well-structured sentences and remember to divide text into paragraphs, (8) Date with a creation or modification date in the homepage to indicate the updated date, (9) Being a multimedia environment, use colors effectively (the guidelines were also suggested), and (10) Put colleges' name into the title of bookmarks to find out them easily.

  • PDF

A Comparative Study on New Words of Korean and Chinese According to Changes in Popular Culture Contents (대중문화 콘텐츠 변화에 따른 한중 신조어 비교 연구)

  • Meng, Xiang-Shan;Lee, Kwang-Ho
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.6
    • /
    • pp.125-137
    • /
    • 2020
  • The purpose of this study is to analyze new words in Korean and Chinese based on changes in popular culture. As China and Korea embrace increasingly close communication in recent years, their languages have influenced each other. A lot of new Korean and Chinese words have been discovered to have the same linguistic characteristics. New words are considered as new developments of a language. They are welcomed and widely used by young people in Korea and China. Therefore, in terms of the communicative function of languages, it is worthwhile to understand new words in Korean and Chinese from the perspective of academic research. This study takes Chinese words created in 2018 as the research object. Firstly, a morphological and semantic comparison of Chinese words created in 2018 and those created in 2017 is carried out to extract the characteristic indicators of Chinese words created in 2018, with emphasis on compound words, abbreviations, substitutions, patters and rhetorical expressions. Secondly, the similarities and differences of these Chinese words with Korean words created in 2018 in terms of morphology are analyzed. Finally, after conducting sample classification and comparison, the characteristics of new Chinese and Korean words and the interaction mechanism under mutual influence are concluded. According to the study, the majority of the new words are created on the basis of existing words. Thus, it is important to explore the morphology of new words as a standard language.