A Study of automatic indexing based on the linguistic analysis for newspaper articles (언어학적 분석기법에 의한 신문기사 자동색인시스팀 설계에 관한 연구)

  • Seo, Gyeong-Ju;SaGong, Cheol
    • Journal of the Korean Society for information Management
    • v.8 no.1
    • pp.78-99
    • 1991
  • So far, most of Korea's newspapers indexing have been done manually using tesaurus. In recent years, however, the need for automatic indexing system has grown stronger so as for indexers to save time, efforts and money. And some newspapers have started establishing their databases along with introducing electronic newspapers and CTS. This thesis is on establishing and automatic indexing system for the full-text of the Korea Economic Daily's articles, which have been accumulated in its database, KETEL. In my thesis, I suggest methods to create a keyword file, a stopword list, an auxiliary word list and an infected word list by applying linguistic analysis methods to Hangul, taking advantage of the language's morphological peculiarity. Through these studies, I was able to reach four conclusions as follows. First, we can obtain satisfactory keywords by automatic indexing methods that were made through morphological analysis. Second, an indexer can improve the efficiency of indexing work by controlling extracted vocabulary, as syntax analysis and semantic analysis is not complete in Hangul. Third, The keyword file in this system which is made of about 20,000 most-frequently-used newspaper terms can be used in the future in compiling a thesaurus. Finally, the suggested methods to prepare an auxiliary word list and an infected word list can be applicable to designing other automatic systems.

Nonlinear Vector Alignment Methodology for Mapping Domain-Specific Terminology into General Space (전문어의 범용 공간 매핑을 위한 비선형 벡터 정렬 방법론)

  • Kim, Junwoo;Yoon, Byungho;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • v.28 no.2
    • pp.127-146
    • 2022
  • Recently, as word embedding has shown excellent performance in various tasks of deep learning-based natural language processing, researches on the advancement and application of word, sentence, and document embedding are being actively conducted. Among them, cross-language transfer, which enables semantic exchange between different languages, is growing simultaneously with the development of embedding models. Academia's interests in vector alignment are growing with the expectation that it can be applied to various embedding-based analysis. In particular, vector alignment is expected to be applied to mapping between specialized domains and generalized domains. In other words, it is expected that it will be possible to map the vocabulary of specialized fields such as R&D, medicine, and law into the space of the pre-trained language model learned with huge volume of general-purpose documents, or provide a clue for mapping vocabulary between mutually different specialized fields. However, since linear-based vector alignment which has been mainly studied in academia basically assumes statistical linearity, it tends to simplify the vector space. This essentially assumes that different types of vector spaces are geometrically similar, which yields a limitation that it causes inevitable distortion in the alignment process. To overcome this limitation, we propose a deep learning-based vector alignment methodology that effectively learns the nonlinearity of data. The proposed methodology consists of sequential learning of a skip-connected autoencoder and a regression model to align the specialized word embedding expressed in each space to the general embedding space. Finally, through the inference of the two trained models, the specialized vocabulary can be aligned in the general space. To verify the performance of the proposed methodology, an experiment was performed on a total of 77,578 documents in the field of 'health care' among national R&D tasks performed from 2011 to 2020. As a result, it was confirmed that the proposed methodology showed superior performance in terms of cosine similarity compared to the existing linear vector alignment.

A Study on the Classification System of KDC for School Libraries - Focused on Vocabulary Analysis of Elementary Materials - (학교도서관을 위한 KDC 분류체계에 관한 연구 - 초등학생관련 문헌의 어휘분석을 중심으로 -)

  • Kim, Jeong-Hyen
    • Journal of Korean Library and Information Science Society
    • v.35 no.4
    • pp.171-191
    • 2004
  • This study presents revision scheme of Korean Decimal Classification appropriate for classification of children-related materials, mainly centered on social science(300) and pure science(400) occupying the majority of children-related materials in school Libraries. Towards this goal, 1 have studied the development and use of classification system for children-related materials available in domestic and overseas school libraries or children's libraries, and researched elementary school 4th, 5th, and 6th grade students' degree of understanding on classification item terms and children-related materials terms used for KDC's social science and Pure science. Based on the results of analysis, f have presented revision scheme of Korean Decimal Classification item terms and class numbers for children-related materials.

Extraction method of spatial relation by analyzing location tag in folksonomy (폭소노미에서 위치태그 분석을 통한 공간관계 추출 기법)

  • Choi, Yun-Hee;Yong, Hwan-Seung
    • Journal of Korea Multimedia Society
    • v.12 no.8
    • pp.1043-1054
    • 2009
  • As the semantic web receives higher concern with an intensified necessity in these days, the research on the ontology as its core technology has been carried out in various fields. The ontology has been adopted as an alternative to work out lots of problematic issues resulted from the insufficient vocabulary selection rules in folksonomy, widely accepted under Web 2.0. Therefore the importance of research to complementarily consolidate the two disciplines, the folksonomy and the ontology, has been increased. Based on this idea this research proposes a system, which pulls out, using open services, the location information tags from folksonomy-based metadata, ultimately extracts, following location information analyses, spatial relationships among tags, and in turn automatically constructs self-correcting location information domain ontology. The system devised in this study will associate data derived from easily accessible folksonomy with meaningful and technological information from ontology.

A Comparative Study of the Chinese Characters education in Korea and China (한·중 한자교육 비교)

  • Yu, Hyuna
    • Cross-Cultural Studies
    • v.27
    • pp.415-434
    • 2012
  • The Hanja used in Korean are traditional Chinese characters, but what Chinese people use now is simplified characters. So, there are differences in pronunciation and meaning between the characters used by Korean and Chinese. More than 70% of the Korean language vocabulary derived from or were influenced by hanja. For the inheritance and development of traditional culture,and for the communication among countries of the Chinese characters cultural circle in Northeast Asia, should we build up an authentic Chinese education system. But the government hasn't pay much attention to this work, and the government's policy can't implement the efficient education. Consequently, in these days, there are more and more Korean people who are functionally illiterate in Chinese. Recently, proficiency tests of Chinese characters are expected to promote the development of Chinese education. But, most Koreans' motives for Chinese study are usually to pass the college entrance exam or to compete for jobs. However, after passing the test, the motive for studying gradually fade away. It is the basic problem faced by Korean Chinese character education. Since the 1950s, various character education methods have been studied in China, the research results were appliedin their textbooks and other materials. Therefore, a well-organized and efficient learning-by-step education system was built up. At present, China's literacy education in the textbooks utilizes a range of methods including revisional centralized and distributed. Unfortunately, there is still one shortcoming worthy of concerns: how to solve the problems due to the simplification of traditional Chinese characters? Is it possible to revive traditional Chinese characters? Before adopting the results of research on China's literacy education and applying them to our character education, we should consider our specific situation carefully. Adopting the research results with cautious review and objective criticism should have a positive impact on Korean Chinese character education.

A Study on Thesaurus Development Based on Women's Oral History Records in Modern Korea (한국 근대 여성 구술 기록물을 통한 시소러스 개발에 관한 연구)

  • Choi, Yoon Kyung;Chung, Yeon Kyoung
    • Journal of Korean Society of Archives and Records Management
    • /
    • /
    • /
    • 2014
  • The purpose of this study is to develop a thesaurus for women's oral history in modern Korea. Literature review and case studies for four thesauri were performed for this study with which a thesaurus was built based upon the index terms in oral history records. The process of developing the thesaurus consisted of five steps. First, there are 1,784 index terms from the oral history records by 53 modern Korean women were extracted and analyzed. Second, possible terms for the thesaurus were selected through regular meetings with experts in the fields of information organization and women's oral history. Third, relationships between terms were defined by focusing on equivalence, hierarchy, and association. Fourth, after developing a Web-based thesaurus management system, terms and relationships were input to the system. Fifth, terms and relationships were again reviewed by experts from the relevant fields. As a result, the thesaurus comprise of 1,076 terms and those terms were classified to 39 broad subject areas, including proper nouns, such as geographic names, places, person's names, corporate names, and others, and it will be expanded with more oral history records from other people during the same period.

Why A Multimedia Approach to English Education\ulcorner

  • Keem, Sung-uk
    • Proceedings of the KSPS conference
    • 1997.07a
    • pp.176-178
    • 1997
  • To make a long story short I made up my mind to experiment with a multimedia approach to my classroom presentations two years ago because my ways of giving instructions bored the pants off me as well as my students. My favorite ways used to be sometimes referred to as classical or traditional ones, heavily dependent on the three elements: teacher's mouth, books, and chalk. Some call it the 'MBC method'. To top it off, I tried audio-visuals such as tape recorders, cassette players, VTR, pictures, and you name it, that could help improve my teaching method. And yet I have been unhappy about the results by a trial and error approach. I was determined to look for a better way that would ensure my satisfaction in the first place. What really turned me on was a multimedia CD ROM title, ELLIS (English Language Learning Instructional Systems) developed by Dr. Frank Otto. This is an integrated system of learning English based on advanced computer technology. Inspired by the utility and potential of such a multimedia system for regular classroom or lab instructions, I designed a simple but practical multimedia language learning laboratory in 1994 for the first time in Korea(perhaps for the first time in the world). It was high time that the conventional type of language laboratory(audio-passive) at Hahnnam be replaced because of wear and tear. Prior to this development, in 1991, I put a first CALL(Computer Assisted Language Learning) laboratory equipped with 35 personal computers(286), where students were encouraged to practise English typing, word processing and study English grammar, English vocabulary, and English composition. The first multimedia language learning laboratory was composed of 1) a multimedia personal computer(486DX2 then, now 586), 2) VGA multipliers that enable simultaneous viewing of the screen at control of the instructor, 3) an amplifIer, 4) loud speakers, 5)student monitors, 6) student tables to seat three students(a monitor for two students is more realistic, though), 7) student chairs, 8) an instructor table, and 9) cables. It was augmented later with an Internet hookup. The beauty of this type of multimedia language learning laboratory is the economy of furnishing and maintaining it. There is no need of darkening the facilities, which is a must when an LCD/beam projector is preferred in the laboratory. It is headset free, which proved to make students exasperated when worn more than- twenty minutes. In the previous semester I taught three different subjects: Freshman English Lab, English Phonetics, and Listening Comprehension Intermediate. I used CD ROM titles like ELLIS, Master Pronunciation, English Tripple Play Plus, English Arcade, Living Books, Q-Steps, English Discoveries, Compton's Encyclopedia. On the other hand, I managed to put all teaching materials into PowerPoint, where letters, photo, graphic, animation, audio, and video files are orderly stored in terms of slides. It takes time for me to prepare my teaching materials via PowerPoint, but it is a wonderful tool for the sake of presentations. And it is worth trying as long as I can entertain my students in such a way. Once everything is put into the computer, I feel relaxed and a bit excited watching my students enjoy my presentations. It appears to be great fun for students because they have never experienced this type of instruction. This is how I freed myself from having to manipulate a cassette tape player, VTR, and write on the board. The student monitors in front of them seem to help them concentrate on what they see, combined with what they hear. All I have to do is to simply click a mouse to give presentations and explanations, when necessary. I use a remote mouse, which prevents me from sitting at the instructor table. Instead, I can walk around in the room and enjoy freer interactions with students. Using this instrument, I can also have my students participate in the presentation. In particular, I invite my students to manipulate the computer using the remote mouse from the student's seat not from the instructor's seat. Every student appears to be fascinated with my multimedia approach to English teaching because of its unique nature as a new teaching tool as we face the 21st century. They all agree that the multimedia way is an interesting and fascinating way of learning to satisfy their needs. Above all, it helps lighten their drudgery in the classroom. They feel other subjects taught by other teachers should be treated in the same fashion. A multimedia approach to education is impossible without the advent of hi-tech computers, of which multi functions are integrated into a unified system, i.e., a personal computer. If you have computer-phobia, make quick friends with it; the sooner, the better. It can be a wonderful assistant to you. It is the Internet that I pay close attention to in conjunction with the multimedia approach to English education. Via e-mail system, I encourage my students to write to me in English. I encourage them to enjoy chatting with people all over the world. I also encourage them to visit the sites where they offer study courses in English conversation, vocabulary, idiomatic expressions, reading, and writing. I help them search any subject they want to via World Wide Web. Some day in the near future it will be the hub of learning for everybody. It will eventually free students from books, teachers, libraries, classrooms, and boredom. I will keep exploring better ways to give satisfying instructions to my students who deserve my entertainment.

Who are Identified through the Teacher Observation-recommendation System in the Aspects of Intelligence, Career Pattern, and Self-regulated Learning Ability? (관찰-추천제는 어떤 특성의 영재를 선발하는가?: 선발시험 vs. 교사관찰추천으로 본 영재들의 지능, 진로유형, 자기조절 학습능력)

  • Han, Ki-Soon;Yang, Tae-Youn;Park, In-Ho
    • Journal of Gifted/Talented Education
    • /
    • v.24 no.3
    • /
    • pp.445-462
    • /
    • 2014
  • The purpose of the present study is to compare paper and pencil test utilized to identify gifted students so far to the recently introduced teacher observation-recommendation system. More specifically, this study compared intelligence, career patterns, and self- regulated learning abilities of gifted students who were identified through those two different identification system to explore the possibility of the newly introduced teacher observation-recommendation system. The results show that there was no significant difference in the aspect of overall IQ score. However, students who were identified through the observation-recommendation system showed significantly higher scores at some subscores of intelligence test, such as vocabulary application, comprehension, and schematization. In the aspects of career patterns, about 72% of gifted students who were identified through the previous paper and pencil test belonged to the 'investigative' category of Holland. But more diverse career patterns such as enterprising, social, realistic, conventional including investigative categories were found in those students who were identified by the observation-recommendation system. There were also significant differences in the self-regulated learning abilities between two groups of students. Practical implications of the study were discussed in depth.

A Taxonomy of Workflow Architectures

  • Kim, Kwang-Hoon;Paik, Su-Ki
    • Proceedings of the Korea Database Society Conference
    • 1998.09a
    • pp.525-543
    • 1998
  • This paper proposes a conceptual taxonomy of architectures far workflow management systems. The systematic classification work is based on a framework for workflow architectures. The framework, consisting of generic-level, conceptual-level and implementation-level architectures, provides common architectural principles for designing a workflow management system. We define the taxonomy by considering the possibilities for centralization or distribution of data, control, and execution. That is, we take into account three criteria. How are the major components of a workflow model and system, like activities, roles, actors, and workcases, concretized in workflow architecture? Which of the components is represented as software modules of the workflow architecture? And how are they configured and operating in the architecture? The workflow components might be embodied, as active (processes or threads) modules or as passive (data) modules, in the software architecture of a workflow management system. One or combinations of the components might become software modules in the software architecture. Finally, they might be centralized or distributed. The distribution of the components should be broken into three: Vertically, Horizontally and Fully distributed. Through the combination of these aspects, we can conceptually generate about 64 software Architectures for a workflow management system. That is, it should be possible to comprehend and characterize all kinds of software architectures for workflow management systems including the current existing systems as well as future systems. We believe that this taxonomy is a significant contribution because it adds clarity, completeness, and "global perspective" to workflow architectural discussions. The vocabulary suggested here includes workflow levels and aspects, allowing very different architectures to be discussed, compared, and contrasted. Added clarity is obtained because similar architectures from different vendors that used different terminology and techniques can now be seen to be identical at the higher level. Much of the complexity can be removed by thinking of workflow systems. Therefore, it is used to categorize existing workflow architectures and suggest a plethora of new workflow architectures. Finally, the taxonomy can be used for sorting out gems and stones amongst the architectures possibly generated. Thus, it might be a guideline not only for characterizing the existing workflow management systems, but also for solving the long-term and short-term architectural research issues, such as dynamic changes in workflow, transactional workflow, dynamically evolving workflow, large-scale workflow, etc., that have been proposed in the literature.

LRM's Characterics and Applications Plan Through Comparing with FRBR (FRBR과 비교를 통한 LRM의 특징 및 적용방안)

  • Lee, Mihwa
    • Journal of Korean Library and Information Science Society
    • v.53 no.2
    • pp.355-375
    • 2022
  • This study is to grasp LRM's feature and applications plan to reflect LRM to cataloging related standards and individual system through comparing and analyzing LRM with the FR model in terms of entities, attributes, and relationships. The application plan is suggested as follows. First, the entity can be extended by defining sub-entities of each entity in the standards and the individual system in order to reflect LRM, even though entities such as families, groups, identifiers, authorized access points, concepts, objects, events, agency and rules have been deleted in LRM. Second, the attribute should be subdivided in the standards and the individual system in order to apply LRM, though many attributes have been changed to relationships for linked data and decreased in LRM. In particular, more specific and detailed property names in the standards and the individual system should be clearly presented, and the vocabulary encoding scheme corresponding to each property should be also developed, since properties with similar functions or repetition in various entities, and material specific properties are generalized and integrated into comprehensive property names. Third, the relationship should be extended through newly declaring the refinement or subtype of the relationship and considering a multi-level relationship, since the relationship itself is general and abstract under increasing the number of relationships in comparing to the property. This study will be practically utilized in cataloging related standards and individual system for applying LRM.