• 제목/요약/키워드: Text Concept

Search Result 378, Processing Time 0.022 seconds

Text Classification Using Heterogeneous Knowledge Distillation

  • Yu, Yerin;Kim, Namgyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.29-41
    • /
    • 2022
  • Recently, with the development of deep learning technology, a variety of huge models with excellent performance have been devised by pre-training massive amounts of text data. However, in order for such a model to be applied to real-life services, the inference speed must be fast and the amount of computation must be low, so the technology for model compression is attracting attention. Knowledge distillation, a representative model compression, is attracting attention as it can be used in a variety of ways as a method of transferring the knowledge already learned by the teacher model to a relatively small-sized student model. However, knowledge distillation has a limitation in that it is difficult to solve problems with low similarity to previously learned data because only knowledge necessary for solving a given problem is learned in a teacher model and knowledge distillation to a student model is performed from the same point of view. Therefore, we propose a heterogeneous knowledge distillation method in which the teacher model learns a higher-level concept rather than the knowledge required for the task that the student model needs to solve, and the teacher model distills this knowledge to the student model. In addition, through classification experiments on about 18,000 documents, we confirmed that the heterogeneous knowledge distillation method showed superior performance in all aspects of learning efficiency and accuracy compared to the traditional knowledge distillation.

Privacy-Preserving Language Model Fine-Tuning Using Offsite Tuning (프라이버시 보호를 위한 오프사이트 튜닝 기반 언어모델 미세 조정 방법론)

  • Jinmyung Jeong;Namgyu Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.165-184
    • /
    • 2023
  • Recently, Deep learning analysis of unstructured text data using language models, such as Google's BERT and OpenAI's GPT has shown remarkable results in various applications. Most language models are used to learn generalized linguistic information from pre-training data and then update their weights for downstream tasks through a fine-tuning process. However, some concerns have been raised that privacy may be violated in the process of using these language models, i.e., data privacy may be violated when data owner provides large amounts of data to the model owner to perform fine-tuning of the language model. Conversely, when the model owner discloses the entire model to the data owner, the structure and weights of the model are disclosed, which may violate the privacy of the model. The concept of offsite tuning has been recently proposed to perform fine-tuning of language models while protecting privacy in such situations. But the study has a limitation that it does not provide a concrete way to apply the proposed methodology to text classification models. In this study, we propose a concrete method to apply offsite tuning with an additional classifier to protect the privacy of the model and data when performing multi-classification fine-tuning on Korean documents. To evaluate the performance of the proposed methodology, we conducted experiments on about 200,000 Korean documents from five major fields, ICT, electrical, electronic, mechanical, and medical, provided by AIHub, and found that the proposed plug-in model outperforms the zero-shot model and the offsite model in terms of classification accuracy.

Comparison of the Features of Science Language between Texts of Earth Science Articles and Earth Science Textbooks (지구과학 논문과 지구과학 교과서 텍스트의 과학 언어적 특성 비교)

  • Lee, Jeong-A;Kim, Chan-Jong;Maeng, Seung-Ho
    • Journal of The Korean Association For Science Education
    • /
    • v.27 no.5
    • /
    • pp.367-378
    • /
    • 2007
  • The purpose of this study is to investigate the features of science language in Earth science textbooks and Earth science research articles. We examined two Earth science textbooks and two Earth science articles using the taxonomy of scientific words, the text structure analysis of explanations, the analysis of conjunctive relations and reasoning, and the function of conjunction. The results showed that school science language revealed in Earth science textbooks had high proportion of naming words and the text structures in which definition/exemplification structure and description structure were dominant. Also, internal relations that showed additional arrangement rather than logical inference, were predominant in Earth science textbooks. However, scientists' science language revealed in the Earth science articles had more proportion of process words and concept words than the Earth science textbooks and the schematic structure of explanation texts, such as orientation - implication sequence - conclusion. In addition, the text structures in each sentences of implication -sequence showed cause/effect or problem-solving after description structures. Also each sentences expressed causal or abductive reasoning through the internal relations using verbs or adverbial inflection. It is necessary that we bridge the gap between the two languages for students' authentic use of science language. For the bridging, we propose "interlanguage", which mediates between school science language and scientists' language.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

Qualitative Research Investigating Patterns of Health Care Behavior among Korean Patients with Chronic Hepatitis B (B형 간염 환자의 건강관리 양상 탐색을 위한 질적 연구)

  • Yang, Jin-Hyang;Cho, Myung-Ok;Lee, Hae-Ok
    • Journal of Korean Academy of Nursing
    • /
    • v.39 no.6
    • /
    • pp.805-817
    • /
    • 2009
  • Purpose: This ethnograpy was done to explore patterns of health care behavior in patients with chronic health problems. Methods: The participants were 15 patients with chronic hepatitis B and 2 family members. Among the patients 4 had progressed to liver cirrhosis and liver cancer. Data were collected from iterative fieldwork in a department of internal medicine of I hospital. Data were analyzed using text analysis and taxonomic methods. Results: Illness and disease, relationship between health care givers and clients, and communication patterns between health professions and clients were discussed as the context of health care behavior. Health care behavior of the participants was categorized by its focus: every day work centered, body centered, organ centered, and pathology centered. Conclusion: Participants' health care behavior was guided by folk health concept and constructed in the sociocultural context. Folk etiology, pathology, and interpretation of one's symptoms were influencing factors in illness behavior. These findings must be a cornerstone of culture specific care for the chronic diseases.

Mobile Message Platform Supporting Dynamic Services based on Templates (템플릿에 기반한 동적인 서비스를 지원하는 모바일 메시지 플랫폼)

  • Han, Hong-Taek;Kim, Nam-Yun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.2
    • /
    • pp.19-27
    • /
    • 2012
  • Although message services in mobile environments provide real-time transfer of various contents such as text, multimedia and location, they can't provide dynamic message services based on analysis of message contents. This paper proposes "message as a service" concept and presents the design methodology for a message platform. The message in this paper is composed of the data part and template part which is in charge of view and functional logic of contents. In addition, two parts of a message are transferred separately. If a terminal device stores message templates, message platform can transfer data part only and thus network traffic can be reduced. Besides an efficient network utilization, we can dynamically update message view and its functional logic by modifying templates.

Hangul-Oullim-Meotjit (한글-어울림-멋짓)

  • Ahn, Sang-Soo
    • Archives of design research
    • /
    • v.20 no.3 s.71
    • /
    • pp.335-344
    • /
    • 2007
  • Hunminjeongeum. is. book. of. Hangul.. The. contents. is. all. about. philosophy. and. concept. of. Hangul. design.. It. is. world-valuable. design. text.. It. is. a. design. theory. book.. typographic. theory.. and. design. philosophy. book.. The. word. of. 'design'. is. Meotjit. in. Korean.. Design. is.'doing. or. making. with .Meot'. in. material,. non-material,. even. in. thinking.. Visual. communication. design. is.'Bom-Meotjit',. Fashion. design. is. 'Ot-Meotjit'.. Substance. of. Meot. is. Oullim,. the. great. harmony.. The. state. of. Meot. is. the. identity. of. Korean. design. spirit..

  • PDF

The Semiotic Meaning of Myth of Family and Gender Through the Corporate Advertisement: Focusing on the SK Advertisement (기업광고를 통해 본 가족신화와 젠더의 기호학적 의미: SK기업광고를 중심으로)

  • Cho, Hee-Sun;Baek, Seon-Gi;Yang, Da-Jin
    • Journal of the Korean Home Economics Association
    • /
    • v.48 no.9
    • /
    • pp.27-40
    • /
    • 2010
  • This study attempts to identify the process in which how the Myth of family and gender image are reproduced and taken in by the recipients through semiotics analysis of three versions consisting of children, husband and housewife series of SK corporate's TV advertisements from the last half of year 2009 to the first half of year 2010. The analysis of the corporate advertisements shows that each advertisement binds and stereotypes concept of family and gender to Myth of family, especially case of female, and consequently, the result through text analysis is that corporate advertisements reproduce and restructure traditional Myth of family and role of gender. Going forward, Family study requires to realize importance of effect of mass media, especially of TV advertisement, to research diverse case studies and searching about it.

A study on the Development of Ubiquitous Performance (유비쿼터스 퍼포먼스의 발전과정에 대한 고찰)

  • Cho, Hyun-Il
    • Journal of Digital Contents Society
    • /
    • v.11 no.2
    • /
    • pp.123-127
    • /
    • 2010
  • Innovation in technology has changed the form and meaning of performance work and its appreciation in its long history. Applied to performance art, the concept of digital media and ubiquitous also created new implications in aesthetics and technology. As a matter of fact, new elements of ubiquitous performance- new relationship between creators and the audience and the extension of the notion of 'audience'- are not newly introduced. These elements have been tried throughout hundreds years of performance history and is far more developed thanks to digital communication. In this study, we will discuss how these elements have been developed from the earlier text-based ubiquitous performance to 3D virtual world and contributed to the construction of the virtual culture.

A Study on the Design of Cyber lecture Component (가상강의 Component 설계에 관한 연구)

  • 강정배;김선경
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.8 no.3
    • /
    • pp.42-50
    • /
    • 2003
  • E-Learning is a modern main teaching method starting from the concept of remote education. This research is aimed for proposing cyber lecture library system and designing a cyber lecture component that becomes a basis for e-Learning system Cyber lecture library is a storage system of cyber lectures that can supply high quality data to the needed developers. Component consists of 5 categories and those are text voice, image, animation, and flash. By using this system the developers can save the necessary time and effort in education development. This system also helps students. The students can access various lecture data on a given subject and select the best fit for them.

  • PDF