• Title/Summary/Keyword: 텍스트 연구

Search Result 3,492, Processing Time 0.03 seconds

Terminology Recognition System based on Machine Learning for Scientific Document Analysis (과학 기술 문헌 분석을 위한 기계학습 기반 범용 전문용어 인식 시스템)

  • Choi, Yun-Soo;Song, Sa-Kwang;Chun, Hong-Woo;Jeong, Chang-Hoo;Choi, Sung-Pil
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.329-338
    • /
    • 2011
  • Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.

Thick Description as a Methodology of Comparative Literature (비교문학연구방법론에 대한 소고: 길고 약하고 두껍게 비교하기)

  • Park, Seonjoo
    • Cross-Cultural Studies
    • /
    • v.50
    • /
    • pp.347-370
    • /
    • 2018
  • This paper proposes a new direction for Comparative Literature which has been deeply Eurocentric and even colonial ever since its birth. 'Comparison' in Comparative Literature has been in fact the ideological mechanism for containing, classifying, and eventually controlling all differences in the world. Literature has naturally served as a national institution of the West at epistemological and discursive level with hidden adjective "comparative". To re-conceptualize the discipline and practice of "Comparative Literature", we need to revolutionize methodology itself based on Wai Chee Dimock's idea of "Weak Theory", Foucault's "disappearance of author", and Clifford Geertz's "thick description". "Thick description" as a methodology of comparative literature re-establishes the discipline as a field of "weak theory", defusing the centrality of linguistic identity and re-making it as a "long network" of loose and missed connections. "Thick description" poses the publicness of nation-state within "confusion of tongues", problematizes the legitimacy of modern knowledge, and puts (the western) nationalism in question. With this idea as a starting point, we can re-imagine Comparative Literature anew as a field of ceaseless discourse of longer, weaker, and thicker networks of interpretation and re-interpretation of differences.

Research Trends Review of Undergraduates' on Entrepreneurship Education Program to Develop the Entrepreneurship Program for Nursing College Students (간호대학생 창업교육프로그램 개발을 위한 대학생 대상 창업교육프로그램 연구 동향 고찰)

  • Noh, Wonjung;Kang, Jiwon;Lee, Youngjin
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.2
    • /
    • pp.148-154
    • /
    • 2019
  • The study was performed to prepare basic data for the development of entrepreneur education programs for nursing students through literature review and text network of relevant studies on entrepreneurship education for college students. The research was found in the database of the Korea Education and Research Information Service, the Korean Academic Information Service System, DBpia and the National Assembly Library with keywords such as 'entrepreneur', 'student', 'education', 'program' and 'training. The final selected paper was 35 studies in Korea from 2000 to September 2016. The largest number of studies have been conducted since 2011 with 85.71%, and the largest proportion of survey(88.57 %). The major independent variables were entrepreneur self-efficacy and entrepreneurship and the dependent variables were entrepreneur intention and entreprenuer self-efficacy. Based on this result, entrepreneur education programs will be developed suitable for the target, and it can promote the entrepreneur education for nursing students.

Knowledge Trend Analysis of Uncertainty in Biomedical Scientific Literature (생의학 학술 문헌의 불확실성 기반 지식 동향 분석에 관한 연구)

  • Heo, Go Eun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.2
    • /
    • pp.175-199
    • /
    • 2019
  • Uncertainty means incomplete stages of knowledge of propositions due to the lack of consensus of information and existing knowledge. As the amount of academic literature increases exponentially over time, new knowledge is discovered as research develops. Although the flow of time may be an important factor to identify patterns of uncertainty in scientific knowledge, existing studies have only identified the nature of uncertainty based on the frequency in a particular discipline, and they did not take into consideration of the flow of time. Therefore, in this study, we identify and analyze the uncertainty words that indicate uncertainty in the scientific literature and investigate the stream of knowledge. We examine the pattern of biomedical knowledge such as representative entity pairs, predicate types, and entities over time. We also perform the significance testing using linear regression analysis. Seven pairs out of 17 entity pairs show the significant decrease pattern statistically and all 10 representative predicates decrease significantly over time. We analyze the relative importance of representative entities by year and identify entities that display a significant rising and falling pattern.

Analysis of News Big Data for Deriving Social Issues in Korea (한국의 사회적 이슈 도출을 위한 뉴스 빅데이터 분석 연구)

  • Lee, Hong Joo
    • The Journal of Society for e-Business Studies
    • /
    • v.24 no.3
    • /
    • pp.163-182
    • /
    • 2019
  • Analyzing the frequency and correlation of the news keywords in the modern society that are becoming complicated according to the time flow is a very important research to discuss the response and solution to issues. This paper analyzed the relationship between the flow of social keyword and major issues through the analysis of news big data for 10 years (2009~2018). In this study, political issues, education and social culture, gender conflicts and social problems were presented as major issues. And, to study the change and flow of issues, it analyzed the change of the issue by dividing it into five years. Through this, the changes and countermeasures of social issues were studied. As a result, the keywords (economy, police) that are closely related to the people's life were analyzed as keywords that are very important in our society regardless of the flow of time. In addition, keyword such as 'safety' have decreased in increasing rate compared to frequency in recent years. Through this, it can be inferred that it is necessary to improve the awareness of safety in our society.

Survey on Out-Of-Domain Detection for Dialog Systems (대화시스템 미지원 도메인 검출에 관한 조사)

  • Jeong, Young-Seob;Kim, Young-Min
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.9
    • /
    • pp.1-12
    • /
    • 2019
  • A dialog system becomes a new way of communication between human and computer. The dialog system takes human voice as an input, and gives a proper response in voice or perform an action. Although there are several well-known products of dialog system (e.g., Amazon Echo, Naver Wave), they commonly suffer from a problem of out-of-domain utterances. If it poorly detects out-of-domain utterances, then it will significantly harm the user satisfactory. There have been some studies aimed at solving this problem, but it is still necessary to study about this intensively. In this paper, we give an overview of the previous studies of out-of-domain detection in terms of three point of view: dataset, feature, and method. As there were relatively smaller studies of this topic due to the lack of datasets, we believe that the most important next research step is to construct and share a large dataset for dialog system, and thereafter try state-of-the-art techniques upon the dataset.

A Study on the Accuracy Improvement of Movie Recommender System Using Word2Vec and Ensemble Convolutional Neural Networks (Word2Vec과 앙상블 합성곱 신경망을 활용한 영화추천 시스템의 정확도 개선에 관한 연구)

  • Kang, Boo-Sik
    • Journal of Digital Convergence
    • /
    • v.17 no.1
    • /
    • pp.123-130
    • /
    • 2019
  • One of the most commonly used methods of web recommendation techniques is collaborative filtering. Many studies on collaborative filtering have suggested ways to improve accuracy. This study proposes a method of movie recommendation using Word2Vec and an ensemble convolutional neural networks. First, in the user, movie, and rating information, construct the user sentences and movie sentences. It inputs user sentences and movie sentences into Word2Vec to obtain user vectors and movie vectors. User vectors are entered into user convolution model and movie vectors are input to movie convolution model. The user and the movie convolution models are linked to a fully connected neural network model. Finally, the output layer of the fully connected neural network outputs forecasts of user movie ratings. Experimentation results showed that the accuracy of the technique proposed in this study accuracy of conventional collaborative filtering techniques was improved compared to those of conventional collaborative filtering technique and the technique using Word2Vec and deep neural networks proposed in a similar study.

An Experimental Study on the Automatic Classification of Korean Journal Articles through Feature Selection (자질선정을 통한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.39 no.1
    • /
    • pp.69-90
    • /
    • 2022
  • As basic data that can systematically support and evaluate R&D activities as well as set current and future research directions by grasping specific trends in domestic academic research, I sought efficient ways to assign standardized subject categories (control keywords) to individual journal papers. To this end, I conducted various experiments on major factors affecting the performance of automatic classification, focusing on feature selection techniques, for the purpose of automatically allocating the classification categories on the National Research Foundation of Korea's Academic Research Classification Scheme to domestic journal papers. As a result, the automatic classification of domestic journal papers, which are imbalanced datasets of the real environment, showed that a fairly good level of performance can be expected using more simple classifiers, feature selection techniques, and relatively small training sets.

A Study on User Experience through Analysis of the Creative Process of Using Image Generative AI: Focusing on User Agency in Creativity (이미지 생성형 AI의 창작 과정 분석을 통한 사용자 경험 연구: 사용자의 창작 주체감을 중심으로)

  • Daeun Han;Dahye Choi;Changhoon Oh
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.667-679
    • /
    • 2023
  • The advent of image generative AI has made it possible for people who are not experts in art and design to create finished artworks through text input. With the increasing availability of generated images and their impact on the art industry, there is a need for research on how users perceive the process of co-creating with AI. In this study, we conducted an experimental study to investigate the expected and experienced processes of image generative AI creation among general users and to find out which processes affect users' sense of creative agency. The results showed that there was a gap between the expected and experienced creative process, and users tended to perceive a low sense of creative agency. We recommend eight ways that AI can act as an enabler to support users' creative intentions so that they can experience a higher sense of creative agency. This study can contribute to the future development of image-generating AI by considering user-centered creative experiences.

High-Quality Multimodal Dataset Construction Methodology for ChatGPT-Based Korean Vision-Language Pre-training (ChatGPT 기반 한국어 Vision-Language Pre-training을 위한 고품질 멀티모달 데이터셋 구축 방법론)

  • Jin Seong;Seung-heon Han;Jong-hun Shin;Soo-jong Lim;Oh-woog Kwon
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.603-608
    • /
    • 2023
  • 본 연구는 한국어 Vision-Language Pre-training 모델 학습을 위한 대규모 시각-언어 멀티모달 데이터셋 구축에 대한 필요성을 연구한다. 현재, 한국어 시각-언어 멀티모달 데이터셋은 부족하며, 양질의 데이터 획득이 어려운 상황이다. 따라서, 본 연구에서는 기계 번역을 활용하여 외국어(영문) 시각-언어 데이터를 한국어로 번역하고 이를 기반으로 생성형 AI를 활용한 데이터셋 구축 방법론을 제안한다. 우리는 다양한 캡션 생성 방법 중, ChatGPT를 활용하여 자연스럽고 고품질의 한국어 캡션을 자동으로 생성하기 위한 새로운 방법을 제안한다. 이를 통해 기존의 기계 번역 방법보다 더 나은 캡션 품질을 보장할 수 있으며, 여러가지 번역 결과를 앙상블하여 멀티모달 데이터셋을 효과적으로 구축하는데 활용한다. 뿐만 아니라, 본 연구에서는 의미론적 유사도 기반 평가 방식인 캡션 투영 일치도(Caption Projection Consistency) 소개하고, 다양한 번역 시스템 간의 영-한 캡션 투영 성능을 비교하며 이를 평가하는 기준을 제시한다. 최종적으로, 본 연구는 ChatGPT를 이용한 한국어 멀티모달 이미지-텍스트 멀티모달 데이터셋 구축을 위한 새로운 방법론을 제시하며, 대표적인 기계 번역기들보다 우수한 영한 캡션 투영 성능을 증명한다. 이를 통해, 우리의 연구는 부족한 High-Quality 한국어 데이터 셋을 자동으로 대량 구축할 수 있는 방향을 보여주며, 이 방법을 통해 딥러닝 기반 한국어 Vision-Language Pre-training 모델의 성능 향상에 기여할 것으로 기대한다.

  • PDF