• Title/Summary/Keyword: 텍스트 연구

Search Result 3,492, Processing Time 0.03 seconds

Analysis of Research Trends in Data Curation Using Text Mining Techniques (텍스트 마이닝을 활용한 국외 데이터 큐레이션 연구 동향 분석)

  • Jaeeun Choi
    • Journal of the Korean Society for information Management
    • /
    • v.41 no.3
    • /
    • pp.85-107
    • /
    • 2024
  • This study analyzes trends in data curation research. A total of 1,849 scholarly records were extracted from Scopus and WoS, with 1,797 papers selected after removing duplicates. Titles, keywords, and abstracts were analyzed through keyword frequency analysis, LDA topic modeling, and network analysis. Frequent keywords like 'research' and 'information' suggest that data curation is widely applied in medical research, biomedical research, data management, and infrastructure. LDA modeling identified five main topics: improving medical data quality, enhancing big data management, managing scientific data and repositories, annotating and modeling medical data, and gene/protein database research. Network analysis showed that 'analysis' was central in global discussions, while 'gene' and 'system' were locally central. These findings highlight the importance of data curation in various research areas.

An Efficient Estimation of Place Brand Image Power Based on Text Mining Technology (텍스트마이닝 기반의 효율적인 장소 브랜드 이미지 강도 측정 방법)

  • Choi, Sukjae;Jeon, Jongshik;Subrata, Biswas;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.113-129
    • /
    • 2015
  • Location branding is a very important income making activity, by giving special meanings to a specific location while producing identity and communal value which are based around the understanding of a place's location branding concept methodology. Many other areas, such as marketing, architecture, and city construction, exert an influence creating an impressive brand image. A place brand which shows great recognition to both native people of S. Korea and foreigners creates significant economic effects. There has been research on creating a strategically and detailed place brand image, and the representative research has been carried out by Anholt who surveyed two million people from 50 different countries. However, the investigation, including survey research, required a great deal of effort from the workforce and required significant expense. As a result, there is a need to make more affordable, objective and effective research methods. The purpose of this paper is to find a way to measure the intensity of the image of the brand objective and at a low cost through text mining purposes. The proposed method extracts the keyword and the factors constructing the location brand image from the related web documents. In this way, we can measure the brand image intensity of the specific location. The performance of the proposed methodology was verified through comparison with Anholt's 50 city image consistency index ranking around the world. Four methods are applied to the test. First, RNADOM method artificially ranks the cities included in the experiment. HUMAN method firstly makes a questionnaire and selects 9 volunteers who are well acquainted with brand management and at the same time cities to evaluate. Then they are requested to rank the cities and compared with the Anholt's evaluation results. TM method applies the proposed method to evaluate the cities with all evaluation criteria. TM-LEARN, which is the extended method of TM, selects significant evaluation items from the items in every criterion. Then the method evaluates the cities with all selected evaluation criteria. RMSE is used to as a metric to compare the evaluation results. Experimental results suggested by this paper's methodology are as follows: Firstly, compared to the evaluation method that targets ordinary people, this method appeared to be more accurate. Secondly, compared to the traditional survey method, the time and the cost are much less because in this research we used automated means. Thirdly, this proposed methodology is very timely because it can be evaluated from time to time. Fourthly, compared to Anholt's method which evaluated only for an already specified city, this proposed methodology is applicable to any location. Finally, this proposed methodology has a relatively high objectivity because our research was conducted based on open source data. As a result, our city image evaluation text mining approach has found validity in terms of accuracy, cost-effectiveness, timeliness, scalability, and reliability. The proposed method provides managers with clear guidelines regarding brand management in public and private sectors. As public sectors such as local officers, the proposed method could be used to formulate strategies and enhance the image of their places in an efficient manner. Rather than conducting heavy questionnaires, the local officers could monitor the current place image very shortly a priori, than may make decisions to go over the formal place image test only if the evaluation results from the proposed method are not ordinary no matter what the results indicate opportunity or threat to the place. Moreover, with co-using the morphological analysis, extracting meaningful facets of place brand from text, sentiment analysis and more with the proposed method, marketing strategy planners or civil engineering professionals may obtain deeper and more abundant insights for better place rand images. In the future, a prototype system will be implemented to show the feasibility of the idea proposed in this paper.

A study on the improving and constructing the content for the Sijo database in the Period of Modern Enlightenment (계몽기·근대시조 DB의 개선 및 콘텐츠화 방안 연구)

  • Chang, Chung-Soo
    • Sijohaknonchong
    • /
    • v.44
    • /
    • pp.105-138
    • /
    • 2016
  • Recently with the research function, "XML Digital collection of Sijo Texts in the Period of Modern Enlightenment" DB data is being provided through the Korean Research Memory (http://www.krm.or.kr) and the foundation for the constructing the contents of Sijo Texts in the Period of Modern Enlightenment has been laid. In this paper, by reviewing the characteristics and problems of Digital collection of Sijo Texts in the Period of Modern Enlightenment and searching for the improvement, I tried to find a way to make it into the content. This database has the primary meaning in the integrating and glancing at the vast amounts of Sijo in the Period of Modern Enlightenment to reaching 12,500 pieces. In addition, it is the first Sijo data base which is provide the variety of search features according to literature, name of poet, title of work, original text, per period, and etc. However, this database has the limits to verifying the overall aspects of the Sijo in the Period of Modern Enlightenment. The title and original text, which is written in the archaic word or Chinese character, could not be searched, because the standard type text of modern language is not formatted. And also the works and the individual Sijo works released after 1945 were missing in the database. It is inconvenient to extract the datum according to the poet, because poets are marked in the various ways such as one's real name, nom de plume and etc. To solve this kind of problems and improve the utilization of the database, I proposed the providing the standard type text of modern language, giving the index terms about content, providing the information on the work format and etc. Furthermore, if the Sijo database in the Period of Modern Enlightenment which is prepared the character of the Sijo Culture Information System could be built, it could be connected with the academic, educational contents. For the specific plan, I suggested as follow, - learning support materials for the Modern history and the national territory recognition on the Modern Age - source materials for studying indigenous animals and plants characters creating the commercial characters - applicability as the Sijo learning tool such as Sijo Game.

  • PDF

The Aspects of "Children" in Saseolsijo and its Historical Implication in Korean Classical Poetry (사설시조에 나타난 '아이'의 양상과 그 시가사적 함의)

  • Park, Sang-Young
    • Sijohaknonchong
    • /
    • v.42
    • /
    • pp.151-185
    • /
    • 2015
  • The purpose of this study is to reveal the aspects of "Children" in Saseolsijo and its historical implication in Korean Classical Poetry. What was discussed can be summarized as follows: There are two types of children in Saseolsijo, one is silent, and the other is speaking. The silent child characteristics are such as being called and addressed by the poetic narrator, customary audience, passive attitude, etc. The speaking child characteristics are speaking subject, active attitude as sign of modernity. These phenomenon simply expose the differences of aesthetic order. The silent children is mainly to be utilized as a device to maximize the lyricism of the text as an ideologically product by the inner request of the poetic narrator and show identification discourse. The speaking child, gives the dynamics in text by heterogeneous discourse and informs aesthetic distance between "the reader and the text" as well and show distance discourse. These fragments from Saseolsijo's children are also found in previous genres. In the case of Hyangga, 'children' speak for solving others' desire but are targeted by poetic narrator as well. In the case of Goryosokyo, 'children' show activity and efforts to break forced silence by the poetic narrator through voluntary speaking. In Sijo's case, unlike other genres, some literary works show contents about disciplining children and the growth of children. However mostly targeted children by the poetic narrator are predominantly appeared from the discourse perspective. These aspects of children in previous genres including some of works in Saseolsijo are mainly associated with the appearance of medieval children. Unlike these, the new aspects of Saseolsijo's children show the cross-section of the signs of transition contemporary, from medieval to modern. Even if there are few literary works in these, speaking children with activity reveals novelty over medieval-imposed 'child-ness' by showing 'self', 'individual desire' strongly. This novelty is far from infants of the modern concept as naive and innocent children but these children are noted in that they show a part of modernity through various voices in the text, the comic(laughter), multiple point views, etc.

  • PDF

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Research on Korea Text Recognition in Images Using Deep Learning (딥 러닝 기법을 활용한 이미지 내 한글 텍스트 인식에 관한 연구)

  • Sung, Sang-Ha;Lee, Kang-Bae;Park, Sung-Ho
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.6
    • /
    • pp.1-6
    • /
    • 2020
  • In this study, research on character recognition, which is one of the fields of computer vision, was conducted. Optical character recognition, which is one of the most widely used character recognition techniques, suffers from decreasing recognition rate if the recognition target deviates from a certain standard and format. Hence, this study aimed to address this limitation by applying deep learning techniques to character recognition. In addition, as most character recognition studies have been limited to English or number recognition, the recognition range has been expanded through additional data training on Korean text. As a result, this study derived a deep learning-based character recognition algorithm for Korean text recognition. The algorithm obtained a score of 0.841 on the 1-NED evaluation method, which is a similar result to that of English recognition. Further, based on the analysis of the results, major issues with Korean text recognition and possible future study tasks are introduced.

Study on Life History of an Elderly Female North Korean Defector (북한이탈여성의 생애사 연구)

  • Yang, Min-Sook;Lee, Dong-Hun
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.10
    • /
    • pp.120-139
    • /
    • 2017
  • The purpose of this research was to explore life history of an elderly woman who fled from North Korea and to understand and provide interventions for female North Korean Defector. The participant of this research is an 81-year-old woman who escaped from North Korea and has lived in South Korea for over 14 years. This life history study followed the analysis of Mandel baum(1973) pointing three perspectives of life: dimensions, turnings, and adaptations. This study concluded that the participant of the research study had no protection while staying in China and North Korea and had experience of Homo Sacer. And even after arriving to South Korea the participant had to live her life with the past negative experiences in North Korea and China. Based on the research results discussions and implications were suggested.

Analysis on Status and Trends of SIAM Journal Papers using Text Mining (텍스트마이닝 기법을 활용한 미국산업응용수학 학회지의 연구 현황 및 동향 분석)

  • Kim, Sung-Yeun
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.212-222
    • /
    • 2020
  • The purpose of this study is to understand the current status and trends of the research studies published by the Society for Industrial and Applied Mathematics which is a leader in the field of industrial mathematics around the world. To perform this purpose, titles and abstracts were collected from 6,255 research articles between 2016 and 2019, and the R program was used to analyze the topic modeling model with LDA techniques and a regression model. As the results of analyses, first, a variety of studies have been studied in the fields of industrial mathematics, such as algebra, discrete mathematics, geometry, topological mathematics, probability and statistics. Second, it was found that the ascending research subjects were fluid mechanics, graph theory, and stochastic differential equations, and the descending research subjects were computational theory and classical geometry. The results of the study, based on the understanding of the overall flows and changes of the intellectual structure in the fields of industrial mathematics, are expected to provide researchers in the field with implications of the future direction of research and how to build an industrial mathematics curriculum that reflects the zeitgeist in the field of education.

A Qualitative Single Case Study on Change in Interactive Bibliotherapy Experience with A Middle-aged Widowed Spouse (배우자 사별 중년여성의 상호작용적 독서치료 경험에서 나타나는 변화에 대한 질적 단일 사례연구)

  • Park, Jung-Ae;Lee, Myng-Woo
    • Journal of Korean Library and Information Science Society
    • /
    • v.46 no.4
    • /
    • pp.299-324
    • /
    • 2015
  • This paper is a qualitative single case study and it was conducted for the purpose to explore change in interactive bibliotherapy experience with a middle-aged widowed spouse. A Middle-aged Widowed Spouse was taken as a research partaker. Through this research, we explored partitively the partaker's experience changes by each session during the commencing of interactive bibliotherapy and the partaker's inner/external experience change after the end of session. The result, interactive bibliotherapy facilitated and expressed the reaction of a partaker by its writing skills and reading materials including partaker's own poems. Furthermore, facilitated emotion from material and self-awareness made a partaker understand and accept itself through therapeutical interaction. In addition, those kind of changes appeared inner/external experience in life, and lasted after the end of the session. In this regard, the inner/external change appeared from a partaker means the effect of bibliotherapy, and we can notice that it is a result that facilitated through truthful reaction of counselor and texts.

Ontology Construction of Technological Knowledge for R&D Trend Analysis (연구 개발 트렌드 분석을 위한 기술 지식 온톨로지 구축)

  • Hwang, Mi-Nyeong;Lee, Seungwoo;Cho, Minhee;Kim, Soon Young;Choi, Sung-Pil;Jung, Hanmin
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.12
    • /
    • pp.35-45
    • /
    • 2012
  • Researchers and scientists spend huge amount of time in analyzing the previous studies and their results. In order to timely take the advantageous position, they usually analyze various resources such as paper, patents, and Web documents on recent research issues to preoccupy newly emerging technologies. However, it is difficult to select invest-worthy research fields out of huge corpus by using the traditional information search based on keywords and bibliographic information. In this paper, we propose a method for efficient creation, storage, and utilization of semantically relevant information among technologies, products and research agents extracted from 'big data' by using text mining. In order to implement the proposed method, we designed an ontology that creates technological knowledge for semantic web environment based on the relationships extracted by text mining techniques. The ontology was utilized for InSciTe Adaptive, a R&D trends analysis and forecast service which supports the search for the relevant technological knowledge.