• Title/Summary/Keyword: source text

Search Result 267, Processing Time 0.036 seconds

Liaohe National Park based on python data visualization Visitor Perception Study (파이썬 데이터 시각화를 이용한 랴오허 국립공원 관광객 인식 연구)

  • Jing-Qiwei;Zheng-Chengkang;Nam Kyung Hyeon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2023.01a
    • /
    • pp.439-441
    • /
    • 2023
  • National park is one of the important types of protected area management systems established by IUCN and a management model for effective conservation and sustainable use of natural and cultural heritage in countries around the world, and it assumes important roles in conservation, scientific research, education, recreation and driving community development. This study takes Liaohe National Park in China, a typical representative of global coastal wetlands, as a case study, and uses python technology to collect travelogues and reviews of visitors from Mafengwo.com, Ctrip.com, Go.com, Meituan.com and Dianping.com as a source, and the text spans from 2015 to 2022. The results show that wildlife resources, natural landscape with river and sea, wetland ecology and fishing and hunting culture of northern China are fully reflected in the perceptions of visitors to Liaohe National Park. However, there is still much room for improvement in terms of supporting services and facilities, public education and tourists' experience and participation in Liaohe National Park. In this paper, we use python data visualization technology to study the public perception of wetland wildlife as the theme, and grasp the satisfaction, spatial distribution, activity content and emotional tendency of the public in the process of wetland wildlife as the theme, so as to better promote the Liaohe National Park to better carry out the public experience while strictly adhering to ecological protection, and to provide the Liaohe National Park with a better opportunity to This will provide scientific basis for the Liaohe National Park to play a better role in ecological civilization construction and education of ecological civilization awareness.

  • PDF

Analysis of Topics Related to Population Aging Using Natural Language Processing Techniques (자연어 처리 기술을 활용한 인구 고령화 관련 토픽 분석)

  • Hyunjung Park;Taemin Lee;Heuiseok Lim
    • Journal of Information Technology Services
    • /
    • v.23 no.1
    • /
    • pp.55-79
    • /
    • 2024
  • Korea, which is expected to enter a super-aged society in 2025, is facing the most worrisome crisis worldwide. Efforts are urgently required to examine problems and countermeasures from various angles and to improve the shortcomings. In this regard, from a new viewpoint, we intend to derive useful implications by applying the recent natural language processing techniques to online articles. More specifically, we derive three research questions: First, what topics are being reported in the online media and what is the public's response to them? Second, what is the relationship between these aging-related topics and individual happiness factors? Third, what are the strategic directions and implications for benchmarking discussed to solve the problem of population aging? To find answers to these, we collect Naver portal articles related to population aging and their classification categories, comments, and number of comments, including other numerical data. From the data, we firstly derive 33 topics with a semi-supervised BERTopic by reflecting article classification information that was not used in previous studies, conducting sentiment analysis of comments on them with a current open-source large language model. We also examine the relationship between the derived topics and personal happiness factors extended to Alderfer's ERG dimension, carrying out additional 3~4-gram keyword frequency analysis, trend analysis, text network analysis based on 3~4-gram keywords, etc. Through this multifaceted approach, we present diverse fresh insights from practical and theoretical perspectives.

A Study on the Development Trend of Artificial Intelligence Using Text Mining Technique: Focused on Open Source Software Projects on Github (텍스트 마이닝 기법을 활용한 인공지능 기술개발 동향 분석 연구: 깃허브 상의 오픈 소스 소프트웨어 프로젝트를 대상으로)

  • Chong, JiSeon;Kim, Dongsung;Lee, Hong Joo;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.1-19
    • /
    • 2019
  • Artificial intelligence (AI) is one of the main driving forces leading the Fourth Industrial Revolution. The technologies associated with AI have already shown superior abilities that are equal to or better than people in many fields including image and speech recognition. Particularly, many efforts have been actively given to identify the current technology trends and analyze development directions of it, because AI technologies can be utilized in a wide range of fields including medical, financial, manufacturing, service, and education fields. Major platforms that can develop complex AI algorithms for learning, reasoning, and recognition have been open to the public as open source projects. As a result, technologies and services that utilize them have increased rapidly. It has been confirmed as one of the major reasons for the fast development of AI technologies. Additionally, the spread of the technology is greatly in debt to open source software, developed by major global companies, supporting natural language recognition, speech recognition, and image recognition. Therefore, this study aimed to identify the practical trend of AI technology development by analyzing OSS projects associated with AI, which have been developed by the online collaboration of many parties. This study searched and collected a list of major projects related to AI, which were generated from 2000 to July 2018 on Github. This study confirmed the development trends of major technologies in detail by applying text mining technique targeting topic information, which indicates the characteristics of the collected projects and technical fields. The results of the analysis showed that the number of software development projects by year was less than 100 projects per year until 2013. However, it increased to 229 projects in 2014 and 597 projects in 2015. Particularly, the number of open source projects related to AI increased rapidly in 2016 (2,559 OSS projects). It was confirmed that the number of projects initiated in 2017 was 14,213, which is almost four-folds of the number of total projects generated from 2009 to 2016 (3,555 projects). The number of projects initiated from Jan to Jul 2018 was 8,737. The development trend of AI-related technologies was evaluated by dividing the study period into three phases. The appearance frequency of topics indicate the technology trends of AI-related OSS projects. The results showed that the natural language processing technology has continued to be at the top in all years. It implied that OSS had been developed continuously. Until 2015, Python, C ++, and Java, programming languages, were listed as the top ten frequently appeared topics. However, after 2016, programming languages other than Python disappeared from the top ten topics. Instead of them, platforms supporting the development of AI algorithms, such as TensorFlow and Keras, are showing high appearance frequency. Additionally, reinforcement learning algorithms and convolutional neural networks, which have been used in various fields, were frequently appeared topics. The results of topic network analysis showed that the most important topics of degree centrality were similar to those of appearance frequency. The main difference was that visualization and medical imaging topics were found at the top of the list, although they were not in the top of the list from 2009 to 2012. The results indicated that OSS was developed in the medical field in order to utilize the AI technology. Moreover, although the computer vision was in the top 10 of the appearance frequency list from 2013 to 2015, they were not in the top 10 of the degree centrality. The topics at the top of the degree centrality list were similar to those at the top of the appearance frequency list. It was found that the ranks of the composite neural network and reinforcement learning were changed slightly. The trend of technology development was examined using the appearance frequency of topics and degree centrality. The results showed that machine learning revealed the highest frequency and the highest degree centrality in all years. Moreover, it is noteworthy that, although the deep learning topic showed a low frequency and a low degree centrality between 2009 and 2012, their ranks abruptly increased between 2013 and 2015. It was confirmed that in recent years both technologies had high appearance frequency and degree centrality. TensorFlow first appeared during the phase of 2013-2015, and the appearance frequency and degree centrality of it soared between 2016 and 2018 to be at the top of the lists after deep learning, python. Computer vision and reinforcement learning did not show an abrupt increase or decrease, and they had relatively low appearance frequency and degree centrality compared with the above-mentioned topics. Based on these analysis results, it is possible to identify the fields in which AI technologies are actively developed. The results of this study can be used as a baseline dataset for more empirical analysis on future technology trends that can be converged.

The Identification Framework for source code author using Authorship Analysis and CNN (작성자 분석과 CNN을 적용한 소스 코드 작성자 식별 프레임워크)

  • Shin, Gun-Yoon;Kim, Dong-Wook;Hong, Sung-sam;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.19 no.5
    • /
    • pp.33-41
    • /
    • 2018
  • Recently, Internet technology has developed, various programs are being created and therefore various codes are being made through many authors. On this aspect, some author deceive a program or code written by other particular author as they make it themselves and use other writers' code indiscriminately, or not indicating the exact code which has been used. Due to this makes it more and more difficult to protect the code. In this paper, we propose author identification framework using Authorship Analysis theory and Natural Language Processing(NLP) based on Convolutional Neural Network(CNN). We apply Authorship Analysis theory to extract features for author identification in the source code, and combine them with the features being used text mining to perform author identification using machine learning. In addition, applying CNN based natural language processing method to source code for code author classification. Therefore, we propose a framework for the identification of authors using the Authorship Analysis theory and the CNN. In order to identify the author, we need special features for identifying the authors only, and the NLP method based on the CNN is able to apply language with a special system such as source code and identify the author. identification accuracy based on Authorship Analysis theory is 95.1% and identification accuracy applied to CNN is 98%.

Topic Modeling of News Article about International Construction Market Using Latent Dirichlet Allocation (Latent Dirichlet Allocation 기법을 활용한 해외건설시장 뉴스기사의 토픽 모델링(Topic Modeling))

  • Moon, Seonghyeon;Chung, Sehwan;Chi, Seokho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.4
    • /
    • pp.595-599
    • /
    • 2018
  • Sufficient understanding of oversea construction market status is crucial to get profitability in the international construction project. Plenty of researchers have been considering the news article as a fine data source for figuring out the market condition, since the data includes market information such as political, economic, and social issue. Since the text data exists in unstructured format with huge size, various text-mining techniques were studied to reduce the unnecessary manpower, time, and cost to summarize the data. However, there are some limitations to extract the needed information from the news article because of the existence of various topics in the data. This research is aimed to overcome the problems and contribute to summarization of market status by performing topic modeling with Latent Dirichlet Allocation. With assuming that 10 topics existed in the corpus, the topics included projects for user convenience (topic-2), private supports to solve poverty problems in Africa (topic-4), and so on. By grouping the topics in the news articles, the results could improve extracting useful information and summarizing the market status.

A Study on the Pyo-bon(標本) concept based on the verse "The Principal and secondary aspects must first be decided(標本須明後先)." in the Sanghan(傷寒) Chapter of "Yixuerumen(醫學入門)" ("의학입문.상한편(醫學入門.傷寒篇)"의 "표본수명후선(標本須明後先)" 조문(條文)에서 나타난 삼음삼양병(三陰三陽病)의 표본(標本) 개념에 대한 고찰)

  • Shin, Sang-Won;Jeong, Chang-Hyun;Baik, You-Sang;Jang, Woo-Chang
    • Journal of Korean Medical classics
    • /
    • v.25 no.1
    • /
    • pp.1-16
    • /
    • 2012
  • "Yixuerumen" is a comprehensive medical text published in the Ming-dynasty by Li Chan(李梴). In this text, Sanghan(傷寒, cold damage) is categorized among external contraction(外感) with much emphasis. The subject of this study is the verse "The Principal and secondary aspects must first be decided." and its annotations in the in the Sanghan chapter of "Yixuerumen". The complex theoretical structure of this verse was firstly analyzed, together with the historical background of how and why Li Chan adopted this concept. The Pyo-Bon concept is the contrast between phenomena(標) and its underlying source of motivation(本). The methodology for this study was to compare and analyze this main verse with contents on Sanghan and Un-gi(運氣) within the text, while reviewing historical theories explaining the physiology and pathology of the human body in terms of the Pyo-bon(標本) concept. As a result, we discovered that the Pyo-bon(標本) concept used in the aforementioned verse of "Yixuerumen" matches the Three Eum Three Yang(三陰三陽)-標本中氣(pyo-bon-jung gi)-gi transformation(氣化) theory of Un-gi(運氣). Li Chan created the connecting link in understanding the Three Eum Three Yang diagnosis system through the viscera/bowels theory(臟腑論) by adopting the Three Eum Three Yang(三陰三陽)-標本中氣(pyo-bon-jung gi)-gi transformation(氣化) theory from Un-gi. Li's work lead to several changes in the field of Sanghan. First, Li understood the disease pattern of Sanghan by using the accumulated knowledge of the viscera/bowel theory during the Jin-Yuan dynasty, and developed a medical perspective that observes the disease pattern based on the body's essence gi(精氣). Second, he set the category of the Sanghan-Three Eum Three Yang disease pattern, establishing a separate guideline. Third, by adding knowledge of herbs to the accumulated knowledge of the viscera/bowel theory, the process of diagnosis and herbal application were made explicable. On the other hand, in the process of interpreting the 三陰三陽 diagnosis system with viscera/bowels theory, theoretical inconsistencies appeared, of which Li tried to mend by several means. The results of the research on "Yixuerumen(醫學入門) the Sanghan chapter(傷寒篇)" calls for further studies, as it has effected both "Dongeuibogam(東醫寶鑑) the Sanghan part(寒門)" and "Dongeuisoosebowon(東醫壽世保元)" as well.

A Study on the Korean Translation Strategy of 《Mu Yang Ai Hua, 牧羊哀話》 by Period (《목양애화(牧羊哀話)》의 시대별 한국어 번역 전략 연구)

  • Moon, dae-il
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.1
    • /
    • pp.377-382
    • /
    • 2021
  • 《Mu Yang Ai Hua, 牧羊哀話》 is known as the first Korean-sanctioned novel in the history of modern Chinese literature, and is famous for a novel created by the author himself visiting Korea and being inspired. The translation of 《牧羊哀話》 is constantly being re-translated (4 types). These translations also reflect the characteristics of each period, and the translation strategies used have their own characteristics. The results of the comparative analysis of the four types of translations in this study are as follows. The role A was published during the Japanese colonial period, and some parts were reduced and omitted according to the intent of the translator, and a foreignization translation strategy was used. B, C, and D have implemented content equivalence by utilizing many of the localization translation strategies, and added supplementary explanations in part to help readers understand. Since translation is a process of communication, it should not just convert the source text to the target text, but the target reader's response to the work should be the same as that of the reader. Therefore, translation must be able to understand the environment of the times and the readership, and it must use all possible methods to elicit the same emotion and empathy as the reader has read the original text. Therefore, translators need to use their nationalization and foreignization strategies at the same time based on their understanding of the target language and the politics, economy, history, culture, etc. of the destination country.

Exploratory Study on the Specification of Content Knowledge Formation - Based on Analysis of University Writing Textbooks - (글쓰기 내용지식 구성의 세분화에 관한 탐색적 연구 - 대학 글쓰기교재 분석을 중심으로 -)

  • Lee, Ran
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.7
    • /
    • pp.486-497
    • /
    • 2022
  • The aim of this study was to subdivide and present the units and the standards of knowledge integration in creating the students' integrated knowledge from content knowledge in college writing classes. For these, it analyzed three typical writing textbooks being used in colleges and examined the ways of presentation on forming integrated knowledge by text qualitative analysis methods. The analysis procedure and the presentation followed Creswell's spiral analysis model It is a method model which repeats the procedure from material collection and analysis to presentation circularly. This examination illustrates three dimensions of the units in forming content knowledge. Also, it suggested those should be all treated for the more systematic education: the units of the whole text, the paragraphs, and the sentences. In the next chapter, the standards and contents of knowledge integration were suggested in each process. For the process of knowledge selection, the suitability and the contradictoriness between the text materials and author's thesis were proposed as the standards and contents. For the process of organization and integration, the corresponsive integration, contradictive integration, background integration, synthetic integration were suggested. Finally the procedure knowledge such as correct expression and spelling, source indication were shown for the process of expression and citation. Furthermore, it showed, in terms of expression, the process of paraphrasing frequently practiced in writing textbooks needs to be exercised in the three dimensions including summarization, connection, and interpretation(or transformation). This result, however, calls for the further study about the subdividing processes to enhance the adequateness to writing textbooks in the level of universities and for a more refined syllabus on the systematic knowledge integration. Accordingly, it suggested the tasks mentioned above for further study.

Development of Intelligent OCR Technology to Utilize Document Image Data (문서 이미지 데이터 활용을 위한 지능형 OCR 기술 개발)

  • Kim, Sangjun;Yu, Donghui;Hwang, Soyoung;Kim, Minho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.212-215
    • /
    • 2022
  • In the era of so-called digital transformation today, the need for the construction and utilization of big data in various fields has increased. Today, a lot of data is produced and stored in a digital device and media-friendly manner, but the production and storage of data for a long time in the past has been dominated by print books. Therefore, the need for Optical Character Recognition (OCR) technology to utilize the vast amount of print books accumulated for a long time as big data was also required in line with the need for big data. In this study, a system for digitizing the structure and content of a document object inside a scanned book image is proposed. The proposal system largely consists of the following three steps. 1) Recognition of area information by document objects (table, equation, picture, text body) in scanned book image. 2) OCR processing for each area of the text body-table-formula module according to recognized document object areas. 3) The processed document informations gather up and returned to the JSON format. The model proposed in this study uses an open-source project that additional learning and improvement. Intelligent OCR proposed as a system in this study showed commercial OCR software-level performance in processing four types of document objects(table, equation, image, text body).

  • PDF

A Study on Dataset Generation Method for Korean Language Information Extraction from Generative Large Language Model and Prompt Engineering (생성형 대규모 언어 모델과 프롬프트 엔지니어링을 통한 한국어 텍스트 기반 정보 추출 데이터셋 구축 방법)

  • Jeong Young Sang;Ji Seung Hyun;Kwon Da Rong Sae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.11
    • /
    • pp.481-492
    • /
    • 2023
  • This study explores how to build a Korean dataset to extract information from text using generative large language models. In modern society, mixed information circulates rapidly, and effectively categorizing and extracting it is crucial to the decision-making process. However, there is still a lack of Korean datasets for training. To overcome this, this study attempts to extract information using text-based zero-shot learning using a generative large language model to build a purposeful Korean dataset. In this study, the language model is instructed to output the desired result through prompt engineering in the form of "system"-"instruction"-"source input"-"output format", and the dataset is built by utilizing the in-context learning characteristics of the language model through input sentences. We validate our approach by comparing the generated dataset with the existing benchmark dataset, and achieve 25.47% higher performance compared to the KLUE-RoBERTa-large model for the relation information extraction task. The results of this study are expected to contribute to AI research by showing the feasibility of extracting knowledge elements from Korean text. Furthermore, this methodology can be utilized for various fields and purposes, and has potential for building various Korean datasets.