• Title/Summary/Keyword: 텍스트 데이터 분석

Search Result 1,095, Processing Time 0.026 seconds

SCOPML and SCOPBrowser (SCOPML과 SCOPBrowser)

  • 윤형석;황의윤;안건태;김진홍;이명준
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10c
    • /
    • pp.286-288
    • /
    • 2002
  • 포스트지놈 시대에 있어서 가장 주된 연구는 단백질의 구조적 유사성이나 분류학적인 연관성을 밝히는 것이다. SCOP 단백질 구조 분류는 이러한 목적을 위하여 3차원 구조가 알려진 단백질에 대한 구조적, 분류학적 관계에 대해 상세한 정보를 제공한다. 그러나 SCOP의 데이터는 단순 텍스트 기반의 자료만 제공되고 있어서, 이를 이용한 다른 분석 도구를 개발하거나 유용한 정보 추출을 할 경우 그 작업이 매우 힘들며 오류 발생의 확률이 높다. 본 논문에서는 단백질 구조 관련 연구자들이 SCOP 데이터를 보다 효과적으로 이용할 수 있도록 구조화된 문서의 표준인 XML을 이용하여 개발된 SCOPML에 대하여 기술한다. 그리고 SCOPML을 이용하여 SCOP 데이터에 대한 효율적인 검색을 지원하는 SCOPBrowser의 개발에 대해 기술한다.

  • PDF

Analysis of News Regarding New Southeastern Airport Using Text Mining Techniques (텍스트 마이닝 기법을 활용한 동남권 신공항 신문기사 분석)

  • Han, Mu Moung Cho;Kim, Yang Sok;Lee, Choong Kwon
    • Smart Media Journal
    • /
    • v.6 no.1
    • /
    • pp.47-53
    • /
    • 2017
  • Social issues are important factors that decide government policy and newspapers are critical channels that reflect them. Analysing news articles can contribute to understanding social issues, but it is very difficult to analyse the unstructured large volumes of news data manually. Therefore, this study aims to analyze the different views among stakeholders of a specific social issue by using text analysis, word cloud analysis and associative analysis methods, which systematically transform unstructured news data into structured one. We analyzed a total of 115 news articles and a total of 6,772 comments, collected from the selected newspapers (Chosun-Il-bo, Joongang-Il-bo, Donga-Il-bo, Maeil Newspaper, Busan-Il-bo) for two weeks. We found that there are significant differences in tone between newspapers. While nation-wide daily newspapers focus on political relations with local areas, local daily newspapers tend to write articles to represent local governments' interests.

The Analysis of Research Trends in Technology to the Fourth Industrial Revolution using SNA (소셜 네트워크 분석을 이용한 4차 산업혁명 기술 분야의 연구 동향 분석)

  • Kim, Hong-Gwang;Ahn, Jong-Wook
    • Journal of Cadastre & Land InformatiX
    • /
    • v.49 no.1
    • /
    • pp.113-121
    • /
    • 2019
  • The fourth industrial revolution technology focused on the fusion of infrastructure and various advanced technologies related city. Therefore, technical cooperation in various fields of research is essential. In order to activating the fourth industrial revolution technologies, it is necessary to research the state of technology in various fields. Consequently, this paper aims to analysis of domestic and foreign research trends on technology to the fourth industrial revolution using SNA and text mining for web site. We collected text, date data of research paper and report in web site for five years, that is, from January 1st in 2014 to December 31st in 2018. Next, we have deduced the major keywords in public data through analyzing the morphemes. Then we have analyzed the core and related keyword lists through an SNA. In Korea, the focus is on R&D and legal/institutional solution in relation to the fourth industrial revolution technology. On the other hand, in the case of foreign, there was focus on practical technologies for urban services in detail aspects.

An Analysis of the 2017 Korean Presidential Election Using Text Mining (텍스트 마이닝을 활용한 2017년 한국 대선 분석)

  • An, Eunhee;An, Jungkook
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.5
    • /
    • pp.199-207
    • /
    • 2020
  • Recently, big data analysis has drawn attention in various fields as it can generate value from large amounts of data and is also used to run political campaigns or predict results. However, existing research had limitations in compiling information about candidates at a high-level by analyzing only specific SNS data. Therefore, this study analyses news trends, topics extraction, sentiment analysis, keyword analysis, comment analysis for the 2017 presidential election of South Korea. The results show that various topics had been generated, and online opinions are extracted for trending keywords of respective candidates. This study also shows that portal news and comments can serve as useful tools for predicting the public's opinion on social issues. This study will This paper advances a building strategic course of action by providing a method of analyzing public opinion across various fields.

Analysis of Public Perception and Policy Implications of Foreign Workers through Social Big Data analysis (소셜 빅데이터분석을 통한 외국인근로자에 관한 국민 인식 분석과 정책적 함의)

  • Ha, Jae-Been;Lee, Do-Eun
    • Journal of Digital Convergence
    • /
    • v.19 no.11
    • /
    • pp.1-10
    • /
    • 2021
  • This paper aimed to look at the awareness of foreign workers in social platforms by using text mining, one of the big data techniques and draw suggestions for foreign workers. To achieve this purpose, data collection was conducted with search keyword 'Foreign Worker' from Jan. 1, to Dec. 31, 2020, and frequency analysis, TF-IDF analysis, and degree centrality analysis and 100 parent keywords were drawn for comparison. Furthermore, Ucinet6.0 and Netdraw were used to analyze semantic networks, and through CONCOR analysis, data were clustered into the following eight groups: foreigner policy issue, regional community issue, business owner's perspective issue, employment issue, working environment issue, legal issue, immigration issue, and human rights issue. Based on such analyzed results, it identified national awareness of foreign workers and main issues and provided the basic data on policy proposals for foreign workers and related researches.

News Article Big Data Analysis based on Machine Learning in Distributed Processing Environments (분산 처리 환경에서의 기계학습 기반의 뉴스 기사 빅 데이터 분석)

  • Oh, Hee-bin;Lee, Jeong-cheol;Kim, Kyungsup
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.59-62
    • /
    • 2017
  • 본 논문에서는 텍스트 형태의 빅 데이터를 분산처리 환경에서 기계학습을 이용하여 분석하고 유의미한 데이터를 만들어내는 시스템에 대해 다루었다. 빅 데이터의 한 종류인 뉴스 기사 빅 데이터를 분산 시스템 환경(Spark) 내에서 기계 학습(Word2Vec)을 이용하여 뉴스 기사의 키워드 간의 연관도를 분석하는 분산 처리 시스템을 설계 및 구현하였고, 사용자가 입력한 검색어와 연관된 키워드들을 한눈에 파악하기 쉽게 만드는 시각화 시스템을 설계하였다.

IoT-based Feature Selection Technique Research Trend (IoT 기반의 특징 선택 기법 연구 동향)

  • Lim, Hwan-Hee;Lee, Tae-Ho;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2018.07a
    • /
    • pp.41-42
    • /
    • 2018
  • 특징 선택이란, 기계학습에서 분류 정확도를 향상시키기 위해서 많은 특징들을 분석해 가장 좋은 성능을 나타낼 수 있게끔 특징의 부분집합을 찾아내는 방법이다. 특징 선택 연구는 수십만개의 변수가 있는 데이터 세트를 이용하는 응용분야에서 주로 연구된다. 이러한 응용 분야는 주로 텍스트 처리, 유전자 배열 분석과 같은 고차원 데이터를 분석하는 분야이다. 또한, IoT 환경은 많은 데이터를 처리하기 때문에, 데이터 분류나 데이터의 가공을 위해서는 특징 선택 기법이 필수적이다. 본 논문에서는 특징 선택 기법에 대해 설명하고, IoT 환경에서 특징 선택 기법을 제안한다.

  • PDF

A Comparative Study on the Social Awareness of Metaverse in Korea and China: Using Big Data Analysis (한국과 중국의 메타버스에 관한 사회적 인식의 비교연구: 빅데이터 분석의 활용 )

  • Ki-youn Kim
    • Journal of Internet Computing and Services
    • /
    • v.24 no.1
    • /
    • pp.71-86
    • /
    • 2023
  • The purpose of this exploratory study is to compare the differences in public perceptual characteristics of Korean and Chinese societies regarding the metaverse using big data analysis. Due to the environmental impact of the COVID-19 pandemic, technological progress, and the expansion of new consumer bases such as generation Z and Alpha, the world's interest in the metaverse is drawing attention, and related academic studies have been also in full swing from 2021. In particular, Korea and China have emerged as major leading countries in the metaverse industry. It is a timely research question to discover the difference in social awareness using big data accumulated in both countries at a time when the amount of mentions on the metaverse has skyrocketed. The analysis technique identifies the importance of key words by analyzing word frequency, N-gram, and TF-IDF of clean data through text mining analysis, and analyzes the density and centrality of semantic networks to determine the strength of connection between words and their semantic relevance. Python 3.9 Anaconda data science platform 3 and Textom 6 versions were used, and UCINET 6.759 analysis and visualization were performed for semantic network analysis and structural CONCOR analysis. As a result, four blocks, each of which are similar word groups, were driven. These blocks represent different perspectives that reflect the types of social perceptions of the metaverse in both countries. Studies on the metaverse are increasing, but studies on comparative research approaches between countries from a cross-cultural aspect have not yet been conducted. At this point, as a preceding study, this study will be able to provide theoretical grounds and meaningful insights to future studies.

A Study on Improvement of Image Classification Accuracy Using Image-Text Pairs (이미지-텍스트 쌍을 활용한 이미지 분류 정확도 향상에 관한 연구)

  • Mi-Hui Kim;Ju-Hyeok Lee
    • Journal of IKEEE
    • /
    • v.27 no.4
    • /
    • pp.561-566
    • /
    • 2023
  • With the development of deep learning, it is possible to solve various computer non-specialized problems such as image processing. However, most image processing methods use only the visual information of the image to process the image. Text data such as descriptions and annotations related to images may provide additional tactile and visual information that is difficult to obtain from the image itself. In this paper, we intend to improve image classification accuracy through a deep learning model that analyzes images and texts using image-text pairs. The proposed model showed an approximately 11% classification accuracy improvement over the deep learning model using only image information.

Self-Supervised Document Representation Method

  • Yun, Yeoil;Kim, Namgyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.5
    • /
    • pp.187-197
    • /
    • 2020
  • Recently, various methods of text embedding using deep learning algorithms have been proposed. Especially, the way of using pre-trained language model which uses tremendous amount of text data in training is mainly applied for embedding new text data. However, traditional pre-trained language model has some limitations that it is hard to understand unique context of new text data when the text has too many tokens. In this paper, we propose self-supervised learning-based fine tuning method for pre-trained language model to infer vectors of long-text. Also, we applied our method to news articles and classified them into categories and compared classification accuracy with traditional models. As a result, it was confirmed that the vector generated by the proposed model more accurately expresses the inherent characteristics of the document than the vectors generated by the traditional models.