• Title/Summary/Keyword: 텍스트 데이터 분석

Search Result 1,095, Processing Time 0.027 seconds

Sentimental Analysis using the Phoneme-level Embedding Model (음소 단위 임베딩 모형을 이용한 감성 분석)

  • Hyun, Kyeongseok;Choi, Woosung;Jung, Soon-young;Chung, Jaehwa
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.1030-1032
    • /
    • 2019
  • 형태소 분석을 통하여 한국어 문장을 형태소 단위의 임베딩 및 학습 관련 연구가 되었으나 최근 비정형적인 텍스트 데이터의 증가에 따라 음소 단위의 임베딩을 통한 신경망 학습에 대한 요구가 높아지고 있다. 본 논문은 비정형적인 텍스트 감성 분석 성능 향상을 위해 음소 단위의 토큰을 생성하고 이를 CNN 모형을 기반으로 다차원 임베딩을 수행하고 감성분석을 위하여 양방향 순환신경망 모델을 사용하여 유튜브의 비정형 텍스트를 학습시켰다. 그 결과 텍스트의 긍정 부정 판별에 있어 90%의 정확도를 보였다.

Development of big data based Skin Care Information System SCIS for skin condition diagnosis and management

  • Kim, Hyung-Hoon;Cho, Jeong-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.137-147
    • /
    • 2022
  • Diagnosis and management of skin condition is a very basic and important function in performing its role for workers in the beauty industry and cosmetics industry. For accurate skin condition diagnosis and management, it is necessary to understand the skin condition and needs of customers. In this paper, we developed SCIS, a big data-based skin care information system that supports skin condition diagnosis and management using social media big data for skin condition diagnosis and management. By using the developed system, it is possible to analyze and extract core information for skin condition diagnosis and management based on text information. The skin care information system SCIS developed in this paper consists of big data collection stage, text preprocessing stage, image preprocessing stage, and text word analysis stage. SCIS collected big data necessary for skin diagnosis and management, and extracted key words and topics from text information through simple frequency analysis, relative frequency analysis, co-occurrence analysis, and correlation analysis of key words. In addition, by analyzing the extracted key words and information and performing various visualization processes such as scatter plot, NetworkX, t-SNE, and clustering, it can be used efficiently in diagnosing and managing skin conditions.

A Probabilistic Method for Recognizing Unlabeled Text on Web Pages (웹페이지에서 레이블이 없는 텍스트 인식을 위한 확률 모델)

  • 정창후;이민호;주원균;맹성현
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.163-165
    • /
    • 2003
  • 도메인 지식은 텍스트의 포맷과 의미 정보를 이용하여 웹에 존재하는 텍스트의 다양한 의미를 이해할 수 있도록 도와준다. 그러나 도메인 지식은 텍스트에 데이터의 의미를 표현하는 레이블이 존재하지 알을 경우에 텍스트 인식을 제대로 수행할 수 없기 때문에 무용지물이 되고 만다. 이러한 문제를 해결하기 위해 본 논문에서는 레이블이 존재하지 않는 텍스트의 의미를 효과적으로 추론할 수 있는 엔티티 인식 모델을 제안한다 엔티티 인식 모델은 베이지언 모델과 컨텍스트 정보를 결합한 방법으로서, 구조 분석을 수행한 HTML 문서의 텍스트 토큰에 대해서 어떤 엔티티에 속할 것인가를 결정하는 기능을 수행한다. 실험 결과 본 모델을 사용할 경우 기존에는 레이블이 없어서 인식되지 않았던 텍스트들을 효과적으로 인식하는 것을 확인할 수 있었다.

  • PDF

Research Trends of Adversarial Attack Techniques in Text (텍스트 분야 적대적 공격 기법 연구 동향)

  • Kim, Bo-Geum;Kang, Hyo-Eun;Kim, Yongsu;Kim, Ho-Won
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.420-422
    • /
    • 2022
  • 인공지능 기술이 문서 분류, 얼굴 인식, 자율 주행 등 실생활 전반에 걸쳐 다양한 분야에 적용됨에 따라, 인공지능 모델에 대한 취약점을 미리 파악하고 대비하는 기술의 중요성이 높아지고 있다. 이미지 영역에서는 입력 데이터에 작은 섭동을 추가해 신경망을 속이는 방법인 적대적 공격 연구가 활발하게 이루어졌지만, 텍스트 영역에서는 텍스트 데이터의 이산적인 특징으로 인해 연구에 어려움이 존재한다. 본 논문은 텍스트 분야 인공지능 기술에 대한 적대적 공격 기법을 분석하고 연구의 필요성을 살펴보고자 한다.

Identifying Research Trends in Big data-driven Digital Transformation Using Text Mining (텍스트마이닝을 활용한 빅데이터 기반의 디지털 트랜스포메이션 연구동향 파악)

  • Minjun, Kim
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.54-64
    • /
    • 2022
  • A big data-driven digital transformation is defined as a process that aims to innovate companies by triggering significant changes to their capabilities and designs through the use of big data and various technologies. For a successful big data-driven digital transformation, reviewing related literature, which enhances the understanding of research statuses and the identification of key research topics and relationships among key topics, is necessary. However, understanding and describing literature is challenging, considering its volume and variety. Establishing a common ground for central concepts is essential for science. To clarify key research topics on the big data-driven digital transformation, we carry out a comprehensive literature review by performing text mining of 439 articles. Text mining is applied to learn and identify specific topics, and the suggested key references are manually reviewed to develop a state-of-the-art overview. A total of 10 key research topics and relationships among the topics are identified. This study contributes to clarifying a systematized view of dispersed studies on big data-driven digital transformation across multiple disciplines and encourages further academic discussions and industrial transformation.

Analysis of VR Game Trends using Text Mining and Word Cloud -Focusing on STEAM review data- (텍스트마이닝과 워드 클라우드를 활용한 VR 게임 트렌드 분석 -스팀(steam) 리뷰 데이터를 중심으로-)

  • Na, Ji Young
    • Journal of Korea Game Society
    • /
    • v.22 no.1
    • /
    • pp.87-98
    • /
    • 2022
  • With the development of fourth industrial revolution-related technology and increased demands for non-face-to-face services, VR games attract attention. This study collected VR game review data from an online game platform STEAM and analyzed chronical trends using text mining and word cloud analysis. According to the results, experience and perceived cost were major trends from 2016 to 2017, increased demands for FPS and rhythm games were from 2018 to 2019, and story and immersion were from 2020 to 2021. It aims to contribute to expanding the base of VR games by identifying the keywords VR users take interest in by period.

TAGS: Text Augmentation with Generation and Selection (생성-선정을 통한 텍스트 증강 프레임워크)

  • Kim Kyung Min;Dong Hwan Kim;Seongung Jo;Heung-Seon Oh;Myeong-Ha Hwang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.10
    • /
    • pp.455-460
    • /
    • 2023
  • Text augmentation is a methodology that creates new augmented texts by transforming or generating original texts for the purpose of improving the performance of NLP models. However existing text augmentation techniques have limitations such as lack of expressive diversity semantic distortion and limited number of augmented texts. Recently text augmentation using large language models and few-shot learning can overcome these limitations but there is also a risk of noise generation due to incorrect generation. In this paper, we propose a text augmentation method called TAGS that generates multiple candidate texts and selects the appropriate text as the augmented text. TAGS generates various expressions using few-shot learning while effectively selecting suitable data even with a small amount of original text by using contrastive learning and similarity comparison. We applied this method to task-oriented chatbot data and achieved more than sixty times quantitative improvement. We also analyzed the generated texts to confirm that they produced semantically and expressively diverse texts compared to the original texts. Moreover, we trained and evaluated a classification model using the augmented texts and showed that it improved the performance by more than 0.1915, confirming that it helps to improve the actual model performance.

Frequency and Social Network Analysis of the Bible Data using Big Data Analytics Tools R (빅데이터 분석도구 R을 이용한 성경 데이터의 빈도와 소셜 네트워크 분석)

  • Ban, ChaeHoon;Ha, JongSoo;Kim, Dong Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.2
    • /
    • pp.166-171
    • /
    • 2020
  • Big data processing technology that can store and analyze data and obtain new knowledge has been adjusted for importance in many fields of the society. Big data is emerging as an important problem in the field of information and communication technology, but the mind of continuous technology is rising. the R, a tool that can analyze big data, is a language and environment that enables information analysis of statistical bases. In this paper, we use this to analyze the Bible data. We analyze the four Gospels of the New Testament in the Bible. We collect the Bible data and perform filtering for analysis. The R is used to investigate the frequency of what text is distributed and analyze the Bible through social network analysis, in which words from a sentence are paired and analyzed between words for accurate data analysis.

A Study on the Method for Extracting the Purpose-Specific Customized Information from Online Product Reviews based on Text Mining (텍스트 마이닝 기반의 온라인 상품 리뷰 추출을 통한 목적별 맞춤화 정보 도출 방법론 연구)

  • Kim, Joo Young;Kim, Dong soo
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.2
    • /
    • pp.151-161
    • /
    • 2016
  • In the era of the Web 2.0, characterized by the openness, sharing and participation, it is easy for internet users to produce and share the data. The amount of the unstructured data which occupies most of the digital world's data has increased exponentially. One of the kinds of the unstructured data called personal online product reviews is necessary for both the company that produces those products and the potential customers who are interested in those products. In order to extract useful information from lots of scattered review data, the process of collecting data, storing, preprocessing, analyzing, and drawing a conclusion is needed. Therefore we introduce the text-mining methodology for applying the natural language process technology to the text format data like product review in order to carry out extracting structured data by using R programming. Also, we introduce the data-mining to derive the purpose-specific customized information from the structured review information drawn by the text-mining.

Learning Emotional States of Chatting Partners from Text Data (채팅 텍스트로부터의 회자 감정상태 학습)

  • 문현구;장벽탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04b
    • /
    • pp.340-342
    • /
    • 2001
  • 현재 인터넷 환경에서 텍스트는 다루기 쉽고 부하가 적어 가장 많이 사용되는 통신 수단이다. 그러나 화상 채팅과는 달리 자신의 표정이나 체스춰를 전달할 수 있는 방법이 없기 때문에 표현상의 한계가 있다. 이 글은 일상 대화를 텍스트로 입력받아, naive Bayes 알고리즘을 사용해 미리 정의된 감정 범주, 즉 울기, 웃기, 화내기 등으로 분류해 주는 방법에 관해 다루고 있다. 채팅사이트에서 수집된 학습데이터는 사람에 의해 해당 감정 범주로 태깅되고, 이렇게 태깅된 데이터가 학습엔진에 의해 통계 정보로 구축되면, 실제 채팅사이트에서 감정인식 엔진은 입력된 데이터를 분석해 해당 감정으로 분류한다. 연령별로 5개의 그룹으로 나눈 대화방에서 각각 1000문장씩 테스트해본 결과 평균 91.6%의 정확도를 얻을 수 있었다.

  • PDF