• Title/Summary/Keyword: 비정형분석

Search Result 484, Processing Time 0.025 seconds

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.

Sentiment Prediction using Emotion and Context Information in Unstructured Documents (비정형 문서에서 감정과 상황 정보를 이용한 감성 예측)

  • Kim, Jin-Su
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.10
    • /
    • pp.40-46
    • /
    • 2020
  • With the development of the Internet, users share their experiences and opinions. Since related keywords are used witho0ut considering information such as the general emotion or genre of an unstructured document such as a movie review, the sensitivity accuracy according to the appropriate emotional situation is impaired. Therefore, we propose a system that predicts emotions based on information such as the genre to which the unstructured document created by users belongs or overall emotions. First, representative keyword related to emotion sets such as Joy, Anger, Fear, and Sadness are extracted from the unstructured document, and the normalized weights of the emotional feature words and information of the unstructured document are trained in a system that combines CNN and LSTM as a training set. Finally, by testing the refined words extracted through movie information, morpheme analyzer and n-gram, emoticons, and emojis, it was shown that the accuracy of emotion prediction using emotions and F-measure were improved. The proposed prediction system can predict sentiment appropriately according to the situation by avoiding the error of judging negative due to the use of sad words in sad movies and scary words in horror movies.

A Study on the Use of Stopword Corpus for Cleansing Unstructured Text Data (비정형 텍스트 데이터 정제를 위한 불용어 코퍼스의 활용에 관한 연구)

  • Lee, Won-Jo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.6
    • /
    • pp.891-897
    • /
    • 2022
  • In big data analysis, raw text data mostly exists in various unstructured data forms, so it becomes a structured data form that can be analyzed only after undergoing heuristic pre-processing and computer post-processing cleansing. Therefore, in this study, unnecessary elements are purified through pre-processing of the collected raw data in order to apply the wordcloud of R program, which is one of the text data analysis techniques, and stopwords are removed in the post-processing process. Then, a case study of wordcloud analysis was conducted, which calculates the frequency of occurrence of words and expresses words with high frequency as key issues. In this study, to improve the problems of the "nested stopword source code" method, which is the existing stopword processing method, using the word cloud technique of R, we propose the use of "general stopword corpus" and "user-defined stopword corpus" and conduct case analysis. The advantages and disadvantages of the proposed "unstructured data cleansing process model" are comparatively verified and presented, and the practical application of word cloud visualization analysis using the "proposed external corpus cleansing technique" is presented.

3D Object State Extraction Through Adjective Analysis from Informal Requirements Specs (비정형 요구사항 스펙에서 형용사 분석을 통한 3D 객체 상태 추출화)

  • Ye Jin Jin;Chae Yun Seo;Ji Hoon Kong;R. Young Chul Kim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.10
    • /
    • pp.529-536
    • /
    • 2024
  • Recent advancements in AI technology have led to its application across various fields. However, the lack of transparency in AI operations makes it challenging to guarantee the quality of its outputs. Therefore, we integrate requirements engineering in software engineering with conversational AI technology to ensure procedural fairness. Traditional requirements engineering research uses grammar-centered analysis, which often fails to fully interpret the semantic aspects of natural language. To solve this, we suggest combining Noam Chomsky's syntactic structure analysis with Charles Fillmore's semantic role theory. Additionally, we extend our previous research by analyzing adjectives in informal requirement sentence structures. This enables precise emotional analysis of the main characters in comics. Based on the results of the analysis, we apply the emotional states of the objects to the states in the UML state diagram. Then, we create the 3D object with Three.js based on the object that reflects the emotional states in the state diagram. With this approach, we expect to represent the emotional state of a 3D object.

A Study on the Relationship between the Eccentricity and the Level of Damage in the Seismic Response of Buildings with Plan Irregularities (지진 하중을 받는 평면 비정형 건물의 편심과 손상도의 상관관계에 대한 연구)

  • Jeong, Seoung-Hoon;Lee, Kwang-Ho
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.14 no.3
    • /
    • pp.49-57
    • /
    • 2010
  • Most previous research on the seismic response of structures with plan irregularities have focused on the relationship between the eccentricity and the amount of torsion. This approach cannot provide the direct relationship between the irregularity and the damage. Therefore, an investigation on the relationship between the eccentricities of buildings with plan irregularities and the damage index was performed. Inelastic dynamic time-history analyses were performed on one-story buildings with various eccentricities. For the damage assessment, a 3D damage index was adopted to reflect the effect of the bi-directional response and torsion. Based on the analysis results, buildings with eccentricities of 10%, 20% and 30% will suffer 3~5%, 13~18%, and 33~47% more damage than their regular counterparts, respectively.

A Design on Informal Big Data Topic Extraction System Based on Spark Framework (Spark 프레임워크 기반 비정형 빅데이터 토픽 추출 시스템 설계)

  • Park, Kiejin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.521-526
    • /
    • 2016
  • As on-line informal text data have massive in its volume and have unstructured characteristics in nature, there are limitations in applying traditional relational data model technologies for data storage and data analysis jobs. Moreover, using dynamically generating massive social data, social user's real-time reaction analysis tasks is hard to accomplish. In the paper, to capture easily the semantics of massive and informal on-line documents with unsupervised learning mechanism, we design and implement automatic topic extraction systems according to the mass of the words that consists a document. The input data set to the proposed system are generated first, using N-gram algorithm to build multiple words to capture the meaning of the sentences precisely, and Hadoop and Spark (In-memory distributed computing framework) are adopted to run topic model. In the experiment phases, TB level input data are processed for data preprocessing and proposed topic extraction steps are applied. We conclude that the proposed system shows good performance in extracting meaningful topics in time as the intermediate results come from main memories directly instead of an HDD reading.

Multi-Dimensional Keyword Search and Analysis of Hotel Review Data Using Multi-Dimensional Text Cubes (다차원 텍스트 큐브를 이용한 호텔 리뷰 데이터의 다차원 키워드 검색 및 분석)

  • Kim, Namsoo;Lee, Suan;Jo, Sunhwa;Kim, Jinho
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.1
    • /
    • pp.63-73
    • /
    • 2014
  • As the advance of WWW, unstructured data including texts are taking users' interests more and more. These unstructured data created by WWW users represent users' subjective opinions thus we can get very useful information such as users' personal tastes or perspectives from them if we analyze appropriately. In this paper, we provide various analysis efficiently for unstructured text documents by taking advantage of OLAP (On-Line Analytical Processing) multidimensional cube technology. OLAP cubes have been widely used for the multidimensional analysis for structured data such as simple alphabetic and numberic data but they didn't have used for unstructured data consisting of long texts. In order to provide multidimensional analysis for unstructured text data, however, Text Cube model has been proposed precently. It incorporates term frequency and inverted index as measurements to search and analyze text databases which play key roles in information retrieval. The primary goal of this paper is to apply this text cube model to a real data set from in an Internet site sharing hotel information and to provide multidimensional analysis for users' reviews on hotels written in texts. To achieve this goal, we first build text cubes for the hotel review data. By using the text cubes, we design and implement the system which provides multidimensional keyword search features to search and to analyze review texts on various dimensions. This system will be able to help users to get valuable guest-subjective summary information easily. Furthermore, this paper evaluats the proposed systems through various experiments and it reveals the effectiveness of the system.

Study on the Methodology for Extracting Information from SNS Using a Sentiment Analysis (SNS 감성분석을 이용한 정보 추출 방법론에 관한 연구)

  • Hong, Doopyo;Jeong, Harim;Park, Sangmin;Han, Eum;Kim, Honghoi;Yun, Ilsoo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.6
    • /
    • pp.141-155
    • /
    • 2017
  • As the use of SNS becomes more active, many people are posting their thoughts about specific events in their SNS in the form of text. As a result, SNS is used in various fields such as finance and distribution to conduct service satisfaction surveys and consumer monitoring. However, in the transportation area, there are not enough cases to utilize unstructured data analysis such as emotional analysis. In this study, we developed an emotional analysis methodology that can be used in transportation by using highway VOC data, which is atypical data collected by Korea Expressway Corporation. The developed methodology consists of morpheme analysis, emotional dictionary construction, and emotional discrimination of the collected unstructured data. The developed methodology was verified using highway related tweet data. As a result of the analysis, it can be guessed that many information and information about the construction and the accident were related to the highway during the analysis period. Also, it seems that users complain about the delay caused by construction and accident.

A study on unstructured text mining algorithm through R programming based on data dictionary (Data Dictionary 기반의 R Programming을 통한 비정형 Text Mining Algorithm 연구)

  • Lee, Jong Hwa;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.20 no.2
    • /
    • pp.113-124
    • /
    • 2015
  • Unlike structured data which are gathered and saved in a predefined structure, unstructured text data which are mostly written in natural language have larger applications recently due to the emergence of web 2.0. Text mining is one of the most important big data analysis techniques that extracts meaningful information in the text because it has not only increased in the amount of text data but also human being's emotion is expressed directly. In this study, we used R program, an open source software for statistical analysis, and studied algorithm implementation to conduct analyses (such as Frequency Analysis, Cluster Analysis, Word Cloud, Social Network Analysis). Especially, to focus on our research scope, we used keyword extract method based on a Data Dictionary. By applying in real cases, we could find that R is very useful as a statistical analysis software working on variety of OS and with other languages interface.

A Plan of Developing the Disaster Preparedness System through Text Analysis (비정형 데이터 분석을 통한 재난예방체계 발전방안)

  • Choi, Seon-Hwa;Choi, Woo-Jeong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.13-15
    • /
    • 2012
  • 최근 모바일 인터넷과 소셜미디어 등장으로 데이터가 폭발적으로 증가하고 있으며, 이를 활용하여 정치 사회 경제 등 제반 이슈와 연계된 분석 예측의 중요성이 날로 증가하고 있다. 특히 모바일 기기의 이동성 위치기반 실시간 등의 특징은 재난안전 관리에 유용한 수단이 되고 있으며, 재난발생시 비상정보 획득 및 공유의 매체로 활용되고 있다. 본 논문은 인터넷에 존재하는 재난관련 언론보도, 민원, 제보 등의 비정형 데이터를 분석하여 재난전조(前兆)를 사전에 파악하고 위험요소를 제거하는 체계에 대해 소개하고 이 체계를 효과적으로 운영하기 위해 도입되어야 할 정보기술과 발전방안을 제안한다.