• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.033 seconds

Analysis on Review Data of Restaurants in Google Maps through Text Mining: Focusing on Sentiment Analysis

  • Shin, Bee;Ryu, Sohee;Kim, Yongjun;Kim, Dongwhan
    • Journal of Multimedia Information System
    • /
    • v.9 no.1
    • /
    • pp.61-68
    • /
    • 2022
  • The importance of online reviews is prevalent as more people access goods or places online and make decisions to visit or purchase. However, such reviews are generally provided by short sentences or mere star ratings; failing to provide a general overview of customer preferences and decision factors. This study explored and broke down restaurant reviews found on Google Maps. After collecting and analyzing 5,427 reviews, we vectorized the importance of words using the TF-IDF. We used a random forest machine learning algorithm to calculate the coefficient of positivity and negativity of words used in reviews. As the result, we were able to build a dictionary of words for positive and negative sentiment using each word's coefficient. We classified words into four major evaluation categories and derived insights into sentiment in each criterion. We believe the dictionary of review words and analyzing the major evaluation categories can help prospective restaurant visitors to read between the lines on restaurant reviews found on the Web.

Grammatical Structure Oriented Automated Approach for Surface Knowledge Extraction from Open Domain Unstructured Text

  • Tissera, Muditha;Weerasinghe, Ruvan
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.2
    • /
    • pp.113-124
    • /
    • 2022
  • News in the form of web data generates increasingly large amounts of information as unstructured text. The capability of understanding the meaning of news is limited to humans; thus, it causes information overload. This hinders the effective use of embedded knowledge in such texts. Therefore, Automatic Knowledge Extraction (AKE) has now become an integral part of Semantic web and Natural Language Processing (NLP). Although recent literature shows that AKE has progressed, the results are still behind the expectations. This study proposes a method to auto-extract surface knowledge from English news into a machine-interpretable semantic format (triple). The proposed technique was designed using the grammatical structure of the sentence, and 11 original rules were discovered. The initial experiment extracted triples from the Sri Lankan news corpus, of which 83.5% were meaningful. The experiment was extended to the British Broadcasting Corporation (BBC) news dataset to prove its generic nature. This demonstrated a higher meaningful triple extraction rate of 92.6%. These results were validated using the inter-rater agreement method, which guaranteed the high reliability.

Pilot Experiment for Named Entity Recognition of Construction-related Organizations from Unstructured Text Data

  • Baek, Seungwon;Han, Seung H.;Jung, Wooyong;Kim, Yuri
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.847-854
    • /
    • 2022
  • The aim of this study is to develop a Named Entity Recognition (NER) model to automatically identify construction-related organizations from news articles. This study collected news articles using web crawling technique and construction-related organizations were labeled within a total of 1,000 news articles. The Bidirectional Encoder Representations from Transformers (BERT) model was used to recognize clients, constructors, consultants, engineers, and others. As a pilot experiment of this study, the best average F1 score of NER was 0.692. The result of this study is expected to contribute to the establishment of international business strategies by collecting timely information and analyzing it automatically.

  • PDF

Data Analysis Web Application Based on Text Mining (텍스트 마이닝 기반의 데이터 분석 웹 애플리케이션)

  • Gil, Wan-Je;Kim, Jae-Woong;Park, Koo-Rack;Lee, Yun-Yeol
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.103-104
    • /
    • 2021
  • 본 논문에서는 텍스트 마이닝 기반의 토픽 모델링 웹 애플리케이션 모델을 제안한다. 웹크롤링 기법을 활용하여 키워드를 입력하면 요약된 논문 정보를 파일로 저장할 수 있고 또한 키워드 빈도 분석과 토픽 모델링 등을 통해 연구 동향을 손쉽게 확인해볼 수 있는 웹 애플리케이션을 설계하고 구현하는 것을 목표로 한다. 제안 모델인 웹 애플리케이션을 통해 프로그래밍 언어와 데이터 분석 기법에 대한 지식이 부족하더라도 논문 수집과 저장, 텍스트 분석을 경험해볼 수 있다. 또한, 이러한 웹 시스템 개발은 기존의 html, css, java script와 같은 언어에 의존하지 않고 파이썬 라이브러리를 활용하였기 때문에 파이썬을 기반으로 데이터 분석과 머신러닝 교육을 수행할 경우 프로젝트 기반 수업 교육 과정으로 채택이 가능할 것으로 기대된다.

  • PDF

Skeleton-Based Data Learning Framework to Efficiently and Accurately Find Text Neck Posture (거북목 자세를 효율적이고 정확하게 찾기 위한 뼈대 기반 데이터 학습 프레임워크)

  • Na, Hong Eun;Kim, Jong-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.01a
    • /
    • pp.361-364
    • /
    • 2022
  • 본 논문에서는 스마트 기기를 사용할 시 자세가 거북목 자세인지 아닌지 판별하는 시스템을 제안한다. 거북목 증후군이란 목이 구부정하게 앞으로 나오는 자세를 오래 취해 목이 일자목으로 바뀌고 뒷목, 어깨, 허리 등에 통증이 생기는 증상을 말하며, 수술이나 약물치료보다 평소의 자세 습관을 고치는 방법이 효과적이다. 기존의 연구들은 노트북에 내장되어있는 웹캠을 이용한 CNN기반의 학습모델은 영상의 명도와 학습 데이터 등에 많은 영향을 받고 학습 데이터를 모을 때 초상권 문제로 수집이 어렵다. 본 논문에서는 이러한 문제를 예방하고자 Openpose 오픈 소스를 이용한 뼈대를 기반으로 측면에서의 앉은 자세를 한습 모델로 실시간 검증하여, 거북목 자세인지 아닌지를 효율적이고 정확하게 판별한다.

  • PDF

Spark-Based Big Data Preprocessing for Text Summarization (텍스트 요약을 위한 스파크 기반 대용량 데이터 전처리)

  • Ji, Dong-Jun;Jun, Hee-Gook;Im, Dong-Hyuk
    • Annual Conference of KIPS
    • /
    • 2022.11a
    • /
    • pp.383-385
    • /
    • 2022
  • 텍스트 요약(Text Summarization)은 자연어 처리(NLP) 분야의 주요 작업 중 하나이다. 높은 정확성을 보이는 문서 요약 딥 러닝 모델을 만들기 위해서 대용량 학습 데이터가 필요한데, 대용량 데이터 전처리 과정에서 처리 시간, 메모리 관리 등과 같은 문제가 발생한다. 본 논문에서는 대규모 병렬처리 플랫폼 Apache Spark 를 사용해 추상 요약 딥 러닝 모델의 데이터 전처리 과정을 개선하는 방법을 제안한다. 실험 결과 제안한 방법이 기존 방법보다 데이터 전처리 시간이 개선된 결과를 보이고 있다.

A Multilevel Project-Oriented Risk-Mining Framework for Overseas Construction Projects

  • Son, JeongWook;Lee, JeeHee;Yi, June-Seong
    • International conference on construction engineering and project management
    • /
    • 2015.10a
    • /
    • pp.39-40
    • /
    • 2015
  • As international construction market increases, the importance of risk management in international construction project is emphasized. Unfortunately, current risk management practice does not sufficiently deal with project risks. Although a lot of risk analysis techniques have been introduced, most of them focus on project's external unexpected risks such as country conditions and owner's financial standing. However, because those external risks are difficult to manage and take preemptive action, we need to concentrate on project inherent risks. Based on this premise, this paper proposes a project-oriented risk mining approach which could detect and extract project risk factors automatically before they are materialized. This study presents a methodology regarding how to extract potential risks which exist in owner's project requirements and project tender documents using state of the art data analysis method such as text mining. The project-oriented risk mining approach is expected to effectively reflect project characteristics to the project risk management and could provide construction firms with valuable business intelligence.

  • PDF

Unraveling the relationship between the dimensions of user experience and user satisfaction in metaverse: A Mixed-methods Approach (메타버스 이용자 경험요인이 만족도에 미치는 영향: 텍스트 마이닝과 계량 분석 혼합방법론)

  • Jeong, Da Hyeon;Kim, Hee Woong;Yoon, Sang Hyeak
    • The Journal of Information Systems
    • /
    • v.32 no.3
    • /
    • pp.19-39
    • /
    • 2023
  • Purpose This study aims to identify user experience factors that can enhance both metaverse utilization and satisfaction based on the honeycomb model. For this we presented two research questions: first, what are the experience factors of metaverse users? Second, do metaverse user experience factors impact satisfaction? Design/methodology/approach To address these questions, a mixed-methodology approach is employed, including text mining techniques to analyze online reviews and quantitative econometric analysis to reveal the relationship between user experience factors and satisfaction. A total of 69,880 reviews and ratings data were collected. Findings The analysis revealed eight metaverse user experience factors: entertainment, operability, virtual reality, immersion, economic activity, visual performance, avatar, and sociality, all of which were found to have a positive impact on user satisfaction.

Analysis of Research Trends in Mathematics Education regarding the Educational Environment based on Digital Technology (디지털 기술 교육 환경 기반 수학교육에 대한 국내 선행 연구의 경향성 분석 연구)

  • Ko, Ho Kyoung;Maeng, Unkyoung;Son, Bok Eun
    • East Asian mathematical journal
    • /
    • v.39 no.4
    • /
    • pp.437-454
    • /
    • 2023
  • The core of the change in the era of the 4th industrial revolution is the change in the base of 'digital technology'. These changes are incomparably large and are expected to have a more important impact on our lives than ever before. One of the major inflection points in the transition to the digital era is the education field, and IT technology has become an essential element in the educational field. Accordingly, this study examines domestic research trends related to the educational environment based on digital technology. Then, we would like to provide implications for the establishment of a digital-based educational environment that will change in the future. To this end, Semantic network analysis has been conducted to quantitatively structure text data obtained from studies related to digital technology in the field of mathematics education over the past 10 years, and the discussion will continue based on the results.

Misinformation Detection and Rectification Based on QA System and Text Similarity with COVID-19

  • Insup Lim;Namjae Cho
    • Journal of Information Technology Applications and Management
    • /
    • v.28 no.5
    • /
    • pp.41-50
    • /
    • 2021
  • As COVID-19 spread widely, and rapidly, the number of misinformation is also increasing, which WHO has referred to this phenomenon as "Infodemic". The purpose of this research is to develop detection and rectification of COVID-19 misinformation based on Open-domain QA system and text similarity. 9 testing conditions were used in this model. For open-domain QA system, 6 conditions were applied using three different types of dataset types, scientific, social media, and news, both datasets, and two different methods of choosing the answer, choosing the top answer generated from the QA system and voting from the top three answers generated from QA system. The other 3 conditions were the Closed-Domain QA system with different dataset types. The best results from the testing model were 76% using all datasets with voting from the top 3 answers outperforming by 16% from the closed-domain model.