• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.033 seconds

Classifying and Characterizing the Types of Gentrified Commercial Districts Based on Sense of Place Using Big Data: Focusing on 14 Districts in Seoul (빅데이터를 활용한 젠트리피케이션 상권의 장소성 분류와 특성 분석 -서울시 14개 주요상권을 중심으로-)

  • Young-Jae Kim;In Kwon Park
    • Journal of the Korean Regional Science Association
    • /
    • v.39 no.1
    • /
    • pp.3-20
    • /
    • 2023
  • This study aims to categorize the 14 major gentrified commercial areas of Seoul and analyze their characteristics based on their sense of place. To achieve this, we conducted hierarchical cluster analysis using text data collected from Naver Blog. We divided the districts into two dimensions: "experience" and "feature" and analyzed their characteristics using LDA (Latent Dirichlet Allocation) of the text data and statistical data collected from Seoul Open Data Square. As a result, we classified the commercial districts of Seoul into 5 categories: 'theater district,' 'traditional cultural district,' 'female-beauty district,' 'exclusive restaurant and medical district,' and 'trend-leading district.' The findings of this study are expected to provide valuable insights for policy-makers to develop more efficient and suitable commercial policies.

Identifying Consumer Response Factors in Live Commerce : Based on Consumer-Generated Text Data (라이브 커머스에서의 소비자 반응 요인 도출 : 소비자 생성 텍스트 데이터를 기반으로)

  • Park, Jae-Hyeong;Lee, Han-Sol;Kang, Ju-Young
    • Informatization Policy
    • /
    • v.30 no.2
    • /
    • pp.68-85
    • /
    • 2023
  • In this study, we collected data from live commerce streaming. Streamimg data were then categorized based on the degree of chatting activation, with the distribution of text responses generated by consumers analyzed. From a total of 2,282 streaming data on NAVER Shopping Live -which has the largest share in the domestic live commerce market- we selected 200 streaming data with the most active viewer responses and finally chose the streams that had steep increase or decrease in viewer responses. We synthesized variables from the existing literature on live commerce viewing intentions and participation motivations to create a table of variables for the purpose of the study. Then we applied them with events in the broadcast. Through this study, we identified which components of the broadcast stimulate the variables of consumer response found in previous studies, moreover, we empirically identified the motivations of consumers to participate in live commerce through data.

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.

A Study on the User Experience at Unmanned Cafe Using Big Data Analsis: Focus on text mining and semantic network analysis (빅데이터를 활용한 무인카페 소비자 인식에 관한 연구: 텍스트 마이닝과 의미연결망 분석을 중심으로)

  • Seung-Yeop Lee;Byeong-Hyeon Park;Jang-Hyeon Nam
    • Asia-Pacific Journal of Business
    • /
    • v.14 no.3
    • /
    • pp.241-250
    • /
    • 2023
  • Purpose - The purpose of this study was to investigate the perception of 'unmanned cafes' on the network through big data analysis, and to identify the latest trends in rapidly changing consumer perception. Based on this, I would like to suggest that it can be used as basic data for the revitalization of unmanned cafes and differentiated marketing strategies. Design/methodology/approach - This study collected documents containing unmanned cafe keywords for about three years, and the data collected using text mining techniques were analyzed using methods such as keyword frequency analysis, centrality analysis, and keyword network analysis. Findings - First, the top 10 words with a high frequency of appearance were identified in the order of unmanned cafes, unmanned cafes, start-up, operation, coffee, time, coffee machine, franchise, and robot cafes. Second, visualization of the semantic network confirmed that the key keyword "unmanned cafe" was at the center of the keyword cluster. Research implications or Originality - Using big data to collect and analyze keywords with high web visibility, we tried to identify new issues or trends in unmanned cafe recognition, which consists of keywords related to start-ups, mainly deals with topics related to start-ups when unmanned cafes are mentioned on the network.

A Text Mining Study on Endangered Wildlife Complaints - Discovery of Key Issues through LDA Topic Modeling and Network Analysis - (멸종위기 야생생물 민원 텍스트 마이닝 연구 - LDA 토픽 모델링과 네트워크 분석을 통한 주요 이슈 발굴 -)

  • Kim, Na-Yeong;Nam, Hee-Jung;Park, Yong-Su
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.26 no.6
    • /
    • pp.205-220
    • /
    • 2023
  • This study aimed to analyze the needs and interests of the public on endangered wildlife using complaint big data. We collected 1,203 complaints and their corresponding text data on endangered wildlife, pre-processed them, and constructed a document-term matrix for 1,739 text data. We performed LDA (Latent Dirichlet Allocation) topic modeling and network analysis. The results revealed that the complaints on endangered wildlife peaked in June-August, and the interest shifted from insects to various endangered wildlife in the living area, such as mammals, birds, and amphibians. In addition, the complaints on endangered wildlife could be categorized into 8 topics and 5 clusters, such as discovery report, habitat protection and response request, information inquiry, investigation and action request, and consultation request. The co-occurrence network analysis for each topic showed that the keywords reflecting the call center reporting procedure, such as photo, send, and take, had high centrality in common, and other keywords such as dung beetle, know, absence and think played an important role in the network. Through this analysis, we identified the main keywords and their relationships within each topic and derived the main issues for each topic. This study confirmed the increasing and diversifying public interest and complaints on endangered wildlife and highlighted the need for professional response. We also suggested developing and extending participatory conservation plans that align with the public's preferences and demands. This study demonstrated the feasibility of using complaint big data on endangered wildlife and its implications for policy decision-making and public promotion on endangered wildlife.

A Study on Measuring the Risk of Re-identification of Personal Information in Conversational Text Data using AI

  • Dong-Hyun Kim;Ye-Seul Cho;Tae-Jong Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.10
    • /
    • pp.77-87
    • /
    • 2024
  • With the recent advancements in artificial intelligence, various chatbots have emerged, efficiently performing everyday tasks such as hotel bookings, news updates, and legal consultations. Particularly, generative chatbots like ChatGPT are expanding their applicability by generating original content in fields such as education, research, and the arts. However, the training of these AI chatbots requires large volumes of conversational text data, such as customer service records, which has led to privacy infringement cases domestically and internationally due to the use of unrefined data. This study proposes a methodology to quantitatively assess the re-identification risk of personal information contained in conversational text data used for training AI chatbots. To validate the proposed methodology, we conducted a case study using synthetic conversational data and carried out a survey with 220 external experts, confirming the significance of the proposed approach.

A Study of Comparison between Cruise Tours in China and U.S.A through Big Data Analytics

  • Shuting, Tao;Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.23 no.6
    • /
    • pp.1-11
    • /
    • 2017
  • The purpose of this study was to compare the cruise tours between China and U.S.A. through the semantic network analysis of big data by collecting online data with SCTM (Smart crawling & Text mining), a data collecting and processing program. The data analysis period was from January $1^{st}$, 2015 to August $15^{th}$, 2017, meanwhile, "cruise tour, china", "cruise tour, usa" were conducted to be as keywords to collet related data and packaged Netdraw along with UCINET 6.0 were utilized for data analysis. Currently, Chinese cruisers concern on the cruising destinations while American cruisers pay more attention on the onboard experience and cruising expenditure. After performing CONCOR (convergence of iterated correlation) analysis, for Chinese cruise tour, there were three clusters created with domestic destinations, international destinations and hospitality tourism. As for American cruise tour, four groups have been segmented with cruise expenditure, onboard experience, cruise brand and destinations. Since the cruise tourism of America was greatly developed, this study also was supposed to provide significant and social network-oriented suggestions for Chinese cruise tourism.

An Exploratory Study on the Semantic Network Analysis of Food Tourism through the Big Data (빅데이터를 활용한 음식관광관련 의미연결망 분석의 탐색적 적용)

  • Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.23 no.4
    • /
    • pp.22-32
    • /
    • 2017
  • The purpose of this study was to explore awareness of food tourism using big data analysis. For this, this study collected data containing 'food tourism' keywords from google web search, google news, and google scholar during one year from January 1 to December 31, 2016. Data were collected by using SCTM (Smart Crawling & Text Mining), a data collecting and processing program. From those data, degree centrality and eigenvector centrality were analyzed by utilizing packaged NetDraw along with UCINET 6. The result showed that the web visibility of 'core service' and 'social marketing' was high. In addition, the web visibility was also high for destination, such as rural, place, ireland and heritage; 'socioeconomic circumstance' related words, such as economy, region, public, policy, and industry. Convergence of iterated correlations showed 4 clustered named 'core service', 'social marketing', 'destinations' and 'social environment'. It is expected that this diagnosis on food tourism according to changes in international business environment by using these web information will be a foundation of baseline data useful for establishing food tourism marketing strategies.

Frequency Analysis of Scientific Texts on the Hypoxia Using Bibliographic Data (논문 서지정보를 이용한 빈산소수괴 연구 분야의 연구용어 빈도분석)

  • Lee, GiSeop;Lee, JiYoung;Cho, HongYeon
    • Ocean and Polar Research
    • /
    • v.41 no.2
    • /
    • pp.107-120
    • /
    • 2019
  • The frequency analysis of scientific terms using bibliographic information is a simple concept, but as relevant data become more widespread, manual analysis of all data is practically impossible or only possible to a very limited extent. In addition, as the scale of oceanographic research has expanded to become much more comprehensive and widespread, the allocation of research resources on various topics has become an important issue. In this study, the frequency analysis of scientific terms was performed using text mining. The data used in the analysis is a general-purpose scholarship database, totaling 2,878 articles. Hypoxia, which is an important issue in the marine environment, was selected as a research field and the frequencies of related words were analyzed. The most frequently used words were 'Organic matter', 'Bottom water', and 'Dead zone' and specific areas showed high frequency. The results of this research can be used as a basis for the allocation of research resources to the frequency of use of related terms in specific fields when planning a large research project represented by single word.

Analysis of Business Performance of Local SMEs Based on Various Alternative Information and Corporate SCORE Index

  • HWANG, Sun Hee;KIM, Hee Jae;KWAK, Dong Chul
    • The Journal of Economics, Marketing and Management
    • /
    • v.10 no.3
    • /
    • pp.21-36
    • /
    • 2022
  • Purpose: The purpose of this study is to compare and analyze the enterprise's score index calculated from atypical data and corrected data. Research design, data, and methodology: In this study, news articles which are non-financial information but qualitative data were collected from 2,432 SMEs that has been extracted "square proportional stratification" out of 18,910 enterprises with fixed data and compared/analyzed each enterprise's score index through text mining analysis methodology. Result: The analysis showed that qualitative data can be quantitatively evaluated by region, industry and period by collecting news from SMEs, and that there are concerns that it could be an element of alternative credit evaluation. Conclusion: News data cannot be collected even if one of the small businesses is self-employed or small businesses has little or no news coverage. Data normalization or standardization should be considered to overcome the difference in scores due to the amount of reference. Furthermore, since keyword sentiment analysis may have different results depending on the researcher's point of view, it is also necessary to consider deep learning sentiment analysis, which is conducted by sentence.