• Title/Summary/Keyword: Online mining

Search Result 398, Processing Time 0.027 seconds

A Study on the Perception of Data 3 Act through Big Data Analysis (빅데이터 분석을 통한 데이터 3법 인식에 관한 연구)

  • Oh, Jungjoo;Lee, Hwansoo
    • Convergence Security Journal
    • /
    • v.21 no.2
    • /
    • pp.19-28
    • /
    • 2021
  • Korea is promoting a digital new deal policy for the digital transformation and innovation accelerating of the industry. However, because of the strict existing data-related laws, there are still restrictions on the industry's use of data for the digital new deal policy. In order to solve this issue, a revised bill of the Data 3 Act has been proposed, but there is still insufficient discussion on how it will actually affect the activation of data use in the industry. Therefore, this study aims to analyze the perception of public opinion on the Data 3 Act and the implications of the revision of the Data 3 Act. To this end, the revision of the Data 3 Act and related research trends were analyzed, and the perception of the Data 3 Act was analyzed using a big data analysis technique. According to the analysis results, while promoting the vitalization of the data industry in line with the purpose of the revision, the Data 3 Act has a concern that it focuses on specific industries. The results of this study are meaningful in providing implications for future improvement plans by analyzing online perceptions of the industrial impact of the Data 3 Act in the early stages of implementation through big data analysis.

A Study on the Quantitative Evaluation of Initial Coin Offering (ICO) Using Unstructured Data (비정형 데이터를 이용한 ICO(Initial Coin Offering) 정량적 평가 방법에 대한 연구)

  • Lee, Han Sol;Ahn, Sangho;Kang, Juyoung
    • Smart Media Journal
    • /
    • v.11 no.5
    • /
    • pp.63-74
    • /
    • 2022
  • Initial public offering (IPO) has a legal framework for investor protection, and because there are various quantitative evaluation factors, objective analysis is possible, and various studies have been conducted. In addition, crowdfunding also has several devices to prevent indiscriminate funding as the legal system for investor protection. On the other hand, the blockchain-based cryptocurrency white paper (ICO), which has recently been in the spotlight, has ambiguous legal means and standards to protect investors and lacks quantitative evaluation methods to evaluate ICOs objectively. Therefore, this study collects online-published ICO white papers to detect fraud in ICOs, performs ICO fraud predictions based on BERT, a text embedding technique, and compares them with existing Random Forest machine learning techniques, and shows the possibility on fraud detection. Finally, this study is expected to contribute to the study of ICO fraud detection based on quantitative methods by presenting the possibility of using a quantitative approach using unstructured data to identify frauds in ICOs.

Analysis Method of User Review using Open Data (오픈 데이터를 이용한 사용자 리뷰 분석 방법)

  • Choi, Taeho;Hwang, Mansoo;Kim, Neunghoe
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.6
    • /
    • pp.185-190
    • /
    • 2022
  • Open data has a lot of economic value. Not only Korea, but many other countries are doing their best to make various policies and efforts to expand and utilize open data. However, although Korea has a large amount of data, the data is not utilized effectively. Thus, attempts to utilize those data should be made in various industries. In particular, in the fashion industry, exchange and refund problems are the most common due to unpredictable consumers. Better feedback is necessary for service providers to solve this problem. We want to solve it by showing improved images of dissatisfactions along with user reviews including consumer needs. In this paper, user reviews are analyzed on online shopping mall websites to identify consumer needs, and product attributes are defined by utilizing the attributes of K-fashion data. The users' request is defined as a dissatisfaction attribute, and labeling data with the corresponding attribute is searched. The users' request is provided to the service provider in forms of text data or attributes, as well as an image to help improve the product.

An Analysis of Changes in Perception of Metaverse through Big Data - Comparing Before and After COVID-19 - (빅데이터 분석을 통한 메타버스에 대한 인식 변화 분석 - 코로나19 발생 전후 비교를 중심으로 -)

  • Kang, Yu Rim;Kim, Mun Young
    • Fashion & Textile Research Journal
    • /
    • v.24 no.5
    • /
    • pp.593-604
    • /
    • 2022
  • The purpose of this study is to analyze the flow of change in perception of metaverse before and after COVID-19 through big data analysis. This research method used Textom to collect all data, including metaverse for two years before COVID-19 (2018.1.1~2019.11.30) and after COVID-19 outbreak (2020.1.11~2021.12.31), and the collection channels were selected by Naver and Google. The collected data were text mining, and word frequency, TF-IDF, word cloud, network analysis, and emotional analysis were conducted. As a result of the analysis, first, hotels, weddings, and glades were commonly extracted as social issues related to metaverse before and after COVID-19, and keywords such as robots and launches were derived, so the frequency of keywords related to hotels and weddings was high. Second, the association of the pre-COVID-19 metaverse keywords was platform-oriented, content-oriented, economic-oriented, and online promotion-oriented, and post-COVID-19 clusters were event-oriented, ontact sales-oriented, stock-oriented, and new businesses. Third, positive keywords such as likes, interest, and joy before COVID-19 were high, and positive keywords such as likes, joy, and interest after COVID-19. In conclusion, through this study, it was found that metaverse has firmly established itself as a new platform business model that can be used in various fields such as tourism, travel, festivals, and education using smart technology and metaverse.

A Study on the Analysis of Influx Factors in Urban Parks Using Data Mining - Focus on Yangjae Citizens' Forest Park - (데이터 마이닝을 활용한 도시공원 유입 요인 분석 연구 - 양재시민의 숲 공원을 대상으로 -)

  • Park Sang Hun
    • Journal of the Korean Regional Science Association
    • /
    • v.39 no.3
    • /
    • pp.35-48
    • /
    • 2023
  • This study analyzed the inflow factors of Yangjae Citizen's Forest Park using social big data generated online. To this end, the applicability of the emotional information analysis method is to be confirmed as a method of analyzing the perception of the city park and confirming the difference in the characteristics and use of the park. The analysis is based on big data, and as the core of the study is keyword network analysis, the methodology of the 'emotional information analysis method' patented by the author was applied. As a result of the analysis, among the influx factors of Yangjae Citizens' Forest recognized by citizens, the most positive emotional factor was derived as a factor related to 'park contents', and the negative emotional factor was derived as a factor related to 'park management'. These research results suggest that more in-depth program development and operation are needed to discover 'park contents' when implementing urban park revitalization support projects in the future

Classifying Social Media Users' Stance: Exploring Diverse Feature Sets Using Machine Learning Algorithms

  • Kashif Ayyub;Muhammad Wasif Nisar;Ehsan Ullah Munir;Muhammad Ramzan
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.79-88
    • /
    • 2024
  • The use of the social media has become part of our daily life activities. The social web channels provide the content generation facility to its users who can share their views, opinions and experiences towards certain topics. The researchers are using the social media content for various research areas. Sentiment analysis, one of the most active research areas in last decade, is the process to extract reviews, opinions and sentiments of people. Sentiment analysis is applied in diverse sub-areas such as subjectivity analysis, polarity detection, and emotion detection. Stance classification has emerged as a new and interesting research area as it aims to determine whether the content writer is in favor, against or neutral towards the target topic or issue. Stance classification is significant as it has many research applications like rumor stance classifications, stance classification towards public forums, claim stance classification, neural attention stance classification, online debate stance classification, dialogic properties stance classification etc. This research study explores different feature sets such as lexical, sentiment-specific, dialog-based which have been extracted using the standard datasets in the relevant area. Supervised learning approaches of generative algorithms such as Naïve Bayes and discriminative machine learning algorithms such as Support Vector Machine, Naïve Bayes, Decision Tree and k-Nearest Neighbor have been applied and then ensemble-based algorithms like Random Forest and AdaBoost have been applied. The empirical based results have been evaluated using the standard performance measures of Accuracy, Precision, Recall, and F-measures.

Analysis of Regional Fertility Gap Factors Using Explainable Artificial Intelligence (설명 가능한 인공지능을 이용한 지역별 출산율 차이 요인 분석)

  • Dongwoo Lee;Mi Kyung Kim;Jungyoon Yoon;Dongwon Ryu;Jae Wook Song
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.47 no.1
    • /
    • pp.41-50
    • /
    • 2024
  • Korea is facing a significant problem with historically low fertility rates, which is becoming a major social issue affecting the economy, labor force, and national security. This study analyzes the factors contributing to the regional gap in fertility rates and derives policy implications. The government and local authorities are implementing a range of policies to address the issue of low fertility. To establish an effective strategy, it is essential to identify the primary factors that contribute to regional disparities. This study identifies these factors and explores policy implications through machine learning and explainable artificial intelligence. The study also examines the influence of media and public opinion on childbirth in Korea by incorporating news and online community sentiment, as well as sentiment fear indices, as independent variables. To establish the relationship between regional fertility rates and factors, the study employs four machine learning models: multiple linear regression, XGBoost, Random Forest, and Support Vector Regression. Support Vector Regression, XGBoost, and Random Forest significantly outperform linear regression, highlighting the importance of machine learning models in explaining non-linear relationships with numerous variables. A factor analysis using SHAP is then conducted. The unemployment rate, Regional Gross Domestic Product per Capita, Women's Participation in Economic Activities, Number of Crimes Committed, Average Age of First Marriage, and Private Education Expenses significantly impact regional fertility rates. However, the degree of impact of the factors affecting fertility may vary by region, suggesting the need for policies tailored to the characteristics of each region, not just an overall ranking of factors.

Exploring the phenomenon of veganphobia in vegan food and vegan fashion (비건 음식과 비건 패션에서 나타난 비건포비아 현상에 대한 탐구)

  • Yeong-Hyeon Choi;Sangyung Lee
    • The Research Journal of the Costume Culture
    • /
    • v.32 no.3
    • /
    • pp.381-397
    • /
    • 2024
  • This study investigates the negative perceptions (veganphobia) held by consumers toward vegan diets and fashion and aims to foster a genuine acceptance of ethical veganism in consumption. The textual data web-crawled Korean online posts, including news articles, blogs, forums, and tweets, containing keywords such as "contradiction," "dilemma," "conflict," "issues," "vegan food" and "vegan fashion" from 2013 to 2021. Data analysis was conducted through text mining, network analysis, and clustering analysis using Python and NodeXL programs. The analysis revealed distinct negative perceptions regarding vegan food. Key issues included the perception of hypocrisy among vegetarians, associations with specific political leanings, conflicts between environmental and animal rights, and contradictions between views on companion animals and livestock. Regarding the vegan fashion industry, the eco-friendliness of material selection and design processes were seen as the pivotal factors shaping negative attitudes. Furthermore, the study identified a shared negative perception regarding vegan food and vegan fashion. This negativity was characterized by confusion and conflicts between animal and environmental rights, biased perceptions linked to specific political affiliations, perceived self-righteousness among vegetarians, and general discomfort toward them. These factors collectively contributed to a broader negative perception of vegan consumption. In conclusion, this study is significant in understanding the complex perceptions and attitudes that con- sumers hold toward vegan food and fashion. The insights gained from this research can aid in the design of more effective campaign strategies aimed at promoting vegan consumerism, ultimately contributing to a more widespread acceptance of ethical veganism in society.

Analysis of the Landscape Characteristics of Island Tourist Site Using Big Data - Based on Bakji and Banwol-do, Shinan-gun - (빅데이터를 활용한 섬 관광지의 경관 특성 분석 - 신안군 박지·반월도를 대상으로 -)

  • Do, Jee-Yoon;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.2
    • /
    • pp.61-73
    • /
    • 2021
  • This study aimed to identify the landscape perception and landscape characteristics of users by utilizing SNS data generated by their experiences. Therefore, how to recognize the main places and scenery appearing on the island, and what are the characteristics of the main scenery were analyzed using online text data and photo data. Text data are text mining and network structural analysis, while photographic data are landscape identification models and color analysis. As a result of the study, First, as a result of frequency analysis of Bakji·Banwol-do topics, we were able to derive keywords for local landscapes such as 'Purple Bridge', 'Doori Village', and location, behavior, and landscape images by analyzing them simultaneously. Second, the network structure analysis showed that the connection between key and undrawn keywords could be more specifically analyzed, indicating that creating landscapes using colors is affecting regional activation. Third, after analyzing the landscape identification model, it was found that artificial elements would be excluded to create preferred landscapes using the main targets of "Purple Bridge" and "Doori Village", and that it would be effective to set a view point of the sea and sky. Fourth, Bakji·Banwol-do were the first islands to be created under the theme of color, and the colors used in artificial facilities were similar to the surrounding environment, and were harmonized with contrasting lighting and saturation values. This study used online data uploaded directly by visitors in the landscape field to identify users' perceptions and objects of the landscape. Furthermore, the use of both text and photographic data to identify landscape recognition and characteristics is significant in that they can specifically identify which landscape and resources they prefer and perceive. In addition, the use of quantitative big data analysis and qualitative landscape identification models in identifying visitors' perceptions of local landscapes will help them understand the landscape more specifically through discussions based on results.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.