• Title/Summary/Keyword: Big data analysis method

Search Result 880, Processing Time 0.022 seconds

Comparison of Sentiment Analysis from Large Twitter Datasets by Naïve Bayes and Natural Language Processing Methods

  • Back, Bong-Hyun;Ha, Il-Kyu
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.4
    • /
    • pp.239-245
    • /
    • 2019
  • Recently, effort to obtain various information from the vast amount of social network services (SNS) big data generated in daily life has expanded. SNS big data comprise sentences classified as unstructured data, which complicates data processing. As the amount of processing increases, a rapid processing technique is required to extract valuable information from SNS big data. We herein propose a system that can extract human sentiment information from vast amounts of SNS unstructured big data using the naïve Bayes algorithm and natural language processing (NLP). Furthermore, we analyze the effectiveness of the proposed method through various experiments. Based on sentiment accuracy analysis, experimental results showed that the machine learning method using the naïve Bayes algorithm afforded a 63.5% accuracy, which was lower than that yielded by the NLP method. However, based on data processing speed analysis, the machine learning method by the naïve Bayes algorithm demonstrated a processing performance that was approximately 5.4 times higher than that by the NLP method.

Analyzing Operation Deviation in the Deasphalting Process Using Multivariate Statistics Analysis Method

  • Park, Joo-Hwang;Kim, Jong-Soo;Kim, Tai-Suk
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.7
    • /
    • pp.858-865
    • /
    • 2014
  • In the case of system like MES, various sensors collect the data in real time and save it as a big data to monitor the process. However, if there is big data mining in distributed computing system, whole processing process can be improved. In this paper, system to analyze the cause of operation deviation was built using the big data which has been collected from deasphalting process at the two different plants. By applying multivariate statistical analysis to the big data which has been collected through MES(Manufacturing Execution System), main cause of operation deviation was analyzed. We present the example of analyzing the operation deviation of deasphalting process using the big data which collected from MES by using multivariate statistics analysis method. As a result of regression analysis of the forward stepwise method, regression equation has been found which can explain 52% increase of performance compare to existing model. Through this suggested method, the existing petrochemical process can be replaced which is manual analysis method and has the risk of being subjective according to the tester. The new method can provide the objective analysis method based on numbers and statistic.

Neo-Chinese Style Furniture Design Based on Semantic Analysis and Connection

  • Ye, Jialei;Zhang, Jiahao;Gao, Liqian;Zhou, Yang;Liu, Ziyang;Han, Jianguo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.8
    • /
    • pp.2704-2719
    • /
    • 2022
  • Lately, neo-Chinese style furniture has been frequently noticed by product design professionals for the big part it played in promoting traditional Chinese culture. This article is an attempt to use big data semantic analysis method to provide effective design research method for neo-Chinese furniture design. By using big data mining program TEXTOM for big data collection and analysis, the data obtained from typical websites in a set time period will be sorted and analyzed. On the basis of "neo-Chinese furniture" samples, key data will be compared, classification analysis of overall data, and horizontal analysis of typical data will be performed by the methods of word frequency analysis, connection centrality analysis, and TF-IDF analysis. And we tried to summarize according to the related views and theories of the design. The research results show that the results of data analysis are close to the relevant definitions of design. The core high-frequency vocabulary obtained under data analysis, such as popular, furniture, modern, etc., can provide a reasonable and effective focus of attention for the designs. The result obtained through the systematic sorting and summary of the data can be a reliable guidance in the direction of our design. This research attempted to introduce related big data mining semantic analysis methods into the product design industry, to supply scientific and objective data and channels for studies on design, and to provide a case on the practical application of big data analysis in the industry.

A Case Study on Big Data Analysis Systems for Policy Proposals of Engineering Education (공학교육 정책제안을 위한 빅데이터 분석 시스템 사례 분석 연구)

  • Kim, JaeHee;Yoo, Mina
    • Journal of Engineering Education Research
    • /
    • v.22 no.5
    • /
    • pp.37-48
    • /
    • 2019
  • The government has tried to develop a platform for systematically collecting and managing engineering education data for policy proposals. However, there have been few cases of big data analysis platform for policy proposals in engineering education, and it is difficult to determine the major function of the platform, the purpose of using big data, and the method of data collection. This study aims to collect the cases of big data analysis systems for the development of a big data system for educational policy proposals, and to conduct a study to analyze cases using the analysis frame of key elements to consider in developing a big data analysis platform. In order to analyze the case of big data system for engineering education policy proposals, 24 systems collecting and managing big data were selected. The analysis framework was developed based on literature reviews and the results of the case analysis were presented. The results of this study are expected to provide from macro-level such as what functions the platform should perform in developing a big data system and how to collect data, what analysis techniques should be adopted, and how to visualize the data analysis results.

A Big Data Analysis of Yumentingzheng: Weiwenqiju as an Example (어문청정 빅데이터 분석: 위문기거 일례)

  • Snowberger, Aaron Daniel;Lee, Choong Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.624-626
    • /
    • 2021
  • Yumentingzheng, which records the contents of the Qing dynasty's discussions with his subjects, is an important document like the Annals of Joseon in Korea. This paper describes the method and steps for big data analysis of Yumentingzheng written in Manchu alphabet. In big data analysis of documents written in Manchu characters, there are many problems that need to be solved in advance, and research on these should be preceded. In this paper, a method of big data analysis using the R language was proposed in the stage where the text written in Manchurian characters was transliterated into Latin characters through a preliminary study to be conducted in the future. In the proposed method, Apkai method was adopted for the transliteration of Wumentingzheng, and the results of big data analysis were presented using the text of Weiwenqiju.

  • PDF

Big Data Analysis Using Principal Component Analysis (주성분 분석을 이용한 빅데이터 분석)

  • Lee, Seung-Joo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.25 no.6
    • /
    • pp.592-599
    • /
    • 2015
  • In big data environment, we need new approach for big data analysis, because the characteristics of big data, such as volume, variety, and velocity, can analyze entire data for inferring population. But traditional methods of statistics were focused on small data called random sample extracted from population. So, the classical analyses based on statistics are not suitable to big data analysis. To solve this problem, we propose an approach to efficient big data analysis. In this paper, we consider a big data analysis using principal component analysis, which is popular method in multivariate statistics. To verify the performance of our research, we carry out diverse simulation studies.

MapReduce-Based Partitioner Big Data Analysis Scheme for Processing Rate of Log Analysis (로그 분석 처리율 향상을 위한 맵리듀스 기반 분할 빅데이터 분석 기법)

  • Lee, Hyeopgeon;Kim, Young-Woon;Park, Jiyong;Lee, Jin-Woo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.5
    • /
    • pp.593-600
    • /
    • 2018
  • Owing to the advancement of Internet and smart devices, access to various media such as social media became easy; thus, a large amount of big data is being produced. Particularly, the companies that provide various Internet services are analyzing the big data by using the MapReduce-based big data analysis techniques to investigate the customer preferences and patterns and strengthen the security. However, with MapReduce, when the big data is analyzed by defining the number of reducer objects generated in the reduce stage as one, the processing rate of big data analysis decreases. Therefore, in this paper, a MapReduce-based split big data analysis method is proposed to improve the log analysis processing rate. The proposed method separates the reducer partitioning stage and the analysis result combining stage and improves the big data processing rate by decreasing the bottleneck phenomenon by generating the number of reducer objects dynamically.

A Study on the de-identification of Personal Information of Hotel Users (호텔 이용 고객의 개인정보 비식별화 방안에 관한 연구)

  • Kim, Taekyung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.12 no.4
    • /
    • pp.51-58
    • /
    • 2016
  • In the area of hotel and tourism sector, various research are analyzed using big data. Big data is being generated by any digital devices around us all the times. All the digital process and social media exchange produces the big data. In this paper, we analyzed the de-identification method of big data to use the personal information of hotel guests. Through the analysis of these big data, hotel can provide differentiated and diverse services to hotel guests and can improve the service and support the marketing of hotels. If the hotel wants to use the information of the guest, the private data should be de-identified. There are several de-identification methods of personal information such as pseudonymisation, aggregation, data reduction, data suppression and data masking. Using the comparison of these methods, the pseudonymisation is discriminated to the suitable methods for the analysis of information for the hotel guest. Also, among the pseudonymisation methods, the t-closeness was analyzed to the secure and efficient method for the de-identification of personal information in hotel.

Utilization of Social Media Analysis using Big Data (빅 데이터를 이용한 소셜 미디어 분석 기법의 활용)

  • Lee, Byoung-Yup;Lim, Jong-Tae;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.2
    • /
    • pp.211-219
    • /
    • 2013
  • The analysis method using Big Data has evolved based on the Big data Management Technology. There are quite a few researching institutions anticipating new era in data analysis using Big Data and IT vendors has been sided with them launching standardized technologies for Big Data management technologies. Big Data is also affected by improvements of IT gadgets IT environment. Foreran by social media, analyzing method of unstructured data is being developed focusing on diversity of analyzing method, anticipation and optimization. In the past, data analyzing methods were confined to the optimization of structured data through data mining, OLAP, statics analysis. This data analysis was solely used for decision making for Chief Officers. In the new era of data analysis, however, are evolutions in various aspects of technologies; the diversity in analyzing method using new paradigm and the new data analysis experts and so forth. In addition, new patterns of data analysis will be found with the development of high performance computing environment and Big Data management techniques. Accordingly, this paper is dedicated to define the possible analyzing method of social media using Big Data. this paper is proposed practical use analysis for social media analysis through data mining analysis methodology.

Design of Client-Server Model For Effective Processing and Utilization of Bigdata (빅데이터의 효과적인 처리 및 활용을 위한 클라이언트-서버 모델 설계)

  • Park, Dae Seo;Kim, Hwa Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.109-122
    • /
    • 2016
  • Recently, big data analysis has developed into a field of interest to individuals and non-experts as well as companies and professionals. Accordingly, it is utilized for marketing and social problem solving by analyzing the data currently opened or collected directly. In Korea, various companies and individuals are challenging big data analysis, but it is difficult from the initial stage of analysis due to limitation of big data disclosure and collection difficulties. Nowadays, the system improvement for big data activation and big data disclosure services are variously carried out in Korea and abroad, and services for opening public data such as domestic government 3.0 (data.go.kr) are mainly implemented. In addition to the efforts made by the government, services that share data held by corporations or individuals are running, but it is difficult to find useful data because of the lack of shared data. In addition, big data traffic problems can occur because it is necessary to download and examine the entire data in order to grasp the attributes and simple information about the shared data. Therefore, We need for a new system for big data processing and utilization. First, big data pre-analysis technology is needed as a way to solve big data sharing problem. Pre-analysis is a concept proposed in this paper in order to solve the problem of sharing big data, and it means to provide users with the results generated by pre-analyzing the data in advance. Through preliminary analysis, it is possible to improve the usability of big data by providing information that can grasp the properties and characteristics of big data when the data user searches for big data. In addition, by sharing the summary data or sample data generated through the pre-analysis, it is possible to solve the security problem that may occur when the original data is disclosed, thereby enabling the big data sharing between the data provider and the data user. Second, it is necessary to quickly generate appropriate preprocessing results according to the level of disclosure or network status of raw data and to provide the results to users through big data distribution processing using spark. Third, in order to solve the problem of big traffic, the system monitors the traffic of the network in real time. When preprocessing the data requested by the user, preprocessing to a size available in the current network and transmitting it to the user is required so that no big traffic occurs. In this paper, we present various data sizes according to the level of disclosure through pre - analysis. This method is expected to show a low traffic volume when compared with the conventional method of sharing only raw data in a large number of systems. In this paper, we describe how to solve problems that occur when big data is released and used, and to help facilitate sharing and analysis. The client-server model uses SPARK for fast analysis and processing of user requests. Server Agent and a Client Agent, each of which is deployed on the Server and Client side. The Server Agent is a necessary agent for the data provider and performs preliminary analysis of big data to generate Data Descriptor with information of Sample Data, Summary Data, and Raw Data. In addition, it performs fast and efficient big data preprocessing through big data distribution processing and continuously monitors network traffic. The Client Agent is an agent placed on the data user side. It can search the big data through the Data Descriptor which is the result of the pre-analysis and can quickly search the data. The desired data can be requested from the server to download the big data according to the level of disclosure. It separates the Server Agent and the client agent when the data provider publishes the data for data to be used by the user. In particular, we focus on the Big Data Sharing, Distributed Big Data Processing, Big Traffic problem, and construct the detailed module of the client - server model and present the design method of each module. The system designed on the basis of the proposed model, the user who acquires the data analyzes the data in the desired direction or preprocesses the new data. By analyzing the newly processed data through the server agent, the data user changes its role as the data provider. The data provider can also obtain useful statistical information from the Data Descriptor of the data it discloses and become a data user to perform new analysis using the sample data. In this way, raw data is processed and processed big data is utilized by the user, thereby forming a natural shared environment. The role of data provider and data user is not distinguished, and provides an ideal shared service that enables everyone to be a provider and a user. The client-server model solves the problem of sharing big data and provides a free sharing environment to securely big data disclosure and provides an ideal shared service to easily find big data.