• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.029 seconds

Network Anomaly Traffic Detection Using WGAN-CNN-BiLSTM in Big Data Cloud-Edge Collaborative Computing Environment

  • Yue Wang
    • Journal of Information Processing Systems
    • /
    • v.20 no.3
    • /
    • pp.375-390
    • /
    • 2024
  • Edge computing architecture has effectively alleviated the computing pressure on cloud platforms, reduced network bandwidth consumption, and improved the quality of service for user experience; however, it has also introduced new security issues. Existing anomaly detection methods in big data scenarios with cloud-edge computing collaboration face several challenges, such as sample imbalance, difficulty in dealing with complex network traffic attacks, and difficulty in effectively training large-scale data or overly complex deep-learning network models. A lightweight deep-learning model was proposed to address these challenges. First, normalization on the user side was used to preprocess the traffic data. On the edge side, a trained Wasserstein generative adversarial network (WGAN) was used to supplement the data samples, which effectively alleviates the imbalance issue of a few types of samples while occupying a small amount of edge-computing resources. Finally, a trained lightweight deep learning network model is deployed on the edge side, and the preprocessed and expanded local data are used to fine-tune the trained model. This ensures that the data of each edge node are more consistent with the local characteristics, effectively improving the system's detection ability. In the designed lightweight deep learning network model, two sets of convolutional pooling layers of convolutional neural networks (CNN) were used to extract spatial features. The bidirectional long short-term memory network (BiLSTM) was used to collect time sequence features, and the weight of traffic features was adjusted through the attention mechanism, improving the model's ability to identify abnormal traffic features. The proposed model was experimentally demonstrated using the NSL-KDD, UNSW-NB15, and CIC-ISD2018 datasets. The accuracies of the proposed model on the three datasets were as high as 0.974, 0.925, and 0.953, respectively, showing superior accuracy to other comparative models. The proposed lightweight deep learning network model has good application prospects for anomaly traffic detection in cloud-edge collaborative computing architectures.

A Study of Comparison between Cruise Tours in China and U.S.A through Big Data Analytics

  • Shuting, Tao;Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.23 no.6
    • /
    • pp.1-11
    • /
    • 2017
  • The purpose of this study was to compare the cruise tours between China and U.S.A. through the semantic network analysis of big data by collecting online data with SCTM (Smart crawling & Text mining), a data collecting and processing program. The data analysis period was from January $1^{st}$, 2015 to August $15^{th}$, 2017, meanwhile, "cruise tour, china", "cruise tour, usa" were conducted to be as keywords to collet related data and packaged Netdraw along with UCINET 6.0 were utilized for data analysis. Currently, Chinese cruisers concern on the cruising destinations while American cruisers pay more attention on the onboard experience and cruising expenditure. After performing CONCOR (convergence of iterated correlation) analysis, for Chinese cruise tour, there were three clusters created with domestic destinations, international destinations and hospitality tourism. As for American cruise tour, four groups have been segmented with cruise expenditure, onboard experience, cruise brand and destinations. Since the cruise tourism of America was greatly developed, this study also was supposed to provide significant and social network-oriented suggestions for Chinese cruise tourism.

A Semantic Network Analysis of Big Data regarding Food Exhibition at Convention Center (전시컨벤션센터 식품박람회와 관련된 빅데이터의 의미연결망 분석)

  • Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.23 no.3
    • /
    • pp.257-270
    • /
    • 2017
  • The purpose of this study was to visualize the semantic network with big data related to food exhibition at convention center. For this, this study collected data containing 'coex food exhibition/bexco food exhibition' keywords from web pages and news on Google during one year from January 1 to December 31, 2016. Data were collected by using TEXTOM, a data collecting and processing program. From those data, degree centrality, closeness centrality, betweenness centrality and eigenvector centrality were analyzed by utilizing packaged NetDraw along with UCINET 6. The result showed that the web visibility of hospitality and destinations was high. In addition, the web visibility was also high for convention center programs, such as festival, exhibition, k-pop and event; hospitality related words, such as tourists, service, hotel, cruise, cuisine, travel. Convergence of iterated correlations showed 4 clustered named "Coex", "Bexco", "Nations" and "Hospitality". It is expected that this diagnosis on food exhibition at convention center according to changes in domestic environment by using these web information will be a foundation of baseline data useful for establishing convention marketing strategies.

An Exploratory Study on the Semantic Network Analysis of Food Tourism through the Big Data (빅데이터를 활용한 음식관광관련 의미연결망 분석의 탐색적 적용)

  • Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.23 no.4
    • /
    • pp.22-32
    • /
    • 2017
  • The purpose of this study was to explore awareness of food tourism using big data analysis. For this, this study collected data containing 'food tourism' keywords from google web search, google news, and google scholar during one year from January 1 to December 31, 2016. Data were collected by using SCTM (Smart Crawling & Text Mining), a data collecting and processing program. From those data, degree centrality and eigenvector centrality were analyzed by utilizing packaged NetDraw along with UCINET 6. The result showed that the web visibility of 'core service' and 'social marketing' was high. In addition, the web visibility was also high for destination, such as rural, place, ireland and heritage; 'socioeconomic circumstance' related words, such as economy, region, public, policy, and industry. Convergence of iterated correlations showed 4 clustered named 'core service', 'social marketing', 'destinations' and 'social environment'. It is expected that this diagnosis on food tourism according to changes in international business environment by using these web information will be a foundation of baseline data useful for establishing food tourism marketing strategies.

An Analysis of Game Strategy and User Behavior Pattern Using Big Data: Focused on Battlegrounds Game (빅데이터를 활용한 게임 전략 및 유저 행동 패턴 분석: 배틀그라운드 게임을 중심으로)

  • Kang, Ha-Na;Yong, Hye-Ryeon;Hwang, Hyun-Seok
    • Journal of Korea Game Society
    • /
    • v.19 no.4
    • /
    • pp.27-36
    • /
    • 2019
  • Approaches to find hidden values using various and enormous amount of data are on the rise. As big data processing becomes easier, companies directly collects data generated from users and analyzes as necessary to produce insights. User-based data are utilized to predict patterns of gameplay, in-game symptom, eventually enhancing gaming. Accordingly, in this study, we tried to analyze the gaming strategy and user activity patterns utilizing Battlegrounds in-game data to detect the in-game hack.

Analysis of Social Media Utilization based on Big Data-Focusing on the Chinese Government Weibo

  • Li, Xiang;Guo, Xiaoqin;Kim, Soo Kyun;Lee, Hyukku
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.8
    • /
    • pp.2571-2586
    • /
    • 2022
  • The rapid popularity of government social media has generated huge amounts of text data, and the analysis of these data has gradually become the focus of digital government research. This study uses Python language to analyze the big data of the Chinese provincial government Weibo. First, this study uses a web crawler approach to collect and statistically describe over 360,000 data from 31 provincial government microblogs in China, covering the period from January 2018 to April 2022. Second, a word separation engine is constructed and these text data are analyzed using word cloud word frequencies as well as semantic relationships. Finally, the text data were analyzed for sentiment using natural language processing methods, and the text topics were studied using LDA algorithm. The results of this study show that, first, the number and scale of posts on the Chinese government Weibo have grown rapidly. Second, government Weibo has certain social attributes, and the epidemics, people's livelihood, and services have become the focus of government Weibo. Third, the contents of government Weibo account for more than 30% of negative sentiments. The classified topics show that the epidemics and epidemic prevention and control overshadowed the other topics, which inhibits the diversification of government Weibo.

A Study on Big Data Based Method of Patient Care Analysis (빅데이터 기반 환자 간병 방법 분석 연구)

  • Park, Ji-Hun;Hwang, Seung-Yeon;Yun, Bum-Sik;Choe, Su-Gil;Lee, Don-Hee;Kim, Jeong-Joon;Moon, Jin-Yong;Park, Kyung-won
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.3
    • /
    • pp.163-170
    • /
    • 2020
  • With the development of information and communication technologies, the growing volume of data is increasing exponentially, raising interest in big data. As technologies related to big data have developed, big data is being collected, stored, processed, analyzed, and utilized in many fields. Big data analytics in the health care sector, in particular, is receiving much attention because they can also have a huge social and economic impact. It is predicted that it will be able to use Big Data technology to analyze patients' diagnostic data and reduce the amount of money that is spent on simple hospital care. Therefore, in this thesis, patient data is analyzed to present to patients who are unable to go to the hospital or caregivers who do not have medical expertise with close care guidelines. First, the collected patient data is stored in HDFS and the data is processed and classified using R, a big data processing and analysis tool, in the Hadoop environment. Visualize to a web server using R Shiny, which is used to implement various functions of R on the web.

Design and Implementation of HDFS Data Encryption Scheme Using ARIA Algorithms on Hadoop (하둡 상에서 ARIA 알고리즘을 이용한 HDFS 데이터 암호화 기법의 설계 및 구현)

  • Song, Youngho;Shin, YoungSung;Chang, Jae-Woo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.2
    • /
    • pp.33-40
    • /
    • 2016
  • Due to the growth of social network systems (SNS), big data are realized and Hadoop was developed as a distributed platform for analyzing big data. Enterprises analyze data containing users' sensitive information by using Hadoop and utilize them for marketing. Therefore, researches on data encryption have been done to protect the leakage of sensitive data stored in Hadoop. However, the existing researches support only the AES encryption algorithm, the international standard of data encryption. Meanwhile, Korean government choose ARIA algorithm as a standard data encryption one. In this paper, we propose a HDFS data encryption scheme using ARIA algorithms on Hadoop. First, the proposed scheme provide a HDFS block splitting component which performs ARIA encryption and decryption under the distributed computing environment of Hadoop. Second, the proposed scheme also provide a variable-length data processing component which performs encryption and decryption by adding dummy data, in case when the last block of data does not contains 128 bit data. Finally, we show from performance analysis that our proposed scheme can be effectively used for both text string processing applications and science data analysis applications.

Implementation of Real-time Data Stream Processing for Predictive Maintenance of Offshore Plants (해양플랜트의 예지보전을 위한 실시간 데이터 스트림 처리 구현)

  • Kim, Sung-Soo;Won, Jongho
    • Journal of KIISE
    • /
    • v.42 no.7
    • /
    • pp.840-845
    • /
    • 2015
  • In recent years, Big Data has been a topic of great interest for the production and operation work of offshore plants as well as for enterprise resource planning. The ability to predict future equipment performance based on historical results can be useful to shuttling assets to more productive areas. Specifically, a centrifugal compressor is one of the major piece of equipment in offshore plants. This machinery is very dangerous because it can explode due to failure, so it is necessary to monitor its performance in real time. In this paper, we present stream data processing architecture that can be used to compute the performance of the centrifugal compressor. Our system consists of two major components: a virtual tag stream generator and a real-time data stream manager. In order to provide scalability for our system, we exploit a parallel programming approach to use multi-core CPUs to process the massive amount of stream data. In addition, we provide experimental evidence that demonstrates improvements in the stream data processing for the centrifugal compressor.

Analysis of the Influence Factors of Data Loading Performance Using Apache Sqoop (아파치 스쿱을 사용한 하둡의 데이터 적재 성능 영향 요인 분석)

  • Chen, Liu;Ko, Junghyun;Yeo, Jeongmo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.2
    • /
    • pp.77-82
    • /
    • 2015
  • Big Data technology has been attracted much attention in aspect of fast data processing. Research of practicing Big Data technology is also ongoing to process large-scale structured data much faster in Relatioinal Database(RDB). Although there are lots of studies about measuring analyzing performance, studies about structured data loading performance, prior step of analyzing, is very rare. Thus, in this study, structured data in RDB is tested the performance that loads distributed processing platform Hadoop using Apache sqoop. Also in order to analyze the influence factors of data loading, it is tested repeatedly with different options of data loading and compared with data loading performance among RDB based servers. Although data loading performance of Apache Sqoop in test environment was low, but in large-scale Hadoop cluster environment we can expect much better performance because of getting more hardware resources. It is expected to be based on study improving data loading performance and whole steps of performance analyzing structured data in Hadoop Platform.