• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.03 seconds

A Study on the NAS Storage-based Data Distributed Processing System Algorithm (NAS 스토리지 기반의 데이터 분산처리 시스템 알고리즘에 관한 연구)

  • Jang, Jae-Myung;Kang, Hee-beom;Jeong, Nahk-ju;Jung, Hoe-kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.643-645
    • /
    • 2015
  • Real life has been actively utilizing storage from anywhere automobiles and Aviation field etc to the development of storage. Recent Big Data is stored as a number of data storage and data distribution processing emerged in research has been actively conducted to process the data. But many bottlenecks or processing speed slows down when you request the data at the same time a problem arises. In this paper consider should be used in the field of big data storing and processing large amounts of data, the process data more efficiently when the number of data request and data suggest that the weight-effective, manageable data processing system algorithm.

  • PDF

Development Problems and Countermeasures of Rural E-Commerce Logistics in the Context of Big Data and Internet of Things

  • Xianfeng Zhu
    • Journal of Information Processing Systems
    • /
    • v.19 no.2
    • /
    • pp.267-274
    • /
    • 2023
  • As the Internet has expanded and the continuous expansion of online shopping in China, many rural areas also have sales outlets. Due to the impact of economic conditions, rural locations have inadequate e-commerce logistical infrastructure, the number of outlets is small, and each other is in a decentralized state. For various reasons, the advancement of rural e-commerce logistics lags far behind that in urban areas. As the Internet of Things with big data grow in popularity, we can create and enhance the assurance system for the booming ecommerce in rural areas by building the support system of rural online shopping platform, and strengthening the joint distribution of logistics terminals based on data mining, so as to encourage the quick and healthy growth of rural online shopping.

Effectiveness of Normalization Pre-Processing of Big Data to the Machine Learning Performance (빅데이터의 정규화 전처리과정이 기계학습의 성능에 미치는 영향)

  • Jo, Jun-Mo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.3
    • /
    • pp.547-552
    • /
    • 2019
  • Recently, the massive growth in the scale of data has been observed as a major issue in the Big Data. Furthermore, the Big Data should be preprocessed for normalization to get a high performance of the Machine learning since the Big Data is also an input of Machine Learning. The performance varies by many factors such as the scope of the columns in a Big Data or the methods of normalization preprocessing. In this paper, the various types of normalization preprocessing methods and the scopes of the Big Data columns will be applied to the SVM(: Support Vector Machine) as a Machine Learning method to get the efficient environment for the normalization preprocessing. The Machine Learning experiment has been programmed in Python and the Jupyter Notebook.

Design of Spark SQL Based Framework for Advanced Analytics (Spark SQL 기반 고도 분석 지원 프레임워크 설계)

  • Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.10
    • /
    • pp.477-482
    • /
    • 2016
  • As being the advanced analytics indispensable on big data for agile decision-making and tactical planning in enterprises, distributed processing platforms, such as Hadoop and Spark which distribute and handle the large volume of data on multiple nodes, receive great attention in the field. In Spark platform stack, Spark SQL unveiled recently to make Spark able to support distributed processing framework based on SQL. However, Spark SQL cannot effectively handle advanced analytics that involves machine learning and graph processing in terms of iterative tasks and task allocations. Motivated by these issues, this paper proposes the design of SQL-based big data optimal processing engine and processing framework to support advanced analytics in Spark environments. Big data optimal processing engines copes with complex SQL queries that involves multiple parameters and join, aggregation and sorting operations in distributed/parallel manner and the proposing framework optimizes machine learning process in terms of relational operations.

Scalable Prediction Models for Airbnb Listing in Spark Big Data Cluster using GPU-accelerated RAPIDS

  • Muralidharan, Samyuktha;Yadav, Savita;Huh, Jungwoo;Lee, Sanghoon;Woo, Jongwook
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.2
    • /
    • pp.96-102
    • /
    • 2022
  • We aim to build predictive models for Airbnb's prices using a GPU-accelerated RAPIDS in a big data cluster. The Airbnb Listings datasets are used for the predictive analysis. Several machine-learning algorithms have been adopted to build models that predict the price of Airbnb listings. We compare the results of traditional and big data approaches to machine learning for price prediction and discuss the performance of the models. We built big data models using Databricks Spark Cluster, a distributed parallel computing system. Furthermore, we implemented models using multiple GPUs using RAPIDS in the spark cluster. The model was developed using the XGBoost algorithm, whereas other models were developed using traditional central processing unit (CPU)-based algorithms. This study compared all models in terms of accuracy metrics and computing time. We observed that the XGBoost model with RAPIDS using GPUs had the highest accuracy and computing time.

Big data distributed processing system using RHadoop (RHadoop을 이용한 빅데이터 분산처리 시스템)

  • Shin, Ji Eun;Jung, Byung Ho;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1155-1166
    • /
    • 2015
  • It is almost impossible to store or analyze big data increasing exponentially with traditional technologies, so Hadoop is a new technology to make that possible. In recent R is using as an engine for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with various data sizes of actual data and simulated data. Experimental results showed our RHadoop system was faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and biglm packages available on bigmemory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

A Study on Traffic Big Data Mapping Using the Grid Index Method (그리드 인덱스 기법을 이용한 교통 빅데이터 맵핑 방안 연구)

  • Chong, Kyu Soo;Sung, Hong Ki
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.6
    • /
    • pp.107-117
    • /
    • 2020
  • With the recent development of autonomous vehicles, various sensors installed in vehicles have become common, and big data generated from those sensors is increasingly being used in the transportation field. In this study, we proposed a grid index method to efficiently process real-time vehicle sensing big data and public data such as road weather. The applicability and effect of the proposed grid space division method and grid ID generation method were analyzed. We created virtual data based on DTG data and mapped to the road link based on coordinates. As a result of analyzing the data processing speed in grid index method, the data processing performance improved by more than 2,400 times compared to the existing link unit processing method. In addition, in order to analyze the efficiency of the proposed technology, the virtually generated data was mapped and visualized.

New Medical Image Fusion Approach with Coding Based on SCD in Wireless Sensor Network

  • Zhang, De-gan;Wang, Xiang;Song, Xiao-dong
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.6
    • /
    • pp.2384-2392
    • /
    • 2015
  • The technical development and practical applications of big-data for health is one hot topic under the banner of big-data. Big-data medical image fusion is one of key problems. A new fusion approach with coding based on Spherical Coordinate Domain (SCD) in Wireless Sensor Network (WSN) for big-data medical image is proposed in this paper. In this approach, the three high-frequency coefficients in wavelet domain of medical image are pre-processed. This pre-processing strategy can reduce the redundant ratio of big-data medical image. Firstly, the high-frequency coefficients are transformed to the spherical coordinate domain to reduce the correlation in the same scale. Then, a multi-scale model product (MSMP) is used to control the shrinkage function so as to make the small wavelet coefficients and some noise removed. The high-frequency parts in spherical coordinate domain are coded by improved SPIHT algorithm. Finally, based on the multi-scale edge of medical image, it can be fused and reconstructed. Experimental results indicate the novel approach is effective and very useful for transmission of big-data medical image(especially, in the wireless environment).

Efficient K-Anonymization Implementation with Apache Spark

  • Kim, Tae-Su;Kim, Jong Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.11
    • /
    • pp.17-24
    • /
    • 2018
  • Today, we are living in the era of data and information. With the advent of Internet of Things (IoT), the popularity of social networking sites, and the development of mobile devices, a large amount of data is being produced in diverse areas. The collection of such data generated in various area is called big data. As the importance of big data grows, there has been a growing need to share big data containing information regarding an individual entity. As big data contains sensitive information about individuals, directly releasing it for public use may violate existing privacy requirements. Thus, privacy-preserving data publishing (PPDP) has been actively studied to share big data containing personal information for public use, while preserving the privacy of the individual. K-anonymity, which is the most popular method in the area of PPDP, transforms each record in a table such that at least k records have the same values for the given quasi-identifier attributes, and thus each record is indistinguishable from other records in the same class. As the size of big data continuously getting larger, there is a growing demand for the method which can efficiently anonymize vast amount of dta. Thus, in this paper, we develop an efficient k-anonymity method by using Spark distributed framework. Experimental results show that, through the developed method, significant gains in processing time can be achieved.

Study on Enhancement of Data Processing Algorithm in SaaS Cloud Infrastructure to Monitor Wind Turbine Condition (풍력발전기 상태 감시를 위한 SaaS 클라우드 인프라 내 데이터 처리 알고리즘 개선 연구)

  • Lee, Gwang-Se;Choi, Jungchul;Kang, Minsang;Park, Sail;Lee, JinJae
    • New & Renewable Energy
    • /
    • v.16 no.1
    • /
    • pp.25-30
    • /
    • 2020
  • In this study, an SW for the analysis of the wind-turbine vibration characteristics was developed as an application of SaaS cloud infrastructure. A measurement system for power-performance, mechanical load, and gearbox vibration as type-test class was installed at a target MW-class wind turbine, and structural meta and raw data were then acquired into the cloud. Data processing algorithms were developed to provide cloud data to the SW. To operate the SW continuously, raw data was downloaded consistently based on the algorithms. During the SW test, an intermittent long time-delay occurred due to the communication load associated with frequent access to the cloud. To solve this, a compression service for the target raw data was developed in the cloud and more stable data processing was confirmed. Using the compression service, stable big data processing of wind turbines, including gearbox vibration analysis, is expected.