• Title/Summary/Keyword: Big Data Processing Technology

Search Result 385, Processing Time 0.025 seconds

Construction of an Internet of Things Industry Chain Classification Model Based on IRFA and Text Analysis

  • Zhimin Wang
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.215-225
    • /
    • 2024
  • With the rapid development of Internet of Things (IoT) and big data technology, a large amount of data will be generated during the operation of related industries. How to classify the generated data accurately has become the core of research on data mining and processing in IoT industry chain. This study constructs a classification model of IoT industry chain based on improved random forest algorithm and text analysis, aiming to achieve efficient and accurate classification of IoT industry chain big data by improving traditional algorithms. The accuracy, precision, recall, and AUC value size of the traditional Random Forest algorithm and the algorithm used in the paper are compared on different datasets. The experimental results show that the algorithm model used in this paper has better performance on different datasets, and the accuracy and recall performance on four datasets are better than the traditional algorithm, and the accuracy performance on two datasets, P-I Diabetes and Loan Default, is better than the random forest model, and its final data classification results are better. Through the construction of this model, we can accurately classify the massive data generated in the IoT industry chain, thus providing more research value for the data mining and processing technology of the IoT industry chain.

Evaluation of Predictive Models for Early Identification of Dropout Students

  • Lee, JongHyuk;Kim, Mihye;Kim, Daehak;Gil, Joon-Min
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.630-644
    • /
    • 2021
  • Educational data analysis is attracting increasing attention with the rise of the big data industry. The amounts and types of learning data available are increasing steadily, and the information technology required to analyze these data continues to develop. The early identification of potential dropout students is very important; education is important in terms of social movement and social achievement. Here, we analyze educational data and generate predictive models for student dropout using logistic regression, a decision tree, a naïve Bayes method, and a multilayer perceptron. The multilayer perceptron model using independent variables selected via the variance analysis showed better performance than the other models. In addition, we experimentally found that not only grades but also extracurricular activities were important in terms of preventing student dropout.

Study on Big Data Linkage Method for Managing Port Infrastructure Disasters and Aging (항만 인프라 재해 및 노후화 관리를 위한 빅데이터 연계 방안 연구)

  • Choi, Woo-geun;Park, Sun-ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.134-137
    • /
    • 2021
  • This study aims to develop a digital twin and big data-based port infrastructure control system that reflects smart maintenance technology. It is a technology that can evaluate aging and disaster risk by converting heterogeneous data such as sensing data and image data acquired from port infrastructure into big data, visualized in a digital twin-based control system, and comprehensively analyzed. The meaning of big data to express the physical world and processes by combining data, which are the core components of the virtual world, and the matters to be reflected in each stage of securing, processing, storing, analyzing and utilizing necessary big data, and we would like to define methods for linking with IT resources.

  • PDF

Compression-Friendly Low Power Test Application Based on Scan Slices Reusing

  • Wang, Weizheng;Wang, JinCheng;Cai, Shuo;Su, Wei;Xiang, Lingyun
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.4
    • /
    • pp.463-469
    • /
    • 2016
  • This paper presents a compression-friendly low power test scheme in EDT environment. The proposed approach exploits scan slices reusing to reduce the switching activity during shifting for test scheme based on linear decompressor. To avoid the impact on encoding efficiency from resulting control data, a counter is utilized to generate control signals. Experimental results obtained for some larger ISCAS'89 and ITC'99 benchmark circuits illustrate that the proposed test application scheme can improve significantly the encoding efficiency of linear decompressor.

Scalable Big Data Pipeline for Video Stream Analytics Over Commodity Hardware

  • Ayub, Umer;Ahsan, Syed M.;Qureshi, Shavez M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.4
    • /
    • pp.1146-1165
    • /
    • 2022
  • A huge amount of data in the form of videos and images is being produced owning to advancements in sensor technology. Use of low performance commodity hardware coupled with resource heavy image processing and analyzing approaches to infer and extract actionable insights from this data poses a bottleneck for timely decision making. Current approach of GPU assisted and cloud-based architecture video analysis techniques give significant performance gain, but its usage is constrained by financial considerations and extremely complex architecture level details. In this paper we propose a data pipeline system that uses open-source tools such as Apache Spark, Kafka and OpenCV running over commodity hardware for video stream processing and image processing in a distributed environment. Experimental results show that our proposed approach eliminates the need of GPU based hardware and cloud computing infrastructure to achieve efficient video steam processing for face detection with increased throughput, scalability and better performance.

Feasibility Verification of Big Data Processing employing SmartX-mini Center with NUC Cluster (SmartX-mini Center를 통한 NUC 클러스터의 Big Data 처리 가능성 검증)

  • Song, Jiwon;Lee, Jungi;Kim, Seungryong;Kim, JongWon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2015.04a
    • /
    • pp.73-74
    • /
    • 2015
  • IoT의 발달로 인해 새롭게 빅데이터와 그의 실시간 처리의 중요성이 증대되고 있다. 본 논문에서는 사물인터넷의 관제 및 데이터 처리 기능을 갖춘 SmartX-mini 센터를 통하여 NUC 클러스터의 빅데이터 처리 가능성을 제시하고, 이를 검증하기 위하여 SmartX-mini 테스트베드를 활용한다. SmartX-mini Center의 Spark 프레임워크를 이용한 실험을 통해 IoT 환경에서의 NUC 클러스터의 빅데이터 처리 가능에 대한 가능성을 검증하였다.

Big data distributed processing system using RHadoop (RHadoop을 이용한 빅데이터 분산처리 시스템)

  • Shin, Ji Eun;Jung, Byung Ho;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1155-1166
    • /
    • 2015
  • It is almost impossible to store or analyze big data increasing exponentially with traditional technologies, so Hadoop is a new technology to make that possible. In recent R is using as an engine for big data analysis based on distributed processing with Hadoop technology. With RHadoop that integrates R and Hadoop environment, we implemented parallel multiple regression analysis with various data sizes of actual data and simulated data. Experimental results showed our RHadoop system was faster as the number of data nodes increases. We also compared the performance of our RHadoop with lm function and biglm packages available on bigmemory. The results showed that our RHadoop was faster than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Development of Information Technology Infrastructures through Construction of Big Data Platform for Road Driving Environment Analysis (도로 주행환경 분석을 위한 빅데이터 플랫폼 구축 정보기술 인프라 개발)

  • Jung, In-taek;Chong, Kyu-soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.669-678
    • /
    • 2018
  • This study developed information technology infrastructures for building a driving environment analysis platform using various big data, such as vehicle sensing data, public data, etc. First, a small platform server with a parallel structure for big data distribution processing was developed with H/W technology. Next, programs for big data collection/storage, processing/analysis, and information visualization were developed with S/W technology. The collection S/W was developed as a collection interface using Kafka, Flume, and Sqoop. The storage S/W was developed to be divided into a Hadoop distributed file system and Cassandra DB according to the utilization of data. Processing S/W was developed for spatial unit matching and time interval interpolation/aggregation of the collected data by applying the grid index method. An analysis S/W was developed as an analytical tool based on the Zeppelin notebook for the application and evaluation of a development algorithm. Finally, Information Visualization S/W was developed as a Web GIS engine program for providing various driving environment information and visualization. As a result of the performance evaluation, the number of executors, the optimal memory capacity, and number of cores for the development server were derived, and the computation performance was superior to that of the other cloud computing.

Semantic Computing for Big Data: Approaches, Tools, and Emerging Directions (2011-2014)

  • Jeong, Seung Ryul;Ghani, Imran
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.6
    • /
    • pp.2022-2042
    • /
    • 2014
  • The term "big data" has recently gained widespread attention in the field of information technology (IT). One of the key challenges in making use of big data lies in finding ways to uncover relevant and valuable information. The high volume, velocity, and variety of big data hinder the use of solutions that are available for smaller datasets, which involve the manual interpretation of data. Semantic computing technologies have been proposed as a means of dealing with these issues, and with the advent of linked data in recent years, have become central to mainstream semantic computing. This paper attempts to uncover the state-of-the-art semantics-based approaches and tools that can be leveraged to enrich and enhance today's big data. It presents research on the latest literature, including 61 studies from 2011 to 2014. In addition, it highlights the key challenges that semantic approaches need to address in the near future. For instance, this paper presents cutting-edge approaches to ontology engineering, ontology evolution, searching and filtering relevant information, extracting and reasoning, distributed (web-scale) reasoning, and representing big data. It also makes recommendations that may encourage researchers to more deeply explore the applications of semantic technology, which could improve the processing of big data. The findings of this study contribute to the existing body of basic knowledge on semantics and computational issues related to big data, and may trigger further research on the field. Our analysis shows that there is a need to put more effort into proposing new approaches, and that tools must be created that support researchers and practitioners in realizing the true power of semantic computing and solving the crucial issues of big data.

Optimization and Performance Analysis of Cloud Computing Platform for Distributed Processing of Big Data (대용량 데이터의 분산 처리를 위한 클라우드 컴퓨팅 환경 최적화 및 성능평가)

  • Hong, Seung-Tae;Shin, Young-Sung;Chang, Jae-Woo
    • Spatial Information Research
    • /
    • v.19 no.4
    • /
    • pp.55-71
    • /
    • 2011
  • Recently, interest in cloud computing which provides IT resources as service form in IT field is increasing. As a result, much research has been done on the distributed data processing that store and manage a large amount of data in many servers. Meanwhile, in order to effectively utilize the spatial data which is rapidly increasing day by day with the growth of GIS technology, distributed processing of spatial data using cloud computing is essential. Therefore, in this paper, we review the representative distributed data processing techniques and we analyze the optimization requirements for performance improvement of the distributed processing techniques for a large amount of data. In addition, we uses the Hadoop and we evaluate the performance of the distributed data processing techniques for their optimization requirements.