• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.03 seconds

Construction of an Internet of Things Industry Chain Classification Model Based on IRFA and Text Analysis

  • Zhimin Wang
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.215-225
    • /
    • 2024
  • With the rapid development of Internet of Things (IoT) and big data technology, a large amount of data will be generated during the operation of related industries. How to classify the generated data accurately has become the core of research on data mining and processing in IoT industry chain. This study constructs a classification model of IoT industry chain based on improved random forest algorithm and text analysis, aiming to achieve efficient and accurate classification of IoT industry chain big data by improving traditional algorithms. The accuracy, precision, recall, and AUC value size of the traditional Random Forest algorithm and the algorithm used in the paper are compared on different datasets. The experimental results show that the algorithm model used in this paper has better performance on different datasets, and the accuracy and recall performance on four datasets are better than the traditional algorithm, and the accuracy performance on two datasets, P-I Diabetes and Loan Default, is better than the random forest model, and its final data classification results are better. Through the construction of this model, we can accurately classify the massive data generated in the IoT industry chain, thus providing more research value for the data mining and processing technology of the IoT industry chain.

Design of Extended Real-time Data Pipeline System Architecture (확장형 실시간 데이터 파이프라인 시스템 아키텍처 설계)

  • Shin, Hoseung;Kang, Sungwon;Lee, Jihyun
    • Journal of KIISE
    • /
    • v.42 no.8
    • /
    • pp.1010-1021
    • /
    • 2015
  • Big data systems are widely used to collect large-scale log data, so it is very important for these systems to operate with a high level of performance. However, the current Hadoop-based big data system architecture has a problem in that its performance is low as a result of redundant processing. This paper solves this problem by improving the design of the Hadoop system architecture. The proposed architecture uses the batch-based data collection of the existing architecture in combination with a single processing method. A high level of performance can be achieved by analyzing the collected data directly in memory to avoid redundant processing. The proposed architecture guarantees system expandability, which is an advantage of using the Hadoop architecture. This paper confirms that the proposed architecture is approximately 30% to 35% faster in analyzing and processing data than existing architectures and that it is also extendable.

An Insight Study on Keyword of IoT Utilizing Big Data Analysis (빅데이터 분석을 활용한 사물인터넷 키워드에 관한 조망)

  • Nam, Soo-Tai;Kim, Do-Goan;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.146-147
    • /
    • 2017
  • Big data analysis is a technique for effectively analyzing unstructured data such as the Internet, social network services, web documents generated in the mobile environment, e-mail, and social data, as well as well formed structured data in a database. The most big data analysis techniques are data mining, machine learning, natural language processing, and pattern recognition, which were used in existing statistics and computer science. Global research institutes have identified analysis of big data as the most noteworthy new technology since 2011. Therefore, companies in most industries are making efforts to create new value through the application of big data. In this study, we analyzed using the Social Matrics which a big data analysis tool of Daum communications. We analyzed public perceptions of "Internet of things" keyword, one month as of october 8, 2017. The results of the big data analysis are as follows. First, the 1st related search keyword of the keyword of the "Internet of things" has been found to be technology (995). This study suggests theoretical implications based on the results.

  • PDF

A Comparison of Starbucks between South Korea and U.S.A. through Big Data Analysis (빅데이터 분석을 통한 한국과 미국의 스타벅스 비교 분석)

  • Jo, Ara;Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.23 no.8
    • /
    • pp.195-205
    • /
    • 2017
  • The purpose of this study was to compare the Starbucks in South Korea with Starbucks in U.S.A through the semantic network analysis of big data by collecting online data with SCTM(Smart Crawling & Text Mining) program which was developed by big data research institute at Kyungsung University, a data collecting and processing program. The data collection period was from January 1st 2014 to December 7th 2017, and packaged Netdraw along with UCINET 6.0 were utilized for data analysis and visualization. After performing CONCOR(convergence of iterated correlation) analysis and centrality analysis, this study illustrated the current characteristics of Starbucks for Korea and U.S.A reflected by the social network and the differences between Korea and U.S.A. Since the Starbucks was greatly developed, especially in Korea. this study also was supposed to provide significant and social-network oriented suggestions for Starbucks USA, Starbucks Korea and also the whole coffee industry. Also this study revealed that big data analytics can generate new insights into variables that have been extensively studied in existing hospitality literature. In addition, implications for theory and practice as well as directions for future research are discussed.

iSSD-Based Collaborative Processing for Big Data Mining (효율적인 빅 데이터 마이닝을 위한 iSSD 기반 협업 처리 방안)

  • Jo, Yong-Yoen;Kim, Sang-Wook;Bae, Duck-Ho
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.460-470
    • /
    • 2017
  • We address how to handle big data mining effectively using the intelligent SSD (iSSD). ISSD is a storage device equipped with computing power inside SSD for reducing the transferring cost and for processing data nearby SSD where the data is stored. We first introduce the structural characteristics of iSSD for efficient data processing. Then, we present how to process data mining algorithms by using iSSD. Finally, we discuss how to improve the performance of data mining algorithms significantly by exploiting heterogeneous computing environment where host CPUs and GPU coexist for maximizing the performance.

De-identification Techniques for Big Data and Issues (빅데이타 비식별화 기술과 이슈)

  • Woo, SungHee
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.750-753
    • /
    • 2017
  • Recently, the processing and utilization of big data, which is generated by the spread of smartphone, SNS, and the internet of things, is emerging as a new growth engine of ICT field. However, in order to utilize such big data, De-identification of personal information should be done. De-identification removes identifying information from a data set so that individual data cannot be linked with specific individuals. De-identification can reduce the privacy risk associated with collecting, processing, archiving, distributing or publishing information, thus it attempts to balance the contradictory goals of using and sharing personal information while protecting privacy. De-identified information has also been re-identified and has been controversial for the protection of personal information, but the number of instances where personal information such as big data is de-identified and processed is increasing. In addition, many de-identification guidelines have been introduced and a method for de-identification of personal information has been proposed. Therefore, in this study, we describe the big data de-identification process and follow-up management, and then compare and analyze de-identification methods. Finally we provide personal information protection issues and solutions.

  • PDF

Yolo based Light Source Object Detection for Traffic Image Big Data Processing (교통 영상 빅데이터 처리를 위한 Yolo 기반 광원 객체 탐지)

  • Kang, Ji-Soo;Shim, Se-Eun;Jo, Sun-Moon;Chung, Kyungyong
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.8
    • /
    • pp.40-46
    • /
    • 2020
  • As interest in traffic safety increases, research on autonomous driving, which reduces the incidence of traffic accidents, is increased. Object recognition and detection are essential for autonomous driving. Therefore, research on object recognition and detection through traffic image big data is being actively conducted to determine the road conditions. However, because most existing studies use only daytime data, it is difficult to recognize objects on night roads. Particularly, in the case of a light source object, it is difficult to use the features of the daytime as it is due to light smudging and whitening. Therefore, this study proposes Yolo based light source object detection for traffic image big data processing. The proposed method performs image processing by applying color model transitions to night traffic image. The object group is determined by extracting the characteristics of the object through image processing. It is possible to increase the recognition rate of light source object detection on a night road through a deep learning model using candidate group data.

Design of Incremental FCM-based Recursive RBF Neural Networks Pattern Classifier for Big Data Processing (빅 데이터 처리를 위한 증분형 FCM 기반 순환 RBF Neural Networks 패턴 분류기 설계)

  • Lee, Seung-Cheol;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.6
    • /
    • pp.1070-1079
    • /
    • 2016
  • In this paper, the design of recursive radial basis function neural networks based on incremental fuzzy c-means is introduced for processing the big data. Radial basis function neural networks consist of condition, conclusion and inference phase. Gaussian function is generally used as the activation function of the condition phase, but in this study, incremental fuzzy clustering is considered for the activation function of radial basis function neural networks, which could effectively do big data processing. In the conclusion phase, the connection weights of networks are given as the linear function. And then the connection weights are calculated by recursive least square estimation. In the inference phase, a final output is obtained by fuzzy inference method. Machine Learning datasets are employed to demonstrate the superiority of the proposed classifier, and their results are described from the viewpoint of the algorithm complexity and performance index.

An Efficient data management Scheme for Hierarchical Multi-processing using Double Hash Chain (이중 해쉬체인을 이용한 계층적 다중 처리를 위한 효율적인 데이터 관리 기법)

  • Jeong, Yoon-Su;Kim, Yong-Tae;Park, Gil-Cheol
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.271-278
    • /
    • 2015
  • Recently, bit data is difficult to easily collect the desired data because big data is collected via the Internet. Big data is higher than the rate at which the data type and the period of time for which data is collected depending on the size of data increases. In particular, since the data of all different by the intended use and the type of data processing accuracy and computational cost is one of the important items. In this paper, we propose data processing method using a dual-chain in a manner to minimize the computational cost of the data when data is correctly extracted at the same time a multi-layered process through the desired number of the user and different kinds of data on the Internet. The proposed scheme is classified into a hierarchical data in accordance with the intended use and method to extract various kinds of data. At this time, multi-processing and tie the data hash with the double chain to enhance the accuracy of the reading. In addition, the proposed method is to organize the data in the hash chain for easy access to the hierarchically classified data and reduced the cost of processing the data. Experimental results, the proposed method is the accuracy of the data on average 7.8% higher than conventional techniques, processing costs were reduced by 4.9% of the data.

Proposed a consulting chatbot service for restaurant start-ups using social media big data

  • Jong-Hyun Park;Yang-Ja Bae;Jun-Ho Park;Ki-Hwan Ryu
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.3
    • /
    • pp.1-7
    • /
    • 2023
  • Since the first outbreak of COVID-19 in 2019, it has caused a huge blow to the restaurant industry. However, as social distancing was lifted as of April 2022, the restaurant industry gradually recovered, and as a result, interest in restaurant start-ups increased. Therefore, in this paper, big data analysis was conducted by selecting "restaurant start-up" as a key keyword through social media big data analysis using Textom and then conducting word frequency and CONCOR analysis. The collection period of keywords was selected from May 1, 2022 to May 23, 2023, after the lifting of social distancing due to COVID-19, and based on the analysis, the development of a restaurant start-up consulting chatbot service is proposed.