• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.03 seconds

An Intelligent Machine Learning Inspired Optimization Algorithm to Enhance Secured Data Transmission in IoT Cloud Ecosystem

  • Ankam, Sreejyothsna;Reddy, N.Sudhakar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.6
    • /
    • pp.83-90
    • /
    • 2022
  • Traditional Cloud Computing would be unable to safely host IoT data due to its high latency as the number of IoT sensors and physical devices accommodated on the Internet grows by the day. Because of the difficulty of processing all IoT large data on Cloud facilities, there hasn't been enough research done on automating the security of all components in the IoT-Cloud ecosystem that deal with big data and real-time jobs. It's difficult, for example, to build an automatic, secure data transfer from the IoT layer to the cloud layer, which incorporates a large number of scattered devices. Addressing this issue this article presents an intelligent algorithm that deals with enhancing security aspects in IoT cloud ecosystem using butterfly optimization algorithm.

Distributed Processing Environment for Outlier Removal to Analyze Big Data (대용량 데이터 분석을 위한 이상치 제거용 분산처리 환경)

  • Hong, Yejin;Na, Eunhee;Jung, Yonghwan;Kim, Yangwoo
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.07a
    • /
    • pp.73-74
    • /
    • 2016
  • IoT 데이터는 비정형 데이터로 가공되고 분석하였을 때 비로소 가치를 갖기에 전 세계적으로 빅데이터 기술에 관심이 집중되고 있다. IoT 데이터 중 많은 부분을 차치하는 센서 데이터는 수집이 용이하고 활용범위가 넓기 때문에 여러 분야에서 사용되고 있다. 하지만 센서가 정상적으로 작동하지 못한 경우에는 실제와는 다른 값인 이상치를 포함하여 왜곡된 결과가 도출되어 활용할 수 없는 경우가 생긴다. 따라서 본 논문에서는 정확한 결과를 도출하기 위하여 수집된 원자료의 데이터를 분석하기 전에 이상치 탐지 및 제거를 하고자 한다. 또한 점점 늘어나고 있는 대용량 데이터를 신속하게 처리하기 위하여 메모리 접근방식인 스파크를 사용한 분산처리환경에서 이상치 탐지 및 제거하는 것을 제안한다. 맵리듀스 기반의 이상치 탐지 및 제거는 총 4단계로 나누어 구현하였으며 제안한 기법의 성능 평가를 위해 총 3가지 환경에서 비교하여 실험하였다. 실험을 통해 데이터의 용량이 커질수록 분산처리환경에서 스파크를 사용하여 처리하는 방식이 가장 빠를 것 이라는 결과를 얻었다.

  • PDF

Design and Implementation of Potential Advertisement Keyword Extraction System Using SNS (SNS를 이용한 잠재적 광고 키워드 추출 시스템 설계 및 구현)

  • Seo, Hyun-Gon;Park, Hee-Wan
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.17-24
    • /
    • 2018
  • One of the major issues in big data processing is extracting keywords from internet and using them to process the necessary information. Most of the proposed keyword extraction algorithms extract keywords using search function of a large portal site. In addition, these methods extract keywords based on already posted or created documents or fixed contents. In this paper, we propose a KAES(Keyword Advertisement Extraction System) system that helps the potential shopping keyword marketing to extract issue keywords and related keywords based on dynamic instant messages such as various issues, interests, comments posted on SNS. The KAES system makes a list of specific accounts to extract keywords and related keywords that have most frequency in the SNS.

Unstructured Data Analysis using Equipment Check Ledger: A Case Study in Telecom Domain (장비점검 일지의 비정형 데이터분석을 통한 고장 대응 효율화 사례 연구)

  • Ju, Yeonjin;Kim, Yoosin;Jeong, Seung Ryul
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.127-135
    • /
    • 2020
  • As the importance of the use and analysis of big data is emerging, there is a growing interest in natural language processing techniques for unstructured data such as news articles and comments. Particularly, as the collection of big data becomes possible, data mining techniques capable of pre-processing and analyzing data are emerging. In this case study with a telecom company, we propose a methodology how to formalize unstructured data using text mining. The domain is determined as equipment failure and the data is about 2.2 million equipment check ledger data. Data on equipment failures by 800,000 per year is accumulated in the equipment check ledger. The equipment check ledger coexist with both formal and unstructured data. Although formal data can be easily used for analysis, unstructured data is difficult to be used immediately for analysis. However, in unstructured data, there is a high possibility that important information. Because it can be contained that is not written in a formal. Therefore, in this study, we study to develop digital transformation method for unstructured data in equipment check ledger.

Proposition of balanced comparative confidence considering all available diagnostic tools (모든 가능한 진단도구를 활용한 균형비교신뢰도의 제안)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.611-618
    • /
    • 2015
  • By Wikipedia, big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Data mining is the computational process of discovering patterns in huge data sets involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. Association rule is a well researched method for discovering interesting relationships between itemsets in huge databases and has been applied in various fields. There are positive, negative, and inverse association rules according to the direction of association. If you want to set the evaluation criteria of association rule, it may be desirable to consider three types of association rules at the same time. To this end, we proposed a balanced comparative confidence considering sensitivity, specificity, false positive, and false negative, checked the conditions for association threshold by Piatetsky-Shapiro, and compared it with comparative confidence and inversely comparative confidence through a few experiments.

Implementation of Disease Search System Based on Public Data using Open Source (오픈 소스를 활용한 공공 데이터 기반의 질병 검색 시스템 구현)

  • Park, Sun-ho;Kim, Young-kil
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.11
    • /
    • pp.1337-1342
    • /
    • 2019
  • Medical institutions face the challenge of securing competitiveness among medical institutions due to the rapid spread of ICT convergence, and managing data that is growing at an enormous rate due to the emergence of big data and the emergence of the Internet of Things. The big data paradigm of the medical community is not just about large data or tools and processes for processing and analyzing it, but also means a computerized shift in the way people live, think and study. As the medical data is recently released, the demand for the use of medical data is increasing. Therefore, the research on disease detection system based on public data using open source that can help rational and efficient decision making was conducted. As a result of the experiment, unlike a simple disease inquiry or a symptom inquiry about a single disease provided by a public institution, related diseases are searched by a symptom or a cause.

An Empirical Evaluation Analysis of the Performance of In-memory Bigdata Processing Platform (메모리 기반 빅데이터 처리 프레임워크의 성능개선 연구)

  • Lee, Jae hwan;Choi, Jun;Koo, Dong hun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.3
    • /
    • pp.13-19
    • /
    • 2016
  • Spark, an in-memory big-data processing framework is popular to use for real-time processing workload. Spark can store all intermediate data in the cluster memory so that Spark can minimize I/O access. However, when the resident memory of workload is larger that the physical memory amount of the cluster, the total performance can drop dramatically. In this paper, we analyse the factors of bottleneck on PageRank Application that needs many memory through experiment, and cluster the Spark with Tachyon File System for using memory to solve the factor of bottleneck and then we improve the performance about 18%.

Processing and Quality Control of Big Data from Korean SPAR (Soil-Plant-Atmosphere-Research) System (한국형 SPAR(Soil-Plant-Atmosphere-Research) 시스템에서 대용량 관측 자료의 처리 및 품질관리)

  • Sang, Wan-Gyu;Kim, Jun-Hwan;Shin, Pyong;Baek, Jae-Kyeong;Seo, Myung-Chul
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.22 no.4
    • /
    • pp.340-345
    • /
    • 2020
  • In this study, we developed the quality control and assurance method of measurement data of SPAR (Soil-Plant-Atmosphere-Research) system, a climate change research facility, for the first time. It was found that the precise processing of CO2 flux data among many observations were sig nificantly important to increase the accuracy of canopy photosynthesis measurements in the SPAR system. The collected raw CO2 flux data should first be removed error and missing data and then replaced with estimated data according to photosynthetic lig ht response curve model. Comparing the correlation between cumulative net assimilation and soybean biomass, the quality control and assurance of the raw CO2 flux data showed an improved effect on canopy photosynthesis evaluation by increasing the coefficient of determination (R2) and lowering the root mean square error (RMSE). These data processing methods are expected to be usefully applied to the development of crop growth model using SPAR system.

Speed-up of the Matrix Computation on the Ridge Regression

  • Lee, Woochan;Kim, Moonseong;Park, Jaeyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3482-3497
    • /
    • 2021
  • Artificial intelligence has emerged as the core of the 4th industrial revolution, and large amounts of data processing, such as big data technology and rapid data analysis, are inevitable. The most fundamental and universal data interpretation technique is an analysis of information through regression, which is also the basis of machine learning. Ridge regression is a technique of regression that decreases sensitivity to unique or outlier information. The time-consuming calculation portion of the matrix computation, however, basically includes the introduction of an inverse matrix. As the size of the matrix expands, the matrix solution method becomes a major challenge. In this paper, a new algorithm is introduced to enhance the speed of ridge regression estimator calculation through series expansion and computation recycle without adopting an inverse matrix in the calculation process or other factorization methods. In addition, the performances of the proposed algorithm and the existing algorithm were compared according to the matrix size. Overall, excellent speed-up of the proposed algorithm with good accuracy was demonstrated.

A Study on Prediction of Linear Relations Between Variables According to Working Characteristics Using Correlation Analysis

  • Kim, Seung Jae
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.14 no.4
    • /
    • pp.228-239
    • /
    • 2022
  • Many countries around the world using ICT technologies have various technologies to keep pace with the 4th industrial revolution, and various algorithms and systems have been developed accordingly. Among them, many industries and researchers are investing in unmanned automation systems based on AI. At the time when new technology development and algorithms are developed, decision-making by big data analysis applied to AI systems must be equipped with more sophistication. We apply, Pearson's correlation analysis is applied to six independent variables to find out the job satisfaction that office workers feel according to their job characteristics. First, a correlation coefficient is obtained to find out the degree of correlation for each variable. Second, the presence or absence of correlation for each data is verified through hypothesis testing. Third, after visualization processing using the size of the correlation coefficient, the degree of correlation between data is investigated. Fourth, the degree of correlation between variables will be verified based on the correlation coefficient obtained through the experiment and the results of the hypothesis test