• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.025 seconds

A Study on the Improvement of Availability of Distributed Processing Systems Using Edge Computing (엣지컴퓨팅을 활용한 분산처리 시스템의 가용성 향상에 관한 연구)

  • Lee, Kun-Woo;Kim, Young-Gon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.1
    • /
    • pp.83-88
    • /
    • 2022
  • Internet of Things (hereinafter referred to as IoT) related technologies are continuously developing in line with the recent development of information and communication technologies. IoT system sends and receives unique data through network based on various sensors. Data generated by IoT systems can be defined as big data in that they occur in real time, and that the amount is proportional to the amount of sensors installed. Until now, IoT systems have applied data storage, processing and computation through centralized processing methods. However, existing centralized processing servers can be under load due to bottlenecks if the deployment grows in size and a large amount of sensors are used. Therefore, in this paper, we propose a distributed processing system for applying a data importance-based algorithm aimed at the high availability of the system to efficiently handle real-time sensor data arising in IoT environments.

A Licence Plate Recognition System using Hadoop (하둡을 이용한 번호판 인식 시스템)

  • Park, Jin-Woo;Park, Ho-Hyun
    • Journal of IKEEE
    • /
    • v.21 no.2
    • /
    • pp.142-145
    • /
    • 2017
  • Currently, a trend in image processing is high-quality and high-resolution. The size and amount of image data are increasing exponentially because of the development of information and communication technology. Thus, license plate recognition with a single processor cannot handle the increasing data. This paper proposes a number plate recognition system using a distributed processing framework, Hadoop. Using SequenceFile format in Hadoop, each mapper performs a license plate recognition with a number of image data in a data block Experimental results show that license plate recognition performance with 16 data nodes accomplishes speedup of maximum 14.7 times comparing with one data node. In large dataset, the recognition performance is robust even if the number of data nodes increases gradually.

A Basic Study on Trade-off Analysis of Downsampling for Indoor Point Cloud Data (실내 포인트 클라우드 데이터 Downsampling의 Trade-off 분석을 통한 기초 연구)

  • Kang, Nam-Woo;Oh, Sang-Min;Ryu, Min-Woo;Jung, Yong-Gil;Cho, Hun-hee
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2020.06a
    • /
    • pp.40-41
    • /
    • 2020
  • As the capacity of the 3d scanner developed, the reverse engineering using the 3d scanner is emphasized in the construction industry to obtain the 3d geometric representation of buildings. However, big size of the indoor point cloud data acquired by the 3d scanner restricts the efficient process in the reverse engineering. In order to solve this inefficiency, several pre-processing methods simplifying and denoising the raw point cloud data by the rough standard are developed, but these non-standard methods can cause the inaccurate recognition and removal the key-points. This paper analyzes the correlation between the accuracy of wall recognition and the density of the data, thus proposes the proper method for the raw point cloud data. The result of this study could improve the efficiency of the data processing phase in the reverse engineering for indoor point cloud data.

  • PDF

Design of a Sentiment Analysis System to Prevent School Violence and Student's Suicide (학교폭력과 자살사고를 예방하기 위한 감성분석 시스템의 설계)

  • Kim, YoungTaek
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.6
    • /
    • pp.115-122
    • /
    • 2014
  • One of the problems with current youth generations is increasing rate of violence and suicide in their school lives, and this study aims at the design of a sentiment analysis system to prevent suicide by uising big data process. The main issues of the design are economical implementation, easy and fast processing for the users, so, the open source Hadoop system with MapReduce algorithm is used on the HDFS(Hadoop Distributed File System) for the experimentation. This study uses word count method to do the sentiment analysis with informal data on some sns communications concerning a kinds of violent words, in terms of text mining to avoid some expensive and complex statistical analysis methods.

  • PDF

Determinants of Online Review Helpfulness for Korean Skincare Products in Online Retailing

  • OH, Yun-Kyung
    • Journal of Distribution Science
    • /
    • v.18 no.10
    • /
    • pp.65-75
    • /
    • 2020
  • Purpose: This study aims to examine how to review contents of experiential and utilitarian products (e.g., skincare products) and how to affect review helpfulness by applying natural language processing techniques. Research design, data, and methodology: This study uses 69,633 online reviews generated for the products registered at Amazon.com by 13 Korean cosmetic firms. The authors identify key topics that emerge about consumers' use of skincare products such as skin type and skin trouble, by applying bigram analysis. The review content variables are included in the review helpfulness model, including other important determinants. Results: The estimation results support the positive effect of review extremity and content on the helpfulness. In particular, the reviewer's skin type information was recognized as highly useful when presented together as a basis for high-rated reviews. Moreover, the content related to skin issues positively affects review helpfulness. Conclusions: The positive relationship between extreme reviews and helpfulness of reviews challenges the findings from prior literature. This result implies that an in-depth study of the effect of product types on review helpfulness is needed. Furthermore, a positive effect of review content on helpfulness suggests that applying big data analytics can provide meaningful customer insights in the online retail industry.

Job-aware Network Scheduling for Hadoop Cluster

  • Liu, Wen;Wang, Zhigang;Shen, Yanming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.1
    • /
    • pp.237-252
    • /
    • 2017
  • In recent years, data centers have become the core infrastructure to deal with big data processing. For these big data applications, network transmission has become one of the most important factors affecting the performance. In order to improve network utilization and reduce job completion time, in this paper, by real-time monitoring from the application layer, we propose job-aware priority scheduling. Our approach takes the correlations of flows in the same job into account, and flows in the same job are assigned the same priority. Therefore, we expect that flows in the same job finish their transmissions at about the same time, avoiding lagging flows. To achieve load balancing, two approaches (Flow-based and Spray) using ECMP (Equal-Cost multi-path routing) are presented. We implemented our scheme using NS-2 simulator. In our evaluations, we emulate real network environment by setting background traffic, scheduling delay and link failures. The experimental results show that our approach can enhance the Hadoop job execution efficiency of the shuffle stage, significantly reduce the network transmission time of the highest priority job.

Research on R&D Planning Through NLP Analysis of Patent Information: Focusing on Display Technology (특허정보의 NLP 분석을 통한 R&D 계획수립 방안 연구: 디스플레이 기술 분석을 중심으로)

  • Kim, Jung-Heui;Kim, Young-Min
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.25 no.5
    • /
    • pp.817-826
    • /
    • 2022
  • Patent information describes the history of technological progress in the relevant field, so it can be usefully used to identify trends in technological development and change and to establish R&D development strategies. This study proposes a method to identify the needs and problems of technology development at the planning stage of the R&D process and to analyze core technologies through patent analysis using Natural Language Processing(NLP) technology. As a big data source, collected patent documents registered in Google Patents for foldable technology, the latest technology in the display industry, and then extracted keywords using NLP analyzer. By classifying the extracted keywords into needs and problems for technology development, developed technology and materials, identified the needs of the market and customers and analyzed the technologies being researched and developed. Unlike previous studies that performed patent analysis, this methodology is different in that it can quickly and conveniently analyze the latest technology trends from big data called patents even if you do not have specialized knowledge and skills in the text mining. This study contributes to the digitalization of the R&D process based on data analysis.

Big Data Analysis of Busan Civil Affairs Using the LDA Topic Modeling Technique (LDA 토픽모델링 기법을 활용한 부산시 민원 빅데이터 분석)

  • Park, Ju-Seop;Lee, Sae-Mi
    • Informatization Policy
    • /
    • v.27 no.2
    • /
    • pp.66-83
    • /
    • 2020
  • Local issues that occur in cities typically garner great attention from the public. While local governments strive to resolve these issues, it is often difficult to effectively eliminate them all, which leads to complaints. In tackling these issues, it is imperative for local governments to use big data to identify the nature of complaints, and proactively provide solutions. This study applies the LDA topic modeling technique to research and analyze trends and patterns in complaints filed online. To this end, 9,625 cases of online complaints submitted to the city of Busan from 2015 to 2017 were analyzed, and 20 topics were identified. From these topics, key topics were singled out, and through analysis of quarterly weighting trends, four "hot" topics(Bus stops, Taxi drivers, Praises, and Administrative handling) and four "cold" topics(CCTV installation, Bus routes, Park facilities including parking, and Festivities issues) were highlighted. The study conducted big data analysis for the identification of trends and patterns in civil affairs and makes an academic impact by encouraging follow-up research. Moreover, the text mining technique used for complaint analysis can be used for other projects requiring big data processing.

Efficient Association Rule Mining based SON Algorithm for a Bigdata Platform (빅데이터 플랫폼을 위한 SON알고리즘 기반의 효과적인 연관 룰 마이닝)

  • Nguyen, Giang-Truong;Nguyen, Van-Quyet;Nguyen, Sinh-Ngoc;Kim, Kyungbaek
    • Journal of Digital Contents Society
    • /
    • v.18 no.8
    • /
    • pp.1593-1601
    • /
    • 2017
  • In a big data platform, association rule mining applications could bring some benefits. For instance, in a agricultural big data platform, the association rule mining application could recommend specific products for farmers to grow, which could increase income. The key process of the association rule mining is the frequent itemsets mining, which finds sets of products accompanying together frequently. Former researches about this issue, e.g. Apriori, are not satisfying enough because huge possible sets can cause memory to be overloaded. In order to deal with it, SON algorithm has been proposed, which divides the considered set into many smaller ones and handles them sequently. But in a single machine, SON algorithm cause heavy time consuming. In this paper, we present a method to find association rules in our Hadoop based big data platform, by parallelling SON algorithm. The entire process of association rule mining including pre-processing, SON algorithm based frequent itemset mining, and association rule finding is implemented on Hadoop based big data platform. Through the experiment with real dataset, it is conformed that the proposed method outperforms a brute force method.

CNN-LSTM Coupled Model for Prediction of Waterworks Operation Data

  • Cao, Kerang;Kim, Hangyung;Hwang, Chulhyun;Jung, Hoekyung
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1508-1520
    • /
    • 2018
  • In this paper, we propose an improved model to provide users with a better long-term prediction of waterworks operation data. The existing prediction models have been studied in various types of models such as multiple linear regression model while considering time, days and seasonal characteristics. But the existing model shows the rate of prediction for demand fluctuation and long-term prediction is insufficient. Particularly in the deep running model, the long-short-term memory (LSTM) model has been applied to predict data of water purification plant because its time series prediction is highly reliable. However, it is necessary to reflect the correlation among various related factors, and a supplementary model is needed to improve the long-term predictability. In this paper, convolutional neural network (CNN) model is introduced to select various input variables that have a necessary correlation and to improve long term prediction rate, thus increasing the prediction rate through the LSTM predictive value and the combined structure. In addition, a multiple linear regression model is applied to compile the predicted data of CNN and LSTM, which then confirms the data as the final predicted outcome.