• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.034 seconds

Learning algorithms for big data logistic regression on RHIPE platform (RHIPE 플랫폼에서 빅데이터 로지스틱 회귀를 위한 학습 알고리즘)

  • Jung, Byung Ho;Lim, Dong Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.911-923
    • /
    • 2016
  • Machine learning becomes increasingly important in the big data era. Logistic regression is a type of classification in machine leaning, and has been widely used in various fields, including medicine, economics, marketing, and social sciences. Rhipe that integrates R and Hadoop environment, has not been discussed by many researchers owing to the difficulty of its installation and MapReduce implementation. In this paper, we present the MapReduce implementation of Gradient Descent algorithm and Newton-Raphson algorithm for logistic regression using Rhipe. The Newton-Raphson algorithm does not require a learning rate, while Gradient Descent algorithm needs to manually pick a learning rate. We choose the learning rate by performing the mixed procedure of grid search and binary search for processing big data efficiently. In the performance study, our Newton-Raphson algorithm outpeforms Gradient Descent algorithm in all the tested data.

IP-Based Heterogeneous Network Interface Gateway for IoT Big Data Collection (IoT 빅데이터 수집을 위한 IP기반 이기종 네트워크 인터페이스 연동 게이트웨이)

  • Kang, Jiheon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.2
    • /
    • pp.173-178
    • /
    • 2019
  • Recently, the types and amount of data generated, collected, and measured in IoT such as smart home, security, and factory are increasing. The technologies for IoT service include sensor devices to measure desired data, embedded software to control the devices such as signal processing, wireless network protocol to transmit and receive the measured data, and big data and AI-based analysis. In this paper, we focused on developing a gateway for interfacing heterogeneous sensor network protocols that are used in various IoT devices and propose a heterogeneous network interface IoT gateway. We utilized a OpenWrt-based wireless routers and used 6LoWAN stack for IP-based communication via BLE and IEEE 802.15.4 adapters. We developed a software to convert Z-Wave and LoRa packets into IP packet using our Python-based middleware. We expect the IoT gateway to be used as an effective device for collecting IoT big data.

A Study on the Domestic Fisheries Industry's Managerial Performance Analysis using Data Envelopment Analysis (자료표괄분석을 활용한 국내 수산산업의 경영성과 분석에 관한 연구)

  • Chun, Dongphil
    • The Journal of Fisheries Business Administration
    • /
    • v.48 no.1
    • /
    • pp.1-16
    • /
    • 2017
  • The fisheries industry has led the Korean economy, and has been achieving high-level position in the world. However, this industry meets aging, low growth and profit. In order to overcome this critical situation, it is needed to understand the overall status of industry. In industry level, most of previous researches focused on ocean industry rather than fisheries. In addition, scholars have been getting a lot of attention about fisheries cooperatives, fishing-ports, methods of fishery, and manufacturing process in fisheries sector. The aim of this research is analysis of domestic fisheries industry's managerial performance using data envelopment analysis(DEA) considering operating and scale view. Furthermore, the comparative analysis is performed by firm size, and industry type. In results, fisheries industry's managerial performance is not high, overall. In more detail, most of big size firms are under decreasing returns to scale(DRS) status. Fishery processing industry's performance is low, and fishery distribution industry has the best performance. This paper suggests that transferring operating capability from big firms to small firms, and policy supports and firm's activities should be accompanied for high-value added in fisher, and fishery processing industries.

Cyclic Shift Based Tone Reservation PAPR Reduction Scheme with Embedding Side Information for FBMC-OQAM Systems

  • Shi, Yongpeng;Xia, Yujie;Gao, Ya;Cui, Jianhua
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.8
    • /
    • pp.2879-2899
    • /
    • 2021
  • The tone reservation (TR) scheme is an attractive method to reduce peak-to-average power ratio (PAPR) in the filter bank multicarrier with offset quadrature amplitude modulation (FBMC-OQAM) systems. However, the high PAPR of FBMC signal will severely degrades system performance. To address this issue, a cyclic shift based TR (CS-TR) scheme with embedding side information (SI) is proposed to reduce the PAPR of FBMC signals. At the transmitter, four candidate signals are first generated based on cyclic shift of the output of inverse discrete Fourier transform (IDFT), and the SI of the selected signal with minimum peak power among the four candidate signals is embedded in sparse symbols with quadrature phase-shift keying constellation. Then, the TR weighted by optimal scaling factor is employed to further reduce PAPR of the selected signal. At the receiver, a reliable SI detector is presented by determining the phase rotation of SI embedding symbols, and the transmitted data blocks can be correctly demodulated according to the detected SI. Simulation results show that the proposed scheme significantly outperforms the existing TR schemes in both PAPR reduction and bit error rate (BER) performances. In addition, the proposed scheme with detected SI can achieve the same BER performance compared to the one with perfect SI.

The Impact of Product Review Usefulness on the Digital Market Consumers Distribution

  • Seung-Yong LEE;Seung-wha (Andy) CHUNG;Sun-Ju PARK
    • Journal of Distribution Science
    • /
    • v.22 no.3
    • /
    • pp.113-124
    • /
    • 2024
  • Purpose: This study is a quantitative study and analyzes the effect of evaluating the extreme and usefulness of product reviews on sales performance by using text mining techniques based on product review big data. We investigate whether the perceived helpfulness of product reviews serves as a mediating factor in the impact of product review extremity on sales performance. Research design, data and methodology: The analysis emphasizes customer interaction factors associated with both product review helpfulness and sales performance. Out of the 8.26 million Amazon product reviews in the book category collected by He & McAuley (2016), text mining using natural language processing methodology was performed on 300,000 product reviews, and the hypothesis was verified through hierarchical regression analysis. Results: The extremity of product reviews exhibited a negative impact on the evaluation of helpfulness. And the helpfulness played a mediating role between the extremity of product reviews and sales performance. Conclusion: Increased inclusion of extreme content in the product review's text correlates with a diminished evaluation of helpfulness. The evaluation of helpfulness exerts a negative mediating effect on sales performance. This study offers empirical insights for digital market distributors and sellers, contributing to the research field related to product reviews based on review ratings.

Combined time bound optimization of control, communication, and data processing for FSO-based 6G UAV aerial networks

  • Seo, Seungwoo;Ko, Da-Eun;Chung, Jong-Moon
    • ETRI Journal
    • /
    • v.42 no.5
    • /
    • pp.700-711
    • /
    • 2020
  • Because of the rapid increase of mobile traffic, flexible broadband supportive unmanned aerial vehicle (UAV)-based 6G mobile networks using free space optical (FSO) links have been recently proposed. Considering the advancements made in UAVs, big data processing, and artificial intelligence precision control technologies, the formation of an additional wireless network based on UAV aerial platforms to assist the existing fixed base stations of the mobile radio access network is considered a highly viable option in the near future. In this paper, a combined time bound optimization scheme is proposed that can adaptively satisfy the control and communication time constraints as well as the processing time constraints in FSO-based 6G UAV aerial networks. The proposed scheme controls the relation between the number of data flows, input data rate, number of worker nodes considering the time bounds, and the errors that occur during communication and data processing. The simulation results show that the proposed scheme is very effective in satisfying the time constraints for UAV control and radio access network services, even when errors in communication and data processing may occur.

Design and Implementation of a Flood Disaster Safety System Using Realtime Weather Big Data (실시간 기상 빅데이터를 활용한 홍수 재난안전 시스템 설계 및 구현)

  • Kim, Yeonwoo;Kim, Byounghoon;Ko, Geonsik;Choi, Minwoong;Song, Heesub;Kim, Gihoon;Yoo, Seunghun;Lim, Jongtae;Bok, Kyungsoo;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.1
    • /
    • pp.351-362
    • /
    • 2017
  • Recently, analysis techniques to extract new meanings using big data analysis and various services using them have been developed. A disaster safety service among such services has been paid attention as the most important service. In this paper, we design and implement a flood disaster safety system using real time weather big data. The proposed system retrieves and processes vast amounts of information being collected in real time. In addition, it analyzes risk factors by aggregating the collected real time and past data and then provides users with prediction information. The proposed system also provides users with the risk prediction information by processing real time data such as user messages and news, and by analyzing disaster risk factors such a typhoon and a flood. As a result, users can prepare for potential disaster safety risks through the proposed system.

Proposal For Improving Data Processing Performance Using Python (파이썬 활용한 데이터 처리 성능 향상방법 제안)

  • Kim, Hyo-Kwan;Hwang, Won-Yong
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.13 no.4
    • /
    • pp.306-311
    • /
    • 2020
  • This paper deals with how to improve the performance of Python language with various libraries when developing a model using big data. The Python language uses the Pandas library for processing spreadsheet-format data such as Excel. In processing data, Python operates on an in-memory basis. There is no performance issue when processing small scale of data. However, performance issues occur when processing large scale of data. Therefore, this paper introduces a method for distributed processing of execution tasks in a single cluster and multiple clusters by using a Dask library that can be used with Pandas when processing data. The experiment compares the speed of processing a simple exponential model using only Pandas on the same specification hardware and the speed of processing using a dask together. This paper presents a method to develop a model by distributing a large scale of data by CPU cores in terms of performance while maintaining that python's advantage of using various libraries is easy.

Concurrency processing comparison of large data list using GO language (GO언어를 이용한 대용량 데이터 리스트의 동시성 처리 비교)

  • Lee, Yoseb;Lim, Young-Han
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.2
    • /
    • pp.361-366
    • /
    • 2022
  • There are several ways to process large amounts of data. Depending on the processing method, there is a big difference in processing speed to create a large data list. Typically, to make a large data list, large data is converted into a normalized query, and the result of the query is stored in a List Map and converted into a printable form. This process occurs as a cause of lowering the processing speed step by step. In the process of storing the results of the created query as a List Map, the processing speed differs because the data is stored in a different format for each type of data. Through the simultaneous processing of GO language, we want to solve the problem of the existing difference in processing speed. In other words, it compares the results of GO language concurrency processing by providing how different and how it proceeds between the format contained in the existing List Map and the method of processing using concurrency in large data lists for faster processing. do.

A Study of Measuring Traffic Congestion for Urban Network using Average Link Travel Time based on DTG Big Data (DTG 빅데이터 기반의 링크 평균통행시간을 이용한 도심네트워크 혼잡분석 방안 연구)

  • Han, Yohee;Kim, Youngchan
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.5
    • /
    • pp.72-84
    • /
    • 2017
  • Together with the Big Data of the 4th Industrial Revolution, the traffic information system has been changed to an section detection system by the point detection system. With DTG(Digital Tachograph) data based on Global Navigation Satellite System, the properties of raw data and data according to processing step were examined. We identified the vehicle trajectory, the link travel time of individual vehicle, and the link average travel time which are generated according to the processing step. In this paper, we proposed a application method for traffic management as characteristics of processing data. We selected the historical data considering the data management status of the center and the availability at the present time. We proposed a method to generate the Travel Time Index with historical link average travel time which can be collected all the time with wide range. We propose a method to monitor the traffic congestion using the Travel Time Index, and analyze the case of intersections when the traffic operation method changed. At the same time, the current situation which makes it difficult to fully utilize DTG data are suggested as limitations.