• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.029 seconds

A Survey on the Performance Comparison of Map Reduce Technologies and the Architectural Improvement of Spark

  • Raghavendra, GS;Manasa, Bezwada;Vasavi, M.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.121-126
    • /
    • 2022
  • Hadoop and Apache Spark are Apache Software Foundation open source projects, and both of them are premier large data analytic tools. Hadoop has led the big data industry for five years. The processing velocity of the Spark can be significantly different, up to 100 times quicker. However, the amount of data handled varies: Hadoop Map Reduce can process data sets that are far bigger than Spark. This article compares the performance of both spark and map and discusses the advantages and disadvantages of both above-noted technologies.

Big Data Based Dynamic Flow Aggregation over 5G Network Slicing

  • Sun, Guolin;Mareri, Bruce;Liu, Guisong;Fang, Xiufen;Jiang, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.4717-4737
    • /
    • 2017
  • Today, smart grids, smart homes, smart water networks, and intelligent transportation, are infrastructure systems that connect our world more than we ever thought possible and are associated with a single concept, the Internet of Things (IoT). The number of devices connected to the IoT and hence the number of traffic flow increases continuously, as well as the emergence of new applications. Although cutting-edge hardware technology can be employed to achieve a fast implementation to handle this huge data streams, there will always be a limit on size of traffic supported by a given architecture. However, recent cloud-based big data technologies fortunately offer an ideal environment to handle this issue. Moreover, the ever-increasing high volume of traffic created on demand presents great challenges for flow management. As a solution, flow aggregation decreases the number of flows needed to be processed by the network. The previous works in the literature prove that most of aggregation strategies designed for smart grids aim at optimizing system operation performance. They consider a common identifier to aggregate traffic on each device, having its independent static aggregation policy. In this paper, we propose a dynamic approach to aggregate flows based on traffic characteristics and device preferences. Our algorithm runs on a big data platform to provide an end-to-end network visibility of flows, which performs high-speed and high-volume computations to identify the clusters of similar flows and aggregate massive number of mice flows into a few meta-flows. Compared with existing solutions, our approach dynamically aggregates large number of such small flows into fewer flows, based on traffic characteristics and access node preferences. Using this approach, we alleviate the problem of processing a large amount of micro flows, and also significantly improve the accuracy of meeting the access node QoS demands. We conducted experiments, using a dataset of up to 100,000 flows, and studied the performance of our algorithm analytically. The experimental results are presented to show the promising effectiveness and scalability of our proposed approach.

Detecting and Avoiding Dangerous Area for UAVs Using Public Big Data (공공 빅데이터를 이용한 UAV 위험구역검출 및 회피방법)

  • Park, Kyung Seok;Kim, Min Jun;Kim, Sung Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.6
    • /
    • pp.243-250
    • /
    • 2019
  • Because of a moving UAV has a lot of potential/kinetic energy, if the UAV falls to the ground, it may have a lot of impact. Because this can lead to human casualities, in this paper, the population density area on the UAV flight path is defined as a dangerous area. The conventional UAV path flight was a passive form in which a UAV moved in accordance with a path preset by a user before the flight. Some UAVs include safety features such as a obstacle avoidance system during flight. Still, it is difficult to respond to changes in the real-time flight environment. Using public Big Data for UAV path flight can improve response to real-time flight environment changes by enabling detection of dangerous areas and avoidance of the areas. Therefore, in this paper, we propose a method to detect and avoid dangerous areas for UAVs by utilizing the Big Data collected in real-time. If the routh is designated according to the destination by the proposed method, the dangerous area is determined in real-time and the flight is made to the optimal bypass path. In further research, we will study ways to increase the quality satisfaction of the images acquired by flying under the avoidance flight plan.

Data-Driven Technology Portfolio Analysis for Commercialization of Public R&D Outcomes: Case Study of Big Data and Artificial Intelligence Fields (공공연구성과 실용화를 위한 데이터 기반의 기술 포트폴리오 분석: 빅데이터 및 인공지능 분야를 중심으로)

  • Eunji Jeon;Chae Won Lee;Jea-Tek Ryu
    • The Journal of Bigdata
    • /
    • v.6 no.2
    • /
    • pp.71-84
    • /
    • 2021
  • Since small and medium-sized enterprises fell short of the securement of technological competitiveness in the field of big data and artificial intelligence (AI) field-core technologies of the Fourth Industrial Revolution, it is important to strengthen the competitiveness of the overall industry through technology commercialization. In this study, we aimed to propose a priority related to technology transfer and commercialization for practical use of public research results. We utilized public research performance information, improving missing values of 6T classification by deep learning model with an ensemble method. Then, we conducted topic modeling to derive the converging fields of big data and AI. We classified the technology fields into four different segments in the technology portfolio based on technology activity and technology efficiency, estimating the potential of technology commercialization for those fields. We proposed a priority of technology commercialization for 10 detailed technology fields that require long-term investment. Through systematic analysis, active utilization of technology, and efficient technology transfer and commercialization can be promoted.

Combination of Array Processing and Space-Time Coding In MC-CDMA System

  • Hung Nguyen Viet;Fernando W. A. C
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.302-309
    • /
    • 2004
  • The transmission capacity of wireless communication systems may become dramatically high by employ multiple transmit and receive antennas with space-time coding techniques appropriate to multiple transmit antennas. For large number of transmit antennas and at high bandwidth efficiencies, the receiver may become too complex whenever correlation across transmit antennas is introduced. Reducing decoding complexity at receiver by combining array processing and space-time codes (STC) helps a communication system using STC to overcome the big obstacle that prevents it from achieving a desired high transmission rate. Multi-carrier CDMA (MC-CDMA) allows providing good performance in a channel with high inter-symbol interference. Antenna array, STC and MC-CDMA system have a similar characteristic that transmit-receive data streams are divided into sub-streams. Thus, there may be a noticeable reduction of receiver complexity when we combine them together. In this paper, the combination of array processing and STC in MC-CDMA system over slow selective-fading channel is investigated and compared with corresponding existing MC-CDMA system using STC. A refinement of this basic structure leads to a system design principle in which we have to make a trade off between transmission rate, decoding complexity, and length of spreading code to reach a given desired design goal.

  • PDF

Analysis of Factors Delaying on Waiting Time for Medical Examination of Outpatient on a Hospital (일 병원의 외래진료대기시간 지연요인 분석)

  • Park, Seong-Hi
    • Quality Improvement in Health Care
    • /
    • v.8 no.1
    • /
    • pp.56-72
    • /
    • 2001
  • Background : To shorten processing time for variety of medical affairs of the patient at the outpatient clinic of a big hospital is very important to qualify medical care of the patient. Therefore, patient's waiting time for medical examination is often utilized as a strong tool to evaluate patient satisfaction with a medical care provided. We performed this study to investigate factors delaying related with waiting time for medical examination. Methods : The data were collected from June 26 to July 30, 1999. A total 275 case of medical treatment and 5,634 patients who visited outpatient clinics of a tertiary hospital were subjected to evaluate the waiting time. The data were analyzed using frequency, t-test, ANOVA, $X^2$-test by SPSS Windows 7.5 program. Results : The mean patient's waiting time objectively evaluated ($30.9{\pm}33.9$ min) was longer than that subjectively by patient evaluated ($25.1{\pm}26.2$ min). Patient waiting time objectively evaluated was influenced by the starting time of medical examination, consultation hours, patients arriving time etc, as expected. The time discrepancy between two evaluations was influenced by several causative factors. Regarding the degree of patients accepted waiting time with the medical examination is 20 min. Conclusion : The results show that, besides the starting time of medical examination, consultation hours and patients arriving time, influence the patient's subjective evaluation of waiting time for medical examination and his satisfaction related with the service in the big hospital. In order to improve patient satisfaction related with waiting time for medical examination, it will be effective examination rather than to shorten the real processing time within the consultation room.

  • PDF

A Decision Tree-based Music Recommendation System Using the user experience (사용자 경험정보를 고려한 결정트리 기반 음악 추천 시스템)

  • Kim, Yu-ri;Kim, Seong-gi;Kim, Jeong-Ho;Jo, Jae-rim;Lee, Dong-wook;Kim, Seok-Jin;Jeon, Soo-bin;Seo, Dong-mahn
    • Annual Conference of KIPS
    • /
    • 2020.11a
    • /
    • pp.655-658
    • /
    • 2020
  • 최근 IT 기술의 발달로 태블릿, 스마트폰과 같은 다양한 디바이스로 손쉽게 음악을 감상할 수 있다. 하지만 최근 이런 기술 발달과는 다르게 사용자가 원하는 음악을 검색하는 방법은 고전적인 형태에서 벗어나지 않고 있다. 기존 음악 검색 방법은 텍스트 기반, 내용 기반, 소비자 감성 기반의 음악 추천 검색 방법이 있으며 저장된 메타 데이터를 이용하여 사용자의 질의에 대한 결과만 제공할 뿐 사용자의 경험 정보를 고려하지 않는다. 그리고 기존 플랫폼들은 사용자가 최근 많이 들은 가수, 장르, 분위기를 종합하여 사용자에게 어울리는 음악을 추천을 할 뿐 사용자의 경험정보를 고려하여 음악을 추천하지는 않는다. 본 논문에서는 사용자의 경험 정보를 활용하여 사용자 맞춤형 음악 추천 시스템을 제안한다. 본 시스템은 사용자의 현재 기분 정보, 주변 날씨 정보 등을 입력 받는다. 이후, 경험 정보를 기반으로 결정 트리를 통해 사용자 요구 기반의 음악 추천 시스템을 구축하였다.

Korean Machine Reading Comprehension for Patent Consultation Using BERT (BERT를 이용한 한국어 특허상담 기계독해)

  • Min, Jae-Ok;Park, Jin-Woo;Jo, Yu-Jeong;Lee, Bong-Gun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.4
    • /
    • pp.145-152
    • /
    • 2020
  • MRC (Machine reading comprehension) is the AI NLP task that predict the answer for user's query by understanding of the relevant document and which can be used in automated consult services such as chatbots. Recently, the BERT (Pre-training of Deep Bidirectional Transformers for Language Understanding) model, which shows high performance in various fields of natural language processing, have two phases. First phase is Pre-training the big data of each domain. And second phase is fine-tuning the model for solving each NLP tasks as a prediction. In this paper, we have made the Patent MRC dataset and shown that how to build the patent consultation training data for MRC task. And we propose the method to improve the performance of the MRC task using the Pre-trained Patent-BERT model by the patent consultation corpus and the language processing algorithm suitable for the machine learning of the patent counseling data. As a result of experiment, we show that the performance of the method proposed in this paper is improved to answer the patent counseling query.

A Comparative Analysis of Recursive Query Algorithm Implementations based on High Performance Distributed In-Memory Big Data Processing Platforms (대용량 데이터 처리를 위한 고속 분산 인메모리 플랫폼 기반 재귀적 질의 알고리즘들의 구현 및 비교분석)

  • Kang, Minseo;Kim, Jaesung;Lee, Jaegil
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.621-626
    • /
    • 2016
  • Recursive query algorithm is used in many social network services, e.g., reachability queries in social networks. Recently, the size of social network data has increased as social network services evolve. As a result, it is almost impossible to use the recursive query algorithm on a single machine. In this paper, we implement recursive query on two popular in-memory distributed platforms, Spark and Twister, to solve this problem. We evaluate the performance of two implementations using 50 machines on Amazon EC2, and real-world data sets: LiveJournal and ClueWeb. The result shows that recursive query algorithm shows better performance on Spark for the Livejournal input data set with relatively high average degree, but smaller vertices. However, recursive query on Twister is superior to Spark for the ClueWeb input data set with relatively low average degree, but many vertices.

Medical Image Analysis Using Artificial Intelligence

  • Yoon, Hyun Jin;Jeong, Young Jin;Kang, Hyun;Jeong, Ji Eun;Kang, Do-Young
    • Progress in Medical Physics
    • /
    • v.30 no.2
    • /
    • pp.49-58
    • /
    • 2019
  • Purpose: Automated analytical systems have begun to emerge as a database system that enables the scanning of medical images to be performed on computers and the construction of big data. Deep-learning artificial intelligence (AI) architectures have been developed and applied to medical images, making high-precision diagnosis possible. Materials and Methods: For diagnosis, the medical images need to be labeled and standardized. After pre-processing the data and entering them into the deep-learning architecture, the final diagnosis results can be obtained quickly and accurately. To solve the problem of overfitting because of an insufficient amount of labeled data, data augmentation is performed through rotation, using left and right flips to artificially increase the amount of data. Because various deep-learning architectures have been developed and publicized over the past few years, the results of the diagnosis can be obtained by entering a medical image. Results: Classification and regression are performed by a supervised machine-learning method and clustering and generation are performed by an unsupervised machine-learning method. When the convolutional neural network (CNN) method is applied to the deep-learning layer, feature extraction can be used to classify diseases very efficiently and thus to diagnose various diseases. Conclusions: AI, using a deep-learning architecture, has expertise in medical image analysis of the nerves, retina, lungs, digital pathology, breast, heart, abdomen, and musculo-skeletal system.