• Title/Summary/Keyword: Data Scalability Problem

Search Result 116, Processing Time 0.027 seconds

Cooperative Coevolution Differential Evolution Based on Spark for Large-Scale Optimization Problems

  • Tan, Xujie;Lee, Hyun-Ae;Shin, Seong-Yoon
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.3
    • /
    • pp.155-160
    • /
    • 2021
  • Differential evolution is an efficient algorithm for solving continuous optimization problems. However, its performance deteriorates rapidly, and the runtime increases exponentially when differential evolution is applied for solving large-scale optimization problems. Hence, a novel cooperative coevolution differential evolution based on Spark (known as SparkDECC) is proposed. The divide-and-conquer strategy is used in SparkDECC. First, the large-scale problem is decomposed into several low-dimensional subproblems using the random grouping strategy. Subsequently, each subproblem can be addressed in a parallel manner by exploiting the parallel computation capability of the resilient distributed datasets model in Spark. Finally, the optimal solution of the entire problem is obtained using the cooperation mechanism. The experimental results on 13 high-benchmark functions show that the new algorithm performs well in terms of speedup and scalability. The effectiveness and applicability of the proposed algorithm are verified.

Recommendation System Using Big Data Processing Technique (빅 데이터 처리 기법을 적용한 추천 시스템에 관한 연구)

  • Yun, So-Young;Youn, Sung-Dae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.6
    • /
    • pp.1183-1190
    • /
    • 2017
  • With the development of network and IT technology, people are searching and purchasing items they want, not bounded by places. Therefore, there are various studies on how to solve the scalability problem due to the rapidly increasing data in the recommendation system. In this paper, we propose an item-based collaborative filtering method using Tag weight and a recommendation technique using MapReduce method, which is a distributed parallel processing method. In order to improve speed and efficiency, the proposed method classifies items into categories in the preprocessing and groups according to the number of nodes. In each distributed node, data is processed by going through Map-Reduce step 4 times. In order to recommend better items to users, item tag weight is used in the similarity calculation. The experiment result indicated that the proposed method has been more enhanced the appropriacy compared to item-based method, and run efficiently on the large amounts of data.

An Efficient Guitar Chords Classification System Using Transfer Learning (전이학습을 이용한 효율적인 기타코드 분류 시스템)

  • Park, Sun Bae;Lee, Ho-Kyoung;Yoo, Do Sik
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.10
    • /
    • pp.1195-1202
    • /
    • 2018
  • Artificial neural network is widely used for its excellent performance and implementability. However, traditional neural network needs to learn the system from scratch, with the addition of new input data, the variation of the observation environment, or the change in the form of input/output data. To resolve such a problem, the technique of transfer learning has been proposed. Transfer learning constructs a newly developed target system partially updating existing system and hence provides much more efficient learning process. Until now, transfer learning is mainly studied in the field of image processing and is not yet widely employed in acoustic data processing. In this paper, focusing on the scalability of transfer learning, we apply the concept of transfer learning to the problem of guitar chord classification and evaluate its performance. For this purpose, we build a target system of convolutional neutral network (CNN) based 48 guitar chords classification system by applying the concept of transfer learning to a source system of CNN based 24 guitar chords classification system. We show that the system with transfer learning has performance similar to that of conventional system, but it requires only half the learning time.

Comparison of estimating vegetation index for outdoor free-range pig production using convolutional neural networks

  • Sang-Hyon OH;Hee-Mun Park;Jin-Hyun Park
    • Journal of Animal Science and Technology
    • /
    • v.65 no.6
    • /
    • pp.1254-1269
    • /
    • 2023
  • This study aims to predict the change in corn share according to the grazing of 20 gestational sows in a mature corn field by taking images with a camera-equipped unmanned air vehicle (UAV). Deep learning based on convolutional neural networks (CNNs) has been verified for its performance in various areas. It has also demonstrated high recognition accuracy and detection time in agricultural applications such as pest and disease diagnosis and prediction. A large amount of data is required to train CNNs effectively. Still, since UAVs capture only a limited number of images, we propose a data augmentation method that can effectively increase data. And most occupancy prediction predicts occupancy by designing a CNN-based object detector for an image and counting the number of recognized objects or calculating the number of pixels occupied by an object. These methods require complex occupancy rate calculations; the accuracy depends on whether the object features of interest are visible in the image. However, in this study, CNN is not approached as a corn object detection and classification problem but as a function approximation and regression problem so that the occupancy rate of corn objects in an image can be represented as the CNN output. The proposed method effectively estimates occupancy for a limited number of cornfield photos, shows excellent prediction accuracy, and confirms the potential and scalability of deep learning.

Data Mining for High Dimensional Data in Drug Discovery and Development

  • Lee, Kwan R.;Park, Daniel C.;Lin, Xiwu;Eslava, Sergio
    • Genomics & Informatics
    • /
    • v.1 no.2
    • /
    • pp.65-74
    • /
    • 2003
  • Data mining differs primarily from traditional data analysis on an important dimension, namely the scale of the data. That is the reason why not only statistical but also computer science principles are needed to extract information from large data sets. In this paper we briefly review data mining, its characteristics, typical data mining algorithms, and potential and ongoing applications of data mining at biopharmaceutical industries. The distinguishing characteristics of data mining lie in its understandability, scalability, its problem driven nature, and its analysis of retrospective or observational data in contrast to experimentally designed data. At a high level one can identify three types of problems for which data mining is useful: description, prediction and search. Brief review of data mining algorithms include decision trees and rules, nonlinear classification methods, memory-based methods, model-based clustering, and graphical dependency models. Application areas covered are discovery compound libraries, clinical trial and disease management data, genomics and proteomics, structural databases for candidate drug compounds, and other applications of pharmaceutical relevance.

Preference-Based Segment Buffer Replacement in Cluster VOD Servers (클러스터 VOD서버에서 선호도 기반 세그먼트 버퍼 대체 기법)

  • Seo, Dong-Mahn;Lee, Joa-Hyoung;Bang, Cheol-Seok;Lim, Dong-Sun;Jung, In-Bum;Kim, Yoon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.11
    • /
    • pp.797-809
    • /
    • 2006
  • To support the QoS streams for large scale clients, the internal resources of VOD servers should be utilized based on the characteristics of the streaming media service. Among the various resources in the server, the main memory is used for the buffer space to the media data loaded from the disks and the buffer hit ratio has a great impact upon the server performance. However, if the buffer data with high hit ratio are replaced for the new media data as a result of the number of clients and the required movie titles are increased, the negative impact on the scalability of server performance is occurred. To address this problem, the buffer replacement policy considers the intrinsic characteristics of the streaming media such as the sequential access to large volume data and the highly disproportionate preference to specific movies. In this paper, the preference-based segment buffer replacement policy is proposed in the cluster-based VOD server to exploit the characteristics of the streaming media. Since the proposed method reflects both the temporal locality by the clients' preference and the spatial locality by the sequential access to media data, the buffer hit ratio would be improved as compared to the existing buffer replacement policy. The enhanced buffer hit ratio causes the fact that the performance scalability of the cluster-based VOD server is linearly improved as the number of cluster nodes is increased.

Auxiliary Stacked Denoising Autoencoder based Collaborative Filtering Recommendation

  • Mu, Ruihui;Zeng, Xiaoqin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.6
    • /
    • pp.2310-2332
    • /
    • 2020
  • In recent years, deep learning techniques have achieved tremendous successes in natural language processing, speech recognition and image processing. Collaborative filtering(CF) recommendation is one of widely used methods and has significant effects in implementing the new recommendation function, but it also has limitations in dealing with the problem of poor scalability, cold start and data sparsity, etc. Combining the traditional recommendation algorithm with the deep learning model has brought great opportunity for the construction of a new recommender system. In this paper, we propose a novel collaborative recommendation model based on auxiliary stacked denoising autoencoder(ASDAE), the model learns effective the preferences of users from auxiliary information. Firstly, we integrate auxiliary information with rating information. Then, we design a stacked denoising autoencoder based collaborative recommendation model to learn the preferences of users from auxiliary information and rating information. Finally, we conduct comprehensive experiments on three real datasets to compare our proposed model with state-of-the-art methods. Experimental results demonstrate that our proposed model is superior to other recommendation methods.

A MapReduce-based Artificial Neural Network Churn Prediction for Music Streaming Service

  • Chen, Min
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.1
    • /
    • pp.55-60
    • /
    • 2022
  • Churn prediction is a critical long-term problem for many business like music, games, magazines etc. The churn probability can be used to study many aspects of a business including proactive customer marketing, sales prediction, and churn-sensitive pricing models. It is quite challenging to design machine learning model to predict the customer churn accurately due to the large volume of the time-series data and the temporal issues of the data. In this paper, a parallel artificial neural network is proposed to create a highly-accurate customer churn model on a large customer dataset. The proposed model has achieved significant improvement in the accuracy of churn prediction. The scalability and effectiveness of the proposed algorithm is also studied.

A Fast and Scalable Image Retrieval Algorithms by Leveraging Distributed Image Feature Extraction on MapReduce (MapReduce 기반 분산 이미지 특징점 추출을 활용한 빠르고 확장성 있는 이미지 검색 알고리즘)

  • Song, Hwan-Jun;Lee, Jin-Woo;Lee, Jae-Gil
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1474-1479
    • /
    • 2015
  • With mobile devices showing marked improvement in performance in the age of the Internet of Things (IoT), there is demand for rapid processing of the extensive amount of multimedia big data. However, because research on image searching is focused mainly on increasing accuracy despite environmental changes, the development of fast processing of high-resolution multimedia data queries is slow and inefficient. Hence, we suggest a new distributed image search algorithm that ensures both high accuracy and rapid response by using feature extraction of distributed images based on MapReduce, and solves the problem of memory scalability based on BIRCH indexing. In addition, we conducted an experiment on the accuracy, processing time, and scalability of this algorithm to confirm its excellent performance.

Design and Implementation of Multidimensional Data Model for OLAP Based on Object-Relational DBMS (OLAP을 위한 객체-관계 DBMS 기반 다차원 데이터 모델의 설계 및 구현)

  • 김은영;용환승
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.6A
    • /
    • pp.870-884
    • /
    • 2000
  • Among OLAP(On-Line Analytical Processing) approaches, ROLAP(Relational OLAP) based on the star, snowflake schema which offer the multidimensional analytical method has performance problem and MOLAP (Multidimensional OLAP) based on Multidimensional Database System has scalability problem. In this paper, to solve the limitaions of previous approaches, design and implementation of multidimensional data model based on Object-Relation DBMS was proposed. With the extensibility of Object-Relation DBMS, it is possible to advent multidimensional data model which more expressively define multidimensional concept and analysis functions that are optimized for the defined multidimensional data model. In addition, through the hierarchy between data objects supported by Object-Relation DBMS, the aggregated data model which is inherited from the super-table, multidimensional data model, was designed. One these data models and functions are defined, they behave just like a built-in function, w th the full performance characteristics of Object-Relation DBMS engine.

  • PDF