• Title/Summary/Keyword: Data Scalability Problem

Search Result 116, Processing Time 0.026 seconds

Optimizing Similarity for User-based Collaborative Filtering

  • Soojung Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.11
    • /
    • pp.243-250
    • /
    • 2024
  • Collaborative filtering is one of the most widely known implementation methods of recommender systems, which recommends items that similar users have preferred in the past. Therefore, similarity measurement is a very important factor that determines the performance of the system. In this study, in order to solve the shortcomings of the existing single or integrated heuristic similarity measures, the genetic algorithm was used to calculate the optimal similarity between users per item genre. In addition, in order to solve the data scalability problem, the number of users for calculating similarity for each genre was limited according to a preset threshold, and the average of the ratings of the items was used to solve the data sparsity problem. Through performance experiments, the optimal probabilities of the genetic operators were obtained and the prediction accuracy performance was analyzed. As a result, it was confirmed that the performance of the proposed method was superior to the existing methods, especially in a sparse data environment.

An Optimization Approach to Data Clustering

  • Kim, Ju-Mi;Olafsson, Sigurdur
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2005.05a
    • /
    • pp.621-628
    • /
    • 2005
  • Scalability of clustering algorithms is critical issues facing the data mining community. This is particularly true for computationally intense tasks such as data clustering. Random sampling of instances is one possible means of achieving scalability but a pervasive problem with this approach is how to deal with the noise that this introduces in the evaluation of the learning algorithm. This paper develops a new optimization based clustering approach using an algorithms specifically designed for noisy performance. Numerical results illustrate that with this algorithm substantial benefits can be achieved in terms of computational time without sacrificing solution quality.

  • PDF

Ranking-based Flow Replacement Method for Highly Scalable SDN (고확장성 SDN을 위한 랭킹 기반 플로우 교체 기법)

  • Tri, Hiep T. Nguyen;Kim, Kyungbaek
    • Annual Conference of KIPS
    • /
    • 2015.04a
    • /
    • pp.143-146
    • /
    • 2015
  • Software Defined Network (SDN) separates control plane and data plane to achieve benefits such as centralized management, centralized provisioning, lower device cost and more flexibility. In SDN, scalability is an important issue. Centralized controller can be a bottle neck and many research tried to solve this issue on the control plan. However, scalability issue does not only happen in the control plane, but also happen in the data plane. In the data plane, flow table is an important component and its size is limited. In a large network operated by SDN technology, the performance of the network can be highly degraded because of the size limitation of a flow table. In this paper, we propose a ranking-based flow replacement method, Flow Table Management (FTM), to overcome this problem.

A Non-Equal Region Split Method for Data-Centric Storage in Sensor Networks (데이타 중심 저장 방식의 센서 네트워크를 위한 비균등 영역 분할 기법)

  • Kang, Hong-Koo;Jeon, Sang-Hun;Hong, Dong-Suk;Han, Ki-Joon
    • Journal of Korea Spatial Information System Society
    • /
    • v.8 no.3
    • /
    • pp.105-115
    • /
    • 2006
  • A sensor network which uses DCS(Data-Centric Storage) stores the same data into the same sensor node. Thus it has a hot spot problem when the sensor network grows and the same data arise frequently. In the past researches of the sensor network using DCS, the hot spot problem caused by growing the sensor network was ignored because they only concentrated on managing stored sensor data efficiently. In this paper, we proposed a non-equal region split method that supports efficient scalability on storing multi-dimensional sensor data. This method can reduce the storing cost, as the sensor network is growing, by dividing whole space into regions which have the same number of sensor nodes according to the distribution of sensor nodes, and storing and managing sensor data within each region. Moreover, this method can distribute the energy consumption of sensor nodes by increasing the number of regions according to the size of the sensor network, the number of sensor nodes within the sensor network, and the quantity of sensor data. Therefore it can help to increase the life time and the scalability of the sensor network.

  • PDF

Ontology BIM-based Knowledge Service Framework Architecture Development (온톨로지 BIM 기반 지식 서비스 프레임웍 아키텍처 개발)

  • Kang, Tae-Wook
    • Journal of KIBIM
    • /
    • v.12 no.4
    • /
    • pp.52-60
    • /
    • 2022
  • Recently, the demand for connection between various heterogeneous dataset and BIM as a construction data model hub is increasing. In the past, in order to connect model between BIM and heterogeneous dataset, related dataset was stored in the RDBMS, and the service was provided by programming a method to link with the BIM object. This approach causes problems such as the need to modify the database schema and business logic, and the migration of existing data when requirements change. This problem adversely affects the scalability, reusability, and maintainability of model information. This study proposes an ontology BIM-based knowledge service framework considering the connectivity and scalability between BIM and heterogeneous dataset. Through the proposed framework, ontology BIM mapping, semantic information query method for linking between knowledge-expressing dataset and BIM are presented. In addition, to identify the effectiveness of the proposed method, the prototype is developed. Also, the effectiveness and considerations of the ontology BIM-based knowledge service framework are derived.

Methods to Enhance Service Scalability Using Service Replication and Migration (서비스 복제 및 이주를 이용한 서비스 확장성 향상 기법)

  • Kim, Ji-Won;Lee, Jae-Yoo;Kim, Soo-Dong
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.7
    • /
    • pp.503-517
    • /
    • 2010
  • Service-oriented computing, the effective paradigm for developing service applications by using reusable services, becomes popular. In service-oriented computing, service consumer has no responsibility for managing services, just invokes services what service providers are producing. On the other hand, service providers should manage any resources and data for service consumers can use the service anytime and anywhere. However, it is hard service providers manage the quality of the services because an unspecified number of service consumers. Therefore, service scalability for providing services with higher quality of services specified in a service level agreement becomes a potential problem in service-oriented computing. There have been many researches for scalability in network, database, and distributed computing area. But a research about a definition of service scalability and metrics of measuring service scalability is still not mature in service engineering area. In this paper, we construct a service network which connects multiple service nodes, and integrate all the resources to manage it. And we also present a service scalability framework for managing service scalability by using a mechanism of service migration or replication. In section 3, we, firstly, present the structure of the scalability management framework and basic functionalities. In section 4, we propose scalability enhancement mechanism which is needed to release functionality of the framework. In section 5, we design and implement the framework by using proposed mechanism. In section 6, we demonstrate the result of our case study which dynamically manages services in multi-nodes environment by applying our framework. Through the case study, we show the applicability of our scalability management framework and mechanism.

An Analytical Traffic Model of Control Plane and Application Plane in Software-Defined Networking based on Queuing Theory (대기행렬 이론 기반 SDN 제어 평면 및 응용 평면의 트래픽 성능 분석 모델)

  • Lee, Seungwoon;Roh, Byeong-hee
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.15 no.4
    • /
    • pp.80-88
    • /
    • 2019
  • Software Defined Networking (SDN) is the future network paradigm of decoupling control and data functions. In SDN structure, it is hard to address scalability in case of large-scale networks because single controller managed thousands of switches in a centralized fashion. Most of previous studies have focused on horizontal scalability, where distributed controllers are assigned to network devices. However, they have abstracted the control plane and the application plane into a single controller. The layer of the common SDN architecture is divided into data plane, control plane, and application plane, but the control plane and application plane have been modeled as a single controller although they are logically separated. In this paper, we propose a analytical traffic model considering the both application plane and control plane based on queuing theory. This model can be used to address scalability issues such as controller placement problem without complicated simulations.

A Study of Multipath Routing based on Software-Defined Networking for Data Center Networking in Cloud Computing Environments (클라우드 컴퓨팅 환경에서 데이터 센터 네트워킹을 위한 소프트웨어 정의 네트워킹 기반 다중 경로 라우팅 연구)

  • Kang, Yong-Hyeog
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.563-564
    • /
    • 2017
  • The core of the cloud computing technology is the data center in that the networking technology is important. Cloud data centers are comprised of tens or even hundreds of thousands of physical servers, so networking technology is required for high-speed data transfer. These networking technologies also require scalability, fault tolerance, and agility. For these requirements, many multi-path based schemes have been proposed. However, it was mainly used for load balancing of traffic and select a path randomly. In this paper, a scheme that can construct a multipath using software defined networking technology and transmit the traffic in parallel by using the multipath to achieve a fast transmission speed, solve the scalability problem and fault tolerance is proposed.

  • PDF

Scalable P2P Botnet Detection with Threshold Setting in Hadoop Framework (하둡 프레임워크에서 한계점 가변으로 확장성이 가능한 P2P 봇넷 탐지 기법)

  • Huseynov, Khalid;Yoo, Paul D.;Kim, Kwangjo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.4
    • /
    • pp.807-816
    • /
    • 2015
  • During the last decade most of coordinated security breaches are performed by the means of botnets, which is a large overlay network of compromised computers being controlled by remote botmaster. Due to high volumes of traffic to be analyzed, the challenge is posed by managing tradeoff between system scalability and accuracy. We propose a novel Hadoop-based P2P botnet detection method solving the problem of scalability and having high accuracy. Moreover, our approach is characterized not to require labeled data and applicable to encrypted traffic as well.

ADMM for least square problems with pairwise-difference penalties for coefficient grouping

  • Park, Soohee;Shin, Seung Jun
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.441-451
    • /
    • 2022
  • In the era of bigdata, scalability is a crucial issue in learning models. Among many others, the Alternating Direction of Multipliers (ADMM, Boyd et al., 2011) algorithm has gained great popularity in solving large-scale problems efficiently. In this article, we propose applying the ADMM algorithm to solve the least square problem penalized by the pairwise-difference penalty, frequently used to identify group structures among coefficients. ADMM algorithm enables us to solve the high-dimensional problem efficiently in a unified fashion and thus allows us to employ several different types of penalty functions such as LASSO, Elastic Net, SCAD, and MCP for the penalized problem. Additionally, the ADMM algorithm naturally extends the algorithm to distributed computation and real-time updates, both desirable when dealing with large amounts of data.