• Title/Summary/Keyword: 대용량 메모리 데이타 처리

Search Result 15, Processing Time 0.025 seconds

Large-Memory Data Processing on a Remote Memory System using Commodity Hardware (대용량 메모리 데이타 처리를 위한 범용 하드웨어 기반의 원격 메모리 시스템)

  • Jung, Hyung-Soo;Han, Hyuck;Yeom, Heon-Y.
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.9
    • /
    • pp.445-458
    • /
    • 2007
  • This article presents a novel infrastructure for large-memory database processing using commodity hardware with operating system support. We exploit inexpensive PCs and a high-speed network capable of Remote Direct Memory Access (RDMA) operations to build a new memory hierarchy between fast volatile memory and slow disk storage. The new memory hierarchy guarantees a reasonable response time, and its storage size enables us to run large-memory database systems with little performance degradation. The proposed architecture has two main components: (1) a remote memory system inside the Linux kernel to manage other computers' memory pages efficiently and (2) a remote memory pager responsible for manipulating remote read/write operations on remote memory pages. We insist that the proposed architecture is practical enough to support the rigorous demands of commercial in-memory database systems by demonstrating the performance of publicly available main-memory databases (e.g., MySQL) on our prototyped system. The experimental results show very interesting results from the TPC-C benchmark.

A Cell-based Clustering Method for Large High-dimensional Data in Data Mining (데이타마이닝에서 고차원 대용량 데이타를 위한 셀-기반 클러스터 링 방법)

  • Jin, Du-Seok;Chang, Jae-Woo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.558-567
    • /
    • 2001
  • Recently, data mining applications require a large amount of high-dimensional data Most algorithms for data mining applications however, do not work efficiently of high-dimensional large data because of the so-called curse of dimensionality[1] and the limitation of available memory. To overcome these problems, this paper proposes a new cell-based clustering which is more efficient than the existing algorithms for high-dimensional large data, Our clustering method provides a cell construction algorithm for dealing with high-dimensional large data and a index structure based of filtering .We do performance comparison of our cell-based clustering method with the CLIQUE method in terms of clustering time, precision, and retrieval time. Finally, the results from our experiment show that our cell-based clustering method outperform the CLIQUE method.

  • PDF

Implementation of GALIS SLDS prototype for managing large volumes of location data (대용량 위치 데이타 관리를 위한 GALIS의 SLDS 프로토타입 구현)

  • 이운주;이준우;나연묵
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10b
    • /
    • pp.46-48
    • /
    • 2004
  • 최근의 위치 측위 기술과 무선 통신 기술의 발전에 따라 위치 기반 서비스에 대한 관심이 크게 증가하고 있다. 기존 연구의 단일 노드 기반 시스템으로는 휴대폰 사용자와 같은 대용량의 객체를 처리하는데 어려움이 있다. 본 논문에서는 대용량 이동 객체의 시공간 정보를 관리하기 위해 클러스터 기반 분산 컴퓨팅 구조로 제안된 GALIS(Gracefully Aging Location Information System)의 아키텍쳐 중 객체의 현재 위치 정보를 관리하는 SLDS(Short-term Location Data Subsystem)의 프로토 타입을 개발하였다. 본 논문에서 구현한 시스템은 메인 메모리 데이터 베이스를 사용하여 디스크 접근 시간이 없고 현재 정보와 과거 정보를 분리하여 빠른 검색이 가능하기 때문에 대용량 이동 객체를 관리하며 빠른 응답을 필요로 하는 상황에 효과적으로 대응할 수 있는 이점이 있다.

  • PDF

An Efficient Grid Cell Based Spatial Clustering Algorithm for Spatial Data Mining (공간데이타 마이닝을 위한 효율적인 그리드 셀 기반 공간 클러스터링 알고리즘)

  • Moon, Sang-Ho;Lee, Dong-Gyu;Seo, Young-Duck
    • The KIPS Transactions:PartD
    • /
    • v.10D no.4
    • /
    • pp.567-576
    • /
    • 2003
  • Spatial data mining, i.e., discovery of interesting characteristics and patterns that may implicitly exists in spatial databases, is a challenging task due to the huge amounts of spatial data. Clustering algorithms are attractive for the task of class identification in spatial databases. Several methods for spatial clustering have been presented in recent years, but have the following several drawbacks increase costs due to computing distance among objects and process only memory-resident data. In this paper, we propose an efficient grid cell based spatial clustering method for spatial data mining. It focuses on resolving disadvantages of existing clustering algorithms. In details, it aims to reduce cost further for good efficiency on large databases. To do this, we devise a spatial clustering algorithm based on grid ceil structures including cell relationships.

A Two-level Indexing Method in Flash Memory Environment (플래시 메모리 환경을 위한 이단계 인덱싱 방법)

  • Kim, Jong-Dae;Chang, Ji-Woong;Hwang, Kyu-Jeong;Kim, Sang-Wook
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.7
    • /
    • pp.713-717
    • /
    • 2008
  • Recently, as the capacity of flash memory increases rapidly, efficient indexing methods become crucial for fast searching of a large volume of data stored in flash memory. Flash memory has its unique characteristics: the write operation is much more costly than the read operation and in-place updating is not allowed. In this paper, we propose a novel index structure that significantly reduces the number of write operations and thus supports efficient searches, insertions, and deletions. We verify the superiority of our method by performing extensive experiments.

Video Index Generation and Search using Trie Structure (Trie 구조를 이용한 비디오 인덱스 생성 및 검색)

  • 현기호;김정엽;박상현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.610-617
    • /
    • 2003
  • Similarity matching in video database is of growing importance in many new applications such as video clustering and digital video libraries. In order to provide efficient access to relevant data in large databases, there have been many research efforts in video indexing with diverse spatial and temporal features. however, most of the previous works relied on sequential matching methods or memory-based inverted file techniques, thus making them unsuitable for a large volume of video databases. In order to resolve this problem, this paper proposes an effective and scalable indexing technique using a trie, originally proposed for string matching, as an index structure. For building an index, we convert each frame into a symbol sequence using a window order heuristic and build a disk-resident trie from a set of symbol sequences. For query processing, we perform a depth-first search on the trie and execute a temporal segmentation. To verify the superiority of our approach, we perform several experiments with real and synthetic data sets. The results reveal that our approach consistently outperforms the sequential scan method, and the performance gain is maintained even with a large volume of video databases.

Iceberg Query Processing by Materialized View (저장뷰를 통한 빙산 질의 처리)

  • Hong, Seok-Jin;Lee, Seok-Ho
    • Journal of KIISE:Databases
    • /
    • v.27 no.4
    • /
    • pp.663-670
    • /
    • 2000
  • 빙산 질의란 대용량의 데이터에 대해 집단 함수를 수행하여 특정 임계값 이상인 데이터를 결과로 반환하는 연산을 의미한다. 빙산 질의는 도메인의 크기가 대단히 큰 다차원, 대용량의 데이터에 대해 적용되므로 집단 함수의 수행을 위한 카운터를 전부 메모리에 적재할 수 없는 상황이 발생한다. 이 논문에서는 빙산 질의에 대한 저장뷰를 통해 효율적으로 빙산질의를 수행하는 방법을 제시하였다. 빙산 질의의 임계값이 저장뷰 내에 포함되는 경우에는 즉각적으로 결과를 돌려줄 수 있으며, 그렇지 않음 경우에도 표본추출 대신 저장뷰를 사용함으로써 빙산 질의 중간 단계의 후보 수를 크게 감소시키고, 질의 수행 시간 또한 단축시킬 수 있다. 또한 순위 빙산 질의를 수행하는 방법을 제시하여 사용자로 하여금 보다 직관적인 질의를 작성할 수 있도록 하였다.

  • PDF

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.

Compact Field Remapping for Dynamically Allocated Structures (동적으로 할당된 구조체를 위한 압축된 필드 재배치)

  • Kim, Jeong-Eun;Han, Hwan-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.10
    • /
    • pp.1003-1012
    • /
    • 2005
  • The most significant difference of embedded systems from general purpose systems is that embedded systems are allowed to use only limited resources including battery and memory. Especially, the number of applications increases which deal with multimedia data. In those systems with high data computations, the delay of memory access is one of the major bottlenecks hurting the system performance. As a result, many researchers have investigated various techniques to reduce the memory access cost. Most programs generally have locality in memory references. Temporal locality of references means that a resource accessed at one point will be used again in the near future. Spatial locality of references is that likelihood of using a resource gets higher if resources near it were just accessed. The latest embedded processors usually adapt cache memory to exploit these two types of localities. Processors access faster cache memory than off-chip memory, reducing the latency. In this paper we will propose the enhanced dynamic allocation technique for structure-type data in order to eliminate unused memory space and to reduce both the cache miss rate and the application execution time. The proposed approach aggregates fields from multiple records dynamically allocated and consecutively remaps them on the memory space. Experiments on Olden benchmarks show $13.9\%$ L1 cache miss rate drop and $15.9\%$ L2 cache miss drop on average, compared to the previously proposed techniques. We also find execution time reduced by $10.9\%$ on average, compared to the previous work.

AFTL: An Efficient Adaptive Flash Translation Layer using Hot Data Identifier for NAND Flash Memory (AFTL: Hot Data 검출기를 이용한 적응형 플래시 전환 계층)

  • Yun, Hyun-Sik;Joo, Young-Do;Lee, Dong-Ho
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.35 no.1
    • /
    • pp.18-29
    • /
    • 2008
  • NAND Flash memory has been growing popular storage device for the last years because of its low power consumption, fast access speed, shock resistance and light weight properties. However, it has the distinct characteristics such as erase-before-write architecture, asymmetric read/write/erase speed, and the limitation on the number of erasure per block. Due to these limitations, various Flash Translation Layers (FTLs) have been proposed to effectively use NAND flash memory. The systems that adopted the conventional FTL may result in severe performance degradation by the hot data which are frequently requested data for overwrite in the same logical address. In this paper, we propose a novel FTL algorithm called Adaptive Flash Translation Layer (AFTL) which uses sector mapping method for hot data and log-based block mapping method for cold data. Our system removes the redundant write operations and the erase operations by the separating hot data from cold data. Moreover, the read performance is enhanced according to sector translation that tends to use a few read operations. A series of experiments was organized to inspect the performance of the proposed method, and they show very impressive results.