• Title/Summary/Keyword: 대용량계산

Search Result 341, Processing Time 0.028 seconds

An Efficient Grid Cell Based Spatial Clustering Algorithm for Spatial Data Mining (공간데이타 마이닝을 위한 효율적인 그리드 셀 기반 공간 클러스터링 알고리즘)

  • Moon, Sang-Ho;Lee, Dong-Gyu;Seo, Young-Duck
    • The KIPS Transactions:PartD
    • /
    • v.10D no.4
    • /
    • pp.567-576
    • /
    • 2003
  • Spatial data mining, i.e., discovery of interesting characteristics and patterns that may implicitly exists in spatial databases, is a challenging task due to the huge amounts of spatial data. Clustering algorithms are attractive for the task of class identification in spatial databases. Several methods for spatial clustering have been presented in recent years, but have the following several drawbacks increase costs due to computing distance among objects and process only memory-resident data. In this paper, we propose an efficient grid cell based spatial clustering method for spatial data mining. It focuses on resolving disadvantages of existing clustering algorithms. In details, it aims to reduce cost further for good efficiency on large databases. To do this, we devise a spatial clustering algorithm based on grid ceil structures including cell relationships.

Spark based Scalable RDFS Ontology Reasoning over Big Triples with Confidence Values (신뢰값 기반 대용량 트리플 처리를 위한 스파크 환경에서의 RDFS 온톨로지 추론)

  • Park, Hyun-Kyu;Lee, Wan-Gon;Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.87-95
    • /
    • 2016
  • Recently, due to the development of the Internet and electronic devices, there has been an enormous increase in the amount of available knowledge and information. As this growth has proceeded, studies on large-scale ontological reasoning have been actively carried out. In general, a machine learning program or knowledge engineer measures and provides a degree of confidence for each triple in a large ontology. Yet, the collected ontology data contains specific uncertainty and reasoning such data can cause vagueness in reasoning results. In order to solve the uncertainty issue, we propose an RDFS reasoning approach that utilizes confidence values indicating degrees of uncertainty in the collected data. Unlike conventional reasoning approaches that have not taken into account data uncertainty, by using the in-memory based cluster computing framework Spark, our approach computes confidence values in the data inferred through RDFS-based reasoning by applying methods for uncertainty estimating. As a result, the computed confidence values represent the uncertainty in the inferred data. To evaluate our approach, ontology reasoning was carried out over the LUBM standard benchmark data set with addition arbitrary confidence values to ontology triples. Experimental results indicated that the proposed system is capable of running over the largest data set LUBM3000 in 1179 seconds inferring 350K triples.

A Study on the Effects of Intermediate Data on the Performance of the MapReduce Framework (맵리듀스 프레임워크의 중간 데이터가 성능에 미치는 영향에 관한 연구)

  • Kim, Shin-gyu;Eom, Hyeonsang;Yeom, Heon Y.
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2012.04a
    • /
    • pp.130-133
    • /
    • 2012
  • 맵리듀스 프레임워크는 개발의 편의성, 높은 확장성, 결함 내성 기능을 제공하며 다양한 대용량 데이터 처리에 사용되고 있다. 또한, 최근의 데이터의 폭발적 증가는 높은 확장성을 제공하는 맵리듀스 프레임워크의 도입의 필요성을 더욱 증가시키고 있다. 이 경우 하나의 단일 클러스터에서 처리할 수 있는 계산 용량을 넘어설 수 있으며, 이를 위하여 클라우드 컴퓨팅 서비스 등에서 계산자원을 빌려오게 된다. 하지만 현재의 맵리듀스 프레임워크는 단일 클러스터 환경을 가정하고 설계되었기에 여러 개의 클러스터로 이루어진 환경에서 수행시킬 경우 전체 계산자원의 이용률이 떨어져서 투입된 자원에 비해 전체적인 성능이 낮아지는 경우가 발생하게 된다. 본 연구에서는 이의 원인이 맵과 리듀스 단계 사이에 존재하는 중간결과의 전송에 있음을 밝히고, 이의 전체 맵리듀스 프레임웍의 성능에 미치는 영향에 대하여 분석해보았다.

Novel Kernel Design for Implementing Volume Rendering in the PyCUDA Framework (PyCUDA 프레임워크에서 볼륨 렌더링을 구현하기 위한 새로운 커널 디자인)

  • Lee, SooHo;Kim, Jong-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.01a
    • /
    • pp.349-351
    • /
    • 2022
  • 본 논문에서는 계산양이 큰 볼륨 렌더링을 구현할 수 있는 파이썬 기반의 CUDA(Computed Unified Device Architecture) 커널(Kernel) 디자인에 대해서 소개한다. 최근에 파이썬은 인공지능뿐만 아니라 서버, 보안, GUI, 데이터 시각화, 빅 데이터 처리 등 다양한 분야에서 활용이 되고 있기 때문에 인터페이스만을 위한 언어라는 색을 탈피한지 오래이다. 본 논문에서는 대용량 병렬처리 기법인 NVIDIA의 CUDA를 이용하여 파이썬 환경에서 커널을 디자인하고, 계산양이 큰 볼륨 렌더링이 빠르게 계산되는 결과를 보여준다. 결과적으로 C언어 기반의 CUDA뿐만 아니라, 상대적으로 개발이 효율적인 파이썬 환경에서도 GPU(Graphic Processing Unit)기반 애플리케이션 개발이 가능하다는 것을 볼륨 렌더링을 통해 보여준다.

  • PDF

Performance Improvement of Web Information Retrieval Using Sentence-Query Similarity (문장-질의 유사성을 이용한 웹 정보 검색의 성능 향상)

  • Park Eui-Kyu;Ra Dong-Yul;Jang Myung-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.406-415
    • /
    • 2005
  • Prosperity of Internet led to the web containing huge number of documents. Thus increasing importance is given to the web information retrieval technology that can provide users with documents that contain the right information they want. This paper proposes several techniques that are effective for the improvement of web information retrieval. Similarity between a document and the query is a major source of information exploited by conventional systems. However, we suggest a technique to make use of similarity between a sentence and the query. We introduce a technique to compute the approximate score of the sentence-query similarity even without a mature technology of natural language processing. It was shown that the amount of computation for this task is linear to the number of documents in the total collection, which implies that practical systems can make use of this technique. The next important technique proposed in this paper is to use stratification of documents in re-ranking the documents to output. It was shown that it can lead to significant improvement in performance. We furthermore showed that using hyper links, anchor texts, and titles can result in enhancement of performance. To justify the proposed techniques we developed a large scale web information retrieval system and used it for experiments.

Dependency relation analysis and mutual information technique for ASR rescoring (음성인식 리스코링을 위한 의존관계분석과 상호정보량 접근방법의 비교)

  • Chung, Euisok;Jeon, Hyung-Bae;Park, Jeon-Gue
    • Annual Conference on Human and Language Technology
    • /
    • 2014.10a
    • /
    • pp.164-166
    • /
    • 2014
  • 음성인식 결과는 다수의 후보를 생성할 수 있다. 해당 후보들은 각각 음향모델 값과 언어모델 값을 결합한 형태의 통합 정보를 갖고 있다. 여기서 언어모델 값을 다시 계산하여 성능을 향상하는 접근 방법이 일반적인 음성인식 성능개선 방법 중 하나이며 n-gram 기반 리스코링 접근 방법이 사용되어 왔다. 본 논문은 적절한 성능 개선을 위하여, 대용량 n-gram 모델의 활용 문제점을 고려한 문장 구성 어휘의 의존 관계 분석 접근 방법 및 일정 거리 어휘쌍들의 상호정보량 값을 이용한 접근 방법을 검토한다.

  • PDF

Design of Memory-Efficient Octree to Query Large 3D Point Cloud (대용량 3차원 포인트 클라우드의 탐색을 위한 메모리 효율적인 옥트리의 설계)

  • Han, Soohee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.1
    • /
    • pp.41-48
    • /
    • 2013
  • The aim of the present study is to design a memory-efficient octree for querying large 3D point cloud. The aim has been fulfilled by omitting variables for minimum bounding hexahedral (MBH) of each octree node expressed in C++ language and by passing the re-estimated MBH from parent nodes to child nodes. More efficiency has been reported by two-fold processes of generating pseudo and regular trees to declare an array for all anticipated nodes, instead of using new operator to declare each child node. Experiments were conducted by constructing tree structures and querying neighbor points out of real point cloud composed of more than 18 million points. Compared with conventional methods using MBH information defined in each node, the suggested methods have proved themselves, in spite of existing trade-off between speed and memory efficiency, to be more memory-efficient than the comparative ones and to be practical alternatives applicable to large 3D point cloud.

Structural Change Detection Technique for RDF Data in MapReduce (맵리듀스에서의 구조적 RDF 데이터 변경 탐지 기법)

  • Lee, Taewhi;Im, Dong-Hyuk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.8
    • /
    • pp.293-298
    • /
    • 2014
  • Detecting and understanding the changes between RDF data is crucial in the evolutionary process, synchronization system, and versioning system on the web of data. However, current researches on detecting changes still remain unsatisfactory in that they did neither consider the large scale of RDF data nor accurately produce the RDF deltas. In this paper, we propose a scalable and effective change detection using a MapReduce framework which has been used in many fields to process and analyze large volumes of data. In particular, we focus on the structure-based change detection that adopts a strategy for the comparison of blank nodes in RDF data. To achieve this, we employ a method which is composed of two MapReduce jobs. First job partitions the triples with blank nodes by grouping each triple with the same blank node ID and then computes the incoming path to the blank node. Second job partitions the triples with the same path and matchs blank nodes with the Hungarian method. In experiments, we show that our approach is more accurate and effective than the previous approach.

Design and Implementation of a Mobile Runtime Library for Execution of Large-scale Application (대용량 소프트웨어 실행을 위한 모바일 런타임 라이브러리 설계 및 구현)

  • Lee, Ye-In;Lee, Jong-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.1
    • /
    • pp.1-9
    • /
    • 2010
  • Today's growth of the mobile communication infrastructure made mobile computing systems like cellular phones came next to or surpassed the desktop PCs in popularity due to their mobility. Although the performance of mobile devices is now being improved continuously, it is a current common sense that compute intensive large-scale applications can hardly run on any kind of mobile handset devices. To clear up this problem, we decided to exploit the mobile cluster computing system and surveyed the existing ones first. We found out, however, that most of them are not the actual implementations but a mobile cluster infrastructure proposal or idea suggestions for reliable mobile clustering. To make cell phones participated in cluster computing nodes, in this paper, we propose a redesigned JPVM cluster computing engine and a set of WIPI mobile runtime functions interfacing with it. And we also show the performance evaluation results of real parallel applications running on our Mobile-JPVM cluster computing systems. We find out by the performance evaluation that large-scale applications can sufficiently run on mobile devices such as cellular phones when using our mobile cluster computing engine.

A Study of designing Parallel File System for Massive Information Processing (대규모 정보처리를 위한 병렬 화일시스템 설계에 관한 연구)

  • Jang, Si-Ung;Jeong, Gi-Dong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.5
    • /
    • pp.1221-1230
    • /
    • 1997
  • In this study, the performance of a parallel file system(N-PFS), which is inplemented using conventional disks as disk arrays on a Workstation Cluster, is analyzed by using analytical method and adtual values in experiments.N-PFS can be used as high-performance file sever in small-scale server systems and effciently pro-cess massive data I/Os such as multimedia and scientifid data. In this paper, an analytical model was suggested and the correctness of the suggested was verified by analyzing the experimental values on a system.The result of the appropriate stping unit for processing massive data of the Workstation Cluster with 8 disks is 64-128Kbytes and the maximum throughput on it is 15.8 Mbytes/ses.In addition, the performance of parallel file system on massive data is bounded by the time required to copy data between buffers.

  • PDF