• Title/Summary/Keyword: 데이타 병렬

Search Result 116, Processing Time 0.031 seconds

Accelerated Loarning of Latent Topic Models by Incremental EM Algorithm (점진적 EM 알고리즘에 의한 잠재토픽모델의 학습 속도 향상)

  • Chang, Jeong-Ho;Lee, Jong-Woo;Eom, Jae-Hong
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1045-1055
    • /
    • 2007
  • Latent topic models are statistical models which automatically captures salient patterns or correlation among features underlying a data collection in a probabilistic way. They are gaining an increased popularity as an effective tool in the application of automatic semantic feature extraction from text corpus, multimedia data analysis including image data, and bioinformatics. Among the important issues for the effectiveness in the application of latent topic models to the massive data set is the efficient learning of the model. The paper proposes an accelerated learning technique for PLSA model, one of the popular latent topic models, by an incremental EM algorithm instead of conventional EM algorithm. The incremental EM algorithm can be characterized by the employment of a series of partial E-steps that are performed on the corresponding subsets of the entire data collection, unlike in the conventional EM algorithm where one batch E-step is done for the whole data set. By the replacement of a single batch E-M step with a series of partial E-steps and M-steps, the inference result for the previous data subset can be directly reflected to the next inference process, which can enhance the learning speed for the entire data set. The algorithm is advantageous also in that it is guaranteed to converge to a local maximum solution and can be easily implemented just with slight modification of the existing algorithm based on the conventional EM. We present the basic application of the incremental EM algorithm to the learning of PLSA and empirically evaluate the acceleration performance with several possible data partitioning methods for the practical application. The experimental results on a real-world news data set show that the proposed approach can accomplish a meaningful enhancement of the convergence rate in the learning of latent topic model. Additionally, we present an interesting result which supports a possible synergistic effect of the combination of incremental EM algorithm with parallel computing.

(Task Creation and Allocation for Static Load Balancing in Parallel Spatial Join (병렬 공간 조인 시 정적 부하 균등화를 위한 작업 생성 및 할당 방법)

  • Park, Yun-Phil;Yeom, Keun-Hyuk
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.418-429
    • /
    • 2001
  • Recently, a GIS has been applicable to the most important computer applications such as urban information systems and transportation information systems. These applications require spatial operations for an efficient management of a large volume of data. In particular, a spatial join among basic operations has the property that its response time is increased exponentially according to the number of spatial objects included in the operation. Therefore, it is not proper to the systems demanding the fast response time. To satisfy these requirements, the efficient parallel processing of spatial joins has been required. In this paper, the efficient method for creating and allocating tasks to balance statically the load of each processor in a parallel spatial join is presented. A task graph is developed in which a vertex weight is calculated by the cost model I have proposed. Then, it is partitioned through a graph partitioning algorithm. According to the experiments in CC16 parallel machine, our method made an improvement in the static load balance by decreasing the variance of a task execution time on each processor.

  • PDF

PDOCM : Fast Text Compression on MasPar Machine (PDOCM : MasPar머쉰상의 새로운 압축기법과 빠른 텍스트 축약)

  • Min, Yong-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.1
    • /
    • pp.40-47
    • /
    • 1995
  • Due to rapid progress in data communications, we are able to acquire the information we need with ease. One means of achieving this is a parallel machine such as the MasPar. Although the parallel machine makes it possible to receive/transmit enormous quantities of data, because of the increasing volume of information that must be processed, it is necessary to transmit only a minimal amount of data bits. This paper suggests a new coding method for the parallel machine, which compresses the data by reducing redundancy. Parallel Dynamic Octal Compact Mapping (PDOCM) compresses at least 1 byte per word, compared with other coding techniques, and achieves a 54.188-fold speedup with 64 processors to transmit 10 million characters.

  • PDF

A Dynamic Signature Declustering Method using Signature Difference (요약 차이를 이용한 요약화일 동적 분산 기법)

  • Kang, Hyung-Il;Kang, Seung-Heon;Yoo, Jae-Soo;Im, Byoung-Mo
    • Journal of KIISE:Databases
    • /
    • v.27 no.1
    • /
    • pp.79-89
    • /
    • 2000
  • For processing signature file in parallel, an effective signature file declustering method is needed. The Linear Code Decomposition Method(LCDM) used for the Hamming Filter may give a good performance in some cases, but due to its static property, it fails to evenly decluster signature file when signature are skewed. In addition, it has other problems such as limited scalability and non-determinism. In this paper we propose a new signature file declustering method, called Inner-product method, which overcomes those problems in the LCDM. The Inner-product method declusters signature file dynamically based on the signature difference which is computed by using signature inner product. we show through the simulation experiment that the Inner-product outperforms the LCDM under various data workloads.

  • PDF

A Hybrid Value Predictor using Static and Dynamic Classification in Superscalar Processors (슈퍼스칼라 프로세서에서 정적 및 동적 분류를 사용한 혼합형 결과 값 예측기)

  • 김주익;박홍준;조영일
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.10
    • /
    • pp.569-578
    • /
    • 2003
  • Data dependencies are one of major hurdles to limit ILP(Instruction Level Parallelism), so several related works have suggested that the limit imposed by data dependencies can be overcome to some extent with use of the data value prediction. Hybrid value predictor can obtain the high prediction accuracy using advantages of various predictors, but it has a defect that same instruction has overlapping entries in all predictor. In this paper, we propose a new hybrid value predictor which achieves high performance by using the information of static and dynamic classification. The proposed predictor can enhance the prediction accuracy and efficiently decrease the prediction table size of predictor, because it allocates each instruction into single best-suited predictor during the fetch stage by using the information of static classification. Also, it can enhance the prediction accuracy because it selects a best- suited prediction method for the “Unknown”pattern instructions by using the dynamic classification mechanism. Simulation results based on the SimpleScalar/PISA tool set and the SPECint95 benchmarks show the average correct prediction rate of 85.1% by using the static classification mechanism. Also, we achieve the average correction prediction rate of 87.6% by using static and dynamic classification mechanism.

A Parallel Algorithm for Merging Heaps on MasPar Machine (MasPar 머쉰상의 병렬 힙 병합 알고리즘)

  • Min, Yong-Sik
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.4
    • /
    • pp.554-560
    • /
    • 1995
  • In this paper, we suggest a parallel algorithm to merge priority queues organized in two heaps, kheap and nheap of sizes k and n, correspondingly. Employing max(2$^{-1}$, $\ulcorner$(m+1)/4$\lrcorner$'s processors, this algorithm requires O(log(n/k)*log(n)) on an EREW-PRAM, where i is the height of the heap and m is the summation of sizes n and k. Also, when we run it on the MasPar machine, this method achieves a 33.934-fold speedup with 64 processors to merge 8 million data items which consist of two heaps of different sizes. So our parallel algorithm's EPU is close to 1, which is considered as an optimal speedup ratio.eedup ratio.

  • PDF

A Study on the Performance Analysis of an Extended Scan Path Architecture (확장된 스캔 경로 구조의 성능 평가에 관한 연구)

  • 손우정
    • Journal of the Korea Society of Computer and Information
    • /
    • v.3 no.2
    • /
    • pp.105-112
    • /
    • 1998
  • In this paper, we propose a ESP(Extended Scan Path) architecture for multi-board testing. The conventional architectures for board testing are single scan path and multi-scan path. In the single scan path architecture, the scan path for test data is just one chain. If the scan path is faulty due to short or open, the test data is not valid. In the multi-scan path architecture, there are additional signals in multi-board testing. So conventional architectures are not adopted to multi-board testing. In the case of the ESP architecture, even though scan path is either short or open, it doesn't affect remaining other scan paths. As a result of executing parallel BIST and IEEE 1149.1 boundary scan test by using the proposed ESP architecture, we observed that the test time is short compared with the single scan path architecture. By comparing the ESP architecture with single scan path responding to independency of scan path, test time and with multi-scan path responding to signal, synchronization, we showed that the architecture has improved results.

  • PDF

Design and implementation of a Shared-Concurrent File System in distributed UNIX environment (분산 UNIX 환경에서 Shared-Concurrent File System의 설계 및 구현)

  • Jang, Si-Ung;Jeong, Gi-Dong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.3
    • /
    • pp.617-630
    • /
    • 1996
  • In this paper, a shared-concurrent file system (S-CFS) is designed and implemented using conventional disks as disk arrays on a Workstation Cluster which can be used as a small-scale server. Since it is implemented on UNIX operating systems, S_CFS is not only portable and flexible but also efficient in resource usage because it does not require additional I/O nodes. The result of the research shows that on small-scale systems with enough disks, the performance of the concurrent file system on transaction processing applications is bounded by the bottleneck of CPUs computing powers while the performance of the concurrent file system on massive data I/Os is bounded by the time required to copy data between buffers. The concurrent file system,which has been implemented on a Workstation Cluster with 8 disks,shows a throughput of 388 tps in case of transaction processing applications and can provide the bandwidth of 15.8 Mbytes/sec in case of massive data processing applications. Moreover,the concurrent file system has been dsigned to enhance the throughput of applications requirring high performance I/O by controlling the paralleism of the concurrent file system on user's side.

  • PDF

Design and Performance Analysis of a Parallel Cell-Based Filtering Scheme using Horizontally-Partitioned Technique (수평 분할 방식을 이용한 병렬 셀-기반 필터링 기법의 설계 및 성능 평가)

  • Chang, Jae-Woo;Kim, Young-Chang
    • The KIPS Transactions:PartD
    • /
    • v.10D no.3
    • /
    • pp.459-470
    • /
    • 2003
  • It is required to research on high-dimensional index structures for efficiently retrieving high-dimensional data because an attribute vector in data warehousing and a feature vector in multimedia database have a characteristic of high-dimensional data. For this, many high-dimensional index structures have been proposed, but they have so called ‘dimensional curse’ problem that retrieval performance is extremely decreased as the dimensionality is increased. To solve the problem, the cell-based filtering (CBF) scheme has been proposed. But the CBF scheme show a linear decreasing on performance as the dimensionality. To cope with the problem, it is necessary to make use of parallel processing techniques. In this paper, we propose a parallel CBF scheme which uses a horizontally-partitioned technique as declustering. In order to maximize the retrieval performance of the proposed parallel CBF scheme, we construct our parallel CBF scheme under a SN (Shared Nothing) cluster architecture. In addition, we present a data insertion algorithm, a rage query processing one, and a k-NN query processing one which are suitable for the SN cluster architecture. Finally, we show that our parallel CBF scheme achieves better retrieval performance in proportion to the number of servers in the SN cluster architecture, compared with the conventional CBF scheme.

Optimizing Skyline Query Processing Algorithms on CUDA Framework (CUDA 프레임워크 상에서 스카이라인 질의처리 알고리즘 최적화)

  • Min, Jun;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.275-284
    • /
    • 2010
  • GPUs are stream processors based on multi-cores, which can process large data with a high speed and a large memory bandwidth. Furthermore, GPUs are less expensive than multi-core CPUs. Recently, usage of GPUs in general purpose computing has been wide spread. The CUDA architecture from Nvidia is one of efforts to help developers use GPUs in their application domains. In this paper, we propose techniques to parallelize a skyline algorithm which uses a simple nested loop structure. In order to employ the CUDA programming model, we apply our optimization techniques to make our skyline algorithm fit into the performance restrictions of the CUDA architecture. According to our experimental results, we improve the original skyline algorithm by 80% with our optimization techniques.