• Title/Summary/Keyword: 데이타 병렬

Search Result 116, Processing Time 0.024 seconds

A Hybrid Value Predictor using Speculative Update of the Predictor Table and Static Classification for the Pattern of Executed Instructions in Superscalar Processors (슈퍼스칼라 프로세서에서 예상 테이블의 모험적 갱신과 명령어 실행 유형의 정적 분류를 이용한 혼합형 결과값 예측기)

  • Park, Hong-Jun;Jo, Young-Il
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.1
    • /
    • pp.107-115
    • /
    • 2002
  • We propose a new hybrid value predictor which achieves high performance by combining several predictors. Because the proposed hybrid value predictor can update the prediction table speculatively, it efficiently reduces the number of mispredicted instructions due to stale data. Also, the proposed predictor can enhance the prediction accuracy and efficiently decrease the hardware cost of predictor, because it allocates instructions into the best-suited predictor during instruction fetch stage by using the information of static classification which is obtained from the profile-based compiler implementation. For the 16-issue superscalar processors, simulation results based on the SimpleScalar/PISA tool set show that we achieve the average prediction rates of 73% by using speculative update and the average prediction rates of 88% by adding static classification for the SPECint95 benchmark programs.

Efficient Processing of Grouped Aggregation on Non-Uniformed Memory Access Architecture (비균등 메모리 접근 구조에서의 효율적인 그룹화 집단 연산의 처리)

  • Choe, Seongjun;Min, Jun-Ki
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.14-27
    • /
    • 2018
  • Recently, to alleviate the memory bottleneck problme occurred in Symmetric Multiprocessing (SMP) architecture, Non-Uniform Memory Access (NUMA) architecture was proposed. In addition, since an aggregation operator is an important operator providing properties and summary of data, the efficiency of the aggregation operator is crucial to overall performance of a system. Thus, in this paper, we propose an efficient aggregation processing technique on NUMA architecture. Our proposed technique consists of partition phase and merge phase. In the partition phase, the target relation is partitioned into several partial relations according to grouping attribute. Thus, since each thread can process aggregation operator on partial relation independently, we prevent the remote memory access during the merge phase. Furthermore, at the merge phase, we improve the performance of the aggregation processing by letting each thread compute aggregation with a local hash table as well as avoiding lock contention to merge aggregation results generated by all threads into one.

Workflow Pattern Extraction based on ACTA Formalism (ACTA 형식론에 기반한 워크플로우 패턴추출)

  • Lee Wookey;Bae Joonsoo;Jung Jae-yoon
    • Journal of KIISE:Databases
    • /
    • v.32 no.6
    • /
    • pp.603-615
    • /
    • 2005
  • As recent business environments are changed and become complex, a more efficient and effective business process management are needed. This paper proposes a new approach to the automatic execution of business processes using Event-Condition-Action (ECA) rules that can be automatically triggered by an active database. First of all, we propose the concept of blocks that can classify process flows into several patterns. A block is a minimal unit that can specify the behaviors represented in a process model. An algorithm is developed to detect blocks from a process definition network and transform it into a hierarchical tree model. The behaviors in each block type are modeled using ACTA formalism. This provides a theoretical basis from which ECA rules are identified. The proposed ECA rule-based approach shows that it is possible to execute the workflow using the active capability of database without users' intervention.

A Fresh Look on Workflow and Workflow Management System (워크플로우 및 워크플로우 관리 시스템의 새로운 조망)

  • Han, Dong-Soo;Shim, Jae-Yong
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.395-405
    • /
    • 2001
  • In this paper, we analyze workflow management system in programming language aspects. Many workflow characteristics such as workflow relevant data, workflow control structures, and workflow application invocations are studied and compared with those of distributed parallel programs. Although there exist minor differences between them, we found that there exist surprisingly many analogies between them. Based on this observation, we suggest to view workflow management system as distributed parallel program development platform. This new view on workflow management system provides users consistent view on workflow and workflow management system and with this view workflow management system designer can cope with arbitrary requests from the users keeping design consistency. Moreover the analogy between workflow and program provides a basis to apply program analysis techniques to the analysis of workflow.

  • PDF

Enhanced NOW-Sort on a PC Cluster with a Low-Speed Network (저속 네트웍 PC 클러스터상에서 NOW-Sort의 성능향상)

  • Kim, Ji-Hyoung;Kim, Dong-Seung
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.10
    • /
    • pp.550-560
    • /
    • 2002
  • External sort on cluster computers requires not only fast internal sorting computation but also careful scheduling of disk input and output and interprocessor communication through networks. This is because the overall time for the execution is determined by reflecting the times for all the jobs involved, and the portion for interprocessor communication and disk I/O operations is significant. In this paper, we improve the sorting performance (sorting throughput) on a cluster of PCs with a low-speed network by developing a new algorithm that enables even distribution of load among processors, and optimizes the disk read and write operations with other computation/communication activities during the sort. Experimental results support the effectiveness of the algorithm. We observe the algorithm reduces the sort time by 45% compared to the previous NOW-sort[1], and provides more scalability in the expansion of the computing nodes of the cluster as well.

A Software VIA based PC Cluster System on SCI Network (SCI 네트워크 상의 소프트웨어 VIA기반 PC글러스터 시스템)

  • Shin, Jeong-Hee;Chung, Sang-Hwa;Park, Se-Jin
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.4
    • /
    • pp.192-200
    • /
    • 2002
  • The performance of a PC cluster system is limited by the use of traditional communication protocols, such as TCP/IP because these protocols are accompanied with significant software overheads. To overcome the problem, systems based on user-level interface for message passing without intervention of kernel have been developed. The VIA(Virtual Interface Architecture) is one of the representative user-level interfaces which provide low latency and high bandwidth. In this paper, a VIA system is implemented on an SCI(Scalable Coherent Interface) network based PC cluster. The system provides both message-passing and shared-memory programming environments and shows the maximum bandwidth of 84MB/s and the latency of $8{\mu}s$. The system also shows better performance in comparison with other comparable computer systems in carrying out parallel benchmark programs.

Simulation of YUV-Aware Instructions for High-Performance, Low-Power Embedded Video Processors (고성능, 저전력 임베디드 비디오 프로세서를 위한 YUV 인식 명령어의 시뮬레이션)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.5
    • /
    • pp.252-259
    • /
    • 2007
  • With the rapid development of multimedia applications and wireless communication networks, consumer demand for video-over-wireless capability on mobile computing systems is growing rapidly. In this regard, this paper introduces YUV-aware instructions that enhance the performance and efficiency in the processing of color image and video. Traditional multimedia extensions (e.g., MMX, SSE, VIS, and AltiVec) depend solely on generic subword parallelism whereas the proposed YUV-aware instructions support parallel operations on two-packed 16-bit YUV (6-bit Y, 5-bits U, V) values in a 32-bit datapath architecture, providing greater concurrency and efficiency for color image and video processing. Moreover, the ability to reduce data format size reduces system cost. Experiment results on a representative dynamically scheduled embedded superscalar processor show that YUV-aware instructions achieve an average speedup of 3.9x over the baseline superscalar performance. This is in contrast to MMX (a representative Intel#s multimedia extension), which achieves a speedup of only 2.1x over the same baseline superscalar processor. In addition, YUV-aware instructions outperform MMX instructions in energy reduction (75.8% reduction with YUV-aware instructions, but only 54.8% reduction with MMX instructions over the baseline).

Efficient Multiple Joins using the Synchronization of Page Execution Time in Limited Processors Environments (한정된 프로세서 환경에서 체이지 실행시간 동기화를 이용한 효율적인 다중 결합)

  • Lee, Kyu-Ock;Weon, Young-Sun;Hong, Man-Pyo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.732-741
    • /
    • 2001
  • In the relational database systems the join operation is one of the most time-consuming query operations. Many parallel join algorithms have been developed 개 reduce the execution time Multiple hash join algorithm using allocation tree is one of the most efficient ones. However, it may have some delay on the processing each node of allocation tree, which is occurred in tuple-probing phase by the difference between one page reading time of outer relation and the processing time of already read one. This delay problem was solved by using the concept of synchronization of page execution time with we had proposed In this paper the effects of the performance improvements in each node of the allocation tree are extended to the whole allocation tree and the performance evaluation about that is processed. In addition we propose an efficient algorithm for multiple hash joins in limited number of processor environments according to the relationship between the number of input relations in the allocation tree and the number of processors allocated to the tree. Finally. we analyze the performance by building the analytical cost model and verify the validity of it by various performance comparison with previous method.

  • PDF

Robust, Low Delay Multi-tree Speech Coding at 9.6Kbits/sec (견실, 저지연 멀티트리 9.6Kbits/s 음성부호기에 관한 연구)

  • 우홍체;문병현;이채욱
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.3
    • /
    • pp.348-354
    • /
    • 1993
  • In this research, a multi-tree coder at 9.6Kbits/sec using a novel scheme for adaptation of the short-term coefficients is developed. The overall delay of the tree coder is maintained at 2.5 msec(16 samples at the 6.4KHz sampling frequency). This coder produces good quality speech over ideal channels, and it is very robust to channel errors up to a bit error rate (BER) of $10^{-3}$. This robustness is achieved by using a parallel adaptation scheme in combination with the use of a smoothed version of the received excitation sequence for adaptation of the short-term prediction coefficients. For the multi-tree coder, reconstructed output speech is evaluated using signal-to-quantization noise ratios (SNR), segmental SNRs, and informal listening tests.

  • PDF

Implementation of the Squared-Error Pattern Clustering Processor Using the Residue Number System (剩餘數體系를 이용한 자승오차 패턴 클러스터링 프로세서의 실현)

  • Kim, Hyeong-Min;Cho, Won-Kyung
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.26 no.2
    • /
    • pp.87-93
    • /
    • 1989
  • Squared-error Pattern Clustering algorithm used in unsupervised pattern recognition and image processing application demands substantial processing time for operation of feature vector matrix. So, this paper propose the fast squared-error Pattern Clustering Processor using the Residue Number System which have been the nature of parallel processing and pipeline. The proposed Squared-error Pattern Clustering Processor illustrate satisfiable error rate for Cluster number which can be divide meaningful region and about 200 times faster than 80287 coprocessor from experiments result of image segmentation. In this result, it is useful to real-time processing application for large data.

  • PDF