• Title/Summary/Keyword: 병렬성능

Search Result 1,947, Processing Time 0.025 seconds

Implementation of Hardware Data Prefetcher Adaptable for Various State-of-the-Art Workload (다양한 최신 워크로드에 적용 가능한 하드웨어 데이터 프리페처 구현)

  • Kim, KangHee;Park, TaeShin;Song, KyungHwan;Yoon, DongSung;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.12
    • /
    • pp.20-35
    • /
    • 2016
  • In this paper, in order to reduce the delay and area of the partial product accumulation (PPA) of the parallel decimal multiplier, a tree architecture that composed by multi-operand decimal CSAs and improved CLA is proposed. The proposed tree using multi-operand CSAs reduces the partial product quickly. Since the input range of the recoder of CSA is limited, CSA can get the simplest logic. In addition, using the multi-operand decimal CSAs to add decimal numbers that have limited range in specific locations of the specific architecture can reduce the partial products efficiently. Also, final BCD result can be received faster by improving the logic of the decimal CLA. In order to evaluate the performance of the proposed partial product accumulation, synthesis is implemented by using Design Complier with 180 nm COMS technology library. Synthesis results show the delay of the proposed partial product accumulation is reduced by 15.6% and area is reduced by 16.2% comparing with which uses general method. Also, the total delay and area are still reduced despite the delay and area of the CLA are increased.

PreSPI: Design and Implementation of Protein-Protein Interaction Prediction Service System (PreSPI:단백질 상호작용 예측 서비스 시스템 설계 및 구현)

  • Kim, Hong-Soog;Jang, Woo-Hyuk;Lee, Sung-Doke;Han, Dong-Soo
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2004.11a
    • /
    • pp.86-100
    • /
    • 2004
  • 계산을 통한 단백질 상호작용 예측 기법의 중요성이 제기되면서 많은 단백질 상호 작용 예측 기법이 제안되고 있다. 하지만 이러한 기법들이 일반 사용자가 손쉽게 사용할 수 있는 서비스 형태로 제공되고 있는 경우는 드물다. 본 논문에서는 현재까지 알려진 단백질 상호작용 예측 기법 중 예측 기법의 완성도가 높고 상대적으로 예측 정확도가 높은 것으로 알려진 도메인 조합 기반 단백질 상호 작용 예측 기법을 PreSPI(Prediction System for Protein Interaction)라는 서비스 시스템으로 설계하고 구현하였다. 구현된 시스템이 제공하는 기능은 크게 도메인 조합 기반 단백질 상호 작용 예측 기법을 서비스 형태로 만들어 제공하는 기능으로 입력 단백질 쌍에 대한 상호작용 예측이 중심이 된 핵심기능과, 핵심 기능으로부터 파생되는 기능인 부가 기능, 그리고 주어진 단백질에 대한 도메인 정보검색 기능과 같이 단백질 상호작용에 관하여 연구하는 연구자에게 도움이 되는 일반적인 기능으로 구성되어 있다. 계산을 통해 단백질 상호 작용을 예측하는 시스템은 대규모계산이 요구되는 경우가 많아 좋은 성능을 갖추는 것이 중요하다. 본 논문에서 구현된 PreSPI 시스템은 서비스에 따라 적절히 그 처리를 병렬화 함으로써 시스템의 성능 향상을 도모하였고, PreSPI 가 제공하는 기능을 웹 서비스 API 로 Deploy 하여 시스템의 개방성을 지원하고 있다. 또한 인터넷 환경에서 변화되는 단백질 상호 작용 및 도메인에 관한 정보를 유연하게 반영할 수 있도록 시스템을 계층 구조로 설계하였다. 본 논문에서는 PreSPI 가 제공하는 몇 가지 대표적인 서비스에 관하여 사용자 인터페이스를 중심으로 상술함으로써 초기 PreSPI 사용자가 PreSPI 가 제공하는 서비스를 이해하고 사용하는 데에도 도움이 되도록 하였다.있어서 자각증상, 타각소견(他覺所見)과 함께 이상(異常)은 확인되지 않았으며 부작용도 없었다. 이상의 결과로부터, ‘펩타이드 음료’는 경증고혈압 혹은 경계역고혈압자(境界域高血壓者)의 혈압을, 자각증상 및 혈액${\cdot}$뇨검사에도 전혀 영향을 미치지 않고 저하시킨다고 결론지었다.이병엽을 염색하여 흰가루 병균의 균사생장과 포자형성 등을 관찰한 결과 균사가 용균되는 것을 볼 수 있었으며, 균사의 용균정도와 분생포자형성 억제 정도는 병 방제효과와 일치하는 경향을 보였다.을 의미한다. IV형은 가장 후기에 포획된 유체포유물이며, 광산 주변에 분포하는 석회암체 등의 변성퇴적암류로부터 $CO_{2}$ 성분과 다양한 성분의 유체가 공급되어 생성된 것으로 여겨진다. 정동이 발달하고 있지 않으며, 백운모를 함유하고 있는 대유페그마타이트는 변성작용에 의한 부분용융에 의해 형성된 멜트에서 결정화되었으며, 상당히 높은 압력의 환경에서 대유페그마타이트의 결정화작용 과정에서 용리한 유체의 성분이 전기석에 포획되어 있다. 이때 용리된 유체는 다양한 성분을 지니고 있었으며, 매우 낮은 공융온도와 다양한 딸결정은 포유물 내에 NaCl, KCl 이외에 적어도 $CaCl_{2},\;MgCl_{2}$와 같은 성분을 포함하고 있음을 지시한다. 유체의 용리는 적어도 $2.7{\sim}5.3$ kbar 이상의 압력과 $230{\sim}328^{\circ}C$ 이상의 온도에서 시작되었다.없었다. 결론적으로 일부 한방제와 생약제제는 육계에서 항생제를 대체하여 사용이 가능하며 특히 혈액의 성분에 유의한 영향을 미치는 것으로 사료된다. 실증연구가 필요할 것으로 사료된다.trip과 Sof-Lex disc로 얻어진 표면은 레진전색제의 사용으로 표면조도의 개선

  • PDF

Adaptive OFDM System Employing a New SNR Estimation Method (새로운 SNR 추정방법을 이용한 적응 OFDM 시스템)

  • Kim Myung-Ik;Ahn Sang-Sik
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.43 no.3 s.345
    • /
    • pp.59-67
    • /
    • 2006
  • OFDM (Orthogonal frequency Division Multiplexing) systems convert serial data stream to N parallel data streams and modulate them to N orthogonal subcarriers. Thus spectrum utilization efficiency of the OFDM systems are high and high-speed data transmission is possible. However, with the OFDM systems using the same modulation method at all subcarriers, the error probability is dominated by the subcarriers which experience deep fades. Therefore, in order to enhance the performance of the system adaptive modulation is required, with which the modulation methods of the subcarriers are determined according to the estimated SNRs. The IEEE 802.11a system selects various transmission speed between 6 and 54 Mbps according to the modulation mode. There are three typical methods for SNR estimation: Direct estimation method uses the frequency domain symbols to estimate SNR directly by minimizing MSE (Mean Square Error), EVM method utilizes the distance between the demodulated constellation points and received complex values, and the method utilizing the Viterbi algorithm uses the cumulative minimum distance in decoding process to estimate the SNR indirectly. Through comparison analyses of three methods we propose a new SNR estimation method, which employs both the EVM method and the Viterbi algorithm. Finally, we perform extensive computer simulations to confirm the performance improvement of the proposed adaptive OFDM systems on the basis of IEEE 802.11a.

A 32${\times}$32-b Multiplier Using a New Method to Reduce a Compression Level of Partial Products (부분곱 압축단을 줄인 32${\times}$32 비트 곱셈기)

  • 홍상민;김병민;정인호;조태원
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.6
    • /
    • pp.447-458
    • /
    • 2003
  • A high speed multiplier is essential basic building block for digital signal processors today. Typically iterative algorithms in Signal processing applications are realized which need a large number of multiply, add and accumulate operations. This paper describes a macro block of a parallel structured multiplier which has adopted a 32$\times$32-b regularly structured tree (RST). To improve the speed of the tree part, modified partial product generation method has been devised at architecture level. This reduces the 4 levels of compression stage to 3 levels, and propagation delay in Wallace tree structure by utilizing 4-2 compressor as well. Furthermore, this enables tree part to be combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, combined with four modular block to construct a CSA tree (carry save adder tree). Therefore, multiplier architecture can be regularly laid out with same modules composed of Booth selectors, compressors and Modified Partial Product Generators (MPPG). At the circuit level new Booth selector with less transistors and encoder are proposed. The reduction in the number of transistors in Booth selector has a greater impact on the total transistor count. The transistor count of designed selector is 9 using PTL(Pass Transistor Logic). This reduces the transistor count by 50% as compared with that of the conventional one. The designed multiplier in 0.25${\mu}{\textrm}{m}$ technology, 2.5V, 1-poly and 5-metal CMOS process is simulated by Hspice and Epic. Delay is 4.2㎱ and average power consumes 1.81㎽/MHz. This result is far better than conventional multiplier with equal or better than the best one published.

A Study on Improvement of the Human Posture Estimation Method for Performing Robots (공연로봇을 위한 인간자세 추정방법 개선에 관한 연구)

  • Park, Cheonyu;Park, Jaehun;Han, Jeakweon
    • Journal of Broadcast Engineering
    • /
    • v.25 no.5
    • /
    • pp.750-757
    • /
    • 2020
  • One of the basic tasks for robots to interact with humans is to quickly and accurately grasp human behavior. Therefore, it is necessary to increase the accuracy of human pose recognition when the robot is estimating the human pose and to recognize it as quickly as possible. However, when the human pose is estimated using deep learning, which is a representative method of artificial intelligence technology, recognition accuracy and speed are not satisfied at the same time. Therefore, it is common to select one of a top-down method that has high inference accuracy or a bottom-up method that has high processing speed. In this paper, we propose two methods that complement the disadvantages while including both the advantages of the two methods mentioned above. The first is to perform parallel inference on the server using multi GPU, and the second is to mix bottom-up and One-class Classification. As a result of the experiment, both of the methods presented in this paper showed improvement in speed. If these two methods are applied to the entertainment robot, it is expected that a highly reliable interaction with the audience can be performed.

Fast Image Pre-processing Algorithms Using SSE Instructions (SSE 명령어를 이용한 영상의 고속 전처리 알고리즘)

  • Park, Eun-Soo;Cui, Xuenan;Kim, Jun-Chul;Im, Yu-Cheong;Kim, Hak-Il
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.2
    • /
    • pp.65-77
    • /
    • 2009
  • This paper proposes fast image processing algorithms using SSE (Streaming SIMD Extensions) instructions. The CPU's supporting SSE instructions have 128bit XMM registers; data included in these registers are processed at the same time with the SIMD (Single Instruction Multiple Data) mode. This paper develops new SIMD image processing algorithms for Mean filter, Sobel horizontal edge detector, and Morphological erosion operation which are most widely used in automated optical inspection systems and compares their processing times. In order to objectively evaluate the processing time, the developed algorithms are compared with OpenCV 1.0 operated in SISD (Single Instruction Single Data) mode, Intel's IPP 5.2 and MIL 8.0 which are fast image processing libraries supporting SIMD mode. The experimental result shows that the proposed algorithms on average are 8 times faster than the SISD mode image processing library and 1.4 times faster than the SIMD fast image processing libraries. The proposed algorithms demonstrate their applicability to practical image processing systems at high speed without commercial image processing libraries or additional hardwares.

Improvement of Address Pointer Assignment in DSP Code Generation (DSP용 코드 생성에서 주소 포인터 할당 성능 향상 기법)

  • Lee, Hee-Jin;Lee, Jong-Yeol
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.1
    • /
    • pp.37-47
    • /
    • 2008
  • Exploitation of address generation units which are typically provided in DSPs plays an important role in DSP code generation since that perform fast address computation in parallel to the central data path. Offset assignment is optimization of memory layout for program variables by taking advantage of the capabilities of address generation units, consists of memory layout generation and address pointer assignment steps. In this paper, we propose an effective address pointer assignment method to minimize the number of address calculation instructions in DSP code generation. The proposed approach reduces the time complexity of a conventional address pointer assignment algorithm with fixed memory layouts by using minimum cost-nodes breaking. In order to contract memory size and processing time, we employ a powerful pruning technique. Moreover our proposed approach improves the initial solution iteratively by changing the memory layout for each iteration because the memory layout affects the result of the address pointer assignment algorithm. We applied the proposed approach to about 3,000 sequences of the OffsetStone benchmarks to demonstrate the effectiveness of the our approach. Experimental results with benchmarks show an average improvement of 25.9% in the address codes over previous works.

Efficient Association Rule Mining based SON Algorithm for a Bigdata Platform (빅데이터 플랫폼을 위한 SON알고리즘 기반의 효과적인 연관 룰 마이닝)

  • Nguyen, Giang-Truong;Nguyen, Van-Quyet;Nguyen, Sinh-Ngoc;Kim, Kyungbaek
    • Journal of Digital Contents Society
    • /
    • v.18 no.8
    • /
    • pp.1593-1601
    • /
    • 2017
  • In a big data platform, association rule mining applications could bring some benefits. For instance, in a agricultural big data platform, the association rule mining application could recommend specific products for farmers to grow, which could increase income. The key process of the association rule mining is the frequent itemsets mining, which finds sets of products accompanying together frequently. Former researches about this issue, e.g. Apriori, are not satisfying enough because huge possible sets can cause memory to be overloaded. In order to deal with it, SON algorithm has been proposed, which divides the considered set into many smaller ones and handles them sequently. But in a single machine, SON algorithm cause heavy time consuming. In this paper, we present a method to find association rules in our Hadoop based big data platform, by parallelling SON algorithm. The entire process of association rule mining including pre-processing, SON algorithm based frequent itemset mining, and association rule finding is implemented on Hadoop based big data platform. Through the experiment with real dataset, it is conformed that the proposed method outperforms a brute force method.

A Study on the Digital Filter Design using Software for Analysis of Observation Data in Radio Astronomy (전파천문 관측데이터 분석을 위해 소프트웨어를 이용한 디지털필터 설계에 관한 연구)

  • Yeom, Jae-Hwan;Oh, Se-Jin;Roh, Duk-Gyoo;Oh, Chung-Sik;Jung, Dong-Kyu;Shin, Jae-Sik;Kim, Hyo-Ryoung;Hwang, Ju-Yeon
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.4
    • /
    • pp.175-181
    • /
    • 2015
  • In this paper, we propose a design method for a digital filter using software in order to analyze the radio astronomy observation data. Recently the analysis method for radio astronomy observing system is transferring from hardware to software by developing of state-of-the-art of computer system. The existing hardware system is not able to easily change the specification because it is implemented to meet special requirements and it takes a high cost and time. In case of software, however, it has an advantage to implement with small cost if open software is used, and flexibly changes to satisfy the desired specification. But, in order to analyze the massive data like radio astronomy with software, the good performance system is needed for computer. Therefore, this paper proposes a digital filter design method using software with the same performance as that of digital filter implemented with hardware in observation system which is operated by the KVN(Korean VLBI Network). To design a digital filter, the proposed method is performed with standard C language and the simulation is conducted with GNU(GNU's Not Unix) Octave and investigated to show its effectiveness. In addition, for the high speed operation of the designed digital filter, the SSE(Streaming SIMD Extensions) library is adopted for available parallel operation. By the proposed digital filter, the digital filtering is performed for the wide band observation data in the KVN observation mode, the filtering result of narrow band observation has no ripple inside of stop band, and confirmed the effectiveness of the proposed method.

Efficient Multiple Joins using the Synchronization of Page Execution Time in Limited Processors Environments (한정된 프로세서 환경에서 체이지 실행시간 동기화를 이용한 효율적인 다중 결합)

  • Lee, Kyu-Ock;Weon, Young-Sun;Hong, Man-Pyo
    • Journal of KIISE:Databases
    • /
    • v.28 no.4
    • /
    • pp.732-741
    • /
    • 2001
  • In the relational database systems the join operation is one of the most time-consuming query operations. Many parallel join algorithms have been developed 개 reduce the execution time Multiple hash join algorithm using allocation tree is one of the most efficient ones. However, it may have some delay on the processing each node of allocation tree, which is occurred in tuple-probing phase by the difference between one page reading time of outer relation and the processing time of already read one. This delay problem was solved by using the concept of synchronization of page execution time with we had proposed In this paper the effects of the performance improvements in each node of the allocation tree are extended to the whole allocation tree and the performance evaluation about that is processed. In addition we propose an efficient algorithm for multiple hash joins in limited number of processor environments according to the relationship between the number of input relations in the allocation tree and the number of processors allocated to the tree. Finally. we analyze the performance by building the analytical cost model and verify the validity of it by various performance comparison with previous method.

  • PDF