• Title/Summary/Keyword: High-performance processor

Search Result 618, Processing Time 0.026 seconds

Design and Implementation of Initial OpenSHMEM Based on PCI Express (PCI Express 기반 OpenSHMEM 초기 설계 및 구현)

  • Joo, Young-Woong;Choi, Min
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.3
    • /
    • pp.105-112
    • /
    • 2017
  • PCI Express is a bus technology that connects the processor and the peripheral I/O devices that widely used as an industry standard because it has the characteristics of high-speed, low power. In addition, PCI Express is system interconnect technology such as Ethernet and Infiniband used in high-performance computing and computer cluster. PGAS(partitioned global address space) programming model is often used to implement the one-sided RDMA(remote direct memory access) from multi-host systems, such as computer clusters. In this paper, we design and implement a OpenSHMEM API based on PCI Express maintaining the existing features of OpenSHMEM to implement RDMA based on PCI Express. We perform experiment with implemented OpenSHMEM API through a matrix multiplication example from system which PCs connected with NTB(non-transparent bridge) technology of PCI Express. The PCI Express interconnection network is currently very expensive and is not yet widely available to the general public. Nevertheless, we actually implemented and evaluated a PCI Express based interconnection network on the RDK evaluation board. In addition, we have implemented the OpenSHMEM software stack, which is of great interest recently.

A Hybrid Value Predictor using Static and Dynamic Classification in Superscalar Processors (슈퍼스칼라 프로세서에서 정적 및 동적 분류를 사용한 혼합형 결과 값 예측기)

  • 김주익;박홍준;조영일
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.10
    • /
    • pp.569-578
    • /
    • 2003
  • Data dependencies are one of major hurdles to limit ILP(Instruction Level Parallelism), so several related works have suggested that the limit imposed by data dependencies can be overcome to some extent with use of the data value prediction. Hybrid value predictor can obtain the high prediction accuracy using advantages of various predictors, but it has a defect that same instruction has overlapping entries in all predictor. In this paper, we propose a new hybrid value predictor which achieves high performance by using the information of static and dynamic classification. The proposed predictor can enhance the prediction accuracy and efficiently decrease the prediction table size of predictor, because it allocates each instruction into single best-suited predictor during the fetch stage by using the information of static classification. Also, it can enhance the prediction accuracy because it selects a best- suited prediction method for the “Unknown”pattern instructions by using the dynamic classification mechanism. Simulation results based on the SimpleScalar/PISA tool set and the SPECint95 benchmarks show the average correct prediction rate of 85.1% by using the static classification mechanism. Also, we achieve the average correction prediction rate of 87.6% by using static and dynamic classification mechanism.

Reliability Analysis of The Mission-Critical Engagement Control Computer Using Active Sparing Redundancy (ASR 기법을 적용한 임무지향 교전통제 컴퓨터의 신뢰도 분석)

  • Shin, Jin-Beom;Kim, Sang-Ha
    • The KIPS Transactions:PartA
    • /
    • v.15A no.6
    • /
    • pp.309-316
    • /
    • 2008
  • The mission-critical engagement control computer for air defense has to maintain its operation without any fault for a long mission time. The mission performed by large-scale and complex embedded software is extremely critical in terms of dependability and safety of computer system, and it is very important that engagement control computer has high reliability. The engagement control computer was implemented using four processors. The distributed computer composed of four processors quarantees the dependability and safety, and ASR fault-tolerant technique applied to each processor guarantees the reliability. In this paper, the mechanism and performance of ASR fault-tolerant technique are analysed. And MTBF, reliability, availability, and cost-effectiveness for ASR, DMR and TMR techniques applied to the engagement control computer are analysed. The mission-critical engagement control computer using software-based ASR fault-tolerant technique provides high reliability and fast recovery time at a low cost. The mission reliability of the engagement control computer using ASR technique in 4 processors board is almost same the reliability of the computer using TMR technique in 6 processors board. ASR technique is most suitable to the mission-critical engagement control computer.

Development of Industrial Embedded System Platform (산업용 임베디드 시스템 플랫폼 개발)

  • Kim, Dae-Nam;Kim, Kyo-Sun
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.5
    • /
    • pp.50-60
    • /
    • 2010
  • For the last half a century, the personal computer and software industries have been prosperous due to the incessant evolution of computer systems. In the 21st century, the embedded system market has greatly increased as the market shifted to the mobile gadget field. While a lot of multimedia gadgets such as mobile phone, navigation system, PMP, etc. are pouring into the market, most industrial control systems still rely on 8-bit micro-controllers and simple application software techniques. Unfortunately, the technological barrier which requires additional investment and higher quality manpower to overcome, and the business risks which come from the uncertainty of the market growth and the competitiveness of the resulting products have prevented the companies in the industry from taking advantage of such fancy technologies. However, high performance, low-power and low-cost hardware and software platforms will enable their high-technology products to be developed and recognized by potential clients in the future. This paper presents such a platform for industrial embedded systems. The platform was designed based on Telechips TCC8300 multimedia processor which embedded a variety of parallel hardware for the implementation of multimedia functions. And open-source Embedded Linux, TinyX and GTK+ are used for implementation of GUI to minimize technology costs. In order to estimate the expected performance and power consumption, the performance improvement and the power consumption due to each of enabled hardware sub-systems including YUV2RGB frame converter are measured. An analytic model was devised to check the feasibility of a new application and trade off its performance and power consumption. The validity of the model has been confirmed by implementing a real target system. The cost can be further mitigated by using the hardware parts which are being used for mass production products mostly in the cell-phone market.

A Study on the Intelligent Service Selection Reasoning for Enhanced User Satisfaction : Appliance to Cloud Computing Service (사용자 만족도 향상을 위한 지능형 서비스 선정 방안에 관한 연구 : 클라우드 컴퓨팅 서비스에의 적용)

  • Shin, Dong Cheon
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.35-51
    • /
    • 2012
  • Cloud computing is internet-based computing where computing resources are offered over the Internet as scalable and on-demand services. In particular, in case a number of various cloud services emerge in accordance with development of internet and mobile technology, to select and provide services with which service users satisfy is one of the important issues. Most of previous works show the limitation in the degree of user satisfaction because they are based on so called concept similarity in relation to user requirements or are lack of versatility of user preferences. This paper presents cloud service selection reasoning which can be applied to the general cloud service environments including a variety of computing resource services, not limited to web services. In relation to the service environments, there are two kinds of services: atomic service and composite service. An atomic service consists of service attributes which represent the characteristics of service such as functionality, performance, or specification. A composite service can be created by composition of atomic services and other composite services. Therefore, a composite service inherits attributes of component services. On the other hand, the main participants in providing with cloud services are service users, service suppliers, and service operators. Service suppliers can register services autonomously or in accordance with the strategic collaboration with service operators. Service users submit request queries including service name and requirements to the service management system. The service management system consists of a query processor for processing user queries, a registration manager for service registration, and a selection engine for service selection reasoning. In order to enhance the degree of user satisfaction, our reasoning stands on basis of the degree of conformance to user requirements of service attributes in terms of functionality, performance, and specification of service attributes, instead of concept similarity as in ontology-based reasoning. For this we introduce so called a service attribute graph (SAG) which is generated by considering the inclusion relationship among instances of a service attribute from several perspectives like functionality, performance, and specification. Hence, SAG is a directed graph which shows the inclusion relationships among attribute instances. Since the degree of conformance is very close to the inclusion relationship, we can say the acceptability of services depends on the closeness of inclusion relationship among corresponding attribute instances. That is, the high closeness implies the high acceptability because the degree of closeness reflects the degree of conformance among attributes instances. The degree of closeness is proportional to the path length between two vertex in SAG. The shorter path length means more close inclusion relationship than longer path length, which implies the higher degree of conformance. In addition to acceptability, in this paper, other user preferences such as priority for attributes and mandatary options are reflected for the variety of user requirements. Furthermore, to consider various types of attribute like character, number, and boolean also helps to support the variety of user requirements. Finally, according to service value to price cloud services are rated and recommended to users. One of the significances of this paper is the first try to present a graph-based selection reasoning unlike other works, while considering various user preferences in relation with service attributes.

Design and Implementation of a Scalable Real-Time Sensor Node Platform (확장성 및 실시간성을 고려한 실시간 센서 노드 플랫폼의 설계 및 구현)

  • Jung, Kyung-Hoon;Kim, Byoung-Hoon;Lee, Dong-Geon;Kim, Chang-Soo;Tak, Sung-Woo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.8B
    • /
    • pp.509-520
    • /
    • 2007
  • In this paper, we propose a real-time sensor node platform that guarantees the real-time scheduling of periodic and aperiodic tasks through a multitask-based software decomposition technique. Since existing sensor networking operation systems available in literature are not capable of supporting the real-time scheduling of periodic and aperiodic tasks, the preemption of aperiodic task with high priority can block periodic tasks, and so periodic tasks are likely to miss their deadlines. This paper presents a comprehensive evaluation of how to structure periodic or aperiodic task decomposition in real-time sensor-networking platforms as regard to guaranteeing the deadlines of all the periodic tasks and aiming to providing aperiodic tasks with average good response time. A case study based on real system experiments is conducted to illustrate the application and efficiency of the multitask-based dynamic component execution environment in the sensor node equipped with a low-power 8-bit microcontroller, an IEEE802.15.4 compliant 2.4GHz RF transceiver, and several sensors. It shows that our periodic and aperiodic task decomposition technique yields efficient performance in terms of three significant, objective goals: deadline miss ratio of periodic tasks, average response time of aperiodic tasks, and processor utilization of periodic and aperiodic tasks.

Numerical Analysis of Nuclear-Power Plant Subjected to an Aircraft Impact using Parallel Processor (병렬프로세서를 이용한 원전 격납건물의 항공기 충돌해석)

  • Song, Yoo-Seob;Shin, Sang-Shup;Jung, Dong-Ho;Park, Tae-Hyo
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.24 no.6
    • /
    • pp.715-722
    • /
    • 2011
  • In this paper, the behavior of nuclear-power plant subjected to an aircraft impact is performed using the parallel analysis. In the erstwhile study of an aircraft impact to the nuclear-power plant, it has been used that the impact load is applied at the local area by using the impact load-time history function of Riera, and the target structures have been restricted to the simple RC(Reinforced Concrete) walls or RC buildings. However, in this paper, the analysis of an aircraft impact is performed by using a real aircraft model similar to the Boeing 767 and a fictitious nuclear-power plant similar to the real structure, and an aircraft model is verified by comparing the generated history of the aircraft crash against the rigid target with another history by using the Riera's function which is allowable in the impact evaluation guide, NEI07-13(2009). Also, in general, it is required too much time for the hypervelocity impact analysis due to the contact problems between two or more adjacent physical bodies and the high nonlinearity causing dynamic large deformation, so there is a limitation with a single CPU alone to deal with these problems effectively. Therefore, in this paper, Message-Passing MIMD type of parallel analysis is performed by using self-constructed Linux-Cluster system to improve the computational efficiency, and in order to evaluate the parallel performance, the four cases of analysis, i.e. plain concrete, reinforced concrete, reinforced concrete with bonded containment liner plate, steel-plate concrete structure, are performed and discussed.

Direction-Embedded Branch Prediction based on the Analysis of Neural Network (신경망의 분석을 통한 방향 정보를 내포하는 분기 예측 기법)

  • Kwak Jong Wook;Kim Ju-Hwan;Jhon Chu Shik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.1
    • /
    • pp.9-26
    • /
    • 2005
  • In the pursuit of ever higher levels of performance, recent computer systems have made use of deep pipeline, dynamic scheduling and multi-issue superscalar processor technologies. In this situations, branch prediction schemes are an essential part of modem microarchitectures because the penalty for a branch misprediction increases as pipelines deepen and the number of instructions issued per cycle increases. In this paper, we propose a novel branch prediction scheme, direction-gshare(d-gshare), to improve the prediction accuracy. At first, we model a neural network with the components that possibly affect the branch prediction accuracy, and analyze the variation of their weights based on the neural network information. Then, we newly add the component that has a high weight value to an original gshare scheme. We simulate our branch prediction scheme using Simple Scalar, a powerful event-driven simulator, and analyze the simulation results. Our results show that, compared to bimodal, two-level adaptive and gshare predictor, direction-gshare predictor(d-gshare. 3) outperforms, without additional hardware costs, by up to 4.1% and 1.5% in average for the default mont of embedded direction, and 11.8% in maximum and 3.7% in average for the optimal one.

Scheduling Algorithm using DAG Leveling in Optical Grid Environment (옵티컬 그리드 환경에서 DAG 계층화를 통한 스케줄링 알고리즘)

  • Yoon, Wan-Oh;Lim, Hyun-Soo;Song, In-Seong;Kim, Ji-Won;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.4
    • /
    • pp.71-81
    • /
    • 2010
  • In grid system, Task scheduling based on list scheduling models has showed low complexity and high efficiency in fully connected processor set environment. However, earlier schemes did not consider sufficiently the communication cost among tasks and the composition process of lightpath for communication in optical gird environment. In this thesis, we propose LSOG (Leveling Selection in Optical Grid) which sets task priority after forming a hierarchical directed acyclic graph (DAG) that is optimized in optical grid environment. To determine priorities of task assignment in the same level, proposed algorithm executes the task with biggest communication cost between itself and its predecessor. Then, it considers the shortest route for communication between tasks. This process improves communication cost in scheduling process through optimizing link resource usage in optical grid environment. We compared LSOG algorithm with conventional ELSA (Extended List Scheduling Algorithm) and SCP (Scheduled Critical Path) algorithm. We could see the enhancement in overall scheduling performance through increment in CCR value and smoothing network environment.

Design of MRI Spectrometer Using 1 Giga-FLOPS DSP (1-GFLOPS DSP를 이용한 자기공명영상 스펙트로미터 설계)

  • 김휴정;고광혁;이상철;정민영;장경섭;이동훈;이흥규;안창범
    • Investigative Magnetic Resonance Imaging
    • /
    • v.7 no.1
    • /
    • pp.12-21
    • /
    • 2003
  • Purpose : In order to overcome limitations in the existing conventional spectrometer, a new spectrometer with advanced functionalities is designed and implemented. Materials and Methods : We designed a spectrometer using the TMS320C6701 DSP capable of 1 giga floating point operations per second (GFLOPS). The spectrometer can generate continuously varying complicate gradient waveforms by real-time calculation, and select image plane interactively. The designed spectrometer is composed of two parts: one is DSP-based digital control part, and the other is analog part generating gradient and RF waveforms, and performing demodulation of the received RF signal. Each recover board can measure 4 channel FID signals simultaneously for parallel imaging, and provides fast reconstruction using the high speed DSP. Results : The developed spectrometer was installed on a 1.5 Tesla whole body MRI system, and performance was tested by various methods. The accurate phase control required in digital modulation and demodulation was tested, and multi-channel acquisition was examined with phase-array coil imaging. Superior image quality is obtained by the developed spectrometer compared to existing commercial spectrometer especially in the fast spin echo images. Conclusion : Interactive control of the selection planes and real-time generation of gradient waveforms are important functions required for advanced imaging such as spiral scan cardiac imaging. Multi-channel acquisition is also highly demanding for parallel imaging. In this paper a spectrometer having such functionalities is designed and developed using the TMS320C6701 DSP having 1 GFLOPS computational power. Accurate phase control was achieved by the digital modulation and demodulation techniques. Superior image qualities are obtained by the developed spectrometer for various imaging techniques including FSE, GE, and angiography compared to those obtained by the existing commercial spectrometer.

  • PDF