• Title/Summary/Keyword: parallelism control

Search Result 67, Processing Time 0.022 seconds

SW-HW Co-design of a High-performance Dehazing System Using OpenCL-based High-level Synthesis Technique (OpenCL 기반의 상위 수준 합성 기술을 이용한 고성능 안개 제거 시스템의 소프트웨어-하드웨어 통합 설계)

  • Park, Yongmin;Kim, Minsang;Kim, Byung-O;Kim, Tae-Hwan
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.54 no.8
    • /
    • pp.45-52
    • /
    • 2017
  • This paper presents a high-performance software-hardware dehazing system based on a dedicated hardware accelerator for the haze removal. In the proposed system, the dedicated hardware accelerator performs the dark-channel-prior-based dehazing process, and the software performs the other control processes. For this purpose, the dehazing process is realized as an OpenCL kernel by finding the inherent parallelism in the algorithm and is synthesized into a hardware by employing a high-level-synthesis technique. The proposed system executes the dehazing process much faster than the previous software-only dehazing system: the performance improvement is up to 96.3% in terms of the execution time.

Implementation of a Scoreboard Array and a Port Arbiter for In-order SMT Processors (순차적 SMT Processor를 위한 Scoreboard Array와 포트 중재 모듈의 구현)

  • Heo, Chang-Yong;Hong, In-Pyo;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.6
    • /
    • pp.59-70
    • /
    • 2004
  • SMT(Simultaneous Multi Threading) architecture uses TLP(Thread Level Parallelism) and increases processor throughput, such that issue slots can be filled with instructions from multiple independent threads. Having multiple ready threads reduces the probability that a functional unit is left idle, which increases processor efficiency. To utilize those advantages for the SMT processors, the issue unit must control the flow of instructions from different threads and not create conflicts among those instructions, which make the SMT issue logic extremely complex. Therefore, our SMT architecture, which is modeled in this paper, uses an in-order-issue and completion scheme, and therefore, can use a simple issue mechanism with a scoreboard already instead of using register renaming or a reorder buffer. However, an SMT scoreboarding mechanism is still more complex and costlier than that of a single threaded conventional processor. This paper proposes an optimal implementation of a scoreboarding mechanism for an ARM-based SMT architecture.

A VLSI Architecture of Systolic Array for FET Computation (고속 퓨리어 변환 연산용 VLSI 시스토릭 어레이 아키텍춰)

  • 신경욱;최병윤;이문기
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.9
    • /
    • pp.1115-1124
    • /
    • 1988
  • A two-dimensional systolic array for fast Fourier transform, which has a regular and recursive VLSI architecture is presented. The array is constructed with identical processing elements (PE) in mesh type, and due to its modularity, it can be expanded to an arbitrary size. A processing element consists of two data routing units, a butterfly arithmetic unit and a simple control unit. The array computes FFT through three procedures` I/O pipelining, data shuffling and butterfly arithmetic. By utilizing parallelism, pipelining and local communication geometry during data movement, the two-dimensional systolic array eliminates global and irregular commutation problems, which have been a limiting factor in VLSI implementation of FFT processor. The systolic array executes a half butterfly arithmetic based on a distributed arithmetic that can carry out multiplication with only adders. Also, the systolic array provides 100% PE activity, i.e., none of the PEs are idle at any time. A chip for half butterfly arithmetic, which consists of two BLC adders and registers, has been fabricated using a 3-um single metal P-well CMOS technology. With the half butterfly arithmetic execution time of about 500 ns which has been obtained b critical path delay simulation, totla FFT execution time for 1024 points is estimated about 16.6 us at clock frequency of 20MHz. A one-PE chip expnsible to anly size of array is being fabricated using a 2-um, double metal, P-well CMOS process. The chip was layouted using standard cell library and macrocell of BLC adder with the aid of auto-routing software. It consists of around 6000 transistors and 68 I/O pads on 3.4x2.8mm\ulcornerarea. A built-i self-testing circuit, BILBO (Built-In Logic Block Observation), was employed at the expense of 3% hardware overhead.

  • PDF

An Example of Changed Design through the Face Mapping and Slope Analysis (절토사면 현황도 작성 및 분석에 따른 설계변경 사례연구)

  • Lee, Byung-Joo;Chae, Byung-Gon;Lee, Kyoung-Mi
    • The Journal of Engineering Geology
    • /
    • v.24 no.1
    • /
    • pp.137-146
    • /
    • 2014
  • The geology of the study area which is located in Samkoe-dong, Dong-gu, Daejeon city comprises black slate, limestone, and pebble-bearing phyllitic rock as meta-sedimentary rocks; and biotite granite and quartz porphyry intrusions. Face mapping revealed sliding in three or four sites of contained coaly slate, where the dip of the foliation and other discontinuities is parallel to the surface slope. The cause of the slope sliding is this parallelism as well as the swelling of the coaly slate when wet. In contrast, the slop on the opposite side of the road is relatively stable because the dip of the foliation and other discontinuities are oblique or normal to the surface slope. To ensure slope stability, a cut-and-cover tunnel was designed and constructed for the new road.

Peak Power Minimization for Clustered VLIW Architectures (분산된 VLIW 구조에서의 최대 전력 최소화 방법)

  • 서재원;김태환;정기석
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.5_6
    • /
    • pp.258-264
    • /
    • 2003
  • VLIW architecture has emerged as one of the most effective architectures in dealing with multimedia applications. In multimedia applications, there is ample potential for parallelizing the execution of multiple operations because such applications typically have data intensive processing which often has limited data and/or control dependencies. As the degree of instruction-level parallelism increases, non-clustered VLIW architectures scale poorly because of the tremendous register port pressure. Therefore, clustered VLIW architecture is definitely preferred over non-clustered VLIW architecture when a higher degree of parallelizing is possible as in the case of multimedia processing However, having multiple clusters in an architecture implies that the amount of hardware is quite large, and therefore, power consumption becomes a very crucial issue. In this paper, we propose an algorithm to minimize the peak power consumption without incurring little or no delay penalty. The effectiveness of our algorithm has been verified by various sets of experiments, and up to 30.7% reduction in the peak power consumption is observed compared with the results that is optimized to minimize resources only.

Analyzing Fine-Grained Resource Utilization for Efficient GPU Workload Allocation (GPU 작업 배치의 효율화를 위한 자원 이용률 상세 분석)

  • Park, Yunjoo;Shin, Donghee;Cho, Kyungwoon;Bahn, Hyokyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.1
    • /
    • pp.111-116
    • /
    • 2019
  • Recently, GPU expands application domains from graphic processing to various kinds of parallel workloads. However, current GPU systems focus on the maximization of each workload's parallelism through simplified control rather than considering various workload characteristics. This paper classifies the resource usage characteristics of GPU workloads into computing-bound, memory-bound, and dependency-latency-bound, and quantifies the fine-grained bottleneck for efficient workload allocation. For example, we identify the exact bottleneck resources such as single function unit, double function unit, or special function unit even for the same computing-bound workloads. Our analysis implies that workloads can be allocated together if fine-grained bottleneck resources are different even for the same computing-bound workloads, which can eventually contribute to efficient workload allocation in GPU.

The Effects of Exercise Education Programs in Mentally-Handicapped Children (정신지체아의 운동교육 Program 적용효과)

  • Kim Sang-Su;Cheon Jae-Kyun
    • The Journal of Korean Physical Therapy
    • /
    • v.6 no.1
    • /
    • pp.23-35
    • /
    • 1994
  • In this research, the mentally-handicapped children being able to be trained were investigated the actual condition of train function making 40 mentally-handicapped children in Kummi Hyedang Spacial Education School, to exmine the effect of physical exercise function when training the mentally-handicapped children who can be trained as applying exercise education programs, and trains for 10 weeks by assigning to both experiment group and control group according to children who are similar to training functions from pre-examination. The results are in following: First, the results of test in the exercise ability of mentally-handicapped children with the degree of being able to trained are very delayed in comparison with normal mental children through the both top and bottom examination. The developments of 5 exercise functions classified by domain, have the order of eyesight exercise, softness, physical strength, quickness, parallelism, the interaction of both eye and hand, and, have the exercise function being equal to the level of between 6 and 12 years old. In 13 bottom test, throwing bean-bag is equal to the nomal 12 years old boy. the board jump, sitting position / bending forward / closing are equal to the level of 12 years old boy. standing with only leg is the level of 9 years old, threading pearls is 7 years old, transfering the wood building, picking the upper body up, walking board, balancing one leg with opened, eye, fist / opening palm / palm, bending and opening arm with postrating on chair, are the exercise functions of below 6 years old. Second, there are great effects in carrying out the exercise education program to the mentally-handicapped children with the level of being able to trained. In experiment group, it is elevated to the middle level of 12 years old nomal children. Classified by domain of test the board jump, training, the bean-bag are far higher level than 12 years old normal children, and are elevated the level of 11 years old boy. Balancing only leg with closed eye is below the level of 10 years old boys, fist / opening palm / palm are the level of 9 years old boys. There and back running, picking the upper body is the level of 9 years old girls. Walking board is the level of 8 years old boys. Bending and opening arm with postrating on chair is the level of 7 rears old boy. Balancing one leg with opened eye is elevated to the level of f years old girls. These functions have the more balanced exercise function rather than pre-examination. In control group, they have little change by classified the bottom test, but have the exercise function on the time of pre-examination, go backward in physical strength. quickness. Third, the exercise function being learned by exercise education program on the mentally-handicapped children of the level with being able to train is appeared to maintain continuately. Softness, physical strength, quickness, eyesight training are maintained the learned exercise function, the interaction of both eye and hands, parallelism are delayed a little. Classified by the bottom test. threading pearls, transfering the wood building, throwing the bean-bag, sitting position / rolling forward / reaching, the broad jump and picking upper body up, there and bark running, picking upper body up, balancing with only leg as opened eye, bending and opening arm with postrating on chair, etc. are maintained. Fist / opening palm / palm, balancing with only leg as opened eye are delayed a little. The change of body position is elevated. Seeing these results, it is appeared to the mentally-handicapped children that the exercise education programs, which is suitable their actual condition and acomplishes in voluntary participation, have very positive effect. So, to develop the function of body exercise in mentally-handicapped children with the level of being to able to be trained, the measures must be groped so that the exercise education programs can be practiced positively, and the ,body exercise can be experienced more.

  • PDF