Search | Korea Science

Performance and Scalability of OpenMP Programs on Chip-MultiThreading Server (칩 멀티쓰레딩 서버에서 OpenMP 프로그램의 성능과 확장성)

Lee Myung-Ho;Kim Yong-Kyu
- The KIPS Transactions:PartA
- /
- v.13A no.2 s.99
- /
- pp.137-146
- /
- 2006
Shared Memory Multiprocessor (SMP) systems adopting Chip-level MultiThreading (CMT) technology are becoming mainstream servers in commercial applications and High Performance Computining (HPC) applications as well. OpenMP has become the standard paradigm to parallelize applications for SMP mostly because of its ease of use. As the demand for more computing power in HPC applications is growing rapidly, obtaining high performance and scalability for these applications parallelized using OpenMP API's will become more important. In this paper, we study the performance and scalability of HPC applications parallelized using OpenMP, SPEC OMPL (standard OpenMP benchmark suite), on the Sun Fire E25K server which adopts CMT technology. We also study the effect of CMT on SPEC OMPL.
https://doi.org/10.3745/KIPSTA.2006.13A.2.137 인용 PDF KSCI

Design and implementation of the synchronization circuit for OFDM system without synchronization preambles (동기 프리엠블이 없는 OFDM 시스템의 동기회로 설계 및 구현)

남우춘;한영열
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.22 no.5
- /
- pp.1045-1057
- /
- 1997
In this paper, we propose an algorithm of block synchronization that uses data withoug synchronization preambles. Block synchronization systems is implemented using the DSP chip employing the proposed algorithm. The data spread of the DFT blocks is proportional to the offset of DFT block and this information is used to achieve the block synchronization in the receiver. The initial bleock synchronization and the clock synchronization between transmitter and receiver are achieved using the early-late removal of the guard interval. The hardware implmentation is carried out using the DSP chip TM320C30 to verify the proposed block synchronization algorithm with the data rate 1200bps. The DSP chip calculates the spread of the 128 complex FFT in the receiver with the system clock 30MHz. It is believed that the proposed synchronization algorithm can be used in the design of OFDM block synchronization with the high processing DSP chip.
PDF

Determination of Flow Stress and Cutting Force Prediction of Ti-6Al-4V Material for 3D Printer using S-K Constitutive Equation (S-K 구성방정식을 이용한 프린터용 3D Ti-6Al-4V 재료의 유동응력 결정 및 절삭력 예측)

Park, Dae-Gyoun;Kim, Tae-Ho;Jeon, Eon-Chan
- Journal of the Korean Society of Manufacturing Process Engineers
- /
- v.17 no.6
- /
- pp.68-74
- /
- 2018
Study on the Ti-6Al-4V have been carried out using cutting simulation, and researches for cutting force and chip shape prediction have been actively conducted under various conditions. However, a 3D printer application method using Ti-6Al-4V metal powder material as a high-power method has been studied for the purpose of prototyping, mold modification and product modification while lowering material removal rate. However, in the case of products / parts made of 3D printers using powder materials, problems may occur in the contact surface during tolerance management and assembly due to the degradation of the surface quality. As a result, even if a 3D printer is applied, post-processing through cutting is essential for surface quality improvement and tolerance management. In the cutting simulation, the cutting force and the chip shape were predicted based on the Johnson-Cook composition equation, but the shape of the shear type chip was not predictable. To solve this problem, we added a damaging term or strain softening term to the Johnson-Cook constitutive equation to predict chip shape. In this thesis, we applied the constant value of the S-K equations to the cutting simulation to predict the cutting force and compare with the experimental data to verify the validity of the cutting simulation and analyzed the machining characterization by considering conditions.
https://doi.org/10.14775/ksmpe.2018.17.6.068 인용 PDF KSCI

An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units

Lee, Sang-Pil;Kim, Deok-Ho;Yi, Jae-Young;Ro, Won-Woo
- Journal of Information Processing Systems
- /
- v.8 no.1
- /
- pp.159-174
- /
- 2012
This paper presents a study on a high-performance design for a block cipher algorithm implemented on modern many-core graphics processing units (GPUs). The recent emergence of VLSI technology makes it feasible to fabricate multiple processing cores on a single chip and enables general-purpose computation on a GPU (GPGPU). The GPU strategy offers significant performance improvements for all-purpose computation and can be used to support a broad variety of applications, including cryptography. We have proposed an efficient implementation of the encryption/decryption operations of a block cipher algorithm, SEED, on off-the-shelf NVIDIA many-core graphics processors. In a thorough experiment, we achieved high performance that is capable of supporting a high network speed of up to 9.5 Gbps on an NVIDIA GTX285 system (which has 240 processing cores). Our implementation provides up to 4.75 times higher performance in terms of encoding and decoding throughput as compared to the Intel 8-core system.
https://doi.org/10.3745/JIPS.2012.8.1.159 인용 PDF KSCI

HW/SW Co-design For an Ultrasonic Signal Processing System Using Zynq SoC (Zynq SoC를 이용한 초음파 신호처리 시스템 HW/SW co-design)

Lim, Byung gyu;Kang, Moon Ho
- Journal of the Institute of Electronics and Information Engineers
- /
- v.51 no.8
- /
- pp.148-155
- /
- 2014
In this research a signal processing system is designed for detecting the ultrasonic signal envelope using Xilinx's Zynq SoC(system on chip). As a design tool, Vivado IDE(integrated design environment) is used to hierarchically design the whole signal processing system. The proposed system consists of a Zynq-internal ADC, an FIR(finite impulse response) BPF(band pass filter), an absolute value calculator, an FIR LPF(lpw pass filter), and the Kalman filter. Under this configuration, two design schemes, HW design scheme with LPF as a final stage and HW/SW co-design scheme with a Kalman filter as a final stage, are compared in terms of the performance and efficiency. As a result, envelope detecting performances of the two schemes are proved to be almost same, but the HW/SW co-design is verified to be much more efficient than the HW design considering the much smaller time consumption during system design.
https://doi.org/10.5573/ieie.2014.51.8.148 인용 PDF KSCI

An Implementation of the Fault Detection System in the RFID Tag Manufacturing Automation (RFID 태그 생산 공정 자동화를 위한 부적합품 검출 시스템의 구현)

Jung, Min-Po;Cho, Hyuk-Gyu;Jung, Deok-Gil
- Journal of the Korea Society of Computer and Information
- /
- v.16 no.2
- /
- pp.47-53
- /
- 2011
The detection process of defective tags in most of Korean domestic RFID manufacturing companies is treated by on-hand processing after the job of chip bonding, so it has been requested to reduce the time and cost for manufacturing of RFID tags. Therefore, in this paper, we implement the system to perform the detection of defective tags after the process of chip bonding, and so provide the basis of software to establish the foundation of automation system for the detection of defected RFID tags which is requested in the related Korean domestic industrial field. We have developed the system by using UML in modeling phase and JAVA in implementation phase to reduce the cost of development of program and make it easy to maintain. The developed system in this paper shows the very enhanced performance in processing speed and perfect detection rate of defective tags, comparing to the method of on-hand processing.
https://doi.org/10.9708/jksci.2011.16.2.047 인용 PDF KSCI

Implementation of a 3D Graphics Hardwired T&L Accelerator based on a SoC Platform for a Mobile System (SoC 플랫폼 기반 모바일용 3차원 그래픽 Hardwired T&L Accelerator 구현)

Lee, Kwang-Yeob;Koo, Yong-Seo
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.9
- /
- pp.59-70
- /
- 2007
In this paper, we proposed an effective T&L(Transform & Lighting) Processor architecture for a real time 3D graphics acceleration SoC(System on a Chip) in a mobile system. We designed Floating point arithmetic IPs for a T&L processor. And we verified IPs using a SoC Platform. Designed T&L Processor consists of 24 bit floating point data format and 16 bit fixed point data format, and supports the pipeline keeping the balance between Transform process and Lighting process using a parallel computation of 3D graphics. The delay of pipeline processing only Transform operation is almost same as the delay processing both Transform operation and Lighting operation. Designed T&L Processor is implemented and verified using a SoC Platform. The T&L Processor operates at 80MHz frequency in Xilinx-Virtex4 FPGA. The processing speed is measured at the rate of 20M Vertexes/sec.
PDF KSCI

Real-time FCWS implementation using CPU-FPGA architecture (CPU-FPGA 구조를 이용한 실시간 FCWS 구현)

Han, Sungwoo;Jeong, Yongjin
- Journal of IKEEE
- /
- v.21 no.4
- /
- pp.358-367
- /
- 2017
Advanced Driver Assistance Systems(ADAS), such as Front Collision Warning System (FCWS) are currently being developed. FCWS require high processing speed because it must operate in real time while driving. In addition, a low-power system is required to operate in an automobile embedded system. In this paper, FCWS is implemented in CPU-FPGA architecture in embedded system to enable real-time processing. The lane detection enabled the use of the Inverse Transform Perspective (IPM) and sliding window methods to operate at fast speed. To detect the vehicle, a Convolutional Neural Network (CNN) with high recognition rate and accelerated by parallel processing in FPGA is used. The proposed architecture was verified using Intel FPGA Cyclone V SoC(System on Chip) with ARM-Core A9 which operates in low power and on-board FPGA. The performance of FCWS in HD resolution is 44FPS, which is real time, and energy efficiency is about 3.33 times higher than that of high performance PC enviroment.
https://doi.org/10.7471/ikeee.2017.21.4.358 인용 PDF KSCI

Design of Image Signal Processor greatly reduced chip area by role sharing of hardware and software (하드웨어와 소프트웨어의 역할 분담을 통해 칩 면적을 크게 줄인 Image Signal Processor의 설계)

Park, Jung-Hwan;Park, Jong-Sik;Lee, Seong-Soo
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.8
- /
- pp.1737-1744
- /
- 2010
The Image sensor needs various image processing to improve image quality. ISP(Image Signal Processor) performs various image processing. Conventional vision cameras have own software ISP functions and perform in PC instead of using commercial ISP chips. However these methods have problems such as large computation for image processing. In this paper, we proposed ISP that significantly reduced chip area by efficient sharing of hardware and software. Large operation blocks are designed to hardware for high performances, and we used hardware simultaneously with software considering the size of the hardware. The implemented ISP can process VGA(640*4800) images and has 91450 gate sizes in 0.35um process.
https://doi.org/10.6109/jkiice.2010.14.8.1737 인용 PDF KSCI

Jeong, Yong-Jae;Park, Sung-Mo;Kim, Jong-Nam;Moon, Kwang-Seok
- Journal of the Institute of Convergence Signal Processing
- /
- v.10 no.2
- /
- pp.100-105
- /
- 2009
A watermarking for copyright protection of digital contents for broadcasting have to be made for a real-time system. In this paper, we propose a real-time video watermarking chip and system which is hardware based watermark embedding system of SD/HD video. Our chip is implemented by FPGA which is STRATIX device from ALTERA, and our system is implemented by GS1560A and GS1532 devices from GENNUM for HD/SD video signal processing. There was little visual artifact due to watermarking in subjective quality evaluation between the original video and the watermarked one. Embedded watermark was all extracted after a robustness test called natural video attacks such as A/D conversion and MPEG compression. Our implemented watermarking hardware system can be useful in movie production and broadcasting companies that requires real-time based copyright protection system.
PDF

Search Result 808, Processing Time 0.03 seconds

Performance and Scalability of OpenMP Programs on Chip-MultiThreading Server (칩 멀티쓰레딩 서버에서 OpenMP 프로그램의 성능과 확장성)

Design and implementation of the synchronization circuit for OFDM system without synchronization preambles (동기 프리엠블이 없는 OFDM 시스템의 동기회로 설계 및 구현)

Determination of Flow Stress and Cutting Force Prediction of Ti-6Al-4V Material for 3D Printer using S-K Constitutive Equation (S-K 구성방정식을 이용한 프린터용 3D Ti-6Al-4V 재료의 유동응력 결정 및 절삭력 예측)

An Efficient Block Cipher Implementation on Many-Core Graphics Processing Units

HW/SW Co-design For an Ultrasonic Signal Processing System Using Zynq SoC (Zynq SoC를 이용한 초음파 신호처리 시스템 HW/SW co-design)

An Implementation of the Fault Detection System in the RFID Tag Manufacturing Automation (RFID 태그 생산 공정 자동화를 위한 부적합품 검출 시스템의 구현)

Implementation of a 3D Graphics Hardwired T&L Accelerator based on a SoC Platform for a Mobile System (SoC 플랫폼 기반 모바일용 3차원 그래픽 Hardwired T&L Accelerator 구현)

Real-time FCWS implementation using CPU-FPGA architecture (CPU-FPGA 구조를 이용한 실시간 FCWS 구현)

Design of Image Signal Processor greatly reduced chip area by role sharing of hardware and software (하드웨어와 소프트웨어의 역할 분담을 통해 칩 면적을 크게 줄인 Image Signal Processor의 설계)

Implementation of A Real Time Watermark Embedding System for Copyright Protection of Digital Broadcasting Contents (디지털 방송 콘텐츠 저작권 보호를 위한 실시간 워터마크 삽입 시스템 구현)

Search Result 808, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)