• Title/Summary/Keyword: hardware architecture

Search Result 1,329, Processing Time 0.025 seconds

The Performance Analysis of GPU-based Cloth simulation according to the Change of Work Group Configuration (워크 그룹 구성 변화에 따른 GPU 기반 천 시뮬레이션의 성능 분석)

  • Choi, Young-Hwan;Hong, Min;Lee, Seung-Hyun;Choi, Yoo-Joo
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.29-36
    • /
    • 2017
  • In these days, 3D dynamic simulation is closely related to many industries. In the past, physically-based 3D simulation was used mainly in the car crash or construction related fields, but it also plays an important role in movies or games today. Many mathematical computations are needed to represent the 3D object realistically, but it is difficult to process a large amount of calculations for simulation of application based on CPU in real-time. Recently, with the advanced graphic hardware and improved architecture, GPU can be utilized for the general purposes of computation function as well as graphic computation. Many approaches using GPU have been applied for various research fields. In this paper, we analyze the performance variation of two cloth simulation algorithms based on GPU according to the change of execution properties of GPU shaders in oder to optimize the performance of GPU-based cloth simulation. Cloth simulation is implemented by the spring centric algorithm and node centric algorithm with GPU parallel computing using compute shader of GLSL 4.3. We compare the performance of between these algorithms according to the change of the size and dimension of work group. The experiment is repeated to 10 times during 5,000 frames for each test and experimental results are provided by averaging of FPS. The experimental result shows that the node centric algorithm is executed in higher speed than the spring centric algorithm.

Implementation of a Windows NT Based Stream Server for Multimedia School Systems (멀티미디어 교실을 위한 윈도우 NT 기반 스트림 서버 구현)

  • 손주영
    • Journal of Korea Multimedia Society
    • /
    • v.2 no.3
    • /
    • pp.277-288
    • /
    • 1999
  • A distributed multimedia school system is developed for the multimedia classroom at high school and university. The system is designed and implemented for students to improve the learning efficiency through the personalized multimedia contents and pace of learning. The previously developed multimedia information retrieval systems have some limitations on being applied to the multimedia classroom: expensive cost per stream or poor retrieval quality inappropriate for education, unscalability of system and service, unfamiliar proprietary client environment, and difficulty for teachers to use the authoring tools and manage the authored teaching materials. The system we developed overcomes the above problems. It is so scalable as to be applicable not only to a segmented classroom but also to the world wide Internet. The stream server is one of the components of the system: stream servers clients, a service gateway system, and a authoring management system. This paper describes the design and implementation of the stream server. A single stream server can simultaneously playback the multimedia streams as many as clients at one classroom. This is achieved only by the software engine without any changes of the hardware architecture. The systematic coupling with other components gives the scalability of the system and the flexibility of services.

  • PDF

A 16 bit FPGA Microprocessor for Embedded Applications (실장제어 16 비트 FPGA 마이크로프로세서)

  • 차영호;조경연;최혁환
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.5 no.7
    • /
    • pp.1332-1339
    • /
    • 2001
  • SoC(System on Chip) technology is widely used in the field of embedded systems by providing high flexibility for a specific application domain. An important aspect of development any new embedded system is verification which usually requires lengthy software and hardware co-design. To reduce development cost of design effort, the instruction set of microprocessor must be suitable for a high level language compiler. And FPGA prototype system could be derived and tested for design verification. In this paper, we propose a 16 bit FPGA microprocessor, which is tentatively-named EISC16, based on an EISC(Extendable Instruction Set Computer) architecture for embedded applications. The proposed EISC16 has a 16 bit fixed length instruction set which has the short length offset and small immediate operand. A 16 bit offset and immediate operand could be extended using by an extension register and an extension flag. We developed a cross C/C++ compiler and development software of the EISC16 by porting GNU on an IBM-PC and SUN workstation and compared the object code size created after compiling a C/C. standard library, concluding that EISC16 exhibits a higher code density than existing 16 microprocessors. The proposed EISC16 requires approximately 6,000 gates when designed and synthesized with RTL level VHDL at Xilinix's Virtex XCV300 FPGA. And we design a test board which consists of EISC16 ROM, RAM, LED/LCD panel, periodic timer, input key pad and RS-232C controller. 11 works normally at 7MHz Clock.

  • PDF

The viterbi decoder implementation with efficient structure for real-time Coded Orthogonal Frequency Division Multiplexing (실시간 COFDM시스템을 위한 효율적인 구조를 갖는 비터비 디코더 설계)

  • Hwang Jong-Hee;Lee Seung-Yerl;Kim Dong-Sun;Chung Duck-Jin
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.2 s.332
    • /
    • pp.61-74
    • /
    • 2005
  • Digital Multimedia Broadcasting(DMB) is a reliable multi-service system for reception by mobile and portable receivers. DMB system allows interference-free reception under the conditions of multipath propagation and transmission errors using COFDM modulation scheme, simultaneously, needs powerful channel error's correction ability. Viterbi Decoder for DMB receiver uses punctured convolutional code and needs lots of computations for real-time operation. So, it is desired to design a high speed and low-power hardware scheme for Viterbi decoder. This paper proposes a combined add-compare-select(ACS) and path metric normalization(PMN) unit for computation power. The proposed PMN architecture reduces the problem of the critical path by applying fixed value for selection algorithm due to the comparison tree which has a weak point from structure with the high-speed operation. The proposed ACS uses the decomposition and the pre-computation technique for reducing the complicated degree of the adder, the comparator and multiplexer. According to a simulation result, reduction of area $3.78\%$, power consumption $12.22\%$, maximum gate delay $23.80\%$ occurred from punctured viterbi decoder for DMB system.

An Emulation System for Efficient Verification of ASIC Design (ASIC 설계의 효과적인 검증을 위한 에뮬레이션 시스템)

  • 유광기;정정화
    • Journal of the Korean Institute of Telematics and Electronics C
    • /
    • v.36C no.10
    • /
    • pp.17-28
    • /
    • 1999
  • In this paper, an ASIC emulation system called ACE (ASIC Emulator) is proposed. It can produce the prototype of target ASIC in a short time and verify the function of ASIC circuit immediately The ACE is consist of emulation software in which there are EDIF reader, library translator, technology mapper, circuit partitioner and LDF generator and emulation hardware including emulation board and logic analyzer. Technology mapping is consist of three steps such as circuit partitioning and extraction of logic function, minimization of logic function and grouping of logic function. During those procedures, the number of basic logic blocks and maximum levels are minimized by making the output to be assigned in a same block sharing product-terms and input variables as much as possible. Circuit partitioner obtain chip-level netlists satisfying some constraints on routing structure of emulation board as well as the architecture of FPGA chip. A new partitioning algorithm whose objective function is the minimization of the number of interconnections among FPGA chips and among group of FPGA chips is proposed. The routing structure of emulation board take the advantage of complete graph and partial crossbar structure in order to minimize the interconnection delay between FPGA chips regardless of circuit size. logic analyzer display the waveform of probing signal on PC monitor that is designated by user. In order to evaluate the performance of the proposed emulation system, video Quad-splitter, one of the commercial ASIC, is implemented on the emulation board. Experimental results show that it is operated in the real time of 14.3MHz and functioned perfectly.

  • PDF

Advanced Calendar Queue Scheduler Design Methodology (진보된 캘린더 큐 스케줄러 설계방법론)

  • Kim, Jin-Sil;Chung, Won-Young;Lee, Jung-Hee;Lee, Yong-Surk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.12B
    • /
    • pp.1380-1386
    • /
    • 2009
  • In this paper, we propose a CQS(Calendar Queue Scheduler) architecture which was designed for processing multimedia and timing traffic in home network. With various characteristics of the increased traffic flowed in home such as VoIP, VOD, IPTV, and Best-efforts traffic, the needs of managing QoS(Quality of Service) are being discussed. Making a group regarding application or service is effective to guarantee successful QoS under the restricted circumstances. The proposed design is aimed for home gateway corresponding to the end points of receiver on end-to-end QoS and eligible for supporting multimedia traffic within restricted network sources and optimizing queue sizes. Then, we simulated the area for each module and each memory. The area for each module is referenced by NAND($2{\times}1$) Gate(11.09) when synthesizing with Magnachip 0.18 CMOS libraries through the Synopsys Design Compiler. We verified the portion of memory is 85.38% of the entire CQS. And each memory size is extracted through CACTI 5.3(a unit in mm2). According to the increase of the memory’sentry, the increment of memory area gradually increases, and defining the day size for 1 year definitely affects the total CQS area. In this paper, we discussed design methodology and operation for each module when designing CQS by hardware.

A Study of Future Internet Testbed Construction using NetFGA/OpenFlow Switch on KOREN/KREONET (KOREN/KREONET기반 NetFPGA/OpenFlow 스위치를 이용한 미래인터넷 테스트 베드 구축 방안 연구)

  • Park, Man-Kyu;Jung, Whoi-Jin;Lee, Jae-Yong;Kim, Byung-Chul
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.47 no.7
    • /
    • pp.109-117
    • /
    • 2010
  • Building a large-scale testbed for Future Internet is very important to evaluate a new protocol and new network architecture designed by clean-slate approach. In Korea, new Future Internet testbed project, called FIRST (Future Internet Research for Sustainable Testbed), has been started since Mar. 2009 to design and test new protocols. This project is working together with ETRI and 5 universities. The FIRST@PC is to implement a virtualized hardware-accelerated PC-node by extending the functions of NetFPGA card and build a Future Internet testbed on the KOREN and KREONET for evaluating newly designed protocols and interesting applications. In this paper, we first briefly introduce FIRST@PC project and explain a 'MAC in IP Capsulator' user-space program using raw-socket in Linux to interconnect OpenFlow enabled switch sites on the KOREN and KREONET. After that, we address test results for TCP throughput performance for varying packet size. The test results show that the software based capsulator can support a reasonable bandwidth performance for most of applications.

Perfomance Analysis for the IPC Interface Part in a Distributed ATM Switching Control System (분산 ATM 교환제어시스템에서 프로세서간 통신 정합부에 대한 성능 분석)

  • Yeo, Hwan-Geun;Song, Kwang-Suk;Ro, Soong-Hwan;Ki, Jang-Geun
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.35S no.6
    • /
    • pp.25-35
    • /
    • 1998
  • The control system architecture in switching systems have undergone numerous changes to provide various call processing capability needed in telecommunication services. During call processing in a distributed switching control environment, the delay effect due to communication among main processors or peripheral controllers is one of the limiting factors which affect the system performance. In this paper, we propose a performance model for an IPC(Inter Processor Communication) interface hardware block which is required on the ATM cell-based message processing in a distributed ATM exchange system, and analyze the primary causes which affect the processor performance through the simulation. Consequently, It can be shown that the local CPU of the several components(resources) related to the IPC scheme is a bottleneck factor in achieving the maximum system performance from the simulation results, such as the utilization of each processing component according to the change of the input message rate, and the queue length and processing delay according to input message rate. And we also give some useful results such as the maximum message processing capacity according to the change of the performance of local CPU, and the local CPU maximum throughput according to the change of average message length, which is applicable as a reference data for the improvement or expansion of the ATM control system.

  • PDF

A Fast Inversion for Low-Complexity System over GF(2 $^{m}$) (경량화 시스템에 적합한 유한체 $GF(2^m)$에서의 고속 역원기)

  • Kim, So-Sun;Chang, Nam-Su;Kim, Chang-Han
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.9 s.339
    • /
    • pp.51-60
    • /
    • 2005
  • The design of efficient cryptosystems is mainly appointed by the efficiency of the underlying finite field arithmetic. Especially, among the basic arithmetic over finite field, the rnultiplicative inversion is the most time consuming operation. In this paper, a fast inversion algerian in finite field $GF(2^m)$ with the standard basis representation is proposed. It is based on the Extended binary gcd algorithm (EBGA). The proposed algorithm executes about $18.8\%\;or\;45.9\%$ less iterations than EBGA or Montgomery inverse algorithm (MIA), respectively. In practical applications where the dimension of the field is large or may vary, systolic array sDucture becomes area-complexity and time-complexity costly or even impractical in previous algorithms. It is not suitable for low-weight and low-power systems, i.e., smartcard, the mobile phone. In this paper, we propose a new hardware architecture to apply an area-efficient and a synchronized inverter on low-complexity systems. It requires the number of addition and reduction operation less than previous architectures for computing the inverses in $GF(2^m)$ furthermore, the proposed inversion is applied over either prime or binary extension fields, more specially $GF(2^m)$ and GF(P) .

An Empirical Study on Linux I/O stack for the Lifetime of SSD Perspective (SSD 수명 관점에서 리눅스 I/O 스택에 대한 실험적 분석)

  • Jeong, Nam Ki;Han, Tae Hee
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.9
    • /
    • pp.54-62
    • /
    • 2015
  • Although NAND flash-based SSD (Solid-State Drive) provides superior performance in comparison to HDD (Hard Disk Drive), it has a major drawback in write endurance. As a result, the lifetime of SSD is determined by the workload and thus it becomes a big challenge in current technology trend of such as the shifting from SLC (Single Level Cell) to MLC (Multi Level cell) and even TLC (Triple Level Cell). Most previous studies have dealt with wear-leveling or improving SSD lifetime regarding hardware architecture. In this paper, we propose the optimal configuration of host I/O stack focusing on file system, I/O scheduler, and link power management using JEDEC enterprise workloads in terms of WAF (Write Amplification Factor) which represents the efficiency perspective of SSD life time especially for host write processing into flash memory. Experimental analysis shows that the optimum configuration of I/O stack for the perspective of SSD lifetime is MinPower-Dead-XFS which prolongs the lifetime of SSD approximately 2.6 times in comparison with MaxPower-Cfq-Ext4, the best performance combination. Though the performance was reduced by 13%, this contributions demonstrates a considerable aspect of SSD lifetime in relation to I/O stack optimization.