• 제목/요약/키워드: 코드 최적화

Search Result 480, Processing Time 0.023 seconds

Efficient Design Methodology based on Hybrid Logic Synthesis for SoC (효율적인 SoC 논리합성을 위한 혼합방식의 설계 방법론)

  • Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.3
    • /
    • pp.571-578
    • /
    • 2012
  • In this paper, we propose two main points. The first is the constraint for logic synthesis, and the second is an efficient logic synthesis method. Logic synthesis is a process to obtain the gate-level netlist from RTL (register transfer level) codes using logic mapping and optimization with the specified constraints. The result of logic synthesis is tightly dependent on constraint and logic synthesis method. Since the size and timing can be dramatically changed by these, we should precisely consider them. In this paper, we present the considering items in the process of logic synthesis by using our experience and experimental results. The proposed techniques was applied to a circuit with the hardware resource of about 650K gates. The synthesis time for the hybrid method was reduced by 47% comparing the bottom-up method and It has better timing property about slack than top-down method.

A New Register Allocation Technique for Performance Enhancement of Embedded Software (내장형 소프트웨어의 성능 향상을 위한 새로운 레지스터 할당 기법)

  • Jong-Yeol, Lee
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.10
    • /
    • pp.85-94
    • /
    • 2004
  • In this paper, a register allocation techlique that translates memory accesses to register accesses Is presented to enhance embedded software performance. In the proposed method, a source code is profiled to generate a memory trace. From the profiling results, target functions with high dynamic call counts are selected, and the proposed register allocation technique is applied only to the target functions to save the compilation time. The memory trace of the target functions is searched for the memory accesses that result in cycle count reduction when replaced by register accesses, and they are translated to register accesses by modifying the intermediate code and allocating Promotion registers. The experiments where the performance is measured in terms of the cycle count on MediaBench and DSPstone benchmark programs show that the proposed method increases the performance by 14% and 18% on the average for ARM and MCORE, respectively.

A Study on Clustered OFCDM with Transmit Antenna Diversity and Coding Associated with Frequency Spreading over Frequency Selective Fading Channel (주파수 선택적 페이딩 채널에서 주파수 확산과 결합된 코딩과 송신안테나 다이버시티를 가진 Clustered OFCDM 시스템에 관한 연구)

  • Ryu Kwan-Woong;Park Yong-Wan
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.3A
    • /
    • pp.267-273
    • /
    • 2006
  • This paper improves the effects of clustered OFCDM scheme considering the frequency diversity effect over a frequency selective fading channel. In OFCDM with frequency domain spreading compared to OFDM, we can increase uncorrelated symbols by frequency allocation method of correspondent symbols over the same antenna and different antenna after spreading. The simulation results show that the performance of proposed system is improved by approximately 4 dB in ${\sigma}=0.02{\mu}sec$, the performance is improved by approximately 2.5dB in large delay spread in a 12-path Rayleigh fading channel with overall the root mean squared delay spread and the maximum Doppler frequency of 20 Hz. Also, the required average received Eb/No at the average BER of $10^{-3}$ by optimum method is improved by approximately 2.0 dB, compared to that of STA-OFCDM with frequency rearrange. The new method does not require any bandwidth expansion any feedback from the receiver to the transmitter and its computation complexity is similar to clustered OFCDM.

Adaptive Execution Techniques for Parallel Programs (병렬 프로그램의 적응형 실행 기법)

  • 이재진
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.8
    • /
    • pp.421-431
    • /
    • 2004
  • This paper presents adaptive execution techniques that determine whether parallelized loops are executed in parallel or sequentially in order to maximize performance. The adaptation and performance estimation algorithms are implemented in a compiler preprocessor. The preprocessor inserts code that automatically determines at compile-time or at run-time the way the parallelized loops are executed. Using a set of standard numerical applications written in Fortran77 and running them with our techniques on a distributed shared memory multiprocessor machine (SGI Origin2000), we obtain the performance of our techniques, on average, 26%, 20%, 16%, and 10% faster than the original parallel program on 32, 16, 8, and 4 processors, respectively. One of the applications runs even more than twice faster than its original parallel version on 32 processors.

A Tabu Search Algorithm for Node Reprogramming in Wireless Sensor Networks (무선 센서 네트워크에서 노드 재프로그래밍을 위한 타부 서치 알고리즘)

  • Jang, Kil-woong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.5
    • /
    • pp.596-603
    • /
    • 2019
  • A reprogramming operation is necessary to update the software code of the node to change or update the functionality of the deployed node in wireless sensor networks. This paper proposes an optimization algorithm that minimizes the transmission energy of a node for the purpose of reprogramming a node in wireless sensor networks. We also design an algorithm that keeps energy consumption of all nodes balanced in order to maintain the lifetime of the network. In this paper, we propose a Tabu search algorithm with a new neighborhood generation method for minimizing transmission energy and energy consumption in wireless sensor networks with many nodes. The proposed algorithm is designed to obtain optimal results within a reasonable execution time. The performance of the proposed Tabu search algorithm was evaluated in terms of the node's transmission energy, remaining energy, and algorithm execution time. The performance evaluation results showed better performance than the previous methods.

Discrete element analysis for design modification of leveling blade on motor grader vehicle (모터 그레이더 평탄작업용 블레이드의 설계개선을 위한 개별요소법 해석)

  • Song, Chang-Heon;Oh, Joo-Young;Cho, Jung-Woo;Kim, Mun-Gyu;Seok, Jeong-Ho
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.23 no.6
    • /
    • pp.423-438
    • /
    • 2021
  • The blade of motor grader is used for scattering and leveling the aggregates on the foundation of road construction site. The paper performed a design improvement research of the blade part to enhance the working efficiency of motor graders. The scattering works of aggregates by blade driving were simulated by DEM (discrete element method) of a dynamic code. The four design parameters were selected and a specific leveling scenario for the simulation was determined. The nine blade models were numerically experimented, and the sensitivity of each factors was analyzed. Next, the design factors that influence a blade performance have been selected by ANOVA, and these key design factors were applied to the progressive quadratic response surface method (PQRSM). The optimum set of design factors of the blade was finally proposed.

Design and Implementation of High-Performance Cryptanalysis System Based on GPUDirect RDMA (GPUDirect RDMA 기반의 고성능 암호 분석 시스템 설계 및 구현)

  • Lee, Seokmin;Shin, Youngjoo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.6
    • /
    • pp.1127-1137
    • /
    • 2022
  • Cryptographic analysis and decryption technology utilizing the parallel operation of GPU has been studied in the direction of shortening the computation time of the password analysis system. These studies focus on optimizing the code to improve the speed of cryptographic analysis operations on a single GPU or simply increasing the number of GPUs to enhance parallel operations. However, using a large number of GPUs without optimization for data transmission causes longer data transmission latency than using a single GPU and increases the overall computation time of the cryptographic analysis system. In this paper, we investigate GPUDirect RDMA and related technologies for high-performance data processing in deep learning or HPC research fields in GPU clustering environments. In addition, we present a method of designing a high-performance cryptanalysis system using the relevant technologies. Furthermore, based on the suggested system topology, we present a method of implementing a cryptanalysis system using password cracking and GPU reduction. Finally, the performance evaluation results are presented according to demonstration of high-performance technology is applied to the implemented cryptanalysis system, and the expected effects of the proposed system design are shown.

Optimized Implementation of Lightweight Block Cipher PIPO Using T-Table (T-table을 사용한 경량 블록 암호 PIPO의 최적화 구현)

  • Minsig Choi;Sunyeop Kim;Insung Kim;Hanbeom Shin;Seonggyeom Kim;Seokhie Hong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.3
    • /
    • pp.391-399
    • /
    • 2023
  • In this paper, we presents for the first time an implementation using T-table for PIPO-64/128, 256 which are lightweight block ciphers. While our proposed implementation requires 16 T-tables, we show that the two types of T-tables are circulant and obtain variants implementations that require a smaller number of T-tables. We then discuss trade-off between the number of required T-tables (code size) and throughput by evaluating the throughput of the variant implementations on an Intel Core i7-9700K processor. The throughput-optimized versions for PIPO-64/128, 256 provide better throughput than TLU(Table-Look-Up) reference implementation by factors of 3.11 and 2.76, respectively, and bit-slice reference implementation by factors of 3.11 and 2.76, respectively.

Lip-Synch System Optimization Using Class Dependent SCHMM (클래스 종속 반연속 HMM을 이용한 립싱크 시스템 최적화)

  • Lee, Sung-Hee;Park, Jun-Ho;Ko, Han-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.7
    • /
    • pp.312-318
    • /
    • 2006
  • The conventional lip-synch system has a two-step process, speech segmentation and recognition. However, the difficulty of speech segmentation procedure and the inaccuracy of training data set due to the segmentation lead to a significant Performance degradation in the system. To cope with that, the connected vowel recognition method using Head-Body-Tail (HBT) model is proposed. The HBT model which is appropriate for handling relatively small sized vocabulary tasks reflects co-articulation effect efficiently. Moreover the 7 vowels are merged into 3 classes having similar lip shape while the system is optimized by employing a class dependent SCHMM structure. Additionally in both end sides of each word which has large variations, 8 components Gaussian mixture model is directly used to improve the ability of representation. Though the proposed method reveals similar performance with respect to the CHMM based on the HBT structure. the number of parameters is reduced by 33.92%. This reduction makes it a computationally efficient method enabling real time operation.

Optimization of Cookies Prepared with Hizikia fusiformis Powder Using Response Surface Methodology (톳 분말 첨가 쿠키의 최적화)

  • Kim, Hyun-Sook;Shin, Eun-Soo;Lyu, Eun-Soon
    • Korean journal of food and cookery science
    • /
    • v.26 no.5
    • /
    • pp.627-635
    • /
    • 2010
  • This study was performed to determine the optimal manufacturing conditions of cookies prepared with Hizikia fusiformis. The variables in Hizikia fusiformis cookies production were the amounts of Hizikia fusiformis powder, butter, and sugar. Six sensory characteristics were used for sensory evaluations, including color, appearance, taste, brittleness, hardness, and overall acceptability. The optimal amounts of the three variables were found to be 6.00 g for Hizikia fusiformis powder, 100.00 g for butter, and 70.00 g for sugar against flour powder 200 g, which satisfies a target sensory score (7.0/9.0) according to the response surface method. Cookies with these optimal amounts of Hizikia fusiformis powder, butter, and sugar were tasted by 114 consumers and showed a high acceptability score (7.21). Females preferred the flavor of Hizikia fusiformis significantly more than males did (p<0.05). Older consumers (${\geq}30$ years old) scored significantly higher than younger consumers (${\leq}29$ years old) in texture (p<0.05). Spread ratio and hardness scores of Hizikia fusiformis cookies were lower than those of control cookies. Color analysis results of Hizikia fusiformis cookies were significantly lower in brightness and redness, and higher in yellowness than control cookies (p<0.01).