• Title/Summary/Keyword: 메모리 효율

Search Result 1,782, Processing Time 0.027 seconds

Design and Implementation of the Central Queue Based Loop Scheduling Method (중앙 큐 기반의 루프 스케쥴링 기법의 설계 및 구현)

  • Kim, Hyun-Chul;Kim, Hyo-Cheol;Yoo, Kee-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.38 no.5
    • /
    • pp.16-26
    • /
    • 2001
  • In this paper, we present a new scheduling method called CDSS(Carried-Dependence Self-Scheduling) for efficiently execution of the loop with intra dependency between iterations based on the central queue. We also implemented it on shared memory system using Java language. Also, we study the modification that converts the existing self-scheduling method based on the central task queue for parallel loops onto the same form applied to loop with loop-carried dependences. The proposed method is self scheduling and assigns the loops in three-level considering the synchronization point according to the dependence distance of the loops. To adapt the proposed scheme and modified methods into various platforms, including a uni-processor system, we use threads for implementation. Compared to other assignment algorithms with various changes of application and system parameters, CDSS is found to be more efficient than other methods in overall execution time including scheduling overheads. CDSS shows improved performance over modified SS, Factoring, GSS and CSS by about 0.02, 40.5, 46.1 and 53.6%, respectively. In CDSS, we achieve the best performance on varying application programs using a few threads, which equal the dependence distance.

  • PDF

Fast Motion Estimation for Variable Motion Block Size in H.264 Standard (H.264 표준의 가변 움직임 블록을 위한 고속 움직임 탐색 기법)

  • 최웅일;전병우
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.41 no.6
    • /
    • pp.209-220
    • /
    • 2004
  • The main feature of H.264 standard against conventional video standards is the high coding efficiency and the network friendliness. In spite of these outstanding features, it is not easy to implement H.264 codec as a real-time system due to its high requirement of memory bandwidth and intensive computation. Although the variable block size motion compensation using multiple reference frames is one of the key coding tools to bring about its main performance gain, it demands substantial computational complexity due to SAD (Sum of Absolute Difference) calculation among all possible combinations of coding modes to find the best motion vector. For speedup of motion estimation process, therefore, this paper proposes fast algorithms for both integer-pel and fractional-pel motion search. Since many conventional fast integer-pel motion estimation algorithms are not suitable for H.264 having variable motion block sizes, we propose the motion field adaptive search using the hierarchical block structure based on the diamond search applicable to variable motion block sizes. Besides, we also propose fast fractional-pel motion search using small diamond search centered by predictive motion vector based on statistical characteristic of motion vector.

A Traffic Pattern Matching Hardware for a Contents Security System (콘텐츠 보안 시스템용 트래픽 패턴 매칭 하드웨어)

  • Choi, Young;Hong, Eun-Kyung;Kim, Tae-Wan;Paek, Seung-Tae;Choi, Il-Hoon;Oh, Hyeong-Cheol
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.1
    • /
    • pp.88-95
    • /
    • 2009
  • This paper presents a traffic pattern matching hardware that can be used in high performance network applications. The presented hardware is designed for a contents security system which is to block various kinds of information drain or intrusion activities. The hardware consists of two parts: the header lookup and string pattern matching parts. For implementing the header lookup part in hardware, the TCAMs(ternary CAMs) are popularly used. Since the TCAM approach is inefficient in terms of the hardware and memory costs and the power consumption, however, we adopt and modify an alternative approach based on the comparator arrays and the HiCuts tree. Our implementation results, using Xilinx FPGA XC4VSX55, show that our design can reduce the usage of the FPGA slices by about 26%, and the Block RAM by about 58%. In the design of string pattern matching part, we design and use a hashing module based on cellular automata, which is hardware efficient and consumes less power by adaptively changing its configuration to reduce the collision rates.

Detection of Complex Event Patterns over Interval-based Events (기간기반 복합 이벤트 패턴 검출)

  • Kang, Man-Mo;Park, Sang-Mu;Kim, Sank-Rak;Kim, Kang-Hyun;Lee, Dong-Hyeong
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.4
    • /
    • pp.201-209
    • /
    • 2012
  • The point-based complex event processing handled an instantaneous event by using one time stamp in each event. However, the activity period of the event plays the important role in the field which is the same as the finance, multimedia, medicine, and meteorology. The point-based event is insufficient for expressing the complex temporal relationship in this field. In the application field of the real-time world, the event has the period. The events more than two kinds can be temporally overlapped. In addition, one event can include the other event. The relation about the events of kind of these can not be successive like the point-based event. This thesis designs and implements the method detecting the patterns of the complex event by using the interval-based events. The interval-based events can express the overlapping relation between events. Furthermore, it can include the others. By using the end point of beginning and end point of the termination, the operator of interval-based events shows the interval-based events. It expresses the sequence of the interval-based events and can detect the complex event patterns. This thesis proposes the algorithm using the active instance stack in order to raise efficiency of detection of the complex event patterns. When comprising the event sequence, this thesis applies the window push down technique in order to reduce the number of intermediate results. It raises the utility factor of the running time and memory.

An Efficient Test Compression Scheme based on LFSR Reseeding (효율적인 LFSR 리시딩 기반의 테스트 압축 기법)

  • Kim, Hong-Sik;Kim, Hyun-Jin;Ahn, Jin-Ho;Kang, Sung-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.3
    • /
    • pp.26-31
    • /
    • 2009
  • A new LFSR based test compression scheme is proposed by reducing the maximum number of specified bits in the test cube set, smax, virtually. The performance of a conventional LFSR reseeding scheme highly depends on smax. In this paper, by using different clock frequencies between an LFSR and scan chains, and grouping the scan cells, we could reduce smax virtually. H the clock frequency which is slower than the clock frequency for the scan chain by n times is used for LFSR, successive n scan cells are filled with the same data; such that the number of specified bits can be reduced with an efficient grouping of scan cells. Since the efficiency of the proposed scheme depends on the grouping mechanism, a new graph-based scan cell grouping heuristic has been proposed. The simulation results on the largest ISCAS 89 benchmark circuit show that the proposed scheme requires less memory storage with significantly smaller area overhead compared to the previous test compression schemes.

An Empirical Study on Linux I/O stack for the Lifetime of SSD Perspective (SSD 수명 관점에서 리눅스 I/O 스택에 대한 실험적 분석)

  • Jeong, Nam Ki;Han, Tae Hee
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.9
    • /
    • pp.54-62
    • /
    • 2015
  • Although NAND flash-based SSD (Solid-State Drive) provides superior performance in comparison to HDD (Hard Disk Drive), it has a major drawback in write endurance. As a result, the lifetime of SSD is determined by the workload and thus it becomes a big challenge in current technology trend of such as the shifting from SLC (Single Level Cell) to MLC (Multi Level cell) and even TLC (Triple Level Cell). Most previous studies have dealt with wear-leveling or improving SSD lifetime regarding hardware architecture. In this paper, we propose the optimal configuration of host I/O stack focusing on file system, I/O scheduler, and link power management using JEDEC enterprise workloads in terms of WAF (Write Amplification Factor) which represents the efficiency perspective of SSD life time especially for host write processing into flash memory. Experimental analysis shows that the optimum configuration of I/O stack for the perspective of SSD lifetime is MinPower-Dead-XFS which prolongs the lifetime of SSD approximately 2.6 times in comparison with MaxPower-Cfq-Ext4, the best performance combination. Though the performance was reduced by 13%, this contributions demonstrates a considerable aspect of SSD lifetime in relation to I/O stack optimization.

RSP-DS: Real Time Sequential Patterns Analysis in Data Streams (RSP-DS: 데이터 스트림에서의 실시간 순차 패턴 분석)

  • Shin Jae-Jyn;Kim Ho-Seok;Kim Kyoung-Bae;Bae Hae-Young
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.9
    • /
    • pp.1118-1130
    • /
    • 2006
  • Existed pattern analysis algorithms in data streams environment have researched performance improvement and effective memory usage. But when new data streams come, existed pattern analysis algorithms have to analyze patterns again and have to generate pattern tree again. This approach needs many calculations in real situation that needs real time pattern analysis. This paper proposes a method that continuously analyzes patterns of incoming data streams in real time. This method analyzes patterns fast, and thereafter obtains real time patterns by updating previously analyzed patterns. The incoming data streams are divided into several sequences based on time based window. Informations of the sequences are inputted into a hash table. When the number of the sequences are over predefined bound, patterns are analyzed from the hash table. The patterns form a pattern tree, and later created new patterns update the pattern tree. In this way, real time patterns are always maintained in the pattern tree. During pattern analysis, suffixes of both new pattern and existed pattern in the tree can be same. Then a pointer is created from the new pattern to the existed pattern. This method reduce calculation time during duplicated pattern analysis. And old patterns in the tree are deleted easily by FIFO method. The advantage of our algorithm is proved by performance comparison with existed method, MILE, in a condition that pattern is changed continuously. And we look around performance variation by changing several variable in the algorithm.

  • PDF

An Efficient Parallelization Implementation of PU-level ME for Fast HEVC Encoding (고속 HEVC 부호화를 위한 효율적인 PU레벨 움직임예측 병렬화 구현)

  • Park, Soobin;Choi, Kiho;Park, Sang-Hyo;Jang, Euee Seon
    • Journal of Broadcast Engineering
    • /
    • v.18 no.2
    • /
    • pp.178-184
    • /
    • 2013
  • In this paper, we propose an efficient parallelization technique of PU-level motion estimation (ME) in the next generation video coding standard, high efficiency video coding (HEVC) to reduce the time complexity of video encoding. It is difficult to encode video in real-time because ME has significant complexity (i.e., 80 percent at the encoder). In order to solve this problem, various techniques have been studied, and among them is the parallelization, which is carefully concerned in algorithm-level ME design. In this regard, merge estimation method using merge estimation region (MER) that enables ME to be designed in parallel has been proposed; but, parallel ME based on MER has still unconsidered problems to be implemented ideally in HEVC test model (HM). Therefore, we propose two strategies to implement stable parallel ME using MER in HM. Through experimental results, the excellence of our proposed methods is shown; the encoding time using the proposed method is reduced by 25.64 percent on average of that of HM which uses sequential ME.

Development of the Efficient DAML+OIL Document Management System to support the DAML-S Services in the Embedded Systems (내장형 시스템에서 DAML-S서비스 지원을 위한 효율적인 DAML+OIL문서 관리 시스템)

  • Kim Hag Soo;Jung Moon-young;Cha Hyun Seok;Son Jin Hyun
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.1
    • /
    • pp.36-49
    • /
    • 2005
  • Recently, many researchers have given high attention to the semantic web services based on the semantic web technology While existing web services use the XML-based web service description language, WSDL, semantic web services are utilizing web service description languages such as DAML-S in ontology languages. The researchers of semantic web services are generally focused on web service discovery, web service invocation, web service selection and composition, and web service execution monitoring. Especially, the semantic web service discovery as the basis to accomplish the ultimate semantic web service environment has some different properties from previous information discovery areas. Hence, it is necessary to develop the storage system and discovery mechanism appropriate to the semantic well description languages. Even though some related systems have been developed, they are not appropriate for the embedded system environment, such as intelligent robotics, in which there are some limitations on memory disk space, and computing power In this regard, we in the embedded system environment have developed the document management system which efficiently manages the web service documents described by DAML-S for the purpose of the semantic web service discovery, In addition, we address the distinguishing characteristics of the system developed in this paper, compared with the related researches.

Distributed Assumption-Based Truth Maintenance System for Scalable Reasoning (대용량 추론을 위한 분산환경에서의 가정기반진리관리시스템)

  • Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1115-1123
    • /
    • 2016
  • Assumption-based truth maintenance system (ATMS) is a tool that maintains the reasoning process of inference engine. It also supports non-monotonic reasoning based on dependency-directed backtracking. Bookkeeping all the reasoning processes allows it to quickly check and retract beliefs and efficiently provide solutions for problems with large search space. However, the amount of data has been exponentially grown recently, making it impossible to use a single machine for solving large-scale problems. The maintaining process for solving such problems can lead to high computation cost due to large memory overhead. To overcome this drawback, this paper presents an approach towards incrementally maintaining the reasoning process of inference engine on cluster using Spark. It maintains data dependencies such as assumption, label, environment and justification on a cluster of machines in parallel and efficiently updates changes in a large amount of inferred datasets. We deployed the proposed ATMS on a cluster with 5 machines, conducted OWL/RDFS reasoning over University benchmark data (LUBM) and evaluated our system in terms of its performance and functionalities such as assertion, explanation and retraction. In our experiments, the proposed system performed the operations in a reasonably short period of time for over 80GB inferred LUBM2000 dataset.