• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.028 seconds

Analysis on the Active/Inactive Status of Computational Resources for Improving the Performance of the GPU (GPU 성능 저하 해결을 위한 내부 자원 활용/비활용 상태 분석)

  • Choi, Hongjun;Son, Dongoh;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.1-11
    • /
    • 2015
  • In recent high performance computing system, GPGPU has been widely used to process general-purpose applications as well as graphics applications, since GPU can provide optimized computational resources for massive parallel processing. Unfortunately, GPGPU doesn't exploit computational resources on GPU in executing general-purpose applications fully, because the applications cannot be optimized to GPU architecture. Therefore, we provide GPU research guideline to improve the performance of computing systems using GPGPU. To accomplish this, we analyze the negative factors on GPU performance. In this paper, in order to clearly classify the cause of the negative factors on GPU performance, GPU core status are defined into 5 status: fully active status, partial active status, idle status, memory stall status and GPU core stall status. All status except fully active status cause performance degradation. We evaluate the ratio of each GPU core status depending on the characteristics of benchmarks to find specific reasons which degrade the performance of GPU. According to our simulation results, partial active status, idle status, memory stall status and GPU core stall status are induced by computational resource underutilization problem, low parallelism, high memory requests, and structural hazard, respectively.

Multi-Dimensional Record Scan with SIMD Vector Instructions (SIMD 벡터 명령어를 이용한 다차원 레코드 스캔)

  • Cho, Sung-Ryong;Han, Hwan-Soo;Lee, Sang-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.732-736
    • /
    • 2010
  • Processing a large amount of data becomes more important than ever. Particularly, the information queries which require multi-dimensional record scan can be efficiently implemented with SIMD instruction sets. In this article, we present a SIMD record scan technique which employs row-based scanning. Our technique is different from existing SIMD techniques for predicate processes and aggregate operations. Those techniques apply SIMD instructions to the attributes in the same column of the database, exploiting the column-based record organization of the in-memory database systems. Whereas, our SIMD technique is useful for multi-dimensional record scanning. As the sizes of registers and the memory become larger, our row-based SIMD scan can have bigger impact on the performance. Moreover, since our technique is orthogonal to the parallelization techniques for multi-core processors, it can be applied to both uni-processors and multi-core processors without too many changes in the software architectures.

New Memristor-Based Crossbar Array Architecture with 50-% Area Reduction and 48-% Power Saving for Matrix-Vector Multiplication of Analog Neuromorphic Computing

  • Truong, Son Ngoc;Min, Kyeong-Sik
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.14 no.3
    • /
    • pp.356-363
    • /
    • 2014
  • In this paper, we propose a new memristor-based crossbar array architecture, where a single memristor array and constant-term circuit are used to represent both plus-polarity and minus-polarity matrices. This is different from the previous crossbar array architecture which has two memristor arrays to represent plus-polarity and minus-polarity connection matrices, respectively. The proposed crossbar architecture is tested and verified to have the same performance with the previous crossbar architecture for applications of character recognition. For areal density, however, the proposed crossbar architecture is twice better than the previous architecture, because only single memristor array is used instead of two crossbar arrays. Moreover, the power consumption of the proposed architecture can be smaller by 48% than the previous one because the number of memristors in the proposed crossbar architecture is reduced to half compared to the previous crossbar architecture. From the high areal density and high energy efficiency, we can know that this newly proposed crossbar array architecture is very suitable to various applications of analog neuromorphic computing that demand high areal density and low energy consumption.

A Context-Aware Engine for Mobile Platforms (모바일 플랫폼 상황이해엔진)

  • Lee Sun A;Lee Keon Myung;Lee Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.3
    • /
    • pp.300-305
    • /
    • 2005
  • Context-aware intelligent services are essential in ubiquitous computing and intelligent robots environments, which make decisions on which services to start with the consideration of surrounding contexts. In the ubiquitous and intelligent environments, context-aware service engines should be light-weighted due to the resource restrictions on the devices. This paper presents a context-aware service engine which is designed for light-weighted devices. The context-aware service engine has been designed with special attention to improve the execution speed and to minimize the memory requirement.

Design and Implementation of Flash Cryptographic File System Based on YAFFS (YAFFS 기반의 암호화 플래시 파일 시스템의 설계 및 구현)

  • Kim, Seok-Hyun;Cho, Yoo-Kun
    • Convergence Security Journal
    • /
    • v.7 no.4
    • /
    • pp.15-21
    • /
    • 2007
  • As the amount of flash memory being used in embedded device is increased and embedded devices become more important in many computing environments, embedded file system security becomes more important issue. Moreover embedded devices can be easily stolen or lost because of it's high portability. If the lost embedded device has very important information, there's no means to protect it except data encryption. For improving embedded devices' security this paper propose design and implementation of flash cryptographic file system. For this purpose YAFFS is used. By the modified YAFFS cryptographic file system, the security of embedded devices can be improved.

  • PDF

Encoding Method for Olfactory Information (후각 정보의 부호화 방법)

  • Lee, Keun-Hee;Lee, Sang-Wook;Kim, Eung-Soo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.7
    • /
    • pp.1275-1282
    • /
    • 2007
  • With the rapid spread of smart mobile products and practical use of wireless network, it became possible to offer the computing environment people can use anytime anywhere and it is predicted that portable computer as the next generation computing is to be put to practical use. This computer will be expected to have the new interface which provides realistic service through human's five senses as well as the interface through the visual and auditory senses. Accordingly, in this study, we are going to talk about the technology that expresses and reproduces what is based on olfactory information - closely related with memory among honan's five senses - in order to embody the lifelike user interface.

Improving the I/O Performance of Disk-Based Graph Engine by Graph Ordering (디스크 기반 그래프 엔진의 입출력 성능 향상을 위한 그래프 오더링)

  • Lim, Keunhak;Kim, Junghyun;Lee, Eunjae;Seo, Jiwon
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.1
    • /
    • pp.40-45
    • /
    • 2018
  • With the advent of big data and social networks, large-scale graph processing becomes popular research topic. Recently, an optimization technique called Gorder has been proposed to improve the performance of in-memory graph processing. This technique improves performance by optimizing the graph layout on memory to have better cache locality. However, since it is designed for in-memory graph processing systems, the technique is not suitable for disk-based graph engines; also the cost for applying the technique is significantly high. To solve the problem, we propose a new graph ordering called I/O Order. I/O Order considers the characteristics of I/O accesses for SSDs and HDDs to improve the performance of disk-based graph engine. In addition, the algorithmic complexity of I/O Order is simple compared to Gorder, hence it is cheaper to apply I/O Ordering. I/O order reduces the cost of pre-processing up to 9.6 times compared to that of Gorder's, still its performance is 2 times higher compared to the Random in low-locality graph algorithms.

Design and Implementation of a Host Interface for a Regular Expression Processor (정규표현식 프로세서를 위한 호스트 인터페이스 설계 및 구현)

  • Kim, JongHyun;Yun, SangKyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.2
    • /
    • pp.97-103
    • /
    • 2017
  • Many hardware-based regular expression matching architectures have been proposed for high-performance matching. In particular, regular expression processors, which perform pattern matching by treating the regular expressions as the instruction sequence like general purpose processors, have been proposed. After instruction sequence and data are provided in the instruction memory and data memory, respectively, a regular expression processor can perform pattern matching. To use a regular expression processor as a coprocessor, we need the host interface to transfer the instruction and data into the memory of a regular expression processor. In this paper, we design and implement the host interface between a host and a regular expression processor in the DE1-SoC board and the application program interface. We verify the operations of the host interface and a regular expression processor by executing the application programs which perform pattern matching using the application program interface.

An Adaptive Garbage Collection Policy for NAND Flash Memory (낸드 플래시 메모리를 위한 적응형 가비지 컬렉션 정책)

  • Han, Gyu-Tae;Kim, Sung-Jo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.5
    • /
    • pp.322-330
    • /
    • 2009
  • In order to utilize NAND flash memory as storage media which does not allow update-in-place and limits the number of block erase count, various garbage collection policies supporting wear-leveling have been investigated. Conventional garbage collection policies require cleaning-index calculation for the entire blocks to choose a block to be garbage-collected to support wear-leveling whenever a garbage collection is required, which results in performance degradation of system. This paper proposes a garbage collection policy which supports wear-leveling using a threshold value, which is in fact a variance of erase counts and by the maximum erase count of all blocks, without calculating the cleaning-index. During garbage collection, the erase cost is minimized by using the Greedy Policy if the variance is less than the threshold value. It achieves wear-leveling by excluding the block with the largest erase count from erase target blocks if the variance is larger than threshold value. The proposed scheme shows that a standard deviation approaches to zero as the erase count of blocks approaches to its upper limit and the measured speed of garbage collection is two times faster than the conventional schemes.

A Reconfigurable Load and Performance Balancing Scheme for Parallel Loops in a Clustered Computing Environment (클러스터 컴퓨팅 환경에서 병렬루프 처리를 위한 재구성 가능한 부하 및 성능 균형 방법)

  • 김태형
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.1
    • /
    • pp.49-56
    • /
    • 2004
  • Load imbalance is a serious impediment to achieving good performance in parallel processing. Global load balancing schemes cannot adequately manage to balance parallel tasks generated from a single application. Dynamic loop scheduling methods are known to be useful in balancing parallel loops on shared-memory multiprocessor machines. However, their centralized nature causes a bottleneck for the relatively small number of processors in a network of workstations because of order-of-magniture differences in communication overheads. Moreover, improvements of basis loops scheduling methods have not effectively dealt with irregularly distributed workloads in parallel loops, which commonly occur in applications for a network of workstation. In this paper, we present a new reconfigurable and decentralized balancing method for parallel loops on a network of workstations. Since our method supplements performance balancing with those tranditional load balancing methods, it minimizes the overall execution time.