• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.035 seconds

Efficient Kernel Integrity Monitor Design for Commodity Mobile Application Processors

  • Heo, Ingoo;Jang, Daehee;Moon, Hyungon;Cho, Hansu;Lee, Seungwook;Kang, Brent Byunghoon;Paek, Yunheung
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.15 no.1
    • /
    • pp.48-59
    • /
    • 2015
  • In recent years, there are increasing threats of rootkits that undermine the integrity of a system by manipulating OS kernel. To cope with the rootkits, in Vigilare, the snoop-based monitoring which snoops the memory traffics of the host system was proposed. Although the previous work shows its detection capability and negligible performance loss, the problem is that the proposed design is not acceptable in recent commodity mobile application processors (APs) which have become de facto the standard computing platforms of smart devices. To mend this problem and adopt the idea of snoop-based monitoring in commercial products, in this paper, we propose a snoop-based monitor design called S-Mon, which is designed for the AP platforms. In designing S-Mon, we especially consider two design constraints in the APs which were not addressed in Vigilare; the unified memory model and the crossbar switch interconnect. Taking into account those, we derive a more realistic architecture for the snoop-based monitoring and a new hardware module, called the region controller, is also proposed. In our experiments on a simulation framework modeling a productionquality device, it is shown that our S-Mon can detect the rootkit attacks while the runtime overhead is also negligible.

Study of the semi-segregation algorithms of the incompressible Navier-Stokes equations using P2P1 finite element formulation (P2P1 유한요소 공식을 이용한 비압축성 Navier-Stokes 방정식의 반-분리 해법에 관한 연구)

  • Cho, Myung-H.;Choi, Hyoung-G.;Yoo, Jung-Y.;Park, Jae-I.
    • 유체기계공업학회:학술대회논문집
    • /
    • 2006.08a
    • /
    • pp.349-352
    • /
    • 2006
  • The conventional segregated finite element formulation produces a small and simple matrix at each step than in an integrated formulation. And the memory and cost requirements of computations are significantly reduced because the pressure equation for the mass conservation of the Navier-Stokes equations is constructed only once if the mesh is fixed. However, segregated finite element formulation solves Poisson equation of elliptic type so that it always needs a pressure boundary condition along a boundary even when physical information on pressure is not provided. On the other hand, the conventional integrated finite element formulation in which the governing equations are simultaneously treated has an advantage over a segregated formulation in the sense that it can give a more robust convergence behavior because all variables are implicitly combined. Further it needs a very small number of iterations to achieve convergence. However, the saddle-paint-type matrix (SPTM) in the integrated formulation is assembled and preconditioned every time step, so that it needs a large memory and computing time. Therefore, we newly proposed the P2PI semi-segregation formulation. In order to utilize the fact that the pressure equation is assembled and preconditioned only once in the segregated finite element formulation, a fixed symmetric SPTM has been obtained for the continuity constraint of the present semi-segregation finite element formulation. The momentum equation in the semi-segregation finite element formulation will be separated from the continuity equation so that the saddle-point-type matrix is assembled and preconditioned only once during the whole computation as long as the mesh does not change. For a comparison of the CPU time, accuracy and condition number between the two methods, they have been applied to the well-known benchmark problem. It is shown that the newly proposed semi-segregation finite element formulation performs better than the conventional integrated finite element formulation in terms of the computation time.

  • PDF

Scalable Ontology Reasoning Using GPU Cluster Approach (GPU 클러스터 기반 대용량 온톨로지 추론)

  • Hong, JinYung;Jeon, MyungJoong;Park, YoungTack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.61-70
    • /
    • 2016
  • In recent years, there has been a need for techniques for large-scale ontology inference in order to infer new knowledge from existing knowledge at a high speed, and for a diversity of semantic services. With the recent advances in distributed computing, developments of ontology inference engines have mostly been studied based on Hadoop or Spark frameworks on large clusters. Parallel programming techniques using GPGPU, which utilizes many cores when compared with CPU, is also used for ontology inference. In this paper, by combining the advantages of both techniques, we propose a new method for reasoning large RDFS ontology data using a Spark in-memory framework and inferencing distributed data at a high speed using GPGPU. Using GPGPU, ontology reasoning over high-capacity data can be performed as a low cost with higher efficiency over conventional inference methods. In addition, we show that GPGPU can reduce the data workload on each node through the Spark cluster. In order to evaluate our approach, we used LUBM ranging from 10 to 120. Our experimental results showed that our proposed reasoning engine performs 7 times faster than a conventional approach which uses a Spark in-memory inference engine.

A Kernel Module to Support High-Performance Intra-Node Communication for Multi-Core Systems (멀티 코어 시스템을 위한 고속 노드내 통신 지원 모듈)

  • Jin, Hyun-Wook;Kang, Hyun-Goo;Kim, Jong-Soon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.9
    • /
    • pp.407-415
    • /
    • 2007
  • In parallel cluster computing systems, the efficiency of communication between computing nodes is one of important factors that decide overall system performance. Accordingly, many researchers have studied on high-performance inter-node communication. The recently launched multi-core processor, however. increases the importance of intra-node communication as well because the more the number of cores in a node, the more the number of parallel processes running in the same node. Though there have been studies on intra-node communications, these have limited considerations on the state-of-the-art systems. In this paper, we propose a Linux kernel module that minimizes the number of data copy by exploiting the memory mapping mechanism for high-performance intra-node communication. The proposed kernel module supports the Linux kernel version 2.6. The performance measurements over a multi-core system present that the proposed kernel module can achieve lower latency up to 62% and higher throughput up to 144% than an existing kernel module approach. In addition, the measurements reveal that the performance of intra-node communication can vary significantly based on whether the cores that run the communication processes are belong to the same processor package (i.e., sharing the L2 cache).

On Reducing False Positives of a Bloom Filter in Trie-Based Algorithms

  • Mun, Ju Hyoung;Lim, Hyesook
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.3
    • /
    • pp.163-168
    • /
    • 2015
  • Many IP address lookup approaches employ Bloom filters to obtain a high-speed search performance. Especially, it has been recently studied that the search performance of trie-based algorithms can be significantly improved by adding Bloom filters. In such algorithms, the number of trie accesses can be greatly reduced because Bloom filters can determine whether a node exists in a trie without actually accessing the trie. Bloom filters do not have false negatives but have false positives. False positives can lead to unnecessary trie accesses. The false positive rate must thus be reduced to enhance the performance of lookup algorithms applying Bloom filters. One important characteristic of trie-based algorithms is that all the ancestors of a node are also stored. The proposed algorithm utilizes this characteristic in reducing the false positive rate of a Bloom filter without increasing the size of the memory for the Bloom filter. When a Bloom filter produces a positive result for a node of a trie, we propose to check whether the ancestors of the node are also positives. Because Bloom filters have no false negatives, the negatives of any of the ancestors mean that the positive of the node is false. In other words, we propose to use more Bloom filter queries to reduce the false positive rate of a Bloom filter in trie-based algorithms. Simulation results show that querying one ancestor of a node can reduce the false positive rate by up to 67% with exactly the same architecture and the same memory requirement. The proposed approach can be applied to other trie-based algorithms employing Bloom filters.

Dynamic Threads Stack Management Scheme for Sensor Operating Systems under Space-Constrained (공간 제약하의 센서 운영체제를 위한 동적 쓰레드 스택관리 기법)

  • Yi, Sang-Ho;Cho, Yoo-Kun;Hong, Ji-Man
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.11
    • /
    • pp.572-580
    • /
    • 2007
  • Wireless sensor networks are sensing, computing and communication infrastructures that allow us to monitor, instrument, observe, and respond to phenomena in the harsh environment. Generally, the wireless sensor networks are composed of many deployed sensor nodes that were designed to be very cost-efficient in terms of production cost. For example, UC Berkeley's MICA motes have only 8-bit CPU, 4KB RAM, and 128KB FLASH memory space. Therefore, sensor operating systems that run on the sensor nodes should be able to operate efficiently in terms of the resource management. In this paper, we present a dynamic threads stack management scheme for space-constrained and multi-threaded sensor operating systems. In this scheme, the necessary stack space of each function is measured on compile-time. Then, the information is used to dynamically allocate and release each function's stack space on run-time. It was implemented in Nano-Qplus sensor operating system. Our experimental results show that the proposed scheme outperforms the existing fixed-size stack allocation mechanism.

High-Performance FFT Using Data Reorganization (데이터 재구성 기법을 이용한 고성능 FFT)

  • Park Neungsoo;Choi Yungho
    • The KIPS Transactions:PartA
    • /
    • v.12A no.3 s.93
    • /
    • pp.215-222
    • /
    • 2005
  • The efficient utilization of cache memories is a key factor in achieving high performance for computing large signal transforms. Nonunit stride access in computation of large DFTs causes cache conflict misses, thereby resulting in poor cache performance. It leads to a severe degradation in overall performance. In this paper, we propose a dynamic data layout approach considering the memory hierarchy system. In our approach, data reorganization is performed between computation stages to reduce the number of cache misses. Also, we develop an efficient search algorithm to determine the optimal tree with the minimum execution time among possible factorization trees considering the size of DFTs and the data access stride. Our approach is applied to compute the fast Fourier Transform (FFT). Experiments were performed on Pentium 4, $Athlon^{TM}$ 64, Alpha 21264, UtraSPARC III. Experiment results show that our FFT achieve performance improvement of up to 3.37 times better than the previous FFT packages.

An Efficient Distributed Shared Memory System for Parallel GIS (병렬 GIS를 위한 효율적인 분산공유메모리 시스템)

  • Jeong, Sang-Hwa;Ryu, Gwang-Yeol;Go, Yun-Yeong;Gwak, Min-Seok
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.5 no.6
    • /
    • pp.700-707
    • /
    • 1999
  • 본 논문에서는 GIS 관련 연산을 실시간에 효율적으로 처리하기 위한 분산공유메모리 기반 병렬처리 시스템을 제안한다. 본 논문의 분산공유메모리 시스템은 메시지전달 방식의 분산메모리 MIMD 컴퓨터 상에 소프트웨어 기반 분산공유메모리 모듈을 탑재함으로써 구현되었다. 또한 GIS 연산의 기본이 되는 공간 객체를 공유의 기본 단위로 설정하고, GIS 데이타의 특성을 반영하여 읽기전용 공유데이타 타입을 추가하였으며, 네트워크 오버헤드를 줄이기 위하여 복수의 객체를 한번에 읽어오는 bulk access가 가능하도록 하였다. 본 시스템에서는 GIS 데이타의 효율적인 분배를 위하여 부하균등화 기법으로 guided self scheduling을 사용하였다. 실험결과 본 시스템은 네트워크 캐쉬의 효율적인 활용을 통하여 소프트웨어 기반 분산메모리 시스템의 오버헤드에도 불구하고 MPI 기반 메시지전달 방식에 비하여 향상된 성능을 얻을 수 있었다.Abstract In this paper, we propose a distributed shared memory(DSM) based parallel processing system to process GIS related computations efficiently in real time. The system is based on a software DSM module implemented on top of a distributed MIMD computer. In the DSM system, spatial object, which is a fundamental structure to represent GIS data, is used as a basic unit for sharing, and a read-only shared data type is added to reflect the characteristics of GIS data. In addition, a bulk access to multiple shared data is made possible to reduce the network overhead. A guided self scheduling method is devised for efficient load balancing in distributing GIS data to parallel processors. The experimental results show that the DSM system performs better than an MPI based message-passing system through the efficient utilization of network cache in spite of the system's software overhead.

Damage detection in plate structures using frequency response function and 2D-PCA

  • Khoshnoudian, Faramarz;Bokaeian, Vahid
    • Smart Structures and Systems
    • /
    • v.20 no.4
    • /
    • pp.427-440
    • /
    • 2017
  • One of the suitable structural damage detection methods using vibrational characteristics are damage-index-based methods. In this study, a damage index for identifying damages in plate structures using frequency response function (FRF) data has been provided. One of the significant challenges of identifying the damages in plate structures is high number of degrees of freedom resulting in decreased damage identifying accuracy. On the other hand, FRF data are of high volume and this dramatically decreases the computing speed and increases the memory necessary to store the data, which makes the use of this method difficult. In this study, FRF data are compressed using two-dimensional principal component analysis (2D-PCA), and then converted into damage index vectors. The damage indices, each of which represents a specific condition of intact or damaged structures are stored in a database. After computing damage index of structure with unknown damage and using algorithm of lookup tables, the structural damage including the severity and location of the damage will be identified. In this study, damage detection accuracy using the proposed damage index in square-shaped structural plates with dimensions of 3, 7 and 10 meters and with boundary conditions of four simply supported edges (4S), three clamped edges (3C), and four clamped edges (4C) under various single and multiple-element damage scenarios have been studied. Furthermore, in order to model uncertainties of measurement, insensitivity of this method to noises in the data measured by applying values of 5, 10, 15 and 20 percent of normal Gaussian noise to FRF values is discussed.

Design and Implementation of a Spatio-Temporal Middleware for Ubiquitous Environments (유비쿼터스 환경을 위한 시공간 미들웨어의 설계 및 구현)

  • Kim, Jeong-Joon;Jeong, Yeon-Jong;Kim, Dong-Oh;Han, Ki-Joon
    • Journal of Korea Spatial Information System Society
    • /
    • v.11 no.1
    • /
    • pp.43-54
    • /
    • 2009
  • As R&D(Research and Development) is going on actively to develop technologies for the ubiquitous computing environment, which Is the human-oriented future computing environment, GIS dealing with spatio-temporal data is emerging as a promising technology. This also increases the necessity of the middleware for providing services to give interoperability in various heterogeneous environments. The core technologies of the middleware are real-time processing technology of data streams coming unceasingly from positioning systems and data stream processing technology developed for non-spatio-temporal data. However, it has problems in processing queries on spatio-temporal data efficiently. Accordingly, this paper designed and implemented the spatio-temporal middleware that provides interoperability between a mobile spatio-temporal DBMS(DataBase Management System) and a server spatio-temporal MMDBMS(Main Memory DataBase Management System). The spatio-temporal middleware maintains interoperability among heterogeneous devices and guarantees data integrity in query processing through real-time processing of unceasing spatio-temporal data streams and two way synchronization of spatio-temporal DBMSs. In addition, it manages session for the connection of each spatio-temporal DBMS and manages resources for its stable operation. Finally, this paper proved the usability of the spatio-temporal middleware by applying it to a real-time position tracking system.

  • PDF