• Title/Summary/Keyword: in-memory computing

Search Result 766, Processing Time 0.033 seconds

Analysis of Switched Reluctance Motors using Parameters obtained by Finite Element Method (유한요소법으로 계산한 파라메타를 이용한 스위치드 릴럭턴스 모타의 해석)

  • Lee, Joon-Ho;Lee, Sang-Ho;Lee, Ki-Sik
    • Proceedings of the KIEE Conference
    • /
    • 1997.07a
    • /
    • pp.47-50
    • /
    • 1997
  • The FEM(Finite Element Method) can be used to analysis SRM(Switched Reluctance Motor) as it can account for the salient pole geometry of the stator and rotor and the nonlinear properties of the magnetic materials. However, FEM requirers a lot of computer memory and computing time because, the kind of SRM drivers is verity and the switching strategies are various for one SRM driver. In this paper we proposed the method of analysis of a SRM which results are similar to FEM and has very short computing time. The Inductance and torque for each phase current at each rotor position are calculated by using two-dimensional nonlinear FEM analysis. Using the look-up table of inductance and torque and the voltage equations of SRM we obtained the phase current and torque. To verify proposed algorithm, 3 phase 6/4 SRM is analysed and found a good agreement with FEM results. And computing time is about 1/1600 of the FEM analysis.

  • PDF

Performance Evaluation of Hypervisor VMs and Nested VMs Overcommitting Memory in Nested Virtualization Environments (중첩 가상화 환경에서 메모리 오버커밋을 하는 하이퍼바이저 VM과 중첩 VM의 성능 평가)

  • Lyoo, Taemuk;Lim, JongBeom;Chung, Kwang-Sik;Suh, Teaweon;Yu, Heonchang
    • Annual Conference of KIPS
    • /
    • 2013.11a
    • /
    • pp.61-64
    • /
    • 2013
  • 가상화는 가상의 자원이 물리적 자원에 접근할 수 있게 해주는 기술이며 VM(가상머신)을 다수 설치하여 VM의 수만큼 운영체제들을 이용할 수 있다. 이러한 가상화는 자원의 낭비를 막고 관리비용을 줄이기 위해 사용한다. 가상화 기술은 CPU, 메모리, I/O 가상화로 구분 지을 수 있으며 이 중 메모리 가상화 기술은 메모리 자원의 효율적인 사용을 가능하게 해준다. 여러 VM들이 실제 머신의 메모리보다 많은 메모리를 할당받아 사용하는 것이 가능한데 이것을 오버커밋 상태라고 한다. 중첩 가상화는 VM에 하드웨어 가상화 기법의 사용을 허용하게 하여 VM 위에 또 다른 VM이 동작할 수 있는 환경을 제공해준다. 이와 같은 (중첩) 가상화 환경에서의 메모리 접근은 일반적으로 하드웨어 지원을 통한 중첩 페이징 기법을 이용하여 메모리의 접근이 이루어진다. 본 논문에서는 오버커밋 발생 시 중첩 VM과 하이퍼바이저 VM의 성능 차이를 실험을 통하여 보여주고자 한다.

Data Replication and Migration Scheme for Load Balancing in Distributed Memory Environments (분산 인-메모리 환경에서 부하 분산을 위한 데이터 복제와 이주 기법)

  • Choi, Kitae;Yoon, Sangwon;Park, Jaeyeol;Lim, Jongtae;Bok, Kyoungsoo;Yoo, Jaesoo
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.1
    • /
    • pp.44-49
    • /
    • 2016
  • Recently, data has been growing dramatically along with the growth of social media and digital devices. A distributed memory processing system has been used to efficiently process large amounts of data. However, if a load is concentrated in a certain node in distributed environments, a node performance significantly degrades. In this paper, we propose a load balancing scheme to distribute load in a distributed memory environment. The proposed scheme replicates hot data to multiple nodes for managing a node's load and migrates the data by considering the load of the nodes when nodes are added or removed. The client reduces the number of accesses to the central server by directly accessing the data node through the metadata information of the hot data. In order to show the superiority of the proposed scheme, we compare it with the existing load balancing scheme through performance evaluation.

Design and Evaluation of the Internet-Of-Small-Things Prototype Powered by a Solar Panel Integrated with a Supercapacitor

  • Park, Sangsoo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.11
    • /
    • pp.11-19
    • /
    • 2021
  • In this paper, we propose a prototype platform combined with the power management system using, as an auxiliary power storage device, a supercapacitor that can be fast charged and discharged with high power efficiency as well as semi-permanent charge and discharge cycle life. For the proposed platform, we designed a technique which is capable of detecting the state of power cutoff or resumption of power supplied from the solar panel in accordance with physical environment changes through an interrupt attached to the micro-controller was developed. To prevent data loss in a computing environment in which continuous power supply is not guaranteed, we implemented a low-level system software in the micro-controller to transfer program context and data in volatile memory to nonvolatile memory when power supply is cut off. Experimental results shows that supercapacitors effectively supply temporary power as auxiliary power storage devices. Various benchmarks also confirm that power state detection and transfer of program context and data from volatile memory to nonvolatile memory have low overhead.

FAST : A Log Buffer Scheme with Fully Associative Sector Translation for Efficient FTL in Flash Memory (FAST :플래시 메모리 FTL을 위한 완전연관섹터변환에 기반한 로그 버퍼 기법)

  • Park Dong-Joo;Choi Won-Kyung;Lee Sang-Won
    • The KIPS Transactions:PartA
    • /
    • v.12A no.3 s.93
    • /
    • pp.205-214
    • /
    • 2005
  • Flash memory is at high speed used as storage of personal information utilities, ubiquitous computing environments, mobile phones, electronic goods, etc. This is because flash memory has the characteristics of low electronic power, non-volatile storage, high performance, physical stability, portability, and so on. However, differently from hard disks, it has a weak point that overwrites on already written block of flash memory is impossible to be done. In order to make an overwrite possible, an erase operation on the written block should be performed before the overwrite, which lowers the performance of flash memory highly. In order to solve this problem the flash memory controller maintains a system software module called the flash translation layer(FTL). Of many proposed FTL schemes, the log block buffer scheme is best known so far. This scheme uses a small number of log blocks of flash memory as a write buffer, which reduces the number of erase operations by overwrites, leading to good performance. However, this scheme shows a weakness of low page usability of log blocks. In this paper, we propose an enhanced log block buffer scheme, FAST(Full Associative Sector Translation), which improves the page usability of each log block by fully associating sectors to be written by overwrites to the entire log blocks. We also show that our FAST scheme outperforms the log block buffer scheme.

Performance Enhancement and Evaluation of a Deep Learning Framework on Embedded Systems using Unified Memory (통합메모리를 이용한 임베디드 환경에서의 딥러닝 프레임워크 성능 개선과 평가)

  • Lee, Minhak;Kang, Woochul
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.7
    • /
    • pp.417-423
    • /
    • 2017
  • Recently, many embedded devices that have the computing capability required for deep learning have become available; hence, many new applications using these devices are emerging. However, these embedded devices have an architecture different from that of PCs and high-performance servers. In this paper, we propose a method that improves the performance of deep-learning framework by considering the architecture of an embedded device that shares memory between the CPU and the GPU. The proposed method is implemented in Caffe, an open-source deep-learning framework, and is evaluated on an NVIDIA Jetson TK1 embedded device. In the experiment, we investigate the image recognition performance of several state-of-the-art deep-learning networks, including AlexNet, VGGNet, and GoogLeNet. Our results show that the proposed method can achieve significant performance gain. For instance, in AlexNet, we could reduce image recognition latency by about 33% and energy consumption by about 50%.

An Out of Core Linear Direct Solution Method for Large Scale Structural Analysis (대규모 구조해석을 위한 보조기억장치 활용 선형 직접해법)

  • Kim, Min-Ki;Kim, Seung Jo
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.42 no.6
    • /
    • pp.445-452
    • /
    • 2014
  • This paper discusses the multifrontal direct solution method with out of core storage for large scale structural analysis in a limited computing resource. Large scale structural analysis requires huge amount of memory space and computation, so out of core solution method is needed in limited computing resource. In this research, out of core multifrontal solution algorithm which utilize the small size of physical memory and minimize the amount of access of low speed out of core storage is introduced. Three ideas, which are stack space in lower trianglar part of square factorization matrix, inverse stack data structure and selective data caching and recovery by data block size, are proposed.

A design of GPU container co-execution framework measuring interference among applications (GPU 컨테이너 동시 실행에 따른 응용의 간섭 측정 프레임워크 설계)

  • Kim, Sejin;Kim, Yoonhee
    • KNOM Review
    • /
    • v.23 no.1
    • /
    • pp.43-50
    • /
    • 2020
  • As General Purpose Graphics Processing Unit (GPGPU) recently plays an essential role in high-performance computing, several cloud service providers offer GPU service. Most cluster orchestration platforms in a cloud environment using containers allocate the integer number of GPU to jobs and do not allow a node shared with other jobs. In this case, resource utilization of a GPU node might be low if a job does not intensively require either many cores or large size of memory in GPU. GPU virtualization brings opportunities to realize kernel concurrency and share resources. However, performance may vary depending on characteristics of applications running concurrently and interference among them due to resource contention on a node. This paper proposes GPU container co-execution framework with multiple server creation and execution based on Kubernetes, container orchestration platform for measuring interference which may be occurred by sharing GPU resources. Performance changes according to scheduling policies were investigated by executing several jobs on GPU. The result shows that optimal scheduling is not possible only considering GPU memory and computing resource usage. Interference caused by co-execution among applications is measured using the framework.

Performance Improvement for PVM by Zero-copy Mechanism (Zero-copy 기술을 이용한 PVM의 성능 개선)

  • 임성택;심재홍;최경희;정기현;김재훈;문성근
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.5B
    • /
    • pp.899-912
    • /
    • 2000
  • PVM provides users with a single image of high performance parallel computing machine by collecting machines distributed over a network. Low communication overhead is essential to effectively run applications on PVM based platforms. In the original PVM, three times of memory copies are required for a PVM task to send a message to a remote task, which results in performance degradation. We propose a zero-copy model using global shared memory that can be accessed by PVM tasks, PVM daemon, and network interface card(NIC). In the scheme, a task packs data into global shared memory, and notify daemon that the data is ready to be sent, then daemon routes the data to a remote task to which it is sent with no virtual data copy overhead. Experimental result reveals that the message round trip time between two machines is reduced significantly in the proposed zero-copy scheme.

  • PDF

Instruction-corruption-less Binary Modification Mechanism for Static Stack Protections (이진 조작을 통한 정적 스택 보호 시 발생하는 명령어 밀림현상 방지 기법)

  • Lee, Young-Rim;Kim, Young-Pil;Yoo, Hyuck
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.1
    • /
    • pp.71-75
    • /
    • 2008
  • Many sensor operating systems have memory limitation constraint; therefore, stack memory areas of threads resides in a single memory space. Because most target platforms do not have hardware MMY (Memory Management Unit), it is difficult to protect each stack area. The method to solve this problem is to exchange original stack handling instructions in binary code for wrapper routines to protect stack area. In this exchanging phase, instruction corruption problem occurs due to difference of each instruction length between stack handling instructions and branch instructions. In this paper, we propose the algorithm to call a target routine without instruction corruption problem. This algorithm can reach a target routine by repeating branch instructions to have a short range. Our solution makes it easy to apply security patch and maintain upgrade of software of sensor node.