Search | Korea Science

Distributed In-Memory Caching Method for ML Workload in Kubernetes (쿠버네티스에서 ML 워크로드를 위한 분산 인-메모리 캐싱 방법)

Dong-Hyeon Youn;Seokil Song
- Journal of Platform Technology
- /
- v.11 no.4
- /
- pp.71-79
- /
- 2023
In this paper, we analyze the characteristics of machine learning workloads and, based on them, propose a distributed in-memory caching technique to improve the performance of machine learning workloads. The core of machine learning workload is model training, and model training is a computationally intensive task. Performing machine learning workloads in a Kubernetes-based cloud environment in which the computing framework and storage are separated can effectively allocate resources, but delays can occur because IO must be performed through network communication. In this paper, we propose a distributed in-memory caching technique to improve the performance of machine learning workloads performed in such an environment. In particular, we propose a new method of precaching data required for machine learning workloads into the distributed in-memory cache by considering Kubflow pipelines, a Kubernetes-based machine learning pipeline management tool.
PDF

Improving Flash Translation Layer for Hybrid Flash-Disk Storage through Sequential Pattern Mining based 2-Level Prefetching Technique (하이브리드 플래시-디스크 저장장치용 Flash Translation Layer의 성능 개선을 위한 순차패턴 마이닝 기반 2단계 프리패칭 기법)

Chang, Jae-Young;Yoon, Un-Keum;Kim, Han-Joon
- The Journal of Society for e-Business Studies
- /
- v.15 no.4
- /
- pp.101-121
- /
- 2010
This paper presents an intelligent prefetching technique that significantly improves performance of hybrid fash-disk storage, a combination of flash memory and hard disk. Since flash memory embedded in a hybrid device is much faster than hard disk in terms of I/O operations, it can be utilized as a 'cache' space to improve system performance. The basic strategy for prefetching is to utilize sequential pattern mining, with which we can extract the access patterns of objects from historical access sequences. We use two techniques for enhancing the performance of hybrid storage with prefetching. One of them is to modify a FAST algorithm for mapping the flash memory. The other is to extend the unit of prefetching to a block level as well as a file level for effectively utilizing flash memory space. For evaluating the proposed technique, we perform the experiments using the synthetic data and real UCC data, and prove the usability of our technique.
PDF KSCI

Efficient Interface circuits of Embedded Memory for RISC-based DSP Microprocessor (RICS-based DSP의 효율적인 임베디드 메모리 인터페이스)

Kim, You-Jin;Cho, Kyoung-Rok;Kim, Sung-Sik;Cheong, Eui-Seok
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.36C no.9
- /
- pp.1-12
- /
- 1999
In this paper, we designed an embedded processor with 128Kbytes EPROM and 4Kbytes SRAM based on GMS30C2132 which RISC processor with DSP functions. And a new architecture of bus sharing to control the embedded memory and external memory unit i proposed aiming at one-cycle access between memories and CPU. For embedded 128Kbytes EPROM, we designed the new expansion interface for data size at data ordering with memory organization and the efficient interface for test. The embedded SRAM supports an extended stack area high speed DSP operation, instruction cache and variable data-length control which is accessed with 4K modulo addressing schemes. The proposed new architecture and circuits reduced the memory access cycle time from 40ns and improved operation speed 2-times for program benchmark test. The chip is occupied $108.68mm^2$ using $0.6{\mu}m$ CMOS technology.
PDF

A Design of Fractional Motion Estimation Engine with 4×4 Block Unit of Interpolator & SAD Tree for 8K UHD H.264/AVC Encoder (8K UHD(7680×4320) H.264/AVC 부호화기를 위한 4×4블럭단위 보간 필터 및 SAD트리 기반 부화소 움직임 추정 엔진 설계)

Lee, Kyung-Ho;Kong, Jin-Hyeung
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.6
- /
- pp.145-155
- /
- 2013
In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Fractional Motion Estimation in 8K UHD($7680{\times}4320$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $10{\times}10$ reference data for interpolation, we design 2D cache buffer which consists of the $10{\times}10$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The gate count is 436.5Kgates. The proposed H.264/AVC Fractional Motion Estimation can support 8K UHD at 30 frames per second by running at 187MHz.
https://doi.org/10.5573/ieek.2013.50.6.145 인용 PDF KSCI

An Effective Employment and Execution Performance Improvement Method of Mobile Web Widget Resources Based on the OMTP BONDI (OMTP BONDI 기반 모바일 웹 위젯 리소스의 효율적 운용 및 구동 성능 개선 기법 연구)

Bang, Ji-Woong;Kim, Dae-Won
- Journal of Korea Multimedia Society
- /
- v.14 no.2
- /
- pp.153-170
- /
- 2011
OMTP (Open Mobile Terminal Platform) is a global forum made by telecommunications providers to promote user-oriented mobile services and data business. Devised by OMTP, BONDI is a browser-based application or a mobile web run-time platform to help widgets make good use of functions of mobile devices in a secure way. BONDI enables applications programmed with web standard technologies such as HTML, JavaScript, CSS, and AJAX to reach the internal functions of mobile devices. Since BONDI, which is not just a simple network application, can reach the internal resources of devices in standard ways, it enables the application and widgets to be developed regardless of tile OS or platform. Web browser-based widgets are vulnerable to the network environment, and their exeeution speed can be slowed as the operations of the widgets or applications become heavy. However, those web widgets will be continuously used thanks to the user-friendly simple interface and the faster speed in using web resources more than the native widgets inside the device. This study suggested a method to effectively operate and manage the resource of OMTP BONDI web widget and then provided an improved result based on a running performance evaluation experiment. The experiment was carried to improve the entire operating time by enhancing the module-loading speed. In this regard, only indispensable modules were allowed to be loaded while the BONDI widget was underway. For the purpose, the widget resource list, able to make the operating speed of the BONDI widget faster, was redefined while a widget cache was employed. In addition, the widget box, a management tool for removed widgets, was devised to store temporarily idle widgets.
https://doi.org/10.9717/kmms.2011.14.2.153 인용 PDF KSCI

A File System for User Special Functions using Speed-based Prefetch in Embedded Multimedia Systems (임베디드 멀티미디어 재생기에서 속도기반 미리읽기를 이용한 사용자기능 지원 파일시스템)

Choe, Tae-Young;Yoon, Hyeon-Ju
- Journal of KIISE:Computing Practices and Letters
- /
- v.14 no.7
- /
- pp.625-635
- /
- 2008
Portable multimedia players have some different properties compared to general multimedia file server. Some of those properties are single user ownership, relatively low hardware performance, I/O burst by user special functions, and short software development cycles. Though suitable for processing multiple user requests at a time, the general multimedia file systems are not efficient for special user functions such as fast forwards/backwards. Soml' methods has been proposed to improve the performance and functionality, which the application programs give prediction hints to the file system. Unfortunately, they require the modification of all applications and recompilation. In this paper, we present a file system that efficiently supports user special functions in embedded multimedia systems using file block allocation, buffer-cache, and prefetch. A prefetch algorithm, SPRA (SPeed-based PRefetch Algorithm) predicts the next block using I/O patterns instead of hints from applications and it is resident in the file system, so doesn't affect application development process. From the experimental file system implementation and comparison with Linux readahead-based algorithms, the proposed system shows $4.29%{\sim}52.63%$ turnaround time and 1.01 to 3,09 times throughput in average.
PDF KSCI

An Efficient MBR Compression Technique for Main Memory Multi-dimensional Indexes (메인 메모리 다차원 인덱스를 위한 효율적인 MBR 압축 기법)

Kim, Joung-Joon;Kang, Hong-Koo;Kim, Dong-Oh;Han, Ki-Joon
- Journal of Korea Spatial Information System Society
- /
- v.9 no.2
- /
- pp.13-23
- /
- 2007
Recently there is growing Interest in LBS(Location Based Service) requiring real-time services and the spatial main memory DBMS for efficient Telematics services. In order to optimize existing disk-based multi-dimensional Indexes of the spatial main memory DBMS in the main memory, multi-dimensional index structures have been proposed, which minimize failures in cache access by reducing the entry size. However, because the reduction of entry size requires compression based on the MBR of the parent node or the removal of redundant MBR, the cost of MBR reconstruction increases in index update and the efficiency of search is lowered in index search. Thus, to reduce the cost of MBR reconstruction, this paper proposed the RSMBR(Relative-Sized MBR) compression technique, which applies the base point of compression differently in case of broad distribution and narrow distribution. In case of broad distribution, compression is made based on the left-bottom point of the extended MBR of the parent node, and in case of narrow distribution, the whole MBR is divided into cells of the same size and compression is made based on the left-bottom point of each cell. In addition, MBR was compressed using a relative coordinate and size to reduce the cost of search in index search. Lastly, we evaluated the performance of the proposed RSMBR compression technique using real data, and proved its superiority.
PDF

Segment-based Cache Replacement Policy in Transcoding Proxy (트랜스코딩 프록시에서 세그먼트 기반 캐쉬 교체 정책)

Park, Yoo-Hyun;Kim, Hag-Young;Kim, Kyong-Sok
- The KIPS Transactions:PartA
- /
- v.15A no.1
- /
- pp.53-60
- /
- 2008
Streaming media has contributed to a significant amount of today's Internet Traffic. Like traditional web objects, rich media objects can benefit from proxy caching, but caching streaming media is more of challenging than caching simple web objects, because the streaming media have features such as huge size and high bandwidth. And to support various bandwidth requirements for the heterogeneous ubiquitous devices, a transcoding proxy is usually necessary to provide not only adapting multimedia streams to the client by transcoding, but also caching them for later use. The traditional proxy considers only a single version of the objects, whether they are to be cached or not. However the transcoding proxy has to evaluate the aggregate effect from caching multiple versions of the same object to determine an optimal set of cache objects. And recent researches about multimedia caching frequently store initial parts of videos on the proxy to reduce playback latency and archive better performance. Also lots of researches manage the contents with segments for efficient storage management. In this paper, we define the 9-events of transcoding proxy using 4-atomic events. According to these events, the transcoding proxy can define the next actions. Then, we also propose the segment-based caching policy for the transcoding proxy system. The performance results show that the proposing policy have a low delayed start time, high byte-hit ratio and less transcoding data.
https://doi.org/10.3745/KIPSTA.2008.15-A.1.53 인용 PDF KSCI

A 3-D Vision Sensor Implementation on Multiple DSPs TMS320C31 (다중 TMS320C31 DSP를 사용한 3-D 비젼센서 Implementation)

Oksenhendler, V.;Bensrhair, Abdelaziz;Miche, Pierre;Lee, Sang-Goog
- Journal of Sensor Science and Technology
- /
- v.7 no.2
- /
- pp.124-130
- /
- 1998
High-speed 3D vision systems are essential for autonomous robot or vehicle control applications. In our study, a stereo vision process has been developed. It consists of three steps : extraction of edges in right and left images, matching corresponding edges and calculation of the 3D map. This process is implemented in a VME 150/40 Imaging Technology vision system. It is a modular system composed by a display, an acquisition, a four Mbytes image frame memory, and three computational cards. Programmable accelerator computational modules are running at 40 MHz and are based on TMS320C31 DSP with a $64{\times}32$ bit instruction cache and two $1024{\times}32$ bit internal RAMs. Each is equipped with 512 Kbytes static RAM, 4 Mbytes image memory, 1 Mbytes flash EEPROM and a serial port. Data transfers and communications between modules are provided by three 8 bit global video bus, and three local configurable pipeline 8 bit video bus. The VME bus is dedicated to system management. Tasks between DSPs are distributed as follows: two DSPs are used to edges detection, one for the right image and the other for the left one. The last processor computes the matching process and the 3D calculation. With $512{\times}512$ pixels images, this sensor generates dense 3D maps at a rate of about 1 Hz depending of the scene complexity. Results can surely be improved by using a special suited multiprocessors cards.
PDF

A Design of 4×4 Block Parallel Interpolation Motion Compensation Architecture for 4K UHD H.264/AVC Decoder (4K UHD급 H.264/AVC 복호화기를 위한 4×4 블록 병렬 보간 움직임보상기 아키텍처 설계)

Lee, Kyung-Ho;Kong, Jin-Hyeung
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.5
- /
- pp.102-111
- /
- 2013
In this paper, we proposed a $4{\times}4$ block parallel architecture of interpolation for high-performance H.264/AVC Motion Compensation in 4K UHD($3840{\times}2160$) video real time processing. To improve throughput, we design $4{\times}4$ block parallel interpolation. For supplying the $9{\times}9$ reference data for interpolation, we design 2D cache buffer which consists of the $9{\times}9$ memory arrays. We minimize redundant storage of the reference pixel by applying the Search Area Stripe Reuse scheme(SASR), and implement high-speed plane interpolator with 3-stage pipeline(Horizontal Vertical 1/2 interpolation, Diagonal 1/2 interpolation, 1/4 interpolation). The proposed architecture was simulated in 0.13um standard cell library. The maximum operation frequency is 150MHz. The gate count is 161Kgates. The proposed H.264/AVC Motion Compensation can support 4K UHD at 72 frames per second by running at 150MHz.
https://doi.org/10.5573/ieek.2013.50.5.102 인용 PDF KSCI

Search Result 488, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)