• Title/Summary/Keyword: 메모리 효율

Search Result 1,782, Processing Time 0.027 seconds

Efficient Schemes for Scaling Ring Bandwidth in Ring-based Multiprocessor System (링 구조 다중프로세서 시스템에서 링 대역폭 확장을 위한 효율적인 방안)

  • Jang, Byoung-Soon;Chung, Sung-Woo;Jhang, Seong-Tae;Jhon, Chu-Shik
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.2
    • /
    • pp.177-187
    • /
    • 2000
  • In the past several years, many systems which adopted ring topology with high-speed unidirectional point-to-point links have emerged to overcome the limit of bus for interconnection network of clustered multiprocessor system. However, rapid increase of processor speed and performance improvement of local bus and memory system limit scalability of system with point-to-point link of standard bandwidth. Therefore, necessity to extend bandwidth is emphasized. In this paper, we adopt PANDA system as base model, which is clustering-based multiprocessor system. By simulating a model adopting commercial processor and local bus specification, we show that point-to-point link is bottleneck of system performance, and bandwidth expansion by more than 200% is needed. To expand bandwidth of interconnection network, it needs excessive design cost and time to develop new point-to-point link with doubled bandwidth. As an alternative to double bandwidth, we propose several ways to implement dual ring -simple dual ring, transaction-separated dual ring, direction-separated dual ring- by using off-the-shelf point-to-point links with IEEE standard bandwidth. We analyze pros. and cons. of each model compared with doubled-bandwidth single ring by simulation.

  • PDF

Semantic Segmentation of Drone Imagery Using Deep Learning for Seagrass Habitat Monitoring (잘피 서식지 모니터링을 위한 딥러닝 기반의 드론 영상 의미론적 분할)

  • Jeon, Eui-Ik;Kim, Seong-Hak;Kim, Byoung-Sub;Park, Kyung-Hyun;Choi, Ock-In
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.2_1
    • /
    • pp.199-215
    • /
    • 2020
  • A seagrass that is marine vascular plants plays an important role in the marine ecosystem, so periodic monitoring ofseagrass habitatsis being performed. Recently, the use of dronesthat can easily acquire very high-resolution imagery is increasing to efficiently monitor seagrass habitats. And deep learning based on a convolutional neural network has shown excellent performance in semantic segmentation. So, studies applied to deep learning models have been actively conducted in remote sensing. However, the segmentation accuracy was different due to the hyperparameter, various deep learning models and imagery. And the normalization of the image and the tile and batch size are also not standardized. So,seagrass habitats were segmented from drone-borne imagery using a deep learning that shows excellent performance in this study. And it compared and analyzed the results focused on normalization and tile size. For comparison of the results according to the normalization, tile and batch size, a grayscale image and grayscale imagery converted to Z-score and Min-Max normalization methods were used. And the tile size isincreased at a specific interval while the batch size is allowed the memory size to be used as much as possible. As a result, IoU was 0.26 ~ 0.4 higher than that of Z-score normalized imagery than other imagery. Also, it wasfound that the difference to 0.09 depending on the tile and batch size. The results were different according to the normalization, tile and batch. Therefore, this experiment found that these factors should have a suitable decision process.

Fast Multi-GPU based 3D Backprojection Method (다중 GPU 기반의 고속 삼차원 역전사 기법)

  • Lee, Byeong-Hun;Lee, Ho;Kye, Hee-Won;Shin, Yeong-Gil
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.2
    • /
    • pp.209-218
    • /
    • 2009
  • 3D backprojection is a kind of reconstruction algorithm to generate volume data consisting of tomographic images, which provides spatial information of the original 3D data from hundreds of 2D projections. The computational time of backprojection increases in proportion to the size of volume data and the number of projection images since the value of every voxel in volume data is calculated by considering corresponding pixels from hundreds of projections. For the reduction of computational time, fast GPU based 3D backprojection methods have been studied recently and the performance of them has been improved significantly. This paper presents two multiple GPU based methods to maximize the parallelism of GPU and compares the efficiencies of two methods by considering both the number of projections and the size of volume data. The first method is to generate partial volume data independently for all projections after allocating a half size of volume data on each GPU. The second method is to acquire the entire volume data by merging the incomplete volume data of each GPU on CPU. The in-complete volume data is generated using the half size of projections after allocating the full size of volume data on each GPU. In experimental results, the first method performed better than the second method when the entire volume data can be allocated on GPU. Otherwise, the second method was efficient than the first one.

  • PDF

Real-Time Shadow Generation using Image Warping (이미지 와핑을 이용한 실시간 그림자 생성 기법)

  • Kang, Byung-Kwon;Ihm, In-Sung
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.5
    • /
    • pp.245-256
    • /
    • 2002
  • Shadows are important elements in producing a realistic image. Generation of exact shapes and positions of shadows is essential in rendering since it provides users with visual cues on the scene. It is also very important to be able to create soft shadows resulted from area light sources since they increase the visual realism drastically. In spite of their importance. the existing shadow generation algorithms still have some problems in producing realistic shadows in real-time. While image-based rendering techniques can often be effective1y applied to real-time shadow generation, such techniques usually demand so large memory space for storing preprocessed shadow maps. An effective compression method can help in reducing memory requirement, only at the additional decoding costs. In this paper, we propose a new image-barred shadow generation method based on image warping. With this method, it is possible to generate realistic shadows using only small sizes of pre-generated shadow maps, and is easy to extend to soft shadow generation. Our method will be efficiently used for generating realistic scenes in many real-time applications such as 3D games and virtual reality systems.

Utilizing Channel Bonding-based M-n and Interval Cache on a Distributed VOD Server (효율적인 분산 VOD 서버를 위한 Channel Bonding 기반 M-VIA 및 인터벌 캐쉬의 활용)

  • Chung, Sang-Hwa;Oh, Soo-Cheol;Yoon, Won-Ju;kim, Hyun-Pil;Choi, Young-In
    • The KIPS Transactions:PartA
    • /
    • v.12A no.7 s.97
    • /
    • pp.627-636
    • /
    • 2005
  • This paper presents a PC cluster-based distributed video on demand (VOD) server that minimizes the load of the interconnection network by adopting channel bonding-based MVIA and the interval cache algorithm Video data is distributed to the disks of each server node of the distributed VOD server and each server node receives the data through the interconnection network and sends it to clients. The load of the interconnection network increases because of the large volume of video data transferred. We adopt two techniques to reduce the load of the interconnection network. First, an Msupporting channel bonding technique is adopted for the interconnection network. n which is a user-level communication protocol that reduces the overhead of the TCP/IP protocol in cluster systems, minimizes the time spent in communicating. We increase the bandwidth of the interconnection network using the channel bonding technique with MThe channel bonding technique expands the bandwidth by sending data concurrently through multiple network cards. Second, the interval cache reduces traffic on the interconnection network by caching the video data transferred from the remote disks in main memory Experiments using the distributed VOD server of this paper showed a maximum performance improvement of $30\%$ compared with a distributed VOD server without channel bonding-based MVIA and the interval cache, when used with a four-node PC cluster.

Tuple Pruning Using Bloom Filter for Packet Classification (패킷 분류를 위한 블룸 필터 이용 튜플 제거 알고리즘)

  • Kim, So-Yeon;Lim, Hye-Sook
    • Journal of KIISE:Information Networking
    • /
    • v.37 no.3
    • /
    • pp.175-186
    • /
    • 2010
  • Due to the emergence of new application programs and the fast growth of Internet users, Internet routers are required to provide the quality of services according to the class of input packets, which is identified by wire-speed packet classification. For a pre-defined rule set, by performing multi-dimensional search using various header fields of an input packet, packet classification determines the highest priority rule matching to the input packet. Efficient packet classification algorithms have been widely studied. Tuple pruning algorithm provides fast classification performance using hash-based search against the candidate tuples that may include matching rules. Bloom filter is an efficient data structure composed of a bit vector which represents the membership information of each element included in a given set. It is used as a pre-filter determining whether a specific input is a member of a set or not. This paper proposes new tuple pruning algorithms using Bloom filters, which effectively remove unnecessary tuples which do not include matching rules. Using the database known to be similar to actual rule sets used in Internet routers, simulation results show that the proposed tuple pruning algorithm provides faster packet classification as well as consumes smaller memory amount compared with the previous tuple pruning algorithm.

Instructions and Data Prefetch Mechanism using Displacement History Buffer (변위 히스토리 버퍼를 이용한 명령어 및 데이터 프리페치 기법)

  • Jeong, Yong Su;Kim, JinHyuk;Cho, Tae Hwan;Choi, SangBang
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.10
    • /
    • pp.82-94
    • /
    • 2015
  • In this paper, we propose hardware prefetch mechanism with an efficient cache replacement policy by giving priority to the trigger block in which a spatial region and producing a spatial region by using the displacement field. It could be taken into account the sequence of the program since a history is based on the trigger block of history record, and it could be quickly prefetching the instructions or data address by adding a stored value to the trigger address and displacement field since a history is stored as a displacement value. Also, we proposed a method of replacing at random by the cache replacement policy from the low priority block when the cache area is full after giving priority to the trigger block. We analyzed using the memory simulator program gem5 and PARSEC benchmark to assess the performance of the hardware prefetcher. As a result, compared to the existing hardware prefecture to generate the spatial region using a bit vector, L1 data cache miss rate was reduced about 44.5% on average and an average of 26.1% of L1 instruction misses occur. In addition, IPC (Instruction Per Cycle) showed an improvement of about 23.7% on average.

A Mesh Partitioning Using Adaptive Vertex Clustering (적응형 정점 군집화를 이용한 메쉬 분할)

  • Kim, Dae-Young;Kim, Jong-Won;Lee, Hae-Young
    • Journal of the Korea Computer Graphics Society
    • /
    • v.15 no.3
    • /
    • pp.19-26
    • /
    • 2009
  • In this paper, a new adaptive vertex clustering using a KD-tree is presented for 3D mesh partitioning. A vertex clustering is used to divide a huge 3D mesh into several partitions for various mesh processing. An octree-based clustering and K-means clustering are currently leading techniques. However, the octree-based methods practice uniform space divisions and so each partitioned mesh has non-uniformly distributed number of vertices and the difference in its size. The K-means clustering produces uniformly partitioned meshes but takes much time due to many repetitions and optimizations. Therefore, we propose to use a KD-tree to efficiently partition meshes with uniform number of vertices. The bounding box region of the given mesh is adaptively subdivided according to the number of vertices included and dynamically determined axis. As a result, the partitioned meshes have a property of compactness with uniformly distributed vertices.

  • PDF

Gate-Level Conversion Methods between Boolean and Arithmetic Masks (불 마스크와 산술 마스크에 대한 게이트 레벨 변환기법)

  • Baek, Yoo-Jin
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.11
    • /
    • pp.8-15
    • /
    • 2009
  • Side-channel attacks including the differential power analysis attack are often more powerful than classical cryptanalysis and have to be seriously considered by cryptographic algorithm's implementers. Various countermeasures have been proposed against such attacks. In this paper, we deal with the masking method, which is known to be a very effective countermeasure against the differential power analysis attack and propose new gate-level conversion methods between Boolean and arithmetic masks. The new methods require only 6n-5 XOR and 2n-2 AND gates with 3n-2 gate delay for converting n-bit masks. The basic idea of the proposed methods is that the carry and the sum bits in the ripple adder are manipulated in a way that the adversary cannot detect the relation between these bits and the original raw data. Since the proposed methods use only bitwise operations, they are especially useful for DPA-securely implementing cryptographic algorithms in hardware which use both Boolean and arithmetic operations. For example, we applied them to securely implement the block encryption algorithm SEED in hardware and present its detailed implementation result.

A 10b 100MS/s 27.2mW $0.8mm^2$ 0.18um CMOS Pipeline ADC with Various Circuit Sharing Schemes (다양한 회로 공유기법을 사용하는 10비트 100MS/s 27.2mW $0.8mm^2$ 0.18um CMOS Pipeline ADC)

  • Yoon, Kun-Yong;Lee, Se-Won;Choi, Min-Ho;Lee, Seung-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.4
    • /
    • pp.53-63
    • /
    • 2009
  • This work proposes a 10b 100MS/s 27.2mW $0.8mm^2$ 0.18um CMOS ADC for WLAN such as an IEEE 802.11n standard. The proposed ADC employs a three-stage pipeline architecture and minimizes power consumption and chip area by sharing as many circuits as possible. Two multiplying DACs share a single amplifier without MOS switches connected in series while the shared amplifier does not show a conventional memory effect. All three flash ADCs use only one resistor ladder while the second and third flash ADCs share all pre-amps to further reduce power consumption and chip area. The interpolation circuit employed in the flash ADCs halves the required number of pre-amps and an input-output isolated dynamic latch reduces the increased kickback noise caused by the pre-amp sharing. The prototype ADC implemented in a 0.18um n-well 1P6M CMOS process shows the DNL and INL within 0.83LSB and 1.52LSB at 10b, respectively. The ADC measures an SNDR of 52.1dB and an SFDR of 67.6dB at a sampling rate of 100MS/s. The ADC with an active die area of $0.8mm^2$ consumes 27.2mW at 1.8V and 100MS/s.