Search | Korea Science

Performance Evaluation and Prediction on a Clustered SMP System for Aerospace CED Applications with Hybrid Paradigm

Matsuo Yuichi;Sueyasu Naoki;Inari Tomohide
- 한국전산유체공학회:학술대회논문집
- /
- 2006.05a
- /
- pp.275-278
- /
- 2006
Japan Aerospace Exploration Agency has introduced a new terascale clusterd SMP system as a main compute engine of Numerical Simulator III for aerospace science and engineering research purposes. The system is using Fujitsu PRIMEPOWER HPC2500; it has computing capability of 9.3Tflop/s peak performance and 3.6TB of user memory, with about 1,800 scalar processors for computation. In this paper, we first present the performance evaluation results for aerospace CFD applications with hybrid programming paradigm used at JAXA. Next we propose a performance prediction formula for hybrid codes based on a simple extension of AMhhal's law, and discuss about the predicted and measured performances for some typical hybrid CFD codes.
PDF

Memory Organization for a Fuzzy Controller.

Jee, K.D.S.;Poluzzi, R.;Russo, B.
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 1993.06a
- /
- pp.1041-1043
- /
- 1993
Fuzzy logic based Control Theory has gained much interest in the industrial world, thanks to its ability to formalize and solve in a very natural way many problems that are very difficult to quantify at an analytical level. This paper shows a solution for treating membership function inside hardware circuits. The proposed hardware structure optimizes the memoried size by using particular form of the vectorial representation. The process of memorizing fuzzy sets, i.e. their membership function, has always been one of the more problematic issues for the hardware implementation, due to the quite large memory space that is needed. To simplify such an implementation, it is commonly [1,2,8,9,10,11] used to limit the membership functions either to those having triangular or trapezoidal shape, or pre-definite shape. These kinds of functions are able to cover a large spectrum of applications with a limited usage of memory, since they can be memorized by specifying very few parameters ( ight, base, critical points, etc.). This however results in a loss of computational power due to computation on the medium points. A solution to this problem is obtained by discretizing the universe of discourse U, i.e. by fixing a finite number of points and memorizing the value of the membership functions on such points [3,10,14,15]. Such a solution provides a satisfying computational speed, a very high precision of definitions and gives the users the opportunity to choose membership functions of any shape. However, a significant memory waste can as well be registered. It is indeed possible that for each of the given fuzzy sets many elements of the universe of discourse have a membership value equal to zero. It has also been noticed that almost in all cases common points among fuzzy sets, i.e. points with non null membership values are very few. More specifically, in many applications, for each element u of U, there exists at most three fuzzy sets for which the membership value is ot null [3,5,6,7,12,13]. Our proposal is based on such hypotheses. Moreover, we use a technique that even though it does not restrict the shapes of membership functions, it reduces strongly the computational time for the membership values and optimizes the function memorization. In figure 1 it is represented a term set whose characteristics are common for fuzzy controllers and to which we will refer in the following. The above term set has a universe of discourse with 128 elements (so to have a good resolution), 8 fuzzy sets that describe the term set, 32 levels of discretization for the membership values. Clearly, the number of bits necessary for the given specifications are 5 for 32 truth levels, 3 for 8 membership functions and 7 for 128 levels of resolution. The memory depth is given by the dimension of the universe of the discourse (128 in our case) and it will be represented by the memory rows. The length of a world of memory is defined by: Length = nem (dm(m)＋dm(fm) Where: fm is the maximum number of non null values in every element of the universe of the discourse, dm(m) is the dimension of the values of the membership function m, dm(fm) is the dimension of the word to represent the index of the highest membership function. In our case then Length=24. The memory dimension is therefore 128*24 bits. If we had chosen to memorize all values of the membership functions we would have needed to memorize on each memory row the membership value of each element. Fuzzy sets word dimension is 8*5 bits. Therefore, the dimension of the memory would have been 128*40 bits. Coherently with our hypothesis, in fig. 1 each element of universe of the discourse has a non null membership value on at most three fuzzy sets. Focusing on the elements 32,64,96 of the universe of discourse, they will be memorized as follows: The computation of the rule weights is done by comparing those bits that represent the index of the membership function, with the word of the program memor . The output bus of the Program Memory (μCOD), is given as input a comparator (Combinatory Net). If the index is equal to the bus value then one of the non null weight derives from the rule and it is produced as output, otherwise the output is zero (fig. 2). It is clear, that the memory dimension of the antecedent is in this way reduced since only non null values are memorized. Moreover, the time performance of the system is equivalent to the performance of a system using vectorial memorization of all weights. The dimensioning of the word is influenced by some parameters of the input variable. The most important parameter is the maximum number membership functions (nfm) having a non null value in each element of the universe of discourse. From our study in the field of fuzzy system, we see that typically nfm 3 and there are at most 16 membership function. At any rate, such a value can be increased up to the physical dimensional limit of the antecedent memory. A less important role n the optimization process of the word dimension is played by the number of membership functions defined for each linguistic term. The table below shows the request word dimension as a function of such parameters and compares our proposed method with the method of vectorial memorization[10]. Summing up, the characteristics of our method are: Users are not restricted to membership functions with specific shapes. The number of the fuzzy sets and the resolution of the vertical axis have a very small influence in increasing memory space. Weight computations are done by combinatorial network and therefore the time performance of the system is equivalent to the one of the vectorial method. The number of non null membership values on any element of the universe of discourse is limited. Such a constraint is usually non very restrictive since many controllers obtain a good precision with only three non null weights. The method here briefly described has been adopted by our group in the design of an optimized version of the coprocessor described in [10].
PDF

Analyses of Key Management Protocol for Wireless Sensor Networks in Wireless Sensor Networks (무선 센서 네트워크망에서의 효율적인 키 관리 프로토콜 분석)

Kim, Jung-Tae
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- v.9 no.2
- /
- pp.799-802
- /
- 2005
In this paper, we analyses of Key Management Protocol for Wireless Sensor Networks in Wireless Sensor Networks. Wireless sensor networks have a wide spectrum of civil military application that call for security, target surveillance in hostile environments. Typical sensors possess limited computation, energy, and memory resources; therefore the use of vastly resource consuming security mechanism is not possible. In this paper, we propose a cryptography key management protocol, which is based on identity based symmetric keying.
PDF

A Study on Effect of Code Distribution and Data Replication for Multicore Computing Architectures

Cho, Doosan
- International Journal of Advanced Culture Technology
- /
- v.9 no.4
- /
- pp.282-287
- /
- 2021
A multicore system must be able to take full advantage of the program's instruction and data parallelism. This study introduces the data replication technique as a support technique to maximize the program's instruction and data parallelism. Instruction level parallelism can be limited by data dependency. In this case, if data is replicated to each processor core and used, instruction level parallelism can be used to the maximum. The technique proposed in this study can maximize the performance improvement effect when applied to scientific applications such as matrix multiplication operation.
https://doi.org/10.17703/IJACT.2021.9.4.282 인용 PDF KSCI

Speeding up the 3D Model Rendering on Android Device (안드로이드 디바이스에서의 3 차원 모델 렌더링 속도 향상)

Ng, Cong Jie;Kang, Dae-Ki
- Proceedings of the Korea Information Processing Society Conference
- /
- 2011.04a
- /
- pp.72-74
- /
- 2011
Rendering complex 3D model on smart mobile device with limited processing power and memory is challenging. Without optimization, the complex 3D model cannot be rendered smoothly. Special techniques are required to take into account to speed up the processing. In this paper, we will discuss about some approaches to alleviate the problem.
https://doi.org/10.3745/PKIPS.y2011m04a.72 인용 PDF

A wear-leveling improving method by periodic exchanging of cold block areas and hot block areas (Cold 블록 영역과 hot 블록 영역의 주기적 교환을 통한 wear-leveling 향상 기법)

Jang, Si-Woong
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2008.05a
- /
- pp.175-178
- /
- 2008
While read operation on flash memory is fast and doesn't have any constraints, flash memory can not be overwritten on updating data, new data are updated in new area. If data are frequently updated, garbage collection, which is achieved by erasing blocks, should be performed to reclaim new area. Hence, because the number of erase operations is limited due to characteristics of flash memory, every block should be evenly written and erased. However, if data with access locality are processed by cost benefit algorithm with separation of hot block and cold block, though the performance of processing is high, wear-leveling is not even. In this paper, we propose CB-MG (Cost Benefit between Multi Group) algorithm in which hot data are allocated in one group and cold data in another group, and in which role of hot group and cold group is exchanged every period. Experimental results show that performance and wear-leveling of CB-MG provide better results than those of CB-S.
PDF

Design of Fast Operation Method In NAND Flash Memory File System (NAND 플래시 메모리 파일 시스템에 빠른 연산을 위한 설계)

Jin, Jong-Won;Lee, Tae-Hoon;Chung, Ki-Dong
- Journal of KIISE:Computing Practices and Letters
- /
- v.14 no.1
- /
- pp.91-95
- /
- 2008
Flash memory is widely used in embedded systems because of its benefits such as non-volatile, shock resistant, and low power consumption. But NAND flash memory suffers from out-place-update, limited erase cycles, and page based read/write operations. To solve these problems, log-structured filesystem was proposed such as YAFFS. However, YAFFS sequentially retrieves an array of all block information to allocate free block for a write operation. Also before the write operation, YAFPS read the array of block information to find invalid block for erase. These could reduce the performance of the filesystem. This paper suggests fast operation method for NAND flash filesystem that solves the above-mentioned problems. We implemented the proposed methods in YAFFS. And we measured the performance compared with the original technique.
PDF KSCI

Experimental investigation of Scalability of DDR DRAM packages

Crisp, R.
- Journal of the Microelectronics and Packaging Society
- /
- v.17 no.4
- /
- pp.73-76
- /
- 2010
A two-facet approach was used to investigate the parametric performance of functional high-speed DDR3 (Double Data Rate) DRAM (Dynamic Random Access Memory) die placed in different types of BGA (Ball Grid Array) packages: wire-bonded BGA (FBGA, Fine Ball Grid Array), flip-chip (FCBGA) and lead-bonded $microBGA^{(R)}$. In the first section, packaged live DDR3 die were tested using automatic test equipment using high-resolution shmoo plots. It was found that the best timing and voltage margin was obtained using the lead-bonded microBGA, followed by the wire-bonded FBGA with the FCBGA exhibiting the worst performance of the three types tested. In particular the flip-chip packaged devices exhibited reduced operating voltage margin. In the second part of this work a test system was designed and constructed to mimic the electrical environment of the data bus in a PC's CPU-Memory subsystem that used a single DIMM (Dual In Line Memory Module) socket in point-to-point and point-to-two-point configurations. The emulation system was used to examine signal integrity for system-level operation at speeds in excess of 6 Gb/pin/sec in order to assess the frequency extensibility of the signal-carrying path of the microBGA considered for future high-speed DRAM packaging. The analyzed signal path was driven from either end of the data bus by a GaAs laser driver capable of operation beyond 10 GHz. Eye diagrams were measured using a high speed sampling oscilloscope with a pulse generator providing a pseudo-random bit sequence stimulus for the laser drivers. The memory controller was emulated using a circuit implemented on a BGA interposer employing the laser driver while the active DRAM was modeled using the same type of laser driver mounted to the DIMM module. A custom silicon loading die was designed and fabricated and placed into the microBGA packages that were attached to an instrumented DIMM module. It was found that 6.6 Gb/sec/pin operation appears feasible in both point to point and point to two point configurations when the input capacitance is limited to 2pF.
PDF KSCI

Boosting up the photoconductivity and relaxation time using a double layered indium-zinc-oxide/indium-gallium-zinc-oxide active layer for optical memory devices

Lee, Minkyung;Jaisutti, Rawat;Kim, Yong-Hoon
- Proceedings of the Korean Vacuum Society Conference
- /
- 2016.02a
- /
- pp.278-278
- /
- 2016
Solution-processed metal-oxide semiconductors have been considered as the next generation semiconducting materials for transparent and flexible electronics due to their high electrical performance. Moreover, since the oxide semiconductors show high sensitivity to light illumination and possess persistent photoconductivity (PPC), these properties can be utilized in realizing optical memory devices, which can transport information much faster than the electrons. In previous works, metal-oxide semiconductors are utilized as a memory device by using the light (i.e. illumination does the "writing", no-gate bias recovery the "reading" operations) [1]. The key issues for realizing the optical memory devices is to have high photoconductivity and a long life time of free electrons in the oxide semiconductors. However, mono-layered indium-zinc-oxide (IZO) and mono-layered indium-gallium-zinc-oxide (IGZO) have limited photoconductivity and relaxation time of 570 nA, 122 sec, 190 nA and 53 sec, respectively. Here, we boosted up the photoconductivity and relaxation time using a double-layered IZO/IGZO active layer structure. Solution-processed IZO (top) and IGZO (bottom) layers are prepared on a Si/SiO2 wafer and we utilized the conventional thermal annealing method. To investigate the photoconductivity and relaxation time, we exposed 9 mW/cm2 intensity light for 30 sec and the decaying behaviors were evaluated. It was found that the double-layered IZO/IGZO showed high photoconductivity and relaxation time of 28 uA and 1048 sec.
PDF

RFFS : Design of a Reliable NAND Flash File System for Embedded system (임베디드 시스템을 위한 신뢰성 있는 NAND 플래시 파일 시스템의 설계)

Lee Tae-hoon;Park Song-hwa;Kim Tae-hoon;Lee Sang-gi;Lee Joo-Kyong;Chung Ki-Dong
- The KIPS Transactions:PartA
- /
- v.12A no.7 s.97
- /
- pp.571-582
- /
- 2005
NAND flash memory has advantages of non-volatility, little power consumption and fast access time. However, it suffers from inability that dose not provide to update-in-place and the erase cycle is limited. Moreover, the unit of read and write operations is a page. A NAND flash file system called YAFFS has been proposed. But YAFFS has several problems to be addressed. In this paper, the Reliable Flash File System(RFFS) for NAND flash memory is designed and evaluated. In designing a file system the following four issues must be considered in particular for the design: (i) to minimize a repairing time when the system fault occurs, (ii) to balance the number of block erase operations by offering wear leveling policy, and (iii) to reduce turnaround time of memory operations by reducing the amount of data written. We demonstrate and evaluate the performance of the proposed schemes.
https://doi.org/10.3745/KIPSTA.2005.12A.7.571 인용 PDF KSCI

Search Result 545, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)