• Title/Summary/Keyword: memory optimization

Search Result 363, Processing Time 0.025 seconds

An Implementation of Cutting-Ironbar Manufacturing Software using Dynamic Programming (동적계획법을 이용한 철근가공용 소프트웨어의 구현)

  • Kim, Seong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.4
    • /
    • pp.1-8
    • /
    • 2009
  • In this paper, we deal an implementation of the software that produces sub-optimal solution of cutting-ironbar planning problem using dynamic programming. Generally, it is required to design an optimization algorithm to accept the practical requirements of cutting ironbar manufacturing. But, this problem is a multiple-sized 1-dimensional cutting stock problem and Linear Programming approaches to get the optimal solution is difficult to be applied due to the problem of explosive computation and memory limitation. In order to overcome this problem, we reform the problem for applying Dynamic Programming and propose a cutting-ironbar planning algorithm searching the sub-optimal solution in the space of fixed amount of combinated columns by using heuristics. Then, we design a graphic user interfaces and screen displays to be operated conveniently in the industry workplace and implement the software using open-source GUI library toolkit, GTK+.

An Optimization Technique in Memory System Performance for RealTime Embedded Systems (실시간 임베디드 시스템을 위한 메모리 시스템 성능 최적화 기법)

  • Yongin Kwon;Doosan Cho;Jongwon Lee;Yongjoo Kim;Jonghee Youn;Sanghyun Park;Yunheung Paek
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.882-884
    • /
    • 2008
  • 통상 하드웨어 캐시의 크기보다 수십에서 수백배 큰 크기의 데이타를 랜덤하게 접근하는 경우 낮은 메모리 접근 지역성(locality)에 기인하여 캐시 메모리 성능이 급격히 저하되는 문제를 야기한다. 예를 들면, 현재 보편적으로 사용되고 있는 차량용 General Positioning System (GPS) 프로그램의 경우 최대 32개의 위성으로부터 데이터를 받아 수신단의 위치를 계산하는 부분이 핵심 모듈중의 하나 이며, 이는 전체 성능의 50% 이상을 차지한다. 이러한 모듈에서는 위성 신호를 실시간으로 받아 버퍼 메모리에 저장하며, 이때 필요한 데이터가 순차적으로 저장되지 못하기 때문에 랜덤하게 데이터를 읽어 사용하게 된다. 결과적으로 낮은 지역성에 기인하여 실시간 (realtime)안에 데이터 처리를 하기 어려운 문제에 직면하게 된다. 통상의 통신 응용의 알고리즘 상에 내재된(inherited) 낮은 메모리 접근 지역성을 개선하는 것은 알고리즘 상에서의 접근을 요구한다. 이는 높은 비용이 필요함으로 본 연구에서는 사용되는 데이터 구조를 변환하여 지역성을 높이는 방향으로 접근하였다. 결과적으로 핵심 모듈에서 2배, 전체 시스템 성능에서 14%를 개선할 수 있었다.

MAGICal Synthesis: Memory-Efficient Approach for Generative Semiconductor Package Image Construction (MAGICal Synthesis: 반도체 패키지 이미지 생성을 위한 메모리 효율적 접근법)

  • Yunbin Chang;Wonyong Choi;Keejun Han
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.30 no.4
    • /
    • pp.69-78
    • /
    • 2023
  • With the rapid growth of artificial intelligence, the demand for semiconductors is enormously increasing everywhere. To ensure the manufacturing quality and quantity simultaneously, the importance of automatic defect detection during the packaging process has been re-visited by adapting various deep learning-based methodologies into automatic packaging defect inspection. Deep learning (DL) models require a large amount of data for training, but due to the nature of the semiconductor industry where security is important, sharing and labeling of relevant data is challenging, making it difficult for model training. In this study, we propose a new framework for securing sufficient data for DL models with fewer computing resources through a divide-and-conquer approach. The proposed method divides high-resolution images into pre-defined sub-regions and assigns conditional labels to each region, then trains individual sub-regions and boundaries with boundary loss inducing the globally coherent and seamless images. Afterwards, full-size image is reconstructed by combining divided sub-regions. The experimental results show that the images obtained through this research have high efficiency, consistency, quality, and generality.

Life prediction of IGBT module for nuclear power plant rod position indicating and rod control system based on SDAE-LSTM

  • Zhi Chen;Miaoxin Dai;Jie Liu;Wei Jiang;Yuan Min
    • Nuclear Engineering and Technology
    • /
    • v.56 no.9
    • /
    • pp.3740-3749
    • /
    • 2024
  • To reduce the losses caused by aging failure of insulation gate bipolar transistor (IGBT), which is the core components of nuclear power plant rod position indicating and rod control (RPC) system. It is necessary to conduct studies on its life prediction. The selection of IGBT failure characteristic parameters in existing research relies heavily on failure principles and expert experience. Moreover, the analysis and learning of time-domain degradation data have not been fully conducted, resulting in low prediction efficiency as the monotonicity, time correlation, and poor anti-interference ability of extracted degradation features. This paper utilizes the advantages of the stacked denoising autoencoder(SDAE) network in adaptive feature extraction and denoising capabilities to perform adaptive feature extraction on IGBT time-domain degradation data; establishes a long-short-term memory (LSTM) prediction model, and optimizes the learning rate, number of nodes in the hidden layer, and number of hidden layers using the Gray Wolf Optimization (GWO) algorithm; conducts verification experiments on the IGBT accelerated aging dataset provided by NASA PCoE Research Center, and selects performance evaluation indicators to compare and analyze the prediction results of the SDAE-LSTM model, PSOLSTM model, and BP model. The results show that the SDAE-LSTM model can achieve more accurate and stable IGBT life prediction.

Deep Learning Architectures and Applications (딥러닝의 모형과 응용사례)

  • Ahn, SungMahn
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.127-142
    • /
    • 2016
  • Deep learning model is a kind of neural networks that allows multiple hidden layers. There are various deep learning architectures such as convolutional neural networks, deep belief networks and recurrent neural networks. Those have been applied to fields like computer vision, automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have been shown to produce state-of-the-art results on various tasks. Among those architectures, convolutional neural networks and recurrent neural networks are classified as the supervised learning model. And in recent years, those supervised learning models have gained more popularity than unsupervised learning models such as deep belief networks, because supervised learning models have shown fashionable applications in such fields mentioned above. Deep learning models can be trained with backpropagation algorithm. Backpropagation is an abbreviation for "backward propagation of errors" and a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent. The method calculates the gradient of an error function with respect to all the weights in the network. The gradient is fed to the optimization method which in turn uses it to update the weights, in an attempt to minimize the error function. Convolutional neural networks use a special architecture which is particularly well-adapted to classify images. Using this architecture makes convolutional networks fast to train. This, in turn, helps us train deep, muti-layer networks, which are very good at classifying images. These days, deep convolutional networks are used in most neural networks for image recognition. Convolutional neural networks use three basic ideas: local receptive fields, shared weights, and pooling. By local receptive fields, we mean that each neuron in the first(or any) hidden layer will be connected to a small region of the input(or previous layer's) neurons. Shared weights mean that we're going to use the same weights and bias for each of the local receptive field. This means that all the neurons in the hidden layer detect exactly the same feature, just at different locations in the input image. In addition to the convolutional layers just described, convolutional neural networks also contain pooling layers. Pooling layers are usually used immediately after convolutional layers. What the pooling layers do is to simplify the information in the output from the convolutional layer. Recent convolutional network architectures have 10 to 20 hidden layers and billions of connections between units. Training deep learning networks has taken weeks several years ago, but thanks to progress in GPU and algorithm enhancement, training time has reduced to several hours. Neural networks with time-varying behavior are known as recurrent neural networks or RNNs. A recurrent neural network is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. Early RNN models turned out to be very difficult to train, harder even than deep feedforward networks. The reason is the unstable gradient problem such as vanishing gradient and exploding gradient. The gradient can get smaller and smaller as it is propagated back through layers. This makes learning in early layers extremely slow. The problem actually gets worse in RNNs, since gradients aren't just propagated backward through layers, they're propagated backward through time. If the network runs for a long time, that can make the gradient extremely unstable and hard to learn from. It has been possible to incorporate an idea known as long short-term memory units (LSTMs) into RNNs. LSTMs make it much easier to get good results when training RNNs, and many recent papers make use of LSTMs or related ideas.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

MPI-OpenMP Hybrid Parallelization for Multibody Peridynamic Simulations (다물체 페리다이나믹 해석을 위한 MPI-OpenMP 혼합 병렬화)

  • Lee, Seungwoo;Ha, Youn Doh
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.33 no.3
    • /
    • pp.171-178
    • /
    • 2020
  • In this study, we develop MPI-OpenMP hybrid parallelization for multibody peridynamic simulations. Peridynamics is suitable for analyzing complicated dynamic fractures and various discontinuities. However, compared with a conventional finite element method, nonlocal interactions in peridynamics cost more time and memory. In multibody peridynamic analysis, the costs increase due to the additional interactions that occur when computing the nonlocal contact and ghost interlayer models between adjacent bodies. The costs become excessive when further refinement and smaller time steps are required in cases of high-velocity impact fracturing or similar instances. Thus, high computational efficiency and performance can be achieved by parallelization and optimization of multibody peridynamic simulations. The analytical code is developed using an Intel Fortran MPI compiler and OpenMP in NURION of the KISTI HPC center and parallelized through MPI-OpenMP hybrid parallelization. Further parallelization is conducted by hybridizing with OpenMP threads in each MPI process. We also try to minimize communication operations by model-based decomposition of MPI processes. The numerical results for the impact fracturing of multiple bodies show that the computing performance improves significantly with MPI-OpenMP hybrid parallelization.

A Hardwired Location-Aware Engine based on Weighted Maximum Likelihood Estimation for IoT Network (IoT Network에서 위치 인식을 위한 가중치 방식의 최대우도방법을 이용한 하드웨어 위치인식엔진 개발 연구)

  • Kim, Dong-Sun;Park, Hyun-moon;Hwang, Tae-ho;Won, Tae-ho
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.11
    • /
    • pp.32-40
    • /
    • 2016
  • IEEE 802.15.4 is the one of the protocols for radio communication in a personal area network. Because of low cost and low power communication for IoT communication, it requires the highest optimization level in the implementation. Recently, the studies of location aware algorithm based on IEEE802.15.4 standard has been achieved. Location estimation is performed basically in equal consideration of reference node information and blind node information. However, an error is not calculated in this algorithm despite the fact that the coordinates of the estimated location of the blind node include an error. In this paper, we enhanced a conventual maximum likelihood estimation using weighted coefficient and implement the hardwired location aware engine for small code size and low power consumption. On the field test using test-beds, the suggested hardware based location awareness method results better accuracy by 10 percents and reduces both calculation and memory access by 30 percents, which improves the systems power consumption.

ZnO nanostructures for e-paper and field emission display applications

  • Sun, X.W.
    • 한국정보디스플레이학회:학술대회논문집
    • /
    • 2008.10a
    • /
    • pp.993-994
    • /
    • 2008
  • Electrochromic (EC) devices are capable of reversibly changing their optical properties upon charge injection and extraction induced by the external voltage. The characteristics of the EC device, such as low power consumption, high coloration efficiency, and memory effects under open circuit status, make them suitable for use in a variety of applications including smart windows and electronic papers. Coloration due to reduction or oxidation of redox chromophores can be used for EC devices (e-paper), but the switching time is slow (second level). Recently, with increasing demand for the low cost, lightweight flat panel display with paper-like readability (electronic paper), an EC display technology based on dye-modified $TiO_2$ nanoparticle electrode was developed. A well known organic dye molecule, viologen, was adsorbed on the surface of a mesoporous $TiO_2$ nanoparticle film to form the EC electrode. On the other hand, ZnO is a wide bandgap II-VI semiconductor which has been applied in many fields such as UV lasers, field effect transistors and transparent conductors. The bandgap of the bulk ZnO is about 3.37 eV, which is close to that of the $TiO_2$ (3.4 eV). As a traditional transparent conductor, ZnO has excellent electron transport properties, even in ZnO nanoparticle films. In the past few years, one-dimension (1D) nanostructures of ZnO have attracted extensive research interest. In particular, 1D ZnO nanowires renders much better electron transportation capability by providing a direct conduction path for electron transport and greatly reducing the number of grain boundaries. These unique advantages make ZnO nanowires a promising matrix electrode for EC dye molecule loading. ZnO nanowires grow vertically from the substrate and form a dense array (Fig. 1). The ZnO nanowires show regular hexagonal cross section and the average diameter of the ZnO nanowires is about 100 nm. The cross-section image of the ZnO nanowires array (Fig. 1) indicates that the length of the ZnO nanowires is about $6\;{\mu}m$. From one on/off cycle of the ZnO EC cell (Fig. 2). We can see that, the switching time of a ZnO nanowire electrode EC cell with an active area of $1\;{\times}\;1\;cm^2$ is 170 ms and 142 ms for coloration and bleaching, respectively. The coloration and bleaching time is faster compared to the $TiO_2$ mesoporous EC devices with both coloration and bleaching time of about 250 ms for a device with an active area of $2.5\;cm^2$. With further optimization, it is possible that the response time can reach ten(s) of millisecond, i.e. capable of displaying video. Fig. 3 shows a prototype with two different transmittance states. It can be seen that good contrast was obtained. The retention was at least a few hours for these prototypes. Being an oxide, ZnO is oxidation resistant, i.e. it is more durable for field emission cathode. ZnO nanotetropods were also applied to realize the first prototype triode field emission device, making use of scattered surface-conduction electrons for field emission (Fig. 4). The device has a high efficiency (field emitted electron to total electron ratio) of about 60%. With this high efficiency, we were able to fabricate some prototype displays (Fig. 5 showing some alphanumerical symbols). ZnO tetrapods have four legs, which guarantees that there is one leg always pointing upward, even using screen printing method to fabricate the cathode.

  • PDF

Pre-Packing, Early Fixation, and Multi-Layer Density Analysis in Analytic Placement for FPGAs (FPGA를 위한 분석적 배치에서 사전 패킹, 조기 배치 고정 및 밀도 분석 다층화)

  • Kim, Kyosun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.10
    • /
    • pp.96-106
    • /
    • 2014
  • Previous academic research on FPGA tools has relied on simple imaginary models for the targeting architecture. As the first step to overcome such restriction, the issues on analytic placement and legalization which are applied to commercial FPGAs have been brought up, and several techniques to remedy them are presented, and evaluated. First of all, the center of gravity of the placed cells may be far displaced from the center of the chip during analytic placement. A function is proposed to be added to the objective function for minimizing this displacement. And then, the density map is expanded into multiple layers to accurately calculate the density distribution for each of the cell types. Early fixation is also proposed for the memory blocks which can be placed at limited sites in small numbers. Since two flip-flops share control pins in a slice, a compatibility constraint is introduced during legalization. Pre-packing compatible flip-flops is proposed as a proactive step. The proposed techniques are implemented on the K-FPGA fabric evaluation framework in which commercial architectures can be precisely modeled, and modified for enhancement, and validated on twelve industrial strength examples. The placement results show that the proposed techniques have reduced the wire length by 22%, and the slice usage by 5% on average. This research is expected to be a development basis of the optimization CAD tools for new as well as the state-of-the-art FPGA architectures.