• Title/Summary/Keyword: and Parallel Processing

Search Result 2,013, Processing Time 0.031 seconds

Effective Graph-Based Heuristics for Contingent Planning (조건부 계획수립을 위한 효과적인 그래프 기반의 휴리스틱)

  • Kim, Hyun-Sik;Kim, In-Cheol;Park, Young-Tack
    • The KIPS Transactions:PartB
    • /
    • v.18B no.1
    • /
    • pp.29-38
    • /
    • 2011
  • In order to derive domain-independent heuristics from the specification of a planning problem, it is required to relax the given problem and then solve the relaxed one. In this paper, we present a new planning graph, Merged Planning Graph(MPG), and GD heuristics for solving contingent planning problems with both uncertainty about the initial state and non-deterministic action effects. The merged planning graph is an extended one to be applied to the contingent planning problems from the relaxed planning graph, which is a common means to get effective heuristics for solving the classical planning problems. In order to get heuristics for solving the contingent planning problems with sensing actions and non-deterministic actions, the new graph utilizes additionally the effect-merge relaxations of these actions as well as the traditional delete relaxations. Proceeding parallel to the forward expansion of the merged planning graph, the computation of GD heuristic excludes the unnecessary redundant cost from estimating the minimal reachability cost to achieve the overall set of goals by analyzing interdependencies among goals or subgoals. Therefore, GD heuristics have the advantage that they usually require less computation time than the overlap heuristics, but are more informative than the max and the additive heuristics. In this paper, we explain the experimental analysis to show the accuracy and the search efficiency of the GD heuristics.

Planning Evacuation Routes with Load Balancing in Indoor Building Environments (실내 빌딩 환경에서 부하 균등을 고려한 대피경로 산출)

  • Jang, Minsoo;Lim, Kyungshik
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.7
    • /
    • pp.159-172
    • /
    • 2016
  • This paper presents a novel algorithm for searching evacuation paths in indoor disaster environments. The proposed method significantly improves the time complexity to find the paths to the evacuation exit by introducing a light-weight Disaster Evacuation Graph (DEG) for a building in terms of the size of the graph. With the DEG, the method also considers load balancing and bottleneck capacity of the paths to the evacuation exit simultaneously. The behavior of the algorithm consists of two phases: horizontal tiering (HT) and vertical tiering (VT). The HT phase finds a possible optimal path from anywhere of a specific floor to the evacuation stairs of the floor. Thus, after finishing the HT phases of all floors in parallel the VT phase begins to integrate all results from the previous HT phases to determine a evacuation path from anywhere of a floor to the safety zone of the building that could be the entrance or the roof of the building. It should be noted that the path produced by the algorithm. And, in order to define the range of graph to process, tiering scheme is used. In order to test the performance of the method, computing times and evacuation times are compared to the existing path searching algorithms. The result shows the proposed method is better than the existing algorithms in terms of the computing time and evacuation time. It is useful in a large-scale building to find the evacuation routes for evacuees quickly.

DNN based Robust Speech Feature Extraction and Signal Noise Removal Method Using Improved Average Prediction LMS Filter for Speech Recognition (음성 인식을 위한 개선된 평균 예측 LMS 필터를 이용한 DNN 기반의 강인한 음성 특징 추출 및 신호 잡음 제거 기법)

  • Oh, SangYeob
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.6
    • /
    • pp.1-6
    • /
    • 2021
  • In the field of speech recognition, as the DNN is applied, the use of speech recognition is increasing, but the amount of calculation for parallel training needs to be larger than that of the conventional GMM, and if the amount of data is small, overfitting occurs. To solve this problem, we propose an efficient method for robust voice feature extraction and voice signal noise removal even when the amount of data is small. Speech feature extraction efficiently extracts speech energy by applying the difference in frame energy for speech and the zero-crossing ratio and level-crossing ratio that are affected by the speech signal. In addition, in order to remove noise, the noise of the speech signal is removed by removing the noise of the speech signal with an average predictive improved LMS filter with little loss of speech information while maintaining the intrinsic characteristics of speech in detection of the speech signal. The improved LMS filter uses a method of processing noise on the input speech signal by adjusting the active parameter threshold for the input signal. As a result of comparing the method proposed in this paper with the conventional frame energy method, it was confirmed that the error rate at the start point of speech is 7% and the error rate at the end point is improved by 11%.

Real-Virtual Fusion Hologram Generation System using RGB-Depth Camera (RGB-Depth 카메라를 이용한 현실-가상 융합 홀로그램 생성 시스템)

  • Song, Joongseok;Park, Jungsik;Park, Hanhoon;Park, Jong-Il
    • Journal of Broadcast Engineering
    • /
    • v.19 no.6
    • /
    • pp.866-876
    • /
    • 2014
  • Generating of digital hologram of video contents with computer graphics(CG) requires natural fusion of 3D information between real and virtual. In this paper, we propose the system which can fuse real-virtual 3D information naturally and fast generate the digital hologram of fused results using multiple-GPUs based computer-generated-hologram(CGH) computing part. The system calculates camera projection matrix of RGB-Depth camera, and estimates the 3D information of virtual object. The 3D information of virtual object from projection matrix and real space are transmitted to Z buffer, which can fuse the 3D information, naturally. The fused result in Z buffer is transmitted to multiple-GPUs based CGH computing part. In this part, the digital hologram of fused result can be calculated fast. In experiment, the 3D information of virtual object from proposed system has the mean relative error(MRE) about 0.5138% in relation to real 3D information. In other words, it has the about 99% high-accuracy. In addition, we verify that proposed system can fast generate the digital hologram of fused result by using multiple GPUs based CGH calculation.

Complexity-based Sample Adaptive Offset Parallelism (복잡도 기반 적응적 샘플 오프셋 병렬화)

  • Ryu, Eun-Kyung;Jo, Hyun-Ho;Seo, Jung-Han;Sim, Dong-Gyu;Kim, Doo-Hyun;Song, Joon-Ho
    • Journal of Broadcast Engineering
    • /
    • v.17 no.3
    • /
    • pp.503-518
    • /
    • 2012
  • In this paper, we propose a complexity-based parallelization method of the sample adaptive offset (SAO) algorithm which is one of HEVC in-loop filters. The SAO algorithm can be regarded as region-based process and the regions are obtained and represented with a quad-tree scheme. A offset to minimize a reconstruction error is sent for each partitioned region. The SAO of the HEVC can be parallelized in data-level. However, because the sizes and complexities of the SAO regions are not regular, workload imbalance occurs with multi-core platform. In this paper, we propose a LCU-based SAO algorithm and a complexity prediction algorithm for each LCU. With the proposed complexity-based LCU processing, we found that the proposed algorithm is faster than the sequential implementation by a factor of 2.38 times. In addition, the proposed algorithm is faster than regular parallel implementation SAO by 21%.

Improvement of Address Pointer Assignment in DSP Code Generation (DSP용 코드 생성에서 주소 포인터 할당 성능 향상 기법)

  • Lee, Hee-Jin;Lee, Jong-Yeol
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.1
    • /
    • pp.37-47
    • /
    • 2008
  • Exploitation of address generation units which are typically provided in DSPs plays an important role in DSP code generation since that perform fast address computation in parallel to the central data path. Offset assignment is optimization of memory layout for program variables by taking advantage of the capabilities of address generation units, consists of memory layout generation and address pointer assignment steps. In this paper, we propose an effective address pointer assignment method to minimize the number of address calculation instructions in DSP code generation. The proposed approach reduces the time complexity of a conventional address pointer assignment algorithm with fixed memory layouts by using minimum cost-nodes breaking. In order to contract memory size and processing time, we employ a powerful pruning technique. Moreover our proposed approach improves the initial solution iteratively by changing the memory layout for each iteration because the memory layout affects the result of the address pointer assignment algorithm. We applied the proposed approach to about 3,000 sequences of the OffsetStone benchmarks to demonstrate the effectiveness of the our approach. Experimental results with benchmarks show an average improvement of 25.9% in the address codes over previous works.

Low Power TLB System by Using Continuous Accessing Distinction Algorithm (연속적 접근 판별 알고리즘을 이용한 저전력 TLB 구조)

  • Lee, Jung-Hoon
    • The KIPS Transactions:PartA
    • /
    • v.14A no.1 s.105
    • /
    • pp.47-54
    • /
    • 2007
  • In this paper we present a translation lookaside buffer (TLB) system with low power consumption for imbedded processors. The proposed TLB is constructed as multiple banks, each with an associated block buffer and a corresponding comparator. Either the block buffer or the main bank is selectively accessed on the basis of two bits in the block buffer (tag buffer). Dynamic power savings are achieved by reducing the number of entries accessed in parallel, as a result of using the tag buffer as a filtering mechanism. The performance overhead of the proposed TLB is negligible compared with other hierarchical TLB structures. For example, the two-cycle overhead of the proposed TLB is only about 1%, as compared with 5% overhead for a filter (micro)-TLB and 14% overhead for a same structure without continuos accessing distinction algorithm. We show that the average hit ratios of the block buffers and the main banks of the proposed TLB are 95% and 5% respectively. Dynamic power is reduced by about 95% with respect to with a fully associative TLB, 90% with respect to a filter-TLB, and 40% relative to a same structure without continuos accessing distinction algorithm.

Cycle Extendability of Torus Sub-Graphs in the Enhanced Pyramid Network (개선된 피라미드 네트워크에서 토러스 부그래프의 사이클 확장성)

  • Chang, Jung-Hwan
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.8
    • /
    • pp.1183-1193
    • /
    • 2010
  • The pyramid graph is well known in parallel processing as a interconnection network topology based on regular square mesh and tree architectures. The enhanced pyramid graph is an alternative architecture by exchanging mesh into the corresponding torus on the base for upgrading performance than the pyramid. In this paper, we adopt a strategy of classification into two disjoint groups of edges in regular square torus as a basic sub-graph constituting of each layer in the enhanced pyramid graph. Edge set in the torus graph is considered as two disjoint sub-sets called NPC(represents candidate edge for neighbor-parent) and SPC(represents candidate edge for shared-parent) whether the parents vertices adjacent to two end vertices of the corresponding edge have a relation of neighbor or sharing in the upper layer of the enhanced pyramid graph. In addition, we also introduce a notion of shrink graph to focus only on the NPC-edges by hiding SPC-edges within the shrunk super-vertex on the resulting shrink graph. In this paper, we analyze that the lower and upper bounds on the number of NPC-edges in a Hamiltonian cycle constructed on $2^n{\times}2^n$ torus is $2^{2n-2}$ and $3{\cdot}2^{2n-2}$ respectively. By expanding this result into the enhanced pyramid graph, we also prove that the maximum number of NPC-edges containable in a Hamiltonian cycle is $4^{n-1}$-2n+1 in the n-dimensional enhanced pyramid.

Analysis on the Active/Inactive Status of Computational Resources for Improving the Performance of the GPU (GPU 성능 저하 해결을 위한 내부 자원 활용/비활용 상태 분석)

  • Choi, Hongjun;Son, Dongoh;Kim, Jongmyon;Kim, Cheolhong
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.1-11
    • /
    • 2015
  • In recent high performance computing system, GPGPU has been widely used to process general-purpose applications as well as graphics applications, since GPU can provide optimized computational resources for massive parallel processing. Unfortunately, GPGPU doesn't exploit computational resources on GPU in executing general-purpose applications fully, because the applications cannot be optimized to GPU architecture. Therefore, we provide GPU research guideline to improve the performance of computing systems using GPGPU. To accomplish this, we analyze the negative factors on GPU performance. In this paper, in order to clearly classify the cause of the negative factors on GPU performance, GPU core status are defined into 5 status: fully active status, partial active status, idle status, memory stall status and GPU core stall status. All status except fully active status cause performance degradation. We evaluate the ratio of each GPU core status depending on the characteristics of benchmarks to find specific reasons which degrade the performance of GPU. According to our simulation results, partial active status, idle status, memory stall status and GPU core stall status are induced by computational resource underutilization problem, low parallelism, high memory requests, and structural hazard, respectively.

AS B-tree: A study on the enhancement of the insertion performance of B-tree on SSD (AS B-트리: SSD를 사용한 B-트리에서 삽입 성능 향상에 관한 연구)

  • Kim, Sung-Ho;Roh, Hong-Chan;Lee, Dae-Wook;Park, Sang-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.18D no.3
    • /
    • pp.157-168
    • /
    • 2011
  • Recently flash memory has been being utilized as a main storage device in mobile devices, and flashSSDs are getting popularity as a major storage device in laptop and desktop computers, and even in enterprise-level server machines. Unlike HDDs, on flash memory, the overwrite operation is not able to be performed unless it is preceded by the erase operation to the same block. To address this, FTL(Flash memory Translation Layer) is employed on flash memory. Even though the modified data block is overwritten to the same logical address, FTL writes the updated data block to the different physical address from the previous one, mapping the logical address to the new physical address. This enables flash memory to avoid the high block-erase cost. A flashSSD has an array of NAND flash memory packages so it can access one or more flash memory packages in parallel at once. To take advantage of the internal parallelism of flashSSDs, it is beneficial for DBMSs to request I/O operations on sequential logical addresses. However, the B-tree structure, which is a representative index scheme of current relational DBMSs, produces excessive I/O operations in random order when its node structures are updated. Therefore, the original b-tree is not favorable to SSD. In this paper, we propose AS(Always Sequential) B-tree that writes the updated node contiguously to the previously written node in the logical address for every update operation. In the experiments, AS B-tree enhanced 21% of B-tree's insertion performance.