• Title/Summary/Keyword: 병렬 방법

Search Result 1,837, Processing Time 0.025 seconds

Reconfigurable Architecture Design for H.264 Motion Estimation and 3D Graphics Rendering of Mobile Applications (이동통신 단말기를 위한 재구성 가능한 구조의 H.264 인코더의 움직임 추정기와 3차원 그래픽 렌더링 가속기 설계)

  • Park, Jung-Ae;Yoon, Mi-Sun;Shin, Hyun-Chul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.1
    • /
    • pp.10-18
    • /
    • 2007
  • Mobile communication devices such as PDAs, cellular phones, etc., need to perform several kinds of computation-intensive functions including H.264 encoding/decoding and 3D graphics processing. In this paper, new reconfigurable architecture is described, which can perform either motion estimation for H.264 or rendering for 3D graphics. The proposed motion estimation techniques use new efficient SAD computation ordering, DAU, and FDVS algorithms. The new approach can reduce the computation by 70% on the average than that of JM 8.2, without affecting the quality. In 3D rendering, midline traversal algorithm is used for parallel processing to increase throughput. Memories are partitioned into 8 blocks so that 2.4Mbits (47%) of memory is shared and selective power shutdown is possible during motion estimation and 3D graphics rendering. Processing elements are also shared to further reduce the chip area by 7%.

Numerical Analysis of Off-Gas Flow in Hot Area of the Vitrification Plant (유리화공정 고온영역에서의 방사성 배기체 유동해석)

  • Park Seung-Chul;Kim Byong-Ryol;Shin Sang-Woon;Lee Jin Wook;Kang Won Gu;Hong Seok Jin
    • Proceedings of the Korean Radioactive Waste Society Conference
    • /
    • 2005.11a
    • /
    • pp.69-78
    • /
    • 2005
  • Appropriate numerical models for the simulation of off-gas flow in hot area of the vitrification plant have been developed in this study. The models have been applied to analyze the effect of design parameters of real plant and numerical analyses have been performed for CCM(Cold Crucible Melter), pipe cooler and HTF(High Temperature Filter) At first, the effect of excess oxygen and the ratio of oxygen distribution on combustion characteristics in the CCM has been studied. Next, solidification behavior of radio nuclide In the pipe tooler has been numerically modeled and scrutinized. Finally, flow pattern In accordance with the location of off-gas entrance of the HTF has been compared.

  • PDF

Bio-Denitrification of the Nitrate Waste Solution from the Lagoon Sludge in a Batch Fermenter (회분식 발효조에서 미생물을 이용한 라군 슬러지 질산염 폐액의 탈질 공정 평가)

  • Oh Jong-Hyeok;Lee O-Mi;Hwang Doo-Seong;Choi Yun-Dong;Hwang Sung-Tae;Jo Byung-Real;Park Jin-Ho
    • Journal of Nuclear Fuel Cycle and Waste Technology(JNFCWT)
    • /
    • v.4 no.2
    • /
    • pp.153-159
    • /
    • 2006
  • It is a serious task to the decommissioning of the uranium conversion plant that the demolition of the lagoon sludge. The main component of the sludge is ammonium nitrate and that is the very explosive material. Therefore, the bio-denitrification is a attractive process to remove the nitrate. In this work, some process variables was tested such as incubation temperature, nitrate concentration, electron donor, C/N ratio, seeding ratio, and pH with an anaerobic bacteria as Pseudomonas halodenitrificans. The results would be used as basic data to the continuous bio-denitrification process.

  • PDF

Distributed processing for the Load Minimization of an SIP Proxy Server (SIP 프록시 서버의 부하 최소화를 위한 분산 처리)

  • Lee, Young-Min;Roh, Young-Sup;Cho, Yong-Karp;Oh, Sam-Kweon;Hwang, Hee-Yeung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.9 no.4
    • /
    • pp.929-935
    • /
    • 2008
  • As internet telephony services based on Session initiation Protocol (SIP) enter the spotlight as marketable technology, many products based on SIPs have been developed and utilized for home and office telephony services. The call connection of an internet phone is classified into specific call connections and group call connections. Group call connections have a forking function which delivers the message to all of the group members. This function requires excessive message control for a call connection and creates heavy traffic in the network. In the internet cail system model. most of the call-setup messages are directed to the proxy server during a short time period. This heavy message load brings an unwanted delay in message processing and. as a result, call setup can not be made. To solve the delay problem, we simplified the analysis of the call-setup message in the proxy server, and processed the forking function distributed for the group call-setup message. In this thesis, a new system model to minimize the load is proposed and the subsequent implementation of this model demonstrates the performance improvement.

A Top-down based Control Tree Construction Mechanism for Reliable Multicast Transport Protocols (신뢰적인 멀티캐스트 전송 프로토콜을 위한 Top-Down 기반의 제어 트리 구축 방안)

  • Kim, Eun-Sook;Koh, Seok-Joo;Kang, Shin-Gak;Choe, Jong-Won
    • Journal of KIISE:Information Networking
    • /
    • v.28 no.4
    • /
    • pp.611-620
    • /
    • 2001
  • To meet the requirements of reliable service for various applications, a Reliable Multicast Transport Protocol should be implemented over IP Multicast where currently best-effort service is provided. Among the current researches, hierarchical tree-based mechanism has been proposed and actively studied. This mechanism is known to provide high scalability as well as reliability, but needs an additional tree configuring mechanism for building an efficient logical tree in transport layer. Bottom-up approach has been used for creating such a tree. This method has benefits from parallel tree construction for receivers, while it has some drawbacks such that it does not guarantee a loop-free tree and brings heavy message overhead during tree creation process. Therefore, this paper proposes a top-down based mechanism for constructing a control tree, which can guarantee loop-freeness by step-wise mannered tree building. From experimental simulations, it shows that the proposed mechanism has less message overhead. It is recommended that the bottom-up and the proposed top-down will be selectively used in real networks, according to the requirements of the concerned multicast applications.

  • PDF

Implementation of a GPU Cluster System using Inexpensive Graphics Devices (저가의 그래픽스 장치를 이용한 GPU 클러스터 시스템 구현)

  • Lee, Jong-Min;Lee, Jung-Hwa;Kim, Seong-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.11
    • /
    • pp.1458-1466
    • /
    • 2011
  • Recently the research on GPGPU has been carried out actively as the performance of GPUs has been increased rapidly. In this paper, we propose the system architecture by benchmarking the existing supercomputer architecture for a cost-effective system using GPUs in low-cost graphics devices and implement a GPU cluster system with eight GPUs. We also make the software development environment that is suitable for the GPU cluster system and use it for the performance evaluation by implementing the n-body problem. According to its result, we found that it is efficient to use multiple GPUs when the problem size is large due to its communication cost. In addition, we could calculate up to eight million celestial bodies by applying the method of calculating block by block to mitigate the problem size constraint due to the limited resource in GPUs.

VLSI Implementation of Low-Power Motion Estimation Using Reduced Memory Accesses and Computations (메모리 호출과 연산횟수 감소기법을 이용한 저전력 움직임추정 VLSI 구현)

  • Moon, Ji-Kyung;Kim, Nam-Sub;Kim, Jin-Sang;Cho, Won-Kyung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.5A
    • /
    • pp.503-509
    • /
    • 2007
  • Low-power motion estimation is required for video coding in portable information devices. In this paper, we propose a low-power motion estimation algorithm and 1-D systolic may VLSI architecture using full search block matching algorithm (FSBMA). Main power dissipation sources of FSBMA are complex computations and frequent memory accesses for data in the search area. In the proposed algorithm, memory accesses and computations are reduced by using 1D PE (processing array) array architecture performing motion estimation of two neighboring blocks in parallel and by skipping unnecessary computations during motion estimation. The VLSI implementation results of the algorithm show that the proposed VLSI architecture can save 9.3% power dissipation and can operate two times faster than an existing low-power motion estimator.

Test Time Reduction for BIST by Parallel Divide-and-Conquer Method (분할 및 병렬 처리 방법에 의한 BIST의 테스트 시간 감소)

  • Choe, Byeong-Gu;Kim, Dong-Uk
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.49 no.6
    • /
    • pp.322-329
    • /
    • 2000
  • BIST(Built-in Self Test) has been considered as the most promising DFT(design-for-test) scheme for the present and future test strategy. The most serious problem in applying BIST(Built-in Self Test) into a large circuit is the excessive increase in test time. This paper is focused on this problem. We proposed a new BIST construction scheme which uses a parallel divide-and-conquer method. The circuit division is performed with respect to some internal nodes called test points. The test points are selected by considering the nodal connectivity of the circuit rather than the testability of each node. The test patterns are generated by only one linear feedback shift register(LFSR) and they are shared by all the divided circuits. Thus, the test for each divided circuit is performed in parallel. Test responses are collected from the test point as well as the primary outputs. Even though the divide-and-conquer scheme is used and test patterns are generated in one LFSR, the proposed scheme does not lose its pseudo-exhaustive property. We proposed a selection procedure to find the test points and it was implemented with C/C++ language. Several example circuits were applied to this procedure and the results showed that test time was reduced upto 1/2151 but the increase in the hardware overhead or the delay increase was not much high. Because the proposed scheme showed a tendency that the increasing rates in hardware overhead and delay overhead were less than that in test time reduction as the size of circuit increases, it is expected to be used efficiently for large circuits as VLSI and ULSI.

  • PDF

An R-tree Index Scheduling Method for kNN Query Processing in Multiple Wireless Broadcast Channels (다중 무선 방송채널에서 kNN 질의 처리를 위한 R-tree 인덱스 스케줄링 기법)

  • Jung, Eui-Jun;Jung, Sung-Won
    • Journal of KIISE:Databases
    • /
    • v.37 no.2
    • /
    • pp.121-126
    • /
    • 2010
  • This paper proposes an efficient index scheduling technique for kNN query processing in multiple wireless broadcast channel environment. Previous works have to wait for the next cycle if the required child nodes of the same parent node are allocated in the same time slot on multiple channel. Our proposed method computes the access frequencies of each node of R tree at the server before the generation of the R-tree index broadcast schedule. If they have high frequencies, we allocate them serially on the single channel. If they have low frequencies, we allocate them in parallel on the multiple channels. As a result, we can reduce the index node access conflicts and the long broadcast cycle. The performance evaluation shows that our scheme gives the better performance than the existing schemes.

Performance Enhancement of Scaling Filter and Transcoder using CUDA (CUDA를 활용한 스케일링 필터 및 트랜스코더의 성능향상)

  • Han, Jae-Geun;Ko, Young-Sub;Suh, Sung-Han;Ha, Soon-Hoi
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.4
    • /
    • pp.507-511
    • /
    • 2010
  • In this paper, we propose to enhance the performance of software transcoder by using GPGPU for scaling filters. Video transcoding is a technique that translates a video file to another video file that has a different coding algorithm and/or a different frame size. Its demand increases as more multimedia devices with different specification coexist in our daily life. Since transcoding is computationally intensive, a software transcoder that runs on a CPU takes long processing time. In this paper, we achieve significant speed-up by parallelizing the scaling filter using a GPGPU that can provide significantly large computation power. Through extensive experiments with various video scripts of different size and with various scaling filter options, it is verified that the enhanced transcoder could achieve 36% performance improvement in the default option, and up to 101% in a certain option.