• Title/Summary/Keyword: 병렬 구현

Search Result 1,474, Processing Time 0.025 seconds

Design and Implementation of An I/O System for Irregular Application under Parallel System Environments (병렬 시스템 환경하에서 비정형 응용 프로그램을 위한 입출력 시스템의 설계 및 구현)

  • No, Jae-Chun;Park, Seong-Sun;;Gwon, O-Yeong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.11
    • /
    • pp.1318-1332
    • /
    • 1999
  • 본 논문에서는 입출력 응용을 위해 collective I/O 기법을 기반으로 한 실행시간 시스템의 설계, 구현 그리고 그 성능평가를 기술한다. 여기서는 모든 프로세서가 동시에 I/O 요구에 따라 스케쥴링하며 I/O를 수행하는 collective I/O 방안과 프로세서들이 여러 그룹으로 묶이어, 다음 그룹이 데이터를 재배열하는 통신을 수행하는 동안 오직 한 그룹만이 동시에 I/O를 수행하는 pipelined collective I/O 등의 두 가지 설계방안을 살펴본다. Pipelined collective I/O의 전체 과정은 I/O 노드 충돌을 동적으로 줄이기 위해 파이프라인된다. 이상의 설계 부분에서는 동적으로 충돌 관리를 위한 지원을 제공한다. 본 논문에서는 다른 노드의 메모리 영역에 이미 존재하는 데이터를 재 사용하여 I/O 비용을 줄이기 위해 collective I/O 방안에서의 소프트웨어 캐슁 방안과 두 가지 모형에서의 chunking과 온라인 압축방안을 기술한다. 그리고 이상에서 기술한 방안들이 입출력을 위해 높은 성능을 보임을 기술하는데, 이 성능결과는 Intel Paragon과 ASCI/Red teraflops 기계 상에서 실험한 것이다. 그 결과 응용 레벨에서의 bandwidth는 peak point가 55%까지 측정되었다.Abstract In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. We present two designs, namely, "Collective I/O" and "Pipelined Collective I/O". In the first scheme, all processors participate in the I/O simultaneously, making scheduling of I/O requests simpler but creating a possibility of contention at the I/O nodes. In the second approach, processors are grouped into several groups, so that only one group performs I/O simultaneously, while the next group performs communication to rearrange data, and this entire process is pipelined to reduce I/O node contention dynamically. In other words, the design provides support for dynamic contention management. Then we present a software caching method using collective I/O to reduce I/O cost by reusing data already present in the memory of other nodes. Finally, chunking and on-line compression mechanisms are included in both models. We demonstrate that we can obtain significantly high-performance for I/O above what has been possible so far. The performance results are presented on an Intel Paragon and on the ASCI/Red teraflops machine. Application level I/O bandwidth up to 55% of the peak is observed.he peak is observed.

Design and Implementation of an Efficient Web Services Data Processing Using Hadoop-Based Big Data Processing Technique (하둡 기반 빅 데이터 기법을 이용한 웹 서비스 데이터 처리 설계 및 구현)

  • Kim, Hyun-Joo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.1
    • /
    • pp.726-734
    • /
    • 2015
  • Relational databases used by structuralizing data are the most widely used in data management at present. However, in relational databases, service becomes slower as the amount of data increases because of constraints in the reading and writing operations to save or query data. Furthermore, when a new task is added, the database grows and, consequently, requires additional infrastructure, such as parallel configuration of hardware, CPU, memory, and network, to support smooth operation. In this paper, in order to improve the web information services that are slowing down due to increase of data in the relational databases, we implemented a model to extract a large amount of data quickly and safely for users by processing Hadoop Distributed File System (HDFS) files after sending data to HDFSs and unifying and reconstructing the data. We implemented our model in a Web-based civil affairs system that stores image files, which is irregular data processing. Our proposed system's data processing was found to be 0.4 sec faster than that of a relational database system. Thus, we found that it is possible to support Web information services with a Hadoop-based big data processing technique in order to process a large amount of data, as in conventional relational databases. Furthermore, since Hadoop is open source, our model has the advantage of reducing software costs. The proposed system is expected to be used as a model for Web services that provide fast information processing for organizations that require efficient processing of big data because of the increase in the size of conventional relational databases.

Design and Implementation of Digital Electrical Impedance Tomography System (디지털 임피던스 영상 시스템의 설계 및 구현)

  • 오동인;백상민;이재상;우응제
    • Journal of Biomedical Engineering Research
    • /
    • v.25 no.4
    • /
    • pp.269-275
    • /
    • 2004
  • Different biological tissues have different values of electrical resistivity. In EIT (electrical impedance tomography), we try to provide cross-sectional images of a resistivity distribution inside an electrically conducting subject such as the human body mainly for functional imaging. However, it is well known that the image reconstruction problem in EIT is ill-posed and the quality of a reconstructed image highly depends on the measurement error. This requires us to develop a high-performance EIT system. In this paper, we describe the development of a 16-channel digital EIT system including a single constant current source, 16 voltmeters, main controller, and PC. The system was designed and implemented using the FPGA-based digital technology. The current source injects 50KHz sinusoidal current with the THD (total harmonic distortion) of 0.0029% and amplitude stability of 0.022%. The single current source and switching circuit reduce the measurement error associated with imperfect matching of multiple current sources at the expense of a reduced data acquisition time. The digital voltmeter measuring the induced boundary voltage consists of a differential amplifier, ADC, and FPGA (field programmable gate array). The digital phase-sensitive demodulation technique was implemented in the voltmeter to maximize the SNR (signal-to-noise ratio). Experimental results of 16-channel digital voltmeters showed the SNR of 90dB. We used the developed EIT system to reconstruct resistivity images of a saline phantom containing banana objects. Based on the results, we suggest future improvements for a 64-channel muff-frequency EIT system for three-dimensional dynamic imaging of bio-impedance distributions inside the human body.

Thermodynamics-Based Weight Encoding Methods for Improving Reliability of Biomolecular Perceptrons (생체분자 퍼셉트론의 신뢰성 향상을 위한 열역학 기반 가중치 코딩 방법)

  • Lim, Hee-Woong;Yoo, Suk-I.;Zhang, Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1056-1064
    • /
    • 2007
  • Biomolecular computing is a new computing paradigm that uses biomolecules such as DNA for information representation and processing. The huge number of molecules in a small volume and the innate massive parallelism inspired a novel computation method, and various computation models and molecular algorithms were developed for problem solving. In the meantime, the use of biomolecules for information processing supports the possibility of DNA computing as an application for biological problems. It has the potential as an analysis tool for biochemical information such as gene expression patterns. In this context, a DNA computing-based model of a biomolecular perceptron has been proposed and the result of its experimental implementation was presented previously. The weight encoding and weighted sum operation, which are the main components of a biomolecular perceptron, are based on the competitive hybridization reactions between the input molecules and weight-encoding probe molecules. However, thermodynamic symmetry in the competitive hybridizations is assumed, so there can be some error in the weight representation depending on the probe species in use. Here we suggest a generalized model of hybridization reactions considering the asymmetric thermodynamics in competitive hybridizations and present a weight encoding method for the reliable implementation of a biomolecular perceptron based on this model. We compare the accuracy of our weight encoding method with that of the previous one via computer simulations and present the condition of probe composition to satisfy the error limit.

A Kernel Module to Support High-Performance Intra-Node Communication for Multi-Core Systems (멀티 코어 시스템을 위한 고속 노드내 통신 지원 모듈)

  • Jin, Hyun-Wook;Kang, Hyun-Goo;Kim, Jong-Soon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.34 no.9
    • /
    • pp.407-415
    • /
    • 2007
  • In parallel cluster computing systems, the efficiency of communication between computing nodes is one of important factors that decide overall system performance. Accordingly, many researchers have studied on high-performance inter-node communication. The recently launched multi-core processor, however. increases the importance of intra-node communication as well because the more the number of cores in a node, the more the number of parallel processes running in the same node. Though there have been studies on intra-node communications, these have limited considerations on the state-of-the-art systems. In this paper, we propose a Linux kernel module that minimizes the number of data copy by exploiting the memory mapping mechanism for high-performance intra-node communication. The proposed kernel module supports the Linux kernel version 2.6. The performance measurements over a multi-core system present that the proposed kernel module can achieve lower latency up to 62% and higher throughput up to 144% than an existing kernel module approach. In addition, the measurements reveal that the performance of intra-node communication can vary significantly based on whether the cores that run the communication processes are belong to the same processor package (i.e., sharing the L2 cache).

Dual Fuel Generator Modeling and Simulation for Development of PMS HILS (PMS HILS 구축을 위한 Dual Fuel Generator 모델링 및 시뮬레이션)

  • Hwang, Joon-Tae;Hong, Suk-Yoon;Kwon, Hyun-Wung;Lee, Kwang-Kook;Song, Jee-Hun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.3
    • /
    • pp.613-619
    • /
    • 2017
  • In this paper, DF(Dual Fuel) Generator modeling, which uses both conventional diesel fuel and LNG fuel, has been performed and monitoring system has been developed based on MATLAB/SIMULINK for the development of PMS(Power Management System) HILS(Hardware In the Loop Simulation). The principal components modeling of DF Generator are DF engine which provides the mechanical power and synchronous generator which convert the mechanical power into electrical power. Submodels, such as throttle body, intake manifold, torque generation and mass of LNG and diesel Quantity are used to perform DF engine. Also, governor is used for load sharing between paralleled DF generators to share a total load that exceeds the capacity of a single generator. To verify modeling of DF Generator designated ship lumped load Simulation is carried out. A validity of DF Generator has been verified by comparison between simulation results and estimated result from the designated lumped load.

Implementation of RTOS Simulator With Execution Time Estimation (실행시간 추정 가능한 RTOS 시뮬레이터의 구현)

  • 김방현;류성준;김종현;남영광;이광용
    • Proceedings of the Korea Society for Simulation Conference
    • /
    • 2002.05a
    • /
    • pp.125-129
    • /
    • 2002
  • 실시간 운영체제(Real-Time Operating System: 이하 RTOS라 함) 개발환경에서 제공하는 도구 중에 하나인 RTOS 시뮬레이터는 타겟 하드웨어가 호스트에 연결되어 있지 않아도 호스트에서 응용프로그램의 개발과 디버깅을 가능하게 해주는 타겟 시뮬레이션 환경을 제공해 줌으로서, 개발자로 하여금 빠른 시간 내에 응용프로그램을 개발할 수 있도록 지원하며 하드웨어 개발이 완료되기 전에도 응용프로그램을 개발할 수 있게 해 준다. 그러한 이유로 현재 대부분의 상용 RTOS 개발환경에서는 RTOS 시뮬레이터를 제공하고 있다. 그러나 현재 상용 RTOS 시뮬레이터들은 대부분 RTOS의 기능적인 부분들만 호스트에서 동작하도록 구현되어 있어서 RTOS나 RTOS 응용프로그램이 실제 타겟에서 실행될 때의 실질적인 시간 추정이 불가능하다. 이러한 문제점은 실시간 시스템이 정해진 시간 내에 결과를 출력해야 하는 시스템임을 감안한다면 RTOS 시뮬레이터의 가장 큰 결점이 되기 때문에 실행시간 추정 기능을 가지면서 실용화도 가능한 RTOS 시뮬레이터가 필요하다. 본 연구에서는 이러한 문제점을 해결하여 RTOS와 RTOS 응용프로그램이 실제 타겟에서 처리될 때의 실행시간 추정이 가능하고 상용화가 가능한 기계 명령어 기반(machine instruction-based)의 RTOS 시뮬레이터를 연구 개발하였다. 나아가 실행시간의 주요 요소인 파이프라인과 캐쉬의 영향도 고려함으로서 실행시간 추정의 정확도를 향상시켰다 본 연구에서 사용된 RTOS는 한국전자통신연구원(ETRI)에서 2000년에 개발된 Q+이고, Q+가 동작하는 타겟 하드웨어는 ARM 계열의 StrongARM SA-110 마이크로프로세서와 21285 주제어기가 장착된 EBSA-285 보드이다. 측정하면서 수행하였다. 검증 결과 random 상태에서는 문헌자료에 부합되는 예측결과를 보여주었으나, intermediate와 constant 상태에서는 문헌보다 다소 낮은 속도를 보여주었다 이러한 속도차는 추후 현장 데이터를 수집하여 보다 실질적인 검증을 통하여 조정되어야 할 것으로 판단된다.지발광(1.26초)보다 구애발광(1.12초)에서 0.88배 감소하였고, 암컷에서 정지발광(2.99초)보다 구애발광(1.06초)에서 0.35배 감소하였다. 발광양상에서 발광주파수는 수짓의 정지발광에서 0.8 Hz, 수컷 구애발광에서 0.9 Hz, 암컷의 정지발광에서 0.3 Hz, 암컷의 구애발광에서 0.9 Hz로 각각 나타났다. H. papariensis의 발광파장영역은 400 nm에서 700 nm에 이르는 모든 영역에서 확인되었으며 가장 높은 첨두치는 600 nm에 있고 500에서 600 nm 사이의 파장대가 가장 두드러지게 나타났다. 발광양상과 어우러진 교미행동은 Hp system과 같은 결과를 얻었다.하는 방법을 제안한다. 즉 채널 액세스 확률을 각 슬롯에서 예약상태에 있는 음성 단말의 수뿐만 아니라 각 슬롯에서 예약을 하려고 하는 단말의 수에 기초하여 산출하는 방법을 제안하고 이의 성능을 분석하였다. 시뮬레이션에 의해 새로 제안된 채널 허용 확률을 산출하는 방식의 성능을 비교한 결과 기존에 제안된 방법들보다 상당한 성능의 향상을 볼 수 있었다., 인삼이 성장될 때 부분적인 영양상태의 불충분이나 기후 등에 따른 영향을 받을 수 있기 때문에 앞으로 이에 대한 많은 연구가 이루어져야할 것으로 판단된다.태에도 불구하고 [-wh]의미의 겹의문사는 병렬적 관계의 합성어가 아니라 내부구조를 지니지 않은 단순한 단어(minimal $X^{0}$

  • PDF

A 12Bit 80MHz CMOS D/A Converter with active load inverter switch driver (능동부하 스위치 구동 회로를 이용한 12비트 80MHz CMOS D/A 변환기 설계)

  • Nam, Tae-Kyu;Seo, Sung-Uk;Shin, Sun-Hwa;Joo, Chan-Yang;Kim, Soo-Jae;Lee, Sang-Min;Yoon, Kwang-S.
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.8
    • /
    • pp.38-44
    • /
    • 2007
  • This paper describes a 12 bit 80MHz CMOS D/A converter for wireless transceiver. Proposed circuit in the paper employes segmented structure which consists of four stage 3bit thermometer decoders. Proposed D/A converter is manufactured 0.35um CMOS n-well digital standard process and measurement results show a ${\pm}1.36SB/{\pm}0.62LSB$ of INL/DNL and $46pV{\cdot}s$ of glitch energy. SNR and SFDR are measured to be 58.5dB and 64.97dB @ Fs=80MHz and Fin=19MHz with a total power consumption of 99mW. Such results proved that our work has low power consumption, high linearity, low glitch and improved dynamic performance. Therefore, our work can be appled to various high speed and high performance circuits.

Implementation of High-Throughput SHA-1 Hash Algorithm using Multiple Unfolding Technique (다중 언폴딩 기법을 이용한 SHA-1 해쉬 알고리즘 고속 구현)

  • Lee, Eun-Hee;Lee, Je-Hoon;Jang, Young-Jo;Cho, Kyoung-Rok
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.47 no.4
    • /
    • pp.41-49
    • /
    • 2010
  • This paper proposes a new high speed SHA-1 architecture using multiple unfolding and pre-computation techniques. We unfolds iterative hash operations to 2 continuos hash stage and reschedules computation timing. Then, the part of critical path is computed at the previous hash operation round and the rest is performed in the present round. These techniques reduce 3 additions to 2 additions on the critical path. It makes the maximum clock frequency of 118 MHz which provides throughput rate of 5.9 Gbps. The proposed architecture shows 26% higher throughput with a 32% smaller hardware size compared to other counterparts. This paper also introduces a analytical model of multiple SHA-1 architecture at the system level that maps a large input data on SHA-1 block in parallel. The model gives us the required number of SHA-1 blocks for a large multimedia data processing that it helps to make decision hardware configuration. The hs fospeed SHA-1 is useful to generate a condensed message and may strengthen the security of mobile communication and internet service.

A New Hardware Design for Generating Digital Holographic Video based on Natural Scene (실사기반 디지털 홀로그래픽 비디오의 실시간 생성을 위한 하드웨어의 설계)

  • Lee, Yoon-Hyuk;Seo, Young-Ho;Kim, Dong-Wook
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.11
    • /
    • pp.86-94
    • /
    • 2012
  • In this paper we propose a hardware architecture of high-speed CGH (computer generated hologram) generation processor, which particularly reduces the number of memory access times to avoid the bottle-neck in the memory access operation. For this, we use three main schemes. The first is pixel-by-pixel calculation rather than light source-by-source calculation. The second is parallel calculation scheme extracted by modifying the previous recursive calculation scheme. The last one is a fully pipelined calculation scheme and exactly structured timing scheduling by adjusting the hardware. The proposed hardware is structured to calculate a row of a CGH in parallel and each hologram pixel in a row is calculated independently. It consists of input interface, initial parameter calculator, hologram pixel calculators, line buffer, and memory controller. The implemented hardware to calculate a row of a $1,920{\times}1,080$ CGH in parallel uses 168,960 LUTs, 153,944 registers, and 19,212 DSP blocks in an Altera FPGA environment. It can stably operate at 198MHz. Because of the three schemes, the time to access the external memory is reduced to about 1/20,000 of the previous ones at the same calculation speed.