• 제목/요약/키워드: Parallel Data Communication

검색결과 341건 처리시간 0.028초

전류 컷 기법을 적용한 저전력형 직병렬/병직렬 변환기 설계 (Design of Low-power Serial-to-Parallel and Parallel-to-Serial Converter using Current-cut method)

  • 박용운;황성호;차재상;양충모;김성권
    • 한국통신학회논문지
    • /
    • 제34권10A호
    • /
    • pp.776-783
    • /
    • 2009
  • 본 논문에서는 OFDM과 같은 대용량 무선 전송방식의 베이스밴드단(Baseband) 신호처리 방식 중 직병렬/병직렬 변환기(Serial-to-Parallel/Parallel-to-Serial Converter)를 전류모드(Current-mode) 회로로 구현했을 경우 유효한 설계 기법을 제안한다. 전류모드를 이용한 OFDM(Orthogonal Frequency Division Multiplexing: 직교주파수분할다중)용 아날로그 프리에 변환(FFT) LSI의 병렬 입출력을 담당하는 전류모드 직병렬병직렬 변환기의 홀드모드(Hold mode)의 불필요한 전류를 제거할 수 있다. 이를 통해 전류모드로 구성한 아날로그 신호처리 시스템의 저소비전력을 실현하기 위해 필수적인 새로운 전류모드 직병렬/병직렬 변환기를 제시하고 설계된 칩의 측정결과가 시뮬레이션 결과와 일치하는 것을 확인하였다. 이를 통해 저전력형 대용량 무선통신 시스템의 베이스밴드단 구축이 가능한 전류모드 아날로그 시스템의 구현 가능성을 제시하였다.

DEVS 형식론을 이용한 다중프로세서 운영체제의 모델링 및 성능평가

  • 홍준성
    • 한국시뮬레이션학회:학술대회논문집
    • /
    • 한국시뮬레이션학회 1994년도 추계학술발표회 및 정기총회
    • /
    • pp.32-32
    • /
    • 1994
  • In this example, a message passing based multicomputer system with general interdonnedtion network is considered. After multicomputer systems are developed with morm-hole routing network, topologies of interconecting network are not major considertion for process management and resource sharing. Tehre is an independeent operating system kernel oneach node. It communicates with other kernels using message passingmechanism. Based on this architecture, the problem is how mech does performance degradation will occur in the case of processor sharing on multicomputer systems. Processor sharing between application programs is veryimprotant decision on system performance. In almost cases, application programs running on massively parallel computer systems are not so much user-interactive. Thus, the main performance index is system throughput. Each application program has various communication patterns. and the sharing of processors causes serious performance degradation in hte worst case such that one processor is shared by two processes and another processes are waiting the messages from those processes. As a result, considering this problem is improtant since it gives the reason whether the system allows processor sharingor not. Input data has many parameters in this simulation . It contains the number of threads per task , communication patterns between threads, data generation and also defects in random inupt data. Many parallel aplication programs has its specific communication patterns, and there are computation and communication phases. Therefore, this phase informatin cannot be obtained random input data. If we get trace data from some real applications. we can simulate the problem more realistic . On the other hand, simualtion results will be waseteful unless sufficient trace data with varisous communication patterns is gathered. In this project , random input data are used for simulation . Only controllable data are the number of threads of each task and mapping strategy. First, each task runs independently. After that , each task shres one and more processors with other tasks. As more processors are shared , there will be performance degradation . Form this degradation rate , we can know the overhead of processor sharing . Process scheduling policy can affects the results of simulation . For process scheduling, priority queue and FIFO queue are implemented to support round-robin scheduling and priority scheduling.

  • PDF

A Study on Distributed System Construction and Numerical Calculation Using Raspberry Pi

  • Ko, Young-ho;Heo, Gyu-Seong;Lee, Sang-Hyun
    • International journal of advanced smart convergence
    • /
    • 제8권4호
    • /
    • pp.194-199
    • /
    • 2019
  • As the performance of the system increases, more parallelized data is being processed than single processing of data. Today's cpu structure has been developed to leverage multicore, and hence data processing methods are being developed to enable parallel processing. In recent years desktop cpu has increased multicore, data is growing exponentially, and there is also a growing need for data processing as artificial intelligence develops. This neural network of artificial intelligence consists of a matrix, making it advantageous for parallel processing. This paper aims to speed up the processing of the system by using raspberrypi to implement the cluster building and parallel processing system against the backdrop of the foregoing discussion. Raspberrypi is a credit card-sized single computer made by the raspberrypi Foundation in England, developed for education in schools and developing countries. It is cheap and easy to get the information you need because many people use it. Distributed processing systems should be supported by programs that connected multiple computers in parallel and operate on a built-in system. RaspberryPi is connected to switchhub, each connected raspberrypi communicates using the internal network, and internally implements parallel processing using the Message Passing Interface (MPI). Parallel processing programs can be programmed in python and can also use C or Fortran. The system was tested for parallel processing as a result of multiplying the two-dimensional arrangement of 10000 size by 0.1. Tests have shown a reduction in computational time and that parallelism can be reduced to the maximum number of cores in the system. The systems in this paper are manufactured on a Linux-based single computer and are thought to require testing on systems in different environments.

PC 클러스터를 위한 정렬 중첩 격자의 병렬처리 (PARALLEL IMPROVEMENT IN STRUCTURED CHIMERA GRID ASSEMBLY FOR PC CLUSTER)

  • 김유진;권장혁
    • 한국전산유체공학회:학술대회논문집
    • /
    • 한국전산유체공학회 2005년도 추계 학술대회논문집
    • /
    • pp.157-162
    • /
    • 2005
  • Parallel implementation and performance assessment of the grid assembly in a structured chimera grid approach is studied. The grid assembly process, involving hole cutting and searching donor, is parallelized on the PC cluster. A message passing programming model based on the MPI library is implemented using the single program multiple data(SPMD) paradigm. The coarse-grained communication is optimized with the minimized memory allocation because that the parallel grid assembly can access the decomposed geometry data in other processors by only message passing in the distributed memory system such as a PC cluster. The grid assembly workload is based on the static load balancing tied to flow solver. A goal of this work is a development of parallelized grid assembly that is suited for handling multiple moving body problems with large grid size.

  • PDF

시스템 온 칩 내 eDRAM을 사용한 Tightly Coupled Memory의 병렬 테스트 구조 (A Parallel Test Structure for eDRAM-based Tightly Coupled Memory in SoCs)

  • 국인성;이재민
    • 한국정보전자통신기술학회논문지
    • /
    • 제4권3호
    • /
    • pp.209-216
    • /
    • 2011
  • 최근 시스템 온 칩 내 메모리의 고속 동작을 위해 TCM (Tightly Coupled Memory)를 내장한 설계가 크게 증가하고 있다. 본 논문에서는 시스템 온칩 내 eDRAM을 사용한 TCM 메모리를 위한 새로운 병열 메모리 테스트 구조를 제안한다. 제안하는 기법에서 피테스트 메모리가 테스트 모드에서 병렬 구조로 바뀌고 바운더리 스캔 체인과 함께 내장 메모리의 테스트용이도가 크게 향상된다. 병렬테스트 방식의 메모리는 각 메모리 요소들이 특정한 기능을 수행하도록 구조화되어 있으므로 모듈들로 분할하여 테스트 할 수 있으며 입출력 데이터를 기반으로 동적 테스트 평가 가능하다. 시뮬레이션을 통하여 제안한 기법의 타당성을 검증하였다.

A Container Orchestration System for Process Workloads

  • Jong-Sub Lee;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제15권4호
    • /
    • pp.270-278
    • /
    • 2023
  • We propose a container orchestration system for process workloads that combines the potential of big data and machine learning technologies to integrate enterprise process-centric workloads. This proposed system analyzes big data generated from industrial automation to identify hidden patterns and build a machine learning prediction model. For each machine learning case, training data is loaded into a data store and preprocessed for model training. In the next step, you can use the training data to select and apply an appropriate model. Then evaluate the model using the following test data: This step is called model construction and can be performed in a deployment framework. Additionally, a visual hierarchy is constructed to display prediction results and facilitate big data analysis. In order to implement parallel computing of PCA in the proposed system, several virtual systems were implemented to build the cluster required for the big data cluster. The implementation for evaluation and analysis built the necessary clusters by creating multiple virtual machines in a big data cluster to implement parallel computation of PCA. The proposed system is modeled as layers of individual components that can be connected together. The advantage of a system is that components can be added, replaced, or reused without affecting the rest of the system.

MB-OFDM UWB 통신 시스템을 위한 고속 2-Parallel Radix-$2^4$ FFT 프로세서의 설계 (A High-Speed 2-Parallel Radix-$2^4$ FFT Processor for MB-OFDM UWB Systems)

  • 이지성;이한호
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2006년도 하계종합학술대회
    • /
    • pp.533-534
    • /
    • 2006
  • This paper presents the architecture design of a high-speed, low-complexity 128-point radix-$2^4$ FFT processor for ultra-wideband (UWB) systems. The proposed high-speed, low-complexity FFT architecture can provide a higher throughput rate and low hardware complexity by using 2-parallel data-path scheme and single-path delay-feedback (SDF) structure. This paper presents the key ideas applied to the design of high-speed, low-complexity FFT processor, especially that for achieving high throughput rate and reducing hardware complexity. The proposed FFT processor has been designed and implemented with the 0.18-m CMOS technology in a supply voltage of 1.8 V. The throughput rate of proposed FFT processor is up to 1 Gsample/s while it requires much smaller hardware complexity.

  • PDF

대규모 신경망 시뮬레이션을 위한 칩상 학습가능한 단일칩 다중 프로세서의 구현 (Design of a Dingle-chip Multiprocessor with On-chip Learning for Large Scale Neural Network Simulation)

  • 김종문;송윤선;김명원
    • 전자공학회논문지B
    • /
    • 제33B권2호
    • /
    • pp.149-158
    • /
    • 1996
  • In this paper we describe designing and implementing a digital neural chip and a parallel neural machine for simulating large scale neural netsorks. The chip is a single-chip multiprocessor which has four digiral neural processors (DNP-II) of the same architecture. Each DNP-II has program memory and data memory, and the chip operates in MIMD (multi-instruction, multi-data) parallel processor. The DNP-II has the instruction set tailored to neural computation. Which can be sed to effectively simulate various neural network models including on-chip learning. The DNP-II facilitates four-way data-driven communication supporting the extensibility of parallel systems. The parallel neural machine consists of a host computer, processor boards, a buffer board and an interface board. Each processor board consists of 8*8 array of DNP-II(equivalently 2*2 neural chips). Each processor board acn be built including linear array, 2-D mesh and 2-D torus. This flexibility supports efficiency of mapping from neural network models into parallel strucgure. The neural system accomplishes the performance of maximum 40 GCPS(giga connection per second) with 16 processor boards.

  • PDF

Novel Parallel Approach for SIFT Algorithm Implementation

  • Le, Tran Su;Lee, Jong-Soo
    • Journal of information and communication convergence engineering
    • /
    • 제11권4호
    • /
    • pp.298-306
    • /
    • 2013
  • The scale invariant feature transform (SIFT) is an effective algorithm used in object recognition, panorama stitching, and image matching. However, due to its complexity, real-time processing is difficult to achieve with current software approaches. The increasing availability of parallel computers makes parallelizing these tasks an attractive approach. This paper proposes a novel parallel approach for SIFT algorithm implementation using a block filtering technique in a Gaussian convolution process on the SIMD Pixel Processor. This implementation fully exposes the available parallelism of the SIFT algorithm process and exploits the processing and input/output capabilities of the processor, which results in a system that can perform real-time image and video compression. We apply this implementation to images and measure the effectiveness of such an approach. Experimental simulation results indicate that the proposed method is capable of real-time applications, and the result of our parallel approach is outstanding in terms of the processing performance.

예측정확도 향상 전략을 통한 예측기반 병렬 게이트수준 타이밍 시뮬레이션의 성능 개선 (Performance Improvement of Prediction-Based Parallel Gate-Level Timing Simulation Using Prediction Accuracy Enhancement Strategy)

  • 양세양
    • 정보처리학회논문지:컴퓨터 및 통신 시스템
    • /
    • 제5권12호
    • /
    • pp.439-446
    • /
    • 2016
  • 본 논문에서는 예측기반 병렬 이벤트구동 게이트수준 타이밍 시뮬레이션의 성능 개선을 위한 효율적인 예측정확도 향상 전략을 제안한다. 제안된 기법은 병렬 이벤트구동 로컬시뮬레이션들의 입력값과 출력값에 대한 예측을 이중으로 예측할 뿐만 아니라, 특별한 상황에서는 동적으로 예측할 수 있게 한다. 이중 예측은 첫번째 예측이 틀린 경우에 두번째 정적 예측 데이터로써 새로운 예측을 시도하게 되며, 동적 예측은 실제의 병렬 시뮬레이션 실행 과정 도중에 동적으로 축적되어진 지금까지의 시뮬레이션 결과를 예측 데이터로 활용하는 것이다. 제안된 두가지의 예측정확도 향상 기법은 병렬 시뮬레이션의 성능 향상의 제약 요소인 동기 오버헤드 및 통신 오버헤드를 크게 감소시킨다. 이 두가지 중요한 예측정확도 향상 방법을 통하여 6개의 디자인들에 대한 예측기반 병렬 이벤트구동 게이트수준 타이밍 시뮬레이션이 기존 통상적 방식의 상용 병렬 멀티-코어 시뮬레이션에 비하여 약 5배의 시뮬레이션 성능이 향상됨을 확인할 수 있었다.