• Title/Summary/Keyword: Parallel Data Communication

Search Result 342, Processing Time 0.031 seconds

Spatial Computation on Spark Using GPGPU (GPGPU를 활용한 스파크 기반 공간 연산)

  • Son, Chanseung;Kim, Daehee;Park, Neungsoo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.8
    • /
    • pp.181-188
    • /
    • 2016
  • Recently, as the amount of spatial information increases, an interest in the study of spatial information processing has been increased. Spatial database systems extended from the traditional relational database systems are difficult to handle large data sets because of the scalability. SpatialHadoop extended from Hadoop system has a low performance, because spatial computations in SpationHadoop require a lot of write operations of intermediate results to the disk, resulting in the performance degradation. In this paper, Spatial Computation Spark(SC-Spark) is proposed, which is an in-memory based distributed processing framework. SC-Spark is extended from Spark in order to efficiently perform the spatial operation for large-scale data. In addition, SC-Spark based on the GPGPU is developed to improve the performance of the SC-Spark. SC-Spark uses the advantage of the Spark holding intermediate results in the memory. And GPGPU-based SC-Spark can perform spatial operations in parallel using a plurality of processing elements of an GPU. To verify the proposed work, experiments on a single AMD system were performed using SC-Spark and GPGPU-based SC-Spark for Point-in-Polygon and spatial join operation. The experimental results showed that the performance of SC-Spark and GPGPU-based SC-Spark were up-to 8 times faster than SpatialHadoop.

[ $8{\sim}10.9$ ]-GHz-Band New LC Oscillator with Low Phase-Noise and Wide Tuning Range for SONET communication (SONET 통신 시스템을 위한 $8{\sim}10.9$ GHz 저 위상 잡음과 넓은 튜닝 범위를 갖는 새로운 구조의 LC VCO 설계)

  • Kim, Seung-Hoon;Cho, Hyo-Moon;Cho, Sang-Bock
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.1
    • /
    • pp.50-55
    • /
    • 2008
  • In this paper, New LC VCO with $8{\sim}10.9$ GHz Band has been designed using commercial $0.35-{\mu}m$ CMOS technology. This proposed circuit is consisted of the parallel construction of the typical NMOS and PMOS cross-coupled pair which is based on the LC tank, MOS cross-coupled pair which has same tail current of complementary NMOS and PMOS, and output buffer. The designed LC VCO, which is according to proposed structure in this paper, takes a 29% improvement of the wide tuning range as 8 GHz to 10.9 GHz, and a 6.48mW of low power dissipation. Its core size is $270{\mu}m{\times}340{\mu}m$ and its phase noise is as -117dBc Hz and -137dBc Hz at 1-MHz and 10-MHz offset, respectively. FOM of the new proposed LC VCO gets -189dBc/Hz at a 1-MHz offset from a 10GHz center frequency. This design is very useful for the 10Gb/s clock generator and data recovery integrated circuit(IC) and SONET communication applications.

Immersive Smart Balance Board with Multiple Feedback (다중 피드백을 지원하는 몰입형 스마트 밸런스 보드)

  • Seung-Yong Lee;Seonho Lee;Junesung Park;Min-Chul Shin;Seung-Hyun Yoon
    • Journal of the Korea Computer Graphics Society
    • /
    • v.30 no.3
    • /
    • pp.171-178
    • /
    • 2024
  • Exercises using a Balance Board (BB) are effective in developing balance, strengthening core muscles, and improving physical fitness and concentration. In particular, the Smart Balance Board (SBB), which integrates with various digital content, provides appropriate feedback compared to traditional balance boards, maximizing the effectiveness of the exercise. However, most systems only offer visual and auditory feedback, failing to evaluate the impact on user engagement, interest, and the accuracy of exercise postures. This study proposes an Immersive Smart Balance Board (I-SBB) that utilizes multiple sensors to enable training with various feedback mechanisms and precise postures. The proposed system, based on Arduino, consists of a gyro sensor for measuring the board's posture, a communication module for wired/wireless communication, an infrared sensor to guide the user's foot placement, and a vibration motor for tactile feedback. The board's posture measurements are smoothly corrected using a Kalman Filter, and the multi-sensor data is processed in real-time using FreeRTOS. The proposed I-SBB is shown to be effective in enhancing user concentration and engagement, as well as generating interest, by integrating with diverse content.

Design and Implementation of An I/O System for Irregular Application under Parallel System Environments (병렬 시스템 환경하에서 비정형 응용 프로그램을 위한 입출력 시스템의 설계 및 구현)

  • No, Jae-Chun;Park, Seong-Sun;;Gwon, O-Yeong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.11
    • /
    • pp.1318-1332
    • /
    • 1999
  • 본 논문에서는 입출력 응용을 위해 collective I/O 기법을 기반으로 한 실행시간 시스템의 설계, 구현 그리고 그 성능평가를 기술한다. 여기서는 모든 프로세서가 동시에 I/O 요구에 따라 스케쥴링하며 I/O를 수행하는 collective I/O 방안과 프로세서들이 여러 그룹으로 묶이어, 다음 그룹이 데이터를 재배열하는 통신을 수행하는 동안 오직 한 그룹만이 동시에 I/O를 수행하는 pipelined collective I/O 등의 두 가지 설계방안을 살펴본다. Pipelined collective I/O의 전체 과정은 I/O 노드 충돌을 동적으로 줄이기 위해 파이프라인된다. 이상의 설계 부분에서는 동적으로 충돌 관리를 위한 지원을 제공한다. 본 논문에서는 다른 노드의 메모리 영역에 이미 존재하는 데이터를 재 사용하여 I/O 비용을 줄이기 위해 collective I/O 방안에서의 소프트웨어 캐슁 방안과 두 가지 모형에서의 chunking과 온라인 압축방안을 기술한다. 그리고 이상에서 기술한 방안들이 입출력을 위해 높은 성능을 보임을 기술하는데, 이 성능결과는 Intel Paragon과 ASCI/Red teraflops 기계 상에서 실험한 것이다. 그 결과 응용 레벨에서의 bandwidth는 peak point가 55%까지 측정되었다.Abstract In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications. We present two designs, namely, "Collective I/O" and "Pipelined Collective I/O". In the first scheme, all processors participate in the I/O simultaneously, making scheduling of I/O requests simpler but creating a possibility of contention at the I/O nodes. In the second approach, processors are grouped into several groups, so that only one group performs I/O simultaneously, while the next group performs communication to rearrange data, and this entire process is pipelined to reduce I/O node contention dynamically. In other words, the design provides support for dynamic contention management. Then we present a software caching method using collective I/O to reduce I/O cost by reusing data already present in the memory of other nodes. Finally, chunking and on-line compression mechanisms are included in both models. We demonstrate that we can obtain significantly high-performance for I/O above what has been possible so far. The performance results are presented on an Intel Paragon and on the ASCI/Red teraflops machine. Application level I/O bandwidth up to 55% of the peak is observed.he peak is observed.

Simulation of YUV-Aware Instructions for High-Performance, Low-Power Embedded Video Processors (고성능, 저전력 임베디드 비디오 프로세서를 위한 YUV 인식 명령어의 시뮬레이션)

  • Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.5
    • /
    • pp.252-259
    • /
    • 2007
  • With the rapid development of multimedia applications and wireless communication networks, consumer demand for video-over-wireless capability on mobile computing systems is growing rapidly. In this regard, this paper introduces YUV-aware instructions that enhance the performance and efficiency in the processing of color image and video. Traditional multimedia extensions (e.g., MMX, SSE, VIS, and AltiVec) depend solely on generic subword parallelism whereas the proposed YUV-aware instructions support parallel operations on two-packed 16-bit YUV (6-bit Y, 5-bits U, V) values in a 32-bit datapath architecture, providing greater concurrency and efficiency for color image and video processing. Moreover, the ability to reduce data format size reduces system cost. Experiment results on a representative dynamically scheduled embedded superscalar processor show that YUV-aware instructions achieve an average speedup of 3.9x over the baseline superscalar performance. This is in contrast to MMX (a representative Intel#s multimedia extension), which achieves a speedup of only 2.1x over the same baseline superscalar processor. In addition, YUV-aware instructions outperform MMX instructions in energy reduction (75.8% reduction with YUV-aware instructions, but only 54.8% reduction with MMX instructions over the baseline).

Design and Implementation of 8b/10b Encoder/Decoder for Serial ATA (직렬 ATA용 8b/10b 인코더와 디코더 설계 및 구현)

  • Heo Jung-Hwa;Park Nho-Kyung;Park Sang-Bong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1A
    • /
    • pp.93-98
    • /
    • 2004
  • Serial ATA interface Is inexpensive comparatively and performance is superior. So it is suitable technology in demand that now require data transmission and throughput of high speed. This paper describes a design and implementation of Serial ATA Link layer about error detection and 8b/10b encoder/decoder for DC balance in frequency 150MHz. The 8b/10b Encoder is partitioned into a 5b/6b plus a 3b/4b coder. The logical model of the block is described by using Verilog HDL at register transistor level and the verified HDL is synthesized using standard cell libraries. And it is fabricated with $0.35{\mu}m$ Standard CMOS Cell library and the chip size is about $1500{\mu}m\;*\;1500{\mu}m$. The function of this chip has been verified and tested using testboard with FPGA equipment and IDEC ATS2 test equipment. It is used to frequency of 100MHz in verification processes and supply voltage 3.3V. The result of testing is well on the system clock 100MHz. The designed and verified each blocks may be used IP in the field of high speed serial data communication.

Design of Binary Constant Envelope System using the Pre-Coding Scheme in the Multi-User CDMA Communication System (다중 사용자 CDMA 통신 시스템에서 프리코딩 기법을 사용한 2진 정진폭 시스템 설계)

  • 김상우;유흥균;정순기;이상태
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.15 no.5
    • /
    • pp.486-492
    • /
    • 2004
  • In this paper, we newly propose the binary CA-CDMA(constant amplitude CDMA) system using pre-coding method to solve the high PAPR problem caused by multi-user signal transmission in the CDMA system. 4-user CA-CDMA, the basis of proposed binary CA-CDMA system, makes binary output signal for 4 input users. It produces the output of binary(${\pm}$2) amplitude by using a parity signal resulting from the XOR operation of 4 users data. Another sub-channel or more bandwidth is not necessary because it is transmitted together with user data and can be easily recovered in the receiver. The extension of the number of users can be possible by the simple repetition of the basic binary 4-user CA-CDMA. For example, binary 16-user CA-CDMA is made easily by allocating the four 4-user CA-CDMA systems in parallel and leading the four outputs to the fifth 4-user CA-CDMA system as input, because the output signal of each 4-user CA-CDMA is also binary. By the same extension procedure, binary 64 and 256-user CA-CDMA systems can be made with the constant amplitude. As a result, the code rate of this proposed CA-CDMA system is just 1 and binary CA-CDMA does not change the transmission rate with the constant output signal(PAPR = 0 ㏈). Therefore, the power efficiency of the HPA can be maximized without the nonlinear distortion. From the simulation results, it is verified that the conventional CDMA system has multi-level output signal, but the proposed binary CA-CDMA system always produces binary output. And it is also found that the BER of conventional CDMA system is increased by nonlinear HPA, but the BER of proposed binary CA-CDMA system is not changed.

A Study on Motion Estimation Encoder Supporting Variable Block Size for H.264/AVC (H.264/AVC용 가변 블록 크기를 지원하는 움직임 추정 부호기의 연구)

  • Kim, Won-Sam;Sohn, Seung-Il
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.10
    • /
    • pp.1845-1852
    • /
    • 2008
  • The key elements of inter prediction are motion estimation(ME) and motion compensation(MC). Motion estimation is to find the optimum motion vectors, not only by using a distance criteria like the SAD, but also by taking into account the resulting number of 비트s in the 비트 stream. Motion compensation is compensate for movement of blocks of current frame. Inter-prediction Encoding is always the main bottleneck in high-quality streaming applications. Therefore, in real-time streaming applications, dedicated hardware for executing Inter-prediction is required. In this paper, we studied a motion estimator(ME) for H.264/AVC. The designed motion estimator is based on 2-D systolic array and it connects processing elements for fast SAD(Sum of Absolute Difference) calculation in parallel. By providing different path for the upper and lower lesion of each reference data and adjusting the input sequence, consecutive calculation for motion estimation is executed without pipeline stall. With data reuse technique, it reduces memory access, and there is no extra delay for finding optimal partitions and motion vectors. The motion estimator supports variable-block size and takes 328 cycles for macro-block calculation. The proposed architecture is local memory-free different from paper [6] using local memory. This motion estimation encoder can be applicable to real-time video processing.

Analysis of Distributed Computational Loads in Large-scale AC/DC Power System using Real-Time EMT Simulation (대규모 AC/DC 전력 시스템 실시간 EMP 시뮬레이션의 부하 분산 연구)

  • In Kwon, Park;Yi, Zhong Hu;Yi, Zhang;Hyun Keun, Ku;Yong Han, Kwon
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.8 no.2
    • /
    • pp.159-179
    • /
    • 2022
  • Often a network becomes complex, and multiple entities would get in charge of managing part of the whole network. An example is a utility grid. While the entire grid would go under a single utility company's responsibility, the network is often split into multiple subsections. Subsequently, each subsection would be given as the responsibility area to the corresponding sub-organization in the utility company. The issue of how to make subsystems of adequate size and minimum number of interconnections between subsystems becomes more critical, especially in real-time simulations. Because the computation capability limit of a single computation unit, regardless of whether it is a high-speed conventional CPU core or an FPGA computational engine, it comes with a maximum limit that can be completed within a given amount of execution time. The issue becomes worsened in real time simulation, in which the computation needs to be in precise synchronization with the real-world clock. When the subject of the computation allows for a longer execution time, i.e., a larger time step size, a larger portion of the network can be put on a computation unit. This translates into a larger margin of the difference between the worst and the best. In other words, even though the worst (or the largest) computational burden is orders of magnitude larger than the best (or the smallest) computational burden, all the necessary computation can still be completed within the given amount of time. However, the requirement of real-time makes the margin much smaller. In other words, the difference between the worst and the best should be as small as possible in order to ensure the even distribution of the computational load. Besides, data exchange/communication is essential in parallel computation, affecting the overall performance. However, the exchange of data takes time. Therefore, the corresponding consideration needs to be with the computational load distribution among multiple calculation units. If it turns out in a satisfactory way, such distribution will raise the possibility of completing the necessary computation in a given amount of time, which might come down in the level of microsecond order. This paper presents an effective way to split a given electrical network, according to multiple criteria, for the purpose of distributing the entire computational load into a set of even (or close to even) sized computational loads. Based on the proposed system splitting method, heavy computation burdens of large-scale electrical networks can be distributed to multiple calculation units, such as an RTDS real time simulator, achieving either more efficient usage of the calculation units, a reduction of the necessary size of the simulation time step, or both.

Extending StarGAN-VC to Unseen Speakers Using RawNet3 Speaker Representation (RawNet3 화자 표현을 활용한 임의의 화자 간 음성 변환을 위한 StarGAN의 확장)

  • Bogyung Park;Somin Park;Hyunki Hong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.7
    • /
    • pp.303-314
    • /
    • 2023
  • Voice conversion, a technology that allows an individual's speech data to be regenerated with the acoustic properties(tone, cadence, gender) of another, has countless applications in education, communication, and entertainment. This paper proposes an approach based on the StarGAN-VC model that generates realistic-sounding speech without requiring parallel utterances. To overcome the constraints of the existing StarGAN-VC model that utilizes one-hot vectors of original and target speaker information, this paper extracts feature vectors of target speakers using a pre-trained version of Rawnet3. This results in a latent space where voice conversion can be performed without direct speaker-to-speaker mappings, enabling an any-to-any structure. In addition to the loss terms used in the original StarGAN-VC model, Wasserstein distance is used as a loss term to ensure that generated voice segments match the acoustic properties of the target voice. Two Time-Scale Update Rule (TTUR) is also used to facilitate stable training. Experimental results show that the proposed method outperforms previous methods, including the StarGAN-VC network on which it was based.