• Title/Summary/Keyword: Parallel processing model

Search Result 336, Processing Time 0.031 seconds

Efficient Collaboration Method Between CPU and GPU for Generating All Possible Cases in Combination (조합에서 모든 경우의 수를 만들기 위한 CPU와 GPU의 효율적 협업 방법)

  • Son, Ki-Bong;Son, Min-Young;Kim, Young-Hak
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.9
    • /
    • pp.219-226
    • /
    • 2018
  • One of the systematic ways to generate the number of all cases is a combination to construct a combination tree, and its time complexity is O($2^n$). A combination tree is used for various purposes such as the graph homogeneity problem, the initial model for calculating frequent item sets, and so on. However, algorithms that must search the number of all cases of a combination are difficult to use realistically due to high time complexity. Nevertheless, as the amount of data becomes large and various studies are being carried out to utilize the data, the number of cases of searching all cases is increasing. Recently, as the GPU environment becomes popular and can be easily accessed, various attempts have been made to reduce time by parallelizing algorithms having high time complexity in a serial environment. Because the method of generating the number of all cases in combination is sequential and the size of sub-task is biased, it is not suitable for parallel implementation. The efficiency of parallel algorithms can be maximized when all threads have tasks with similar size. In this paper, we propose a method to efficiently collaborate between CPU and GPU to parallelize the problem of finding the number of all cases. In order to evaluate the performance of the proposed algorithm, we analyze the time complexity in the theoretical aspect, and compare the experimental time of the proposed algorithm with other algorithms in CPU and GPU environment. Experimental results show that the proposed CPU and GPU collaboration algorithm maintains a balance between the execution time of the CPU and GPU compared to the previous algorithms, and the execution time is improved remarkable as the number of elements increases.

A Design of the OOPP(Optimized Online Portfolio Platform) using Enterprise Competency Information (기업 직무 정보를 활용한 OOPP(Optimized Online Portfolio Platform)설계)

  • Jung, Bogeun;Park, Jinuk;Lee, ByungKwan
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.5
    • /
    • pp.493-506
    • /
    • 2018
  • This paper proposes the OOPP(Optimized Online Portfolio Platform) design for the job seekers to search for the job competency necessary for employment and to write and manage portfolio online efficiently. The OOPP consists of three modules. First, JDCM(Job Data Collection Module) stores the help-wanted advertisements of job information sites in a spreadsheet. Second, CSM(Competency Statistical Model) classifies core competencies for each job by text-mining the collected help-wanted ads. Third, OBBM(Optimize Browser Behavior Module) makes users to look up data rapidly by improving the processing speed of a browser. In addition, The OBBM consists of the PSES(Parallel Search Engine Sub-Module) optimizing the computation of a Search Engine and the OILS(Optimized Image Loading Sub-Module) optimizing the loading of image text, etc. The performance analysis of the CSM shows that there is little difference in accuracy between the CSM and the actual advertisement because its data accuracy is 99.4~100%. If Browser optimization is done by using the OBBM, working time is reduced by about 68.37%. Therefore, the OOPP makes users look up the analyzed result in the web page rapidly by analyzing the help-wanted ads. of job information sites accurately.

Design of a CAM-Type Traffic Policing Controller with minimum additional delay (시간지연을 최소화한 CAM형 트래픽 폴리싱 장치 설계)

  • 정윤찬;홍영진
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.4B
    • /
    • pp.604-612
    • /
    • 2000
  • In order to satisfy the desired QoS level associated with each existing connection, ATM networks require traffic policing during a connection. Users who respect the contract should receive the function of transparent traffic policing without any interruption. However, contract violations should be detected and mediated immediately. So we propose a CAM type policing controller to allow user cell streams to minimize additional delay. The proposed policing scheme controls policing actions including traffic shaping by suitably spacing cells on each virtual circuit. This policing action is based on parallel processing of multiple cell stream which arrive in ATM multiplexed virtual circuits. We have developed an analytical model of the proposed policing scheme to examine the amount of cell loss and delay, which depends on traffic load, the size of policing buffers and minimum spacing cell time.

  • PDF

Design and Implementation of KDSM(KAIST Distributed Shared Memory) System (KDSM(KAIST Distributed Shared Memory) 시스템의 설계 및 구현)

  • Lee, Sang-Kwon;Yun, Hee-Chul;Lee, Joon-Won;Maeng, Seung-Ryoul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.5
    • /
    • pp.257-264
    • /
    • 2002
  • In this paper, we give a detailed description of KDSM(KAIST Distributed Shared Memory) system. KDSM is implemented as a user-level library running on Linux 2.2.13, and TCP/IP is used for communication. KDSM uses page-based invalidation protocol, multiple-writer protocol, and supports HLRC(Home-based Lazy Release Consistency) memory consistency model. To evaluate performance of KDSM, we executed 4 scientific applications and compared the result to JLAJLA. The results showed that performance of KDSM almost equal to JIAJIA for 2 applications and performance of KDSM is better than JIAJIA for 2 applications.

Intelligent Tuning Of a PID Controller Using Immune Algorithm (면역 알고리즘을 이용한 PID 제어기의 지능 튜닝)

  • Kim, Dong-Hwa
    • The Transactions of the Korean Institute of Electrical Engineers D
    • /
    • v.51 no.1
    • /
    • pp.8-17
    • /
    • 2002
  • This paper suggests that the immune algorithm can effectively be used in tuning of a PID controller. The artificial immune network always has a new parallel decentralized processing mechanism for various situations, since antibodies communicate to each other among different species of antibodies/B-cells through the stimulation and suppression chains among antibodies that form a large-scaled network. In addition to that, the structure of the network is not fixed, but varies continuously. That is, the artificial immune network flexibly self-organizes according to dynamic changes of external environment (meta-dynamics function). However, up to the present time, models based on the conventional crisp approach have been used to describe dynamic model relationship between antibody and antigen. Therefore, there are some problems with a less flexible result to the external behavior. On the other hand, a number of tuning technologies have been considered for the tuning of a PID controller. As a less common method, the fuzzy and neural network or its combined techniques are applied. However, in the case of the latter, yet, it is not applied in the practical field, in the former, a higher experience and technology is required during tuning procedure. In addition to that, tuning performance cannot be guaranteed with regards to a plant with non-linear characteristics or many kinds of disturbances. Along with these, this paper used immune algorithm in order that a PID controller can be more adaptable controlled against the external condition, including moise or disturbance of plant. Parameters P, I, D encoded in antibody randomly are allocated during selection processes to obtain an optimal gain required for plant. The result of study shows the artificial immune can effectively be used to tune, since it can more fit modes or parameters of the PID controller than that of the conventional tuning methods.

A Study on the Full-HD HEVC Encoder IP Design (고해상도 비디오 인코더 IP 설계에 대한 연구)

  • Lee, Sukho;Cho, Seunghyun;Kim, Hyunmi;Lee, Jehyun
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.12
    • /
    • pp.167-173
    • /
    • 2015
  • This paper presents a study on the Full-HD HEVC(High Efficiency Video Coding) encoder IP(Intellectual Property) design. The designed IP is for HEVC main profile 4.1, and performs encoding with a speed of 60 fps of full high definition. Before hardware and software design, overall reference model was developed with C language, and we proposed a parallel processing architecture for low-power consumption. And also we coded firmware and driver programs relating IP. The platform for verification of developed IP was developed, and we verified function and performance for various pictures under several encoding conditions by implementing designed IP to FPGA board. Compared to HM-13.0, about 35% decrease in bit-rate under same PSNR was achieved, and about 25% decrease in power consumption under low-power mode was performed.

A Study on the Design of a RISC core with DSP Support (DSP기능을 강화한 RISC 프로세서 core의 ASIC 설계 연구)

  • 김문경;정우경;이용석;이광엽
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.26 no.11C
    • /
    • pp.148-156
    • /
    • 2001
  • This paper proposed embedded application-specific microprocessor(YS-RDSP) whose structure has an additional DSP processor on chip. The YS-RDSP can execute maximum four instructions in parallel. To make program size shorter, 16-bit and 32-bit instruction lengths are supported in YS-RDSP. The YS-RDSP provides programmability. controllability, DSP processing ability, and includes eight-kilobyte on-chip ROM and eight-kilobyte RAM. System controller on the chip gives three power-down modes for low-power operation, and SLEEP instruction changes operation statue of CPU core and peripherals. YS-RDSP processor was implemented with Verilog HDL on top-down methodology, and it was improved and verified by cycle-based simulator written in C-language. The verified model was synthesized with 0.7um, 3.3V CMOS standard cell library, and the layout size was 10.7mm78.4mm which was implemented by using automatic P&R software.

  • PDF

The Study of fire Driven flow and Smoke Exhaust Efficiency for PSD Installation Subway Station (PSD 설치역사의 화재유동 및 배연 효율 연구)

  • Jang, Yong-Jun;Lee, Chang-Hyun;Kim, Hag-Beom;Kim, Jin-Ho
    • Proceedings of the KSR Conference
    • /
    • 2009.05a
    • /
    • pp.1054-1061
    • /
    • 2009
  • This research was performed with emphasis on fire driven flow behavior and smoke exhaust efficiency which depend on the presence of PSD which are being installed domestically and overseas. For simulation, Jung-ang-ro station of Dae-gu subway station was chosen as model, and fire driven flow analysis was performed by using FDS as flow analysis code. Since many calculation time are required for calculation due to increase in the number of grid as the entire station is modeled, simulation was conducted in parallel processing technique. The fire driven flow analysis was analyzed case by case with composing fire scenario to compare fire driven flow and smoke exhaust efficiency changes depending on the presence of PSD. For fire scale, fire strength of 10MW was studied by referring to NFPA-l30. The calculation results were analyzed with focus on passenger safety by referring to NFPA-130.

  • PDF

Real-time Eye Contact System Using a Kinect Depth Camera for Realistic Telepresence (Kinect 깊이 카메라를 이용한 실감 원격 영상회의의 시선 맞춤 시스템)

  • Lee, Sang-Beom;Ho, Yo-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.4C
    • /
    • pp.277-282
    • /
    • 2012
  • In this paper, we present a real-time eye contact system for realistic telepresence using a Kinect depth camera. In order to generate the eye contact image, we capture a pair of color and depth video. Then, the foreground single user is separated from the background. Since the raw depth data includes several types of noises, we perform a joint bilateral filtering method. We apply the discontinuity-adaptive depth filter to the filtered depth map to reduce the disocclusion area. From the color image and the preprocessed depth map, we construct a user mesh model at the virtual viewpoint. The entire system is implemented through GPU-based parallel programming for real-time processing. Experimental results have shown that the proposed eye contact system is efficient in realizing eye contact, providing the realistic telepresence.

Two-Level Hierarchical Production Planning for a Semiconductor Probing Facility (반도체 프로브 공정에서의 2단계 계층적 생산 계획 방법 연구)

  • Bang, June-Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.4
    • /
    • pp.159-167
    • /
    • 2015
  • We consider a wafer lot transfer/release planning problem from semiconductor wafer fabrication facilities to probing facilities with the objective of minimizing the deviation of workload and total tardiness of customers' orders. Due to the complexity of the considered problem, we propose a two-level hierarchical production planning method for the lot transfer problem between two parallel facilities to obtain an executable production plan and schedule. In the higher level, the solution for the reduced mathematical model with Lagrangian relaxation method can be regarded as a coarse good lot transfer/release plan with daily time bucket, and discrete-event simulation is performed to obtain detailed lot processing schedules at the machines with a priority-rule-based scheduling method and the lot transfer/release plan is evaluated in the lower level. To evaluate the performance of the suggested planning method, we provide computational tests on the problems obtained from a set of real data and additional test scenarios in which the several levels of variations are added in the customers' demands. Results of computational tests showed that the proposed lot transfer/planning architecture generates executable plans within acceptable computational time in the real factories and the total tardiness of orders can be reduced more effectively by using more sophisticated lot transfer methods, such as considering the due date and ready times of lots associated the same order with the mathematical formulation. The proposed method may be implemented for the problem of job assignment in back-end process such as the assignment of chips to be tested from assembly facilities to final test facilities. Also, the proposed method can be improved by considering the sequence dependent setup in the probing facilities.