• Title/Summary/Keyword: 과학기술 데이터

Search Result 2,575, Processing Time 0.028 seconds

An Implementation of Efficient Quicksort Utilizing SIMD-Based VBP Technique (SIMD 기반의 VBP 기법을 적용한 효율적인 퀵정렬의 구현)

  • Hong, Gilseok;Kim, Hongyeon;Kang, Seonghyeon;Min, Jun-Ki
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.8
    • /
    • pp.498-503
    • /
    • 2017
  • SIMD (Single Instruction Multiple Data) is a representative parallelization architecture that processes multiple data loaded in a SIMD register with a single instruction. Quicksort is a sorting algorithm that picks an element as a pivot from the array and reorders the array such that all elements having the values less than the pivot value are located in the left side on the pivot as well as all elements having the value greater than the pivot value are located in the right side on the pivot and then the algorithm performs the same task on both sublist recursively. In this paper, we propose an efficient Quicksort algorithm applying the SIMD instructions which minimally invokes conditional branches to avoid the performance degradation incurred by branch misprediction in a pipeline architecture. In addition, we improve the performance of the Quicksort algorithm by fetching data into a SIMD register as a byte unit to apply VBP (Vertical Bit Parallel) and the early pruning technique.

Performance Optimization in GlusterFS on SSDs (SSD 환경 아래에서 GlusterFS 성능 최적화)

  • Kim, Deoksang;Eom, Hyeonsang;Yeom, Heonyoung
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.2
    • /
    • pp.95-100
    • /
    • 2016
  • In the current era of big data and cloud computing, the amount of data utilized is increasing, and various systems to process this big data rapidly are being developed. A distributed file system is often used to store the data, and glusterFS is one of popular distributed file systems. As computer technology has advanced, NAND flash SSDs (Solid State Drives), which are high performance storage devices, have become cheaper. For this reason, datacenter operators attempt to use SSDs in their systems. They also try to install glusterFS on SSDs. However, since the glusterFS is designed to use HDDs (Hard Disk Drives), when SSDs are used instead of HDDs, the performance is degraded due to structural problems. The problems include the use of I/O-cache, Read-ahead, and Write-behind Translators. By removing these features that do not fit SSDs which are advantageous for random I/O, we have achieved performance improvements, by up to 255% in the case of 4KB random reads, and by up to 50% in the case of 64KB random reads.

Experimental Analysis of Recent Works on the Overlap Phase of De Novo Sequence Assembly (De novo 시퀀스 어셈블리의 overlap 단계의 최근 연구 실험 분석)

  • Lim, Jihyuk;Kim, Sun;Park, Kunsoo
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.200-210
    • /
    • 2018
  • Given a set of DNA read sequences, de novo sequence assembly reconstructs a target sequence without a reference sequence. For reconstruction, the assembly needs the overlap phase, which computes all overlaps between every pair of reads. Since the overlap phase is the most time-consuming part of the whole assembly, the performance of the assembly depends on that of the overlap phase. There have been extensive studies on the overlap phase in various fields. Among them, three state-of-the-art results for the overlap phase are Readjoiner, SOF, and Lim-Park algorithm. Recently, a rapid development of sequencing technology has made it possible to produce a large read dataset at a low cost, and many platforms for generating a DNA read dataset have been developed. Since the platforms produce datasets with different statistical characteristics, a performance evaluation for the overlap phase should consider datasets with these characteristics. In this paper, we compare and analyze the performances of the three algorithms with various large datasets.

Gene filtering based on fuzzy pattern matching for whole genome micro array data analysis (마이크로어레이 데이터의 게놈수준 분석을 위한 퍼지 패턴 매칭에 의한 유전자 필터링)

  • Lee, Sun-A;Lee, Keon-Myung;Lee, Seung-Joo;Kim, Wun-Jea;Kim, Yong-June;Bae, Suk-Cheol
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.471-475
    • /
    • 2008
  • Microarray technology in biological science enables molecular level observations and analyses on the biological phenomina by allowing to measure the RNA expression profiles in cells. Microarray data analysis is applied in various purposes such as identifying significant genes which react to drug treatment, understanding the genome scale phenomina. In drug response experiments, the microarray-based gene expression analysis could provide meaningful information. It is sometimes needed to identify the genes which shows different expression behavior for treatment group and normal group each other. When the normal group shows the medium level expression, it is not easy to discriminate the group just by expression level comparison. This paper proposes a method which selects group-wise representative values for each gene and sets the value range of the groups in order to filter out the genes with specific pattern. It also shows some experiment results.

Flash Memory File System for Mobile Devices (이동 기기를 위한 플래시 메모리 파일 시스템)

  • Bae Young Hyun;Choi Jongmoo;Lee Donghee;Noh Sam H.;Min Sang Lyul
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.11 no.4
    • /
    • pp.368-380
    • /
    • 2005
  • File systems for flash memory that is widely used as a storage device for mobile devices should provide not only high-performance data reads and writes but also a guarantee on the data integrity even on a power failure. In this paper, we explain the design and implementation of a file system for flash memory that considers flash memory's physical characteristics and the data layout in the file system to give an optimized write performance. This file system guarantees the reliability against various system failures including a power failure by using the transaction concept in write processing. In addition, the file system minimizes the memory usage by using a simple static mapping. In the paper, we also describe the implementation of the file system and compare its performance with other existing flash memory ille systems.

Database Generation and Management System for Small-pixelized Airborne Target Recognition (미소 픽셀을 갖는 비행 객체 인식을 위한 데이터베이스 구축 및 관리시스템 연구)

  • Lee, Hoseop;Shin, Heemin;Shim, David Hyunchul;Cho, Sungwook
    • Journal of Aerospace System Engineering
    • /
    • v.16 no.5
    • /
    • pp.70-77
    • /
    • 2022
  • This paper proposes database generation and management system for small-pixelized airborne target recognition. The proposed system has five main features: 1) image extraction from in-flight test video frames, 2) automatic image archiving, 3) image data labeling and Meta data annotation, 4) virtual image data generation based on color channel convert conversion and seamless cloning and 5) HOG/LBP-based tiny-pixelized target augmented image data. The proposed framework is Python-based PyQt5 and has an interface that includes OpenCV. Using video files collected from flight tests, an image dataset for airborne target recognition on generates by using the proposed system and system input.

A Distributed Activity Recognition Algorithm based on the Hidden Markov Model for u-Lifecare Applications (u-라이프케어를 위한 HMM 기반의 분산 행위 인지 알고리즘)

  • Kim, Hong-Sop;Yim, Geo-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.5
    • /
    • pp.157-165
    • /
    • 2009
  • In this paper, we propose a distributed model that recognize ADLs of human can be occurred in daily living places. We collect and analyze user's environmental, location or activity information by simple sensor attached home devices or utensils. Based on these information, we provide a lifecare services by inferring the user's life pattern and health condition. But in order to provide a lifecare services well-refined activity recognition data are required and without enough inferred information it is very hard to build an ADL activity recognition model for high-level situation awareness. The sequence that generated by sensors are very helpful to infer the activities so we utilize the sequence to analyze an activity pattern and propose a distributed linear time inference algorithm. This algorithm is appropriate to recognize activities in small area like home, office or hospital. For performance evaluation, we test with an open data from MIT Media Lab and the recognition result shows over 75% accuracy.

An Index Structure for Efficiently Handling Dynamic User Preferences and Multidimensional Data (다차원 데이터 및 동적 이용자 선호도를 위한 색인 구조의 연구)

  • Choi, Jong-Hyeok;Yoo, Kwan-Hee;Nasridinov, Aziz
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.7
    • /
    • pp.925-934
    • /
    • 2017
  • R-tree is index structure which is frequently used for handling spatial data. However, if the number of dimensions increases, or if only partial dimensions are used for searching the certain data according to user preference, the time for indexing is greatly increased and the efficiency of the generated R-tree is greatly reduced. Hence, it is not suitable for the multidimensional data, where dimensions are continuously increasing. In this paper, we propose a multidimensional hash index, a new multidimensional index structure based on a hash index. The multidimensional hash index classifies data into buckets of euclidean space through a hash function, and then, when an actual search is requested, generates a hash search tree for effective searching. The generated hash search tree is able to handle user preferences in selected dimensional space. Experimental results show that the proposed method has better indexing performance than R-tree, while maintaining the similar search performance.

Korean Hedge Detection Using Word Usage Information and Neural Networks (단어 쓰임새 정보와 신경망을 활용한 한국어 Hedge 인식)

  • Ren, Mei-Ying;Kang, Sin-jae
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.9
    • /
    • pp.317-325
    • /
    • 2017
  • In this paper, we try to classify Korean hedge sentences, which are regarded as not important since they express uncertainties or personal assumptions. Through previous researches to English language, we found dependency information of words has been one of important features in hedge classification, but not used in Korean researches. Additionally, we found that word embedding vectors include the word usage information. We assume that the word usage information could somehow represent the dependency information. Therefore, we utilized word embedding and neural networks in hedge sentence classification. We used more than one and half million sentences as word embedding dataset and also manually constructed 12,517-sentence hedge classification dataset obtained from online news. We used SVM and CRF as our baseline systems and the proposed system outperformed SVM by 7.2%p and also CRF by 1.2%p. This indicates that word usage information has positive impacts on Korean hedge classification.

A Study on the Characteristics of Cyanobacteria in the Downstream of Nakdong River Considering the Meteorological Effects (기상학적 영향을 고려한 낙동강 하류 녹조 발생특성 연구)

  • Jung, Woo Suk;Kim, Young Do;Kim, Sung Eun;Ki, Seo Jin
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.110-110
    • /
    • 2020
  • 최근 낙동강유역에서는 여름철 폭염 및 가뭄의 영향으로 조류대경보가 발령되고 있으며, 급격한 수질환경적 변화가 이루어지고 있다. 본 연구대상유역인 낙동강에서도 가뭄으로 인해 녹조가 발생하여 조류경보가 발령되었다. 남조류의 대발생은 대량 번성 및 사멸에 따라 수체 내 산소 고갈 및 유기물 증가와 같은 문제를 야기하고 있다. 또한 남조류가 분비하는 독성물질 또한 수생태계와 인체에 유해하다. 그리고 인체에는 무해하다고 밝혀졌지만 수돗물 등에서 흙냄새와 같은 좋지 않은 냄새를 유발하는 냄새물인 지오스민, 2-MIB을 분비하여 정수공급체계의 악영향을 미친다. 본 연구대상 지점인 낙동강은 다기능 보 건설로 인해 하천 수심이 증가하고 유속이 느려지면서 정체성 수역 특성을 나타내고 있다. 이는 호소성 수역 특성을 나타내고 있음과 동시에 녹조발생과 같은 수질환경적 변화가 이루어지고 있다는 것을 의미한다. 본 연구에서 시각화 분석을 통해 낙동강 하류 남조류 발생현황을 분석하였으며, 랜덤포레스트를 이용하여 지점별 남조류 발생 주요 영향인자를 도출하였다. 조류경보제 발생 등급은 발령기준으로 관심, 위험, 대발생으로 구분된다. 학습데이터로 관심단계 기준인 남조류세포수 1,000 cell/mL 보다 작게 측정된 데이터들은 관심미만의 데이터로 Normal 등급으로 구분하였다. 구분된 발생등급을 범주형 변수로 설정하여 학습 데이터를 통해 모형을 구축하고 검증 데이터를 이용하여 모형 정확성을 평가하였다. 본 연구를 통해 조류발생 주요 영향인자를 도출하고 변수별 중요도를 평가를 통해 지점별 녹조 발생특성을 비교 분석하였다.

  • PDF