• Title/Summary/Keyword: over-clustering

Search Result 390, Processing Time 0.024 seconds

A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing

  • Hyeonwoo Kim;Jiwon Kim;Ji Won Cho;Kwang-Sung Ahn;Dong-Il Park;Sangsoo Kim
    • Genomics & Informatics
    • /
    • v.21 no.3
    • /
    • pp.40.1-40.11
    • /
    • 2023
  • Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline's performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.

Local structural alignment and classification of TIM barrel domains

  • Keum, Chang-Won;Kim, Ji-Hong;Jung, Jong-Sun
    • Bioinformatics and Biosystems
    • /
    • v.1 no.2
    • /
    • pp.123-127
    • /
    • 2006
  • TIM barrel domain is widely studied since it is one of most common structure and mediates diverse function maintaining overall structure. TIM barrel domain's function is determined by local structural environment at the C-terminal end of barrel structure. We classified TIM barrel domains by local structural alignment tool, LSHEBA, to understand characteristics of TIM barrel domain's functionalvariation. TIM barrel domains classified as the same cluster share common structure, function and ligands. Over 80% of TIM barrels in clusters share exactly the same catalytic function. Comparing clustering result with that of SCOP, we found that it's important to know local structural environment of TIM barrel domains rather than overallstructure to understand specific structural detail of TIM barrel function. Non TIM barrel domains were associated to make different domain combination to form a different function. The relationship between domain combination, we suggested expected evolutional history. We finally analyzed the characteristics of amino acids around ligand interface.

  • PDF

Fast LBG Algorithm to Reduce the Computational Complexity

  • Kim Dong-Hyun;Kang Chul-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.4E
    • /
    • pp.123-127
    • /
    • 2005
  • In this paper, we propose a new method for reducing the number of distance calculations in the LBG (Linde, Buzo, Gray) algorithm, which is widely used method to construct a codebook in vector quantization of speech recognition system. The proposed algorithm can reduce the distance calculation between input vector and codeword by utilizing the observation that codewords are quickly stabilized as the number of iteration increases. From the simulation results, it is shown that we can reduce the running times over $43.77\%$ on average in comparison with current LBG algorithm without sacrificing the performance of codebook.

The Development of the Vehicles Information Detector (Al 기법을 이용한 차량 정보 수집 장비 개발)

  • Moon, Hak-Yong;Ryu, Seung-Ki;Kim, Young-Chun;Byeon, Sang-Cheol;Choi, Do-Hyuk
    • Proceedings of the KIEE Conference
    • /
    • 2002.07b
    • /
    • pp.1283-1285
    • /
    • 2002
  • This study is developed vehicle information detector using loop and piezo sensors. This study would analyze the over all problems concerning our road conditions, environmental matters and unique features of our traffic matters; moreover, with these it would develope the hardware, software, car classification algorithm applied by artificial intelligence and traffic monitoring program which can be easily fixed. This can be divided into traffic detecting algorithm and car classification algorithm. Especially, we have developed the car classification algorithm used by C-means Fuzzy Clustering method.

  • PDF

Performance of Seamless Handoff Scheme with Fast Moving Detection

  • Kim Dong Ok;Yoon Hong;Yoon Chong Hoo
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.588-591
    • /
    • 2004
  • This paper describes a new approach to Internet host mobility. We argue that local mobility, the performance of existing mobile host protocol can be significantly improved. It proposes Fast Moving Detection scheme that based on neighbor AP channel information and moving detection table. And, it composes Local Area Clustering Path (LACP) domain that collected in AP's channel information and MN interface information. It stored the roaming table to include channel information and moving detection. Those which use the proposal scheme will need to put LACP information into the beacon or probe frame. Each AP uses scheme to inform available channel information to MN. From the simulation result, we show that the proposed scheme is advantageous over the legacy schemes in terms of the burst blocking probability and the link utilization.

  • PDF

A clustered cyclic product code for the burst error correction in the DVCR systems (DVCR 시스템의 연집 오류 정정을 위한 클러스터 순환 프러덕트 부호)

  • 이종화;유철우;강창언;홍대식
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.34S no.2
    • /
    • pp.1-10
    • /
    • 1997
  • In this paper, an improved lower bound on the burst-error correcting capability of th ecyclic product code is presented and through the analysis of this new bound clustered cyclic product (CCP abbr.)code is proposed. The CCP code, to improve the burst-error correcting capability, combines the idea of clustering and the transmission method of cyclic product code. That is, a cluster which is defined in this paper as a group of consecutive code symbols is employed as a new transmission unit to the code array transmission of cyclic product code. the burst-error correcting capability of the CCP code is improved without a loss in the random-error correcting capability and performance comparison in the digital video camera records (DVCR) system shows the superiority of the proposed CCP code over conventional product codes.

  • PDF

Exact BER Expression of 2-1-1 Relaying Scheme in Wireless Sensor Networks

  • Kong, Hyung-Yun
    • Journal of electromagnetic engineering and science
    • /
    • v.9 no.3
    • /
    • pp.111-117
    • /
    • 2009
  • This paper presents an energy-efficient and bandwidth-efficient 2-1-1 relaying scheme in which a sensor node(SN) assists two others in their data transmission to a clusterhead in WSNs(Wireless Sensor Networks) using LEACH (Low-Energy Adaptive Clustering Hierarchy). We derive the closed-form BER expression of this scheme which is also a general BER one for the decode-and-forward cooperative protocol and prove that the proposed scheme performs the same as the conventional relaying scheme but obtains higher channel utilization efficiency. A variety of numerical results reveal the relaying can save the network energy up to 11 dB over single-hop transmission at BER of $10^{-3}$.

Network Anomaly Detection using Hybrid Feature Selection

  • Kim Eun-Hye;Kim Se-Hun
    • Proceedings of the Korea Institutes of Information Security and Cryptology Conference
    • /
    • 2006.06a
    • /
    • pp.649-653
    • /
    • 2006
  • In this paper, we propose a hybrid feature extraction method in which Principal Components Analysis is combined with optimized k-Means clustering technique. Our approach hierarchically reduces the redundancy of features with high explanation in principal components analysis for choosing a good subset of features critical to improve the performance of classifiers. Based on this result, we evaluate the performance of intrusion detection by using Support Vector Machine and a nonparametric approach based on k-Nearest Neighbor over data sets with reduced features. The Experiment results with KDD Cup 1999 dataset show several advantages in terms of computational complexity and our method achieves significant detection rate which shows possibility of detecting successfully attacks.

  • PDF

Efficient Superpixel Generation Method Based on Image Complexity

  • Park, Sanghyun
    • Journal of Multimedia Information System
    • /
    • v.7 no.3
    • /
    • pp.197-204
    • /
    • 2020
  • Superpixel methods are widely used in the preprocessing stage as a method to reduce computational complexity by simplifying images while maintaining the characteristics of the images in the computer vision applications. It is common to generate superpixels of similar size and shape based on the pixel values rather than considering the characteristics of the image. In this paper, we propose a method to control the sizes and shapes of generated superpixels, considering the contents of an image. The proposed method consists of two steps. The first step is to over-segment an image so that the boundary information of the image is well preserved. In the second step, generated superpixels are merged based on similarity to produce the target number of superpixels, where the shapes of superpixels are controlled by limiting the maximum size and the proposed roundness metric. Experimental results show that the proposed method preserves the boundaries of the objects in an image more accurately than the existing method.

LARGE-SCALE CLUSTERING OF GALAXIES IN THE CFA SURVEY

  • Park, Chang-Bom
    • Publications of The Korean Astronomical Society
    • /
    • v.7 no.1
    • /
    • pp.9-17
    • /
    • 1992
  • The power spectrum of the galaxy distribution is accurately measured up to wavelengths over $100\;h^{-1}$ Mpc from the CfA 1 and 2 catalogs. We find that our results agree with power spectra calculated by others from smaller samples of optical, radio and infrared galaxies. The power spectrum of an open CDM model (${\Omega}h$ = 0.2 and ${\delta}_8$ = 1; see below for definitions) best approximates the observed power spectrum. The power spectrum of the standard COM model (${\Omega}h$ = 0.5 and ${\delta}_8$ = 1) is inconsistent with the observed one at the 99% confidence level. Our best estimation of the corresponding correlation function in real space is ${\xi}(r)\;=\;(r/6.2h^{-1}Mpc)^{-1.8}$ for r < $20h^{-1}$ Mpc.

  • PDF