• Title/Summary/Keyword: Candidate Clustering

Search Result 83, Processing Time 0.024 seconds

misMM: An Integrated Pipeline for Misassembly Detection Using Genotyping-by-Sequencing and Its Validation with BAC End Library Sequences and Gene Synteny

  • Ko, Young-Joon;Kim, Jung Sun;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • v.15 no.4
    • /
    • pp.128-135
    • /
    • 2017
  • As next-generation sequencing technologies have advanced, enormous amounts of whole-genome sequence information in various species have been released. However, it is still difficult to assemble the whole genome precisely, due to inherent limitations of short-read sequencing technologies. In particular, the complexities of plants are incomparable to those of microorganisms or animals because of whole-genome duplications, repeat insertions, and Numt insertions, etc. In this study, we describe a new method for detecting misassembly sequence regions of Brassica rapa with genotyping-by-sequencing, followed by MadMapper clustering. The misassembly candidate regions were cross-checked with BAC clone paired-ends library sequences that have been mapped to the reference genome. The results were further verified with gene synteny relations between Brassica rapa and Arabidopsis thaliana. We conclude that this method will help detect misassembly regions and be applicable to incompletely assembled reference genomes from a variety of species.

Design Pattern Base4 Component Classification and Retrieval using E-SARM (설계 패턴 기반 컴포넌트 분류와 E-SARM을 이용한 검색)

  • Kim, Gui-Jung;Han, Jung-Soo;Song, Young-Jae
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1133-1142
    • /
    • 2004
  • This paper proposes a method to classify and retrieve components in repository using the idea of domain orientation for the successful reuse of components. A design pattern was applied to existing systems and a component classification method is suggested here to compare the structural similarity between each component in relevant domain and criterion patterns. Classifying reusable components by their functionality and then depicting their structures with a diagram can increase component reusability and portability between platforms. Efficiency of component reuse can be raised because the most appropriate component to query and similar candidate components are provided in priority by use of-SARM algorithm.

Data Mining for High Dimensional Data in Drug Discovery and Development

  • Lee, Kwan R.;Park, Daniel C.;Lin, Xiwu;Eslava, Sergio
    • Genomics & Informatics
    • /
    • v.1 no.2
    • /
    • pp.65-74
    • /
    • 2003
  • Data mining differs primarily from traditional data analysis on an important dimension, namely the scale of the data. That is the reason why not only statistical but also computer science principles are needed to extract information from large data sets. In this paper we briefly review data mining, its characteristics, typical data mining algorithms, and potential and ongoing applications of data mining at biopharmaceutical industries. The distinguishing characteristics of data mining lie in its understandability, scalability, its problem driven nature, and its analysis of retrospective or observational data in contrast to experimentally designed data. At a high level one can identify three types of problems for which data mining is useful: description, prediction and search. Brief review of data mining algorithms include decision trees and rules, nonlinear classification methods, memory-based methods, model-based clustering, and graphical dependency models. Application areas covered are discovery compound libraries, clinical trial and disease management data, genomics and proteomics, structural databases for candidate drug compounds, and other applications of pharmaceutical relevance.

Licence Plate Recognition Using Improved IAFC Fuzzy Neural Network (개선된 IAFC 퍼지 신경회로망을 이용한 차량 번호판 인식)

  • Lee, Si-Hyun;Choi, Si-Young;Lee, Se-Yul;Kim, Yong-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.1
    • /
    • pp.6-12
    • /
    • 2009
  • In this paper, we propose a system that extracts licence plate and recognizes numerals in the licence plate. The candidate area of licence plate is extracted using the improved IAFC(Integrated Adaptive Fuzzy Clustering) fuzzy neural network. And the morphological filters are used to reduce noise from the extracted licence plate. The extracted licence plate is standardized using Hough transform and geometric transform. Backpropagation neural network is used to recognize numerals that are separated using the projection technique.

EDGE: An Enticing Deceptive-content GEnerator as Defensive Deception

  • Li, Huanruo;Guo, Yunfei;Huo, Shumin;Ding, Yuehang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1891-1908
    • /
    • 2021
  • Cyber deception defense mitigates Advanced Persistent Threats (APTs) with deploying deceptive entities, such as the Honeyfile. The Honeyfile distracts attackers from valuable digital documents and attracts unauthorized access by deliberately exposing fake content. The effectiveness of distraction and trap lies in the enticement of fake content. However, existing studies on the Honeyfile focus less on this perspective. In this work, we seek to improve the enticement of fake text content through enhancing its readability, indistinguishability, and believability. Hence, an enticing deceptive-content generator, EDGE, is presented. The EDGE is constructed with three steps: extracting key concepts with a semantics-aware K-means clustering algorithm, searching for candidate deceptive concepts within the Word2Vec model, and generating deceptive text content under the Integrated Readability Index (IR). Furthermore, the readability and believability performance analyses are undertaken. The experimental results show that EDGE generates indistinguishable deceptive text content without decreasing readability. In all, EDGE proves effective to generate enticing deceptive text content as deception defense against APTs.

A detailed analysis of nearby young stellar moving groups

  • Lee, Jinhee
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.63.3-63.3
    • /
    • 2019
  • Nearby young moving groups (NYMGs hereafter) are gravitationally unbound loose young stellar associations located within 100 pc of the Sun. Since NYMGs are crucial laboratories for studying low-mass stars and planets, intensive searches for NYMG members have been performed. For identification of NYMG members, various strategies and methods have been applied. As a result, the reliability of the members in terms of membership is not uniform, which means that a careful membership re-assessment is required. In this study, I developed a NYMG membership probability calculation tool based on Bayesian inference (Bayesian Assessment of Moving Groups: BAMG). For the development of the BAMG tool, I constructed ellipsoidal models for nine NYMGs via iterative and self-consistent processes. Using BAMG, memberships of claimed members in the literature (N~2000) were evaluated, and 35 per cent of members were confirmed as bona fide members of NYMGs. Based on the deficiency of low-mass members appeared in mass function using these bona fide members, low mass members from Gaia DR2 are identified. About 2000 new M dwarf and brown dwarf candidate members were identified. Memberships of ~70 members with RV from Gaia were confirmed, and the additional ~20 members were confirmed via spectroscopic observation. Not relying on previous knowledge about the existence of nine NYMGs, unsupervised machine learning analyses were applied to NYMG members. K-means and Agglomerative Clustering algorithms result in similar trends of grouping. As a result, six previously known groups (TWA, beta-Pic, Carina, Argus, AB Doradus, and Volans-Carina) were rediscovered. Three the other known groups are recognized as well; however, they are combined into two new separate groups (ThOr+Columba and TucHor+Columba).

  • PDF

Railway Track Extraction from Mobile Laser Scanning Data (모바일 레이저 스캐닝 데이터로부터 철도 선로 추출에 관한 연구)

  • Yoonseok, Jwa;Gunho, Sohn;Jong Un, Won;Wonchoon, Lee;Nakhyeon, Song
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.33 no.2
    • /
    • pp.111-122
    • /
    • 2015
  • This study purposed on introducing a new automated solution for detecting railway tracks and reconstructing track models from the mobile laser scanning data. The proposed solution completes following procedures; the study initiated with detecting a potential railway region, called Region Of Interest (ROI), and approximating the orientation of railway track trajectory with the raw data. At next, the knowledge-based detection of railway tracks was performed for localizing track candidates in the first strip. In here, a strip -referring the local track search region- is generated in the orthogonal direction to the orientation of track trajectory. Lastly, an initial track model generated over the candidate points, which were detected by GMM-EM (Gaussian Mixture Model-Expectation & Maximization) -based clustering strip- wisely grows to capture all track points of interest and thus converted into geometric track model in the tracking by detection framework. Therefore, the proposed railway track tracking process includes following key features; it is able to reduce the complexity in detecting track points by using a hypothetical track model. Also, it enhances the efficiency of track modeling process by simultaneously capturing track points and modeling tracks that resulted in the minimization of data processing time and cost. The proposed method was developed using the C++ program language and was evaluated by the LiDAR data, which was acquired from MMS over an urban railway track area with a complex railway scene as well.

Automatic Change Detection Based on Areal Feature Matching in Different Network Data-sets (이종의 도로망 데이터 셋에서 면 객체 매칭 기반 변화탐지)

  • Kim, Jiyoung;Huh, Yong;Yu, Kiyun;Kim, Jung Ok
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.6_1
    • /
    • pp.483-491
    • /
    • 2013
  • By a development of car navigation systems and mobile or positioning technology, it increases interest in location based services, especially pedestrian navigation systems. Updating of digital maps is important because digital maps are mass data and required to short updating cycle. In this paper, we proposed change detection for different network data-sets based on areal feature matching. Prior to change detection, we defined type of updating between different network data-sets. Next, we transformed road lines into areal features(block) that are surrounded by them and calculated a shape similarity between blocks in different data-sets. Blocks that a shape similarity is more than 0.6 are selected candidate block pairs. Secondly, we detected changed-block pairs by bipartite graph clustering or properties of a concave polygon according to types of updating, and calculated Fr$\acute{e}$chet distance between segments within the block or forming it. At this time, road segments of KAIS map that Fr$\acute{e}$chet distance is more than 50 are extracted as updating road features. As a result of accuracy evaluation, a value of detection rate appears high at 0.965. We could thus identify that a proposed method is able to apply to change detection between different network data-sets.

Facial Expression Control of 3D Avatar by Hierarchical Visualization of Motion Data (모션 데이터의 계층적 가시화에 의한 3차원 아바타의 표정 제어)

  • Kim, Sung-Ho;Jung, Moon-Ryul
    • The KIPS Transactions:PartA
    • /
    • v.11A no.4
    • /
    • pp.277-284
    • /
    • 2004
  • This paper presents a facial expression control method of 3D avatar that enables the user to select a sequence of facial frames from the facial expression space, whose level of details the user can select hierarchically. Our system creates the facial expression spare from about 2,400 captured facial frames. But because there are too many facial expressions to select from, the user faces difficulty in navigating the space. So, we visualize the space hierarchically. To partition the space into a hierarchy of subspaces, we use fuzzy clustering. In the beginning, the system creates about 11 clusters from the space of 2,400 facial expressions. The cluster centers are displayed on 2D screen and are used as candidate key frames for key frame animation. When the user zooms in (zoom is discrete), it means that the user wants to see mort details. So, the system creates more clusters for the new level of zoom-in. Every time the level of zoom-in increases, the system doubles the number of clusters. The user selects new key frames along the navigation path of the previous level. At the maximum zoom-in, the user completes facial expression control specification. At the maximum, the user can go back to previous level by zooming out, and update the navigation path. We let users use the system to control facial expression of 3D avatar, and evaluate the system based on the results.

Extraction of Classes and Hierarchy from Procedural Software (절차지향 소프트웨어로부터 클래스와 상속성 추출)

  • Choi, Jeong-Ran;Park, Sung-Og;Lee, Moon-Kun
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.9
    • /
    • pp.612-628
    • /
    • 2001
  • This paper presents a methodology to extract classes and inheritance relations from procedural software. The methodology is based on the idea of generating all groups of class candidates, based on the combinatorial groups of object candidates, and their inheritance with all possible combinations and selecting a group of object candidates, and their inheritance with all possible combinations and selecting a group with the best or optimal combination of candidates with respect to the degree of relativity and similarity between class candidates in the group and classes in a domain model. The methodology has innovative features in class candidates in the group and classes in a domain model. The methodology has innovative features in class and inheritance extraction: a clustering method based on both static (attribute) and dynamic (method) clustering, the combinatorial cases of grouping class candidate cases based on abstraction, a signature similarity measurement for inheritance relations among n class candidates or m classes, two-dimensional similarity measurement for inheritance relations among n class candidates or m classes, two-dimensional similarity measurement, that is, the horizontal measurement for overall group similarity between n class candidates and m classes, and the vertical measurement for specific similarity between a set of classes in a group of class candidates and a set of classes with the same class hierarchy in a domain model, etc. This methodology provides reengineering experts with a comprehensive and integrated environment to select the best or optimal group of class candidates.

  • PDF