• Title/Summary/Keyword: 유전자 데이터

Search Result 660, Processing Time 0.026 seconds

Gene Expression Data Analysis Using Seed Clustering (시드 클러스터링 방법에 의한 유전자 발현 데이터 분석)

  • Shin Myoung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.1
    • /
    • pp.1-7
    • /
    • 2005
  • Cluster analysis of microarray data has been often used to find biologically relevant Broups of genes based on their expression levels. Since many functionally related genes tend to be co-expressed, by identifying groups of genes with similar expression profiles, the functionalities of unknown genes can be inferred from those of known genes in the same group. In this Paper we address a novel clustering approach, called seed clustering, and investigate its applicability for microarray data analysis. In the seed clustering method, seed genes are first extracted by computational analysis of their expression profiles and then clusters are generated by taking the seed genes as prototype vectors for target clusters. Since it has strong mathematical foundations, the seed clustering method produces the stable and consistent results in a systematic way. Also, our empirical results indicate that the automatically extracted seed genes are well representative of potential clusters hidden in the data, and that its performance is favorable compared to current approaches.

A fast feature selection technique for microarray data (마이크로어레이 데이터 분석을 위한 선형 특징 선별 기법)

  • Lee, Jae-Seong;Kim, Dae-Won
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2008.04a
    • /
    • pp.188-189
    • /
    • 2008
  • 마이크로어레이 데이터는 대량의 유전자들을 짧은 시간에 테스트 하여 얻은 대량의 데이터로 구성되어 있다. 그러나 이렇게 얻은 대량의 데이터에서 특징으로 표현되는 유전자의 수가 매우 많고, 각각의 유전자는 서로에 대해 독립적이지 않기 때문에 전통적인 데이터 마이닝 기법을 적용하여 바이오마커를 찾아내는 작업이 용이하지 않다. 마이크로어레이 데이터에서 나타나는 이러한 특성과 여기에서 파생되는 문제점들을 극복하기 위해 다양한 특징 선별 방법론들이 등장하였으나 다소의 문제점을 가지고 있어 실제 세계의 문제에 적용하기 어렵다. 본 논문에서는 코사인 내적 행렬과 행렬식을 이용하여 직교하지 않는 특징들을 제거하는 방법에 대해 소개하고, 그 결과를 분석하였다.

  • PDF

Cancer driver gene using multi-omics data and biological network information (멀티 오믹스 데이터 및 생물학적 네트워크 정보를 이용한 드라이버 유전자 분류)

  • Jeong-Ho Park;Kyuri Jo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.490-492
    • /
    • 2023
  • 시퀀싱(sequencing) 기술의 발달로 다양한 오믹스(omics) 데이터의 축적과 인공 지능 기술의 발달로 인하여 다양한 드라이버 유전자 분류기법이 제안되어왔다. 최근에는 암 데이터가 대용량으로 축적되며 기계 학습 기반의 다양한 기법들이 활발히 제안되었다. 특히 다양한 오믹스 데이터를 결합한 고차원 데이터에서 높은 정확도를 확보하기 위한 시도가 활발히 이루어지고 있다. 본 논문에서는 멀티 오믹스와 네트워크 관련 특징을 기반으로 암의 증식 및 발생에 중요한 역할을 하는 드라이버 유전자를 분류하는 딥러닝 모델을 제시한다. 또한 The Cancer Genome Atlas(TCGA) 데이터를 통해서 모델 학습 후 기존 통계 및 머신러닝 기반 기법과 비교하여 성능이 개선되었음을 확인하였다.

Detection of Gene Interactions based on Syntactic Relations (구문관계에 기반한 유전자 상호작용 인식)

  • Kim, Mi-Young
    • The KIPS Transactions:PartB
    • /
    • v.14B no.5
    • /
    • pp.383-390
    • /
    • 2007
  • Interactions between proteins and genes are often considered essential in the description of biomolecular phenomena and networks of interactions are considered as an entre for a Systems Biology approach. Recently, many works try to extract information by analyzing biomolecular text using natural language processing technology. Previous researches insist that linguistic information is useful to improve the performance in detecting gene interactions. However, previous systems do not show reasonable performance because of low recall. To improve recall without sacrificing precision, this paper proposes a new method for detection of gene interactions based on syntactic relations. Without biomolecular knowledge, our method shows reasonable performance using only small size of training data. Using the format of LLL05(ICML05 Workshop on Learning Language in Logic) data we detect the agent gene and its target gene that interact with each other. In the 1st phase, we detect encapsulation types for each agent and target candidate. In the 2nd phase, we construct verb lists that indicate the interaction information between two genes. In the last phase, to detect which of two genes is an agent or a target, we learn direction information. In the experimental results using LLL05 data, our proposed method showed F-measure of 88% for training data, and 70.4% for test data. This performance significantly outperformed previous methods. We also describe the contribution rate of each phase to the performance, and demonstrate that the first phase contributes to the improvement of recall and the second and last phases contribute to the improvement of precision.

A Method of Identifying Disease-related Significant Pathways Using Time-Series Microarray Data (시간열 마이크로어레이 데이터를 이용한 질병 관련 유의한 패스웨이 유전자 집합의 검출)

  • Kim, Jae-Young;Shin, Mi-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.5
    • /
    • pp.17-24
    • /
    • 2010
  • Recently the study of identifying bio-markers for disease diagnosis and prognosis has been actively performed. In particular, lots of attentions have been paid to the finding of pathway gene-sets differentially expressed in disease patients rather than the finding of individual gene markers. In this paper we propose a novel method to identify disease-related pathway gene-sets based on time-series microarray data. For this purpose, we firstly compute individual gene scores by the using maSigPro (microarray Significant Profiles) and then arrange all the genes in the decreasing order of the corresponding gene scores. The rank of each gene in the entire list is used to evaluate the statistical significance of candidate gene-sets with Wilcoxson rank sum test. For the generation of candidate gene-sets, MSigDB (Molecular Signatures Database) pathway information has been employed. The experiment was conducted with prostate cancer time-series microarray data and the results showed the usefulness of the proposed method by correctly identifying 6 out of 7 biological pathways already known as being actually related to prostate cancer.

Automatic Test Data Generation for Mutation Testing Using Genetic Algorithms (유전자 알고리즘을 이용한 뮤테이션 테스팅의 테스트 데이터 자동 생성)

  • 정인상;창병모
    • The KIPS Transactions:PartD
    • /
    • v.8D no.1
    • /
    • pp.81-86
    • /
    • 2001
  • one key goal of software testing is to generate a 'good' test data set, which is consideres as the most difficult and time-consuming task. This paper discusses how genetic algorithns can be used for automatic generation of test data set for software testing. We employ mutation testing to show the effectiveness of genetic algorithms (GAs) in automatic test data generation. The approach presented in this paper is different from other in that test generation process requireas no lnowledge of implementation details of a program under test. In addition, we have conducted some experiments and compared our approach with random testing which is also regarded as a black-box test generation technique to show its effectiveness.

  • PDF

A Dynamic Weapon Allocation Algorithm using Genetic Algorithm in Mapreduce Environments (맵리듀스 환경에서 유전자 알고리즘 기반의 동적 무기할당 알고리즘)

  • Park, Junho;Kim, Jieun;Cho, Kilseok
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2014.11a
    • /
    • pp.469-470
    • /
    • 2014
  • 동적 무기할당 문제는 전형적인 NP-완전 문제로써 위협하는 표적에 대해 아군의 무기를 적절히 할당하는 문제이다. 이는 매우 시간 제약적인 문제로써 가능한 단 시간 내에 적절한 무기할당 및 대응을 도출하여야 하지만 매우 유동적인 전장 환경에서 이는 쉽지 않다. 최근 이와 같이 높은 복잡성을 가진 빅데이터를 기반으로 하는 응용에서 분산 처리 시스템을 활용한 분석 및 처리에 대한 연구가 큰 주목을 받고 있고, 대표적인 프레임워크로써 맵리듀스가 활용되고 있다. 그러나 맵리듀스는 전체 데이터에 대한 일괄 처리 기능만을 제공하므로 동적 데이터에 대한 유전자 알고리즘의 수행이 쉽지 않고, 최종 결과 도출에 여전히 많은 시간을 필요로 한다. 본 논문에서는 맵리듀스 환경에서 유전자 알고리즘 기반의 동적 무기할당 알고리즘을 제안한다. 제안하는 기법에서는 맵리듀스 환경에서 유전자 알고리즘의 연속적인 데이터 처리의 지원을 위해 새롭게 추가 및 제거된 무기-표적 데이터만을 분석하고, 이를 기 분석 완료된 데이터와 결합하여 최종 결과를 도출한다. 이를 통해, 신속한 동적무기할당의 수행이 가능하다.

  • PDF

For Gene Disease Analysis using Data Mining Implement MKSV System (데이터마이닝을 활용한 유전자 질병 분석을 위한 MKSV시스템 구현)

  • Jeong, Yu-Jeong;Choi, Kwang-Mi
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.4
    • /
    • pp.781-786
    • /
    • 2019
  • We should give a realistic value on the large amounts of relevant data obtained from these studies to achieve effective objectives of the disease study which is dealing with various vital phenomenon today. In this paper, the proposed MKSV algorithm is estimated by optimal probability distribution, and the input pattern is determined. After classifying it into data mining, it is possible to obtain efficient computational quantity and recognition rate. MKSV algorithm is useful for studying the relationship between disease and gene in the present society by simulating the probabilistic flow of gene data and showing fast and effective performance improvement to classify data through the data mining process of big data.

Analysis of Combined Yeast Cell Cycle Data by Using the Integrated Analysis Program for DNA chip (DNA chip 통합분석 프로그램을 이용한 효모의 세포주기 유전자 발현 통합 데이터의 분석)

  • 양영렬;허철구
    • KSBB Journal
    • /
    • v.16 no.6
    • /
    • pp.538-546
    • /
    • 2001
  • An integrated data analysis program for DNA chip containing normalization, FDM analysis, various kinds of clustering methods, PCA, and SVD was applied to analyze combined yeast cell cycle data. This paper includes both comparisons of some clustering algorithms such as K-means, SOM and furry c-means and their results. For further analysis, clustering results from the integrated analysis program was used for function assignments to each cluster and for motif analysis. These results show an integrated analysis view on DNA chip data.

  • PDF

Informatics Network Representation Using Probabilistic Graphical Models of Network Genetics (유전자 네트워크에서 확률적 그래프 모델을 이용한 정보 네트워크 추론)

  • Ra Sang-Dong;Park Dong-Suk;Youn Young-Ji
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.8
    • /
    • pp.1386-1392
    • /
    • 2006
  • This study is a numerical representative modelling analysis for applying the process that unravels networks between cells in genetics to WWW of informatics. Using the probabilistic graphical model, the insight from the data describing biological networks is used for making a probabilistic function. Rather than a complex network of cells, we reconstruct a simple lower-stage model and show a genetic representation level from the genetic based network logic. We made probabilistic graphical models from genetic data and extends them to genetic representation data in the method of network modelling in informatics.