• Title/Summary/Keyword: 유전자 예측

Search Result 504, Processing Time 0.037 seconds

Classification of Cancer-related Gene Expression Data Using Neural Network Classifiers (신경망 분류기를 이용한 암 관련 유전자 발현정보를 분류)

  • 권영준;류중원;조성배
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04b
    • /
    • pp.295-297
    • /
    • 2001
  • 최근 생물 유전자 정보를 효과적으로 분석하기 위한 적절한 도구의 필요성이 대두되고 있다. 본 논문에서는 백혈병 환자의 골수로부터 얻어낸 DNA Microarray 유전 정보를 분류하여 환자가 가지고 있는 암의 종류를 예측하기 위한 최적의 특징추출방법과 분류 방법을 찾고자 한다. 이를 위해 피어슨 상관관계, 유클리디안 거리, 코사인 계수, 스피어맨 상관관계, 정보 이득, 상호 정보, 신호 대잡음비의 7가지 특징 추출 방법을 사용하였으며, 역전과 신경망, 의사결정 트리, 구조 적응형 자기구성 지도, $textsc{k}$-최근접 이웃 등 가지의 기계학습 분류기를 이용하여 분류 실험을 하였다. 실험결과, 피어슨 상관관계와 역전파 신경망을 이용한 분류 방법이 97.1%의 인식률을 보임을 알 수 있었다.

  • PDF

Sentence segmentation of KeyGraph using genetic algorithm (유전자 알고리즘을 이용한 KeyGraph 알고리즘의 데이터 분할)

  • Lee, Young-Seol;Cho, Sung-Bae
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.10c
    • /
    • pp.352-356
    • /
    • 2007
  • 키그래프는 데이터 패턴 속에서 인간의 의사결정이나 미래에 닥쳐올 변화에 영향을 주지만 자주 발생하지 않는 희소성이 있는 사건을 발견하기 위한 알고리즘이다. 키그래프는 지진예측, 논문, 파일탐색, 그리고 중요한 URL 추출 등에 이용되었다. 데이터 분할을 통한 클러스터의 형성은 키그래프의 성능에 가장 큰 영향을 끼치는 요소 중의 하나이다. 본 논문에서는 유전자 알고리즘을 이용하여 키그래프의 성능을 향상시킬 수 있는 최적의 데이터 분할을 찾아내는 방법을 제안한다. 제안한 방법의 가능성을 보여주기 위하여 모바일 기기 사용자로부터 수집한 방문 장소 데이터에 제안하는 방법을 적용하여 키그래프의 성능이 향상되는 것을 보인다.

  • PDF

Design and Application of Genetic-Fuzzy System based on Grammatical Encoding (문법 코딩에 기반한 유전적 퍼지 시스템의 설계 및 응용)

  • Gil, Jun-Min;Go, Myeong-Suk;Hwang, Jong-Seon
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.1
    • /
    • pp.31-45
    • /
    • 2001
  • 퍼지 시스템의 설계시, 퍼지 시스템의 성능 저하 없이 최적의 퍼지 규칙 선택과 퍼지 소속 함수의 단순한 정의는 매우 중요하다. 이러한 목적을 이루기 위해서, 본 논문에서는 입력 공간에 강한 영향을 보이는 퍼지 규칙만을 퍼지 규칙으로 선택함으로써 입력 공간의 증가에 유연하게 대처할 수 있는 퍼지 규칙 구조를 제안한다. 또한, 유전자 알고리즘의 진화 탐색을 통하여 퍼지 시스템의 최적화된 구조를 얻기 위해서 퍼지 시스템의 구조를 생성시키는 문법 규칙을 해개체로 코딩하는 문법 코딩을 이용한 유전적 퍼지 시스템을 제안한다. 문법 규칙은 퍼지 규칙의 복잡한 구조를 단순한 모듈 구조로 표현하므로 문법 규칙의 코딩은 유전자 알고리즘의 빠른 수렴과 효율적인 탐색을 보장한다. 아울러, 제안하는 방법을 많은 입력 공간을 갖는 아이리스 데이타(Iris data) 문제와 시간열 예측(time series prediction) 문제에 적용함으로써 제안하는 방법의 응용성을 보이고 성능을 분석한다. 실험 결과, 제안하는 방법이 직접 코딩을 사용한 다른 설계 방법보다 더 좋은 성능을 보여 주었다.

  • PDF

Prediction and Analysis of Charge Density Using Neural Network (신경망을 이용한 전하밀도의 예측과 해석)

  • Kwon, Sang-Hee;Hwang, Bo-Kwang;Lee, Kyu-Sang;Uh, Hyung-Soo;Kim, Byung-Whan
    • Proceedings of the Korean Institute of Electrical and Electronic Material Engineers Conference
    • /
    • 2007.11a
    • /
    • pp.111-112
    • /
    • 2007
  • Silicon nitride (SiN) 박막을 플라즈마 응용화학기상법을 이용하여 증착하였다. SiN박막의 전하밀도는 일반화된 회귀 신경망과 유전자 알고리즘을 이용하여 모델링하였다. PECVD 공정은 Box Wilson 실험계획표를 이용하여 수행하였다. $SiH_4$ 유량변화에 따른 온도의 영향은 미미하였다. 그러나, 저 전력에서의 온도증가 (또는 저온에서의 전력의 증가)에 따라 전하밀도는 급격히 상승하였으며, 이는 [N-H]의 증가에 기인하는 것으로 해석되었다. $SiH_4$ 유량의 증가 (또는 고온에서의 전력의 증가)에 따라 전하밀도는 감소하고 있으며, 이는 [Si-H]의 증가에 기인하는 것으로 이해된다.

  • PDF

The Brassica rapa Tissue-specific EST Database (배추의 조직 특이적 발현유전자 데이터베이스)

  • Yu, Hee-Ju;Park, Sin-Gi;Oh, Mi-Jin;Hwang, Hyun-Ju;Kim, Nam-Shin;Chung, Hee;Sohn, Seong-Han;Park, Beom-Seok;Mun, Jeong-Hwan
    • Horticultural Science & Technology
    • /
    • v.29 no.6
    • /
    • pp.633-640
    • /
    • 2011
  • Brassica rapa is an A genome model species for Brassica crop genetics, genomics, and breeding. With the completion of sequencing the B. rapa genome, functional analysis of the genome is forthcoming issue. The expressed sequence tags are fundamental resources supporting annotation and functional analysis of the genome including identification of tissue-specific genes and promoters. As of July 2011, 147,217 ESTs from 39 cDNA libraries of B. rapa are reported in the public database. However, little information can be retrieved from the sequences due to lack of organized databases. To leverage the sequence information and to maximize the use of publicly-available EST collections, the Brassica rapa tissue-specific EST database (BrTED) is developed. BrTED includes sequence information of 23,962 unigenes assembled by StackPack program. The unigene set is used as a query unit for various analyses such as BLAST against TAIR gene model, functional annotation using MIPS and UniProt, gene ontology analysis, and prediction of tissue-specific unigene sets based on statistics test. The database is composed of two main units, EST sequence processing and information retrieving unit and tissue-specific expression profile analysis unit. Information and data in both units are tightly inter-connected to each other using a web based browsing system. RT-PCR evaluation of 29 selected unigene sets successfully amplified amplicons from the target tissues of B. rapa. BrTED provided here allows the user to identify and analyze the expression of genes of interest and aid efforts to interpret the B. rapa genome through functional genomics. In addition, it can be used as a public resource in providing reference information to study the genus Brassica and other closely related crop crucifer plants.

Bio-marker Detector and Parkinson's disease diagnosis Approach based on Samples Balanced Genetic Algorithm and Extreme Learning Machine (균형 표본 유전 알고리즘과 극한 기계학습에 기반한 바이오표지자 검출기와 파킨슨 병 진단 접근법)

  • Sachnev, Vasily;Suresh, Sundaram;Choi, YongSoo
    • Journal of Digital Contents Society
    • /
    • v.17 no.6
    • /
    • pp.509-521
    • /
    • 2016
  • A novel Samples Balanced Genetic Algorithm combined with Extreme Learning Machine (SBGA-ELM) for Parkinson's Disease diagnosis and detecting bio-markers is presented in this paper. Proposed approach uses genes' expression data of 22,283 genes from open source ParkDB data base for accurate PD diagnosis and detecting bio-markers. Proposed SBGA-ELM includes two major steps: feature (genes) selection and classification. Feature selection procedure is based on proposed Samples Balanced Genetic Algorithm designed specifically for genes expression data from ParkDB. Proposed SBGA searches a robust subset of genes among 22,283 genes available in ParkDB for further analysis. In the "classification" step chosen set of genes is used to train an Extreme Learning Machine (ELM) classifier for an accurate PD diagnosis. Discovered robust subset of genes creates ELM classifier with stable generalization performance for PD diagnosis. In this research the robust subset of genes is also used to discover 24 bio-markers probably responsible for Parkinson's Disease. Discovered robust subset of genes was verified by using existing PD diagnosis approaches such as SVM and PBL-McRBFN. Both tested methods caused maximum generalization performance.

Cancer subtype's classifier based on Hybrid Samples Balanced Genetic Algorithm and Extreme Learning Machine (하이브리드 균형 표본 유전 알고리즘과 극한 기계학습에 기반한 암 아류형 분류기)

  • Sachnev, Vasily;Suresh, Sundaram;Choi, Yong Soo
    • Journal of Digital Contents Society
    • /
    • v.17 no.6
    • /
    • pp.565-579
    • /
    • 2016
  • In this paper a novel cancer subtype's classifier based on Hybrid Samples Balanced Genetic Algorithm with Extreme Learning Machine (hSBGA-ELM) is presented. Proposed cancer subtype's classifier uses genes' expression data of 16063 genes from open Global Cancer Map (GCM) data base for accurate cancer subtype's classification. Proposed method efficiently classifies 14 subtypes of cancer (breast, prostate, lung, colorectal, lymphoma, bladder, melanoma, uterus, leukemia, renal, pancreas, ovary, mesothelioma and CNS). Proposed hSBGA-ELM unifies genes' selection procedure and cancer subtype's classification into one framework. Proposed Hybrid Samples Balanced Genetic Algorithm searches a reduced robust set of genes responsible for cancer subtype's classification from 16063 genes available in GCM data base. Selected reduced set of genes is used to build cancer subtype's classifier using Extreme Learning Machine (ELM). As a result, reduced set of robust genes guarantees stable generalization performance of the proposed cancer subtype's classifier. Proposed hSBGA-ELM discovers 95 genes probably responsible for cancer. Comparison with existing cancer subtype's classifiers clear indicates efficiency of the proposed method.

Effects of Genetic and Environmental Factors on the Depression in Early Adulthood (초기 성인기 우울증에 대한 유전적, 환경적 요인의 영향)

  • Kim, Sie-Kyeong;Lee, Sang-Ick;Shin, Chul-Jin;Son, Jung-Woo;Eom, Sang-Yong;Kim, Heon
    • Korean Journal of Biological Psychiatry
    • /
    • v.15 no.1
    • /
    • pp.14-22
    • /
    • 2008
  • Objectives : The authors purposed to present data for explaining gene-environmental interaction causing depressive disorder by examining the effects of genetic factors related to the serotonin system and environmental factors such as stressful life events in early adulthood. Methods : The subjects were 150 young adults(mean age 25.0${\pm}$0.54), a part of 534 freshmen who had completed the previous study of genotyping of TPH1 gene. We assessed characteristics of life events, depression and anxiety scale and checked if they had a depressive disorder with DSM-IV SCID interview. Along with TPH1 A218C genotype confirmed in previous study, TPH2 -1463G/A and 5HTR2A -1438A/G genes were genotyped using the SNaPshot$^{TM}$ method. Results : In comparison with the group without C allele of TPH1 gene, the number of life events had a significant effect on the probability of depressive disorder in the group with C allele. Other alleles or genotypes did not have a significant effect on the causality of life events and depressive disorder. Conclusion : The results of this study suggest that TPH1 C allele is a significant predictor of onset of depressive disorder following environmental stress. It means that the TPH1 gene may affect the gene-environmental interaction of depressive disorder.

  • PDF

Association of a c.1084A>G (p.Thr362Ala)Variant in the DCTN4 Gene with Wilson Disease

  • Lee, Robin Dong-Woo;Kim, Jae-Jung;Kim, Joo-Hyun;Lee, Jong-Keuk;Yoo, Han-Wook
    • Journal of Genetic Medicine
    • /
    • v.8 no.1
    • /
    • pp.53-57
    • /
    • 2011
  • Purpose: Wilson disease is an autosomal recessive disorder which causes excessive copper accumulation in the hepatic region. So far, ATP7B gene is the only disease-causing gene of Wilson disease known to date. However, ATP7B mutations have not been found in ~15% of the patients. This study was performed to identify any causative gene in Wilson disease patients without an ATP7B mutation in either allele. Materials and Methods: The sequence of the coding regions and exon-intron boundaries of the five ATP7B-interacting genes, ATOX1, COMMD1, GLRX, DCTN4, and ZBTB16, were analyzed in the 12 patients with Wilson disease. Results: Three nonsynonymous variants including c.1084A>G (p.Thr362Ala) in the exon 12 of the DCTN4 gene were identified in the patients examined. Among these, only p.Thr362Ala was predicted as possibly damaging protein function by in silico analysis. Examination of allele frequency of c.1084A>G (p.Thr362Ala) variant in the 176 patients with Wilson disease and in the 414 normal subjects revealed that the variant was more prevalent in the Wilson disease patients (odds ratio [OR]=3.14, 95% confidence interval=1.36-7.22, P=0.0094). Conclusion: Our result suggests that c.1084A>G (p.Thr362Ala) in the ATP7B-interacting DCTN4 gene may be associated with the pathogenesis of Wilson disease.

Validation of diacylglycerol O-acyltransferase1 gene effect on milk yield using Bayesian regression (베이지안 회귀를 이용한 국내 홀스타인 젖소의 유량형질 관련 DGAT1유전자 효과 검증)

  • Cho, Kwang-Hyun;Cho, Chung-Il;Park, Kyong-Do;Lee, Joon-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1249-1258
    • /
    • 2015
  • DGAT1(diacylglycerol O-acyltransferase1) gene is well known as a major gene of milk production in dairy cattle. This study was conducted to investigate how the DGAT1 gene effect on milk yield was appeared from the genome wide association (GWA) using high density whole genome SNP chip. The data set used in this study consisted of 353 Korean Holstein sires with 50k SNP genotypes and deregressed estimated breeding values of milk yield. After quality control 41,051 SNPs were selected and locations on chromosome were mapped using UMD 3.1. Bayesian regression of BayesB method (pi=0.99) was used to estimate the SNP effects and genomic breeding values. Percentages of variance explained by 1 Mb non-overlapping windows were calculated to detect the QTL region. As the result of this study, top 1 and 3 of 2,516 windows were seen around DGAT1 gene region and 0.51% and 0.48% of genetic variance were explained by these two windows. Although SNPs on the DGAT1 gene region are excluded in commercial 50k SNP chip, the effect of DGAT1 gene seem to be reflected on GWA by the SNPs which are in linkage disequilibrium with DGAT1 gene.