• 제목/요약/키워드: Dimensionality Reduction

검색결과 204건 처리시간 0.028초

더미 다중인자 차원축소법에 의한 검증력과 주요 유전자 규명 (Power and major gene-gene identification of dummy multifactor dimensionality reduction algorithm)

  • 여정수;라부미;이호근;이성원;이제영
    • Journal of the Korean Data and Information Science Society
    • /
    • 제24권2호
    • /
    • pp.277-287
    • /
    • 2013
  • 광범위 유전자 관련 연구에서는 유전자-유전자 상호작용을 규명하는 것은 매우 중요하다. 최근 유전자-유전자 상호작용을 규명하는데에 대한 많은 연구가 진행되고 있다. 그 중 하나로 더미 다중인자 차원축소법이다. 이 연구의 목적은 모의실험을 통해 유전자-유전자 상호작용 파악하기 위한 더미 다중인자 차원축소의 검증력을 평가하는 것이다. 또한 이 방법을 적용하여 한우모집단에서 경제형질을 위한 단일 염기 다형성의 상호작용 효과를 확인하였다.

서포트 벡터 머신 알고리즘을 활용한 연속형 데이터의 다중인자 차원축소방법 적용 (Support vector machine and multifactor dimensionality reduction for detecting major gene interactions of continuous data)

  • 이제영;이종형
    • Journal of the Korean Data and Information Science Society
    • /
    • 제21권6호
    • /
    • pp.1271-1280
    • /
    • 2010
  • 인간의 질병과 가축의 특성에 영향을 주는 유전자들의 상호작용을 규명하는 방법으로 전통적인 통계방법들이 사용되었지만, 유전자와 같은 고차원의 데이터에는 적합하지 않았다. 따라서 다중인자 차원축소방법이 제안되었다. 다중인자 차원축소방법은 모형에 대한 가정이 필요하지 않는 비모수적 방법으로 이분형 자료에 적용 가능 하지만, 연속형 데이터에는 적용할 수 없는 단점이 있다. 따라서 본 연구에서는 일반화 분류 성능이 뛰어난 서포트 벡터 머신 알고리즘을 통해 연속형 자료를 가공하여 다중인자 차원축소방법에 적용하였다. 아울러 한우의 6번 염색체내 6개의 후보 단일염기다형성을 대상으로 연속형 자료인 실제 한우의 경제형질에 서포트 벡터 머신을 이용한 다중인자 차원축소방법을 적용함으로써 한우의 경제형질에 연관된 우수 유전자 상호작용의 조합을 규명하였다.

Human Action Recognition Based on 3D Human Modeling and Cyclic HMMs

  • Ke, Shian-Ru;Thuc, Hoang Le Uyen;Hwang, Jenq-Neng;Yoo, Jang-Hee;Choi, Kyoung-Ho
    • ETRI Journal
    • /
    • 제36권4호
    • /
    • pp.662-672
    • /
    • 2014
  • Human action recognition is used in areas such as surveillance, entertainment, and healthcare. This paper proposes a system to recognize both single and continuous human actions from monocular video sequences, based on 3D human modeling and cyclic hidden Markov models (CHMMs). First, for each frame in a monocular video sequence, the 3D coordinates of joints belonging to a human object, through actions of multiple cycles, are extracted using 3D human modeling techniques. The 3D coordinates are then converted into a set of geometrical relational features (GRFs) for dimensionality reduction and discrimination increase. For further dimensionality reduction, k-means clustering is applied to the GRFs to generate clustered feature vectors. These vectors are used to train CHMMs separately for different types of actions, based on the Baum-Welch re-estimation algorithm. For recognition of continuous actions that are concatenated from several distinct types of actions, a designed graphical model is used to systematically concatenate different separately trained CHMMs. The experimental results show the effective performance of our proposed system in both single and continuous action recognition problems.

Investigation of gene-gene interactions of clock genes for chronotype in a healthy Korean population

  • Park, Mira;Kim, Soon Ae;Shin, Jieun;Joo, Eun-Jeong
    • Genomics & Informatics
    • /
    • 제18권4호
    • /
    • pp.38.1-38.9
    • /
    • 2020
  • Chronotype is an important moderator of psychiatric illnesses, which seems to be controlled in some part by genetic factors. Clock genes are the most relevant genes for chronotype. In addition to the roles of individual genes, gene-gene interactions of clock genes substantially contribute to chronotype. We investigated genetic associations and gene-gene interactions of the clock genes BHLHB2, CLOCK, CSNK1E, NR1D1, PER1, PER2, PER3, and TIMELESS for chronotype in 1,293 healthy Korean individuals. Regression analysis was conducted to find associations between single nucleotide polymorphism (SNP) and chronotype. For gene-gene interaction analyses, the quantitative multifactor dimensionality reduction (QMDR) method, a nonparametric model-free method for quantitative phenotypes, were performed. No individual SNP or haplotype showed a significant association with chronotype by both regression analysis and single-locus model of QMDR. QMDR analysis identified NR1D1 rs2314339 and TIMELESS rs4630333 as the best SNP pairs among two-locus interaction models associated with chronotype (cross-validation consistency [CVC] = 8/10, p = 0.041). For the three-locus interaction model, the SNP combination of NR1D1 rs2314339, TIMELESS rs4630333, and PER3 rs228669 showed the best results (CVC = 4/10, p < 0.001). However, because the mean differences between genotype combinations were minor, the clinical roles of clock gene interactions are unlikely to be critical.

Classification of Imbalanced Data Based on MTS-CBPSO Method: A Case Study of Financial Distress Prediction

  • Gu, Yuping;Cheng, Longsheng;Chang, Zhipeng
    • Journal of Information Processing Systems
    • /
    • 제15권3호
    • /
    • pp.682-693
    • /
    • 2019
  • The traditional classification methods mostly assume that the data for class distribution is balanced, while imbalanced data is widely found in the real world. So it is important to solve the problem of classification with imbalanced data. In Mahalanobis-Taguchi system (MTS) algorithm, data classification model is constructed with the reference space and measurement reference scale which is come from a single normal group, and thus it is suitable to handle the imbalanced data problem. In this paper, an improved method of MTS-CBPSO is constructed by introducing the chaotic mapping and binary particle swarm optimization algorithm instead of orthogonal array and signal-to-noise ratio (SNR) to select the valid variables, in which G-means, F-measure, dimensionality reduction are regarded as the classification optimization target. This proposed method is also applied to the financial distress prediction of Chinese listed companies. Compared with the traditional MTS and the common classification methods such as SVM, C4.5, k-NN, it is showed that the MTS-CBPSO method has better result of prediction accuracy and dimensionality reduction.

Centroid and Nearest Neighbor based Class Imbalance Reduction with Relevant Feature Selection using Ant Colony Optimization for Software Defect Prediction

  • B., Kiran Kumar;Gyani, Jayadev;Y., Bhavani;P., Ganesh Reddy;T, Nagasai Anjani Kumar
    • International Journal of Computer Science & Network Security
    • /
    • 제22권10호
    • /
    • pp.1-10
    • /
    • 2022
  • Nowadays software defect prediction (SDP) is most active research going on in software engineering. Early detection of defects lowers the cost of the software and also improves reliability. Machine learning techniques are widely used to create SDP models based on programming measures. The majority of defect prediction models in the literature have problems with class imbalance and high dimensionality. In this paper, we proposed Centroid and Nearest Neighbor based Class Imbalance Reduction (CNNCIR) technique that considers dataset distribution characteristics to generate symmetry between defective and non-defective records in imbalanced datasets. The proposed approach is compared with SMOTE (Synthetic Minority Oversampling Technique). The high-dimensionality problem is addressed using Ant Colony Optimization (ACO) technique by choosing relevant features. We used nine different classifiers to analyze six open-source software defect datasets from the PROMISE repository and seven performance measures are used to evaluate them. The results of the proposed CNNCIR method with ACO based feature selection reveals that it outperforms SMOTE in the majority of cases.

Machine Learning-based Classification of Hyperspectral Imagery

  • Haq, Mohd Anul;Rehman, Ziaur;Ahmed, Ahsan;Khan, Mohd Abdul Rahim
    • International Journal of Computer Science & Network Security
    • /
    • 제22권4호
    • /
    • pp.193-202
    • /
    • 2022
  • The classification of hyperspectral imagery (HSI) is essential in the surface of earth observation. Due to the continuous large number of bands, HSI data provide rich information about the object of study; however, it suffers from the curse of dimensionality. Dimensionality reduction is an essential aspect of Machine learning classification. The algorithms based on feature extraction can overcome the data dimensionality issue, thereby allowing the classifiers to utilize comprehensive models to reduce computational costs. This paper assesses and compares two HSI classification techniques. The first is based on the Joint Spatial-Spectral Stacked Autoencoder (JSSSA) method, the second is based on a shallow Artificial Neural Network (SNN), and the third is used the SVM model. The performance of the JSSSA technique is better than the SNN classification technique based on the overall accuracy and Kappa coefficient values. We observed that the JSSSA based method surpasses the SNN technique with an overall accuracy of 96.13% and Kappa coefficient value of 0.95. SNN also achieved a good accuracy of 92.40% and a Kappa coefficient value of 0.90, and SVM achieved an accuracy of 82.87%. The current study suggests that both JSSSA and SNN based techniques prove to be efficient methods for hyperspectral classification of snow features. This work classified the labeled/ground-truth datasets of snow in multiple classes. The labeled/ground-truth data can be valuable for applying deep neural networks such as CNN, hybrid CNN, RNN for glaciology, and snow-related hazard applications.

합성곱 오토인코더 기반의 응집형 계층적 군집 분석 (Agglomerative Hierarchical Clustering Analysis with Deep Convolutional Autoencoders)

  • 박노진;고한석
    • 한국멀티미디어학회논문지
    • /
    • 제23권1호
    • /
    • pp.1-7
    • /
    • 2020
  • Clustering methods essentially take a two-step approach; extracting feature vectors for dimensionality reduction and then employing clustering algorithm on the extracted feature vectors. However, for clustering images, the traditional clustering methods such as stacked auto-encoder based k-means are not effective since they tend to ignore the local information. In this paper, we propose a method first to effectively reduce data dimensionality using convolutional auto-encoder to capture and reflect the local information and then to accurately cluster similar data samples by using a hierarchical clustering approach. The experimental results confirm that the clustering results are improved by using the proposed model in terms of clustering accuracy and normalized mutual information.

시계열 데이터에 대한 클러스터링 성능 분석: Wavelet과 Autoencoder 비교 (Clustering Performance Analysis for Time Series Data: Wavelet vs. Autoencoder)

  • 황우성;임효상
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2018년도 추계학술발표대회
    • /
    • pp.585-588
    • /
    • 2018
  • 시계열 데이터의 특징을 추출하여 분석하는 과정에서 시게열 데이터가 가지는 고차원성은 차원의 저주(Course of Dimensionality)로 인해 데이터내의 유효한 정보를 찾는데 어려움을 만든다. 이러한 문제를 해결하기 위해 차원 축소 기법(dimensionality reduction)이 널리 사용되고 있지만, 축소 과정에서 발생하는 정보의 희석으로 인하여 시계열 데이터에 대한 군집화(clustering)등을 수행하는데 있어서 성능의 변화를 가져온다. 본 논문은 이러한 현상을 관찰하기 위해 이산 웨이블릿 변환(Discrete Wavelet Transform:DWT)과 오토 인코더(AutoEncoder)를 차원 축소 기법으로 활용하여 시계열 데이터의 차원을 압축 한 뒤, 압축된 데이터를 K-평균(K-means) 알고리즘에 적용하여 군집화의 효율성을 비교하였다. 성능 비교 결과, DWT는 압축된 차원수 그리고 오토인코더는 시계열 데이터에 대한 충분한 학습이 각각 보장된다면 좋은 군집화 성능을 보이는 것을 확인하였다.

자료별 분류분석(DDA)에 의한 특징추출 (Datawise Discriminant Analysis For Feature Extraction)

  • 박명수;최진영
    • 한국지능시스템학회논문지
    • /
    • 제19권1호
    • /
    • pp.90-95
    • /
    • 2009
  • 본 논문은 선형차원감소(Linear Dimensionality Reduction)을 위해 널리 이용되고 있는 특징추출 알고리듬인 선형판별분석(Linear Discriminant Analysis)의 문제점을 해결할 수 있는 새로운 특징추출 알고리듬을 제안한다. 선형판별분석에 포함되는 평균-자료 간 거리 및 평균-평균 간의 거리에 기반한 분산행렬은 역행렬 연산, 계수의 제한 등으로 인하여 계산상의 문제와 추출되는 특징의 수가 제한되는 한계를 가지고 있다. 또한 자료의 집단이 단일 모드의 정규 분포로부터 얻어진 것으로 가정되며 그렇지 않은 경우에 대해서는 적절한 결과를 얻을 수 없다. 본 논문에서는 자료-자료 간의 거리에 기반하고 적절하게 가중치가 추가된 새로운 행렬을 정의하였으며. 이에 기반하여 특징을 추출하는 방법을 제안하였다. 그럼으로써 앞서 선형판별분석의 여러 문제를 해결하고자 시도하였다. 제안된 방법의 성능을 실험을 통해 확인하였다.