Search | Korea Science

Word Sense Similarity Clustering Based on Vector Space Model and HAL (벡터 공간 모델과 HAL에 기초한 단어 의미 유사성 군집)

Kim, Dong-Sung
- Korean Journal of Cognitive Science
- /
- v.23 no.3
- /
- pp.295-322
- /
- 2012
In this paper, we cluster similar word senses applying vector space model and HAL (Hyperspace Analog to Language). HAL measures corelation among words through a certain size of context (Lund and Burgess 1996). The similarity measurement between a word pair is cosine similarity based on the vector space model, which reduces distortion of space between high frequency words and low frequency words (Salton et al. 1975, Widdows 2004). We use PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) to reduce a large amount of dimensions caused by similarity matrix. For sense similarity clustering, we adopt supervised and non-supervised learning methods. For non-supervised method, we use clustering. For supervised method, we use SVM (Support Vector Machine), Naive Bayes Classifier, and Maximum Entropy Method.
PDF

A Non-linear Variant of Improved Robust Fuzzy PCA (잡음 민감성이 향상된 주성분 분석 기법의 비선형 변형)

Heo, Gyeong-Yong;Seo, Jin-Seok;Lee, Im-Geun
- Journal of the Korea Society of Computer and Information
- /
- v.16 no.4
- /
- pp.15-22
- /
- 2011
Principal component analysis (PCA) is a well-known method for dimensionality reduction and feature extraction while maintaining most of the variation in data. Although PCA has been applied in many areas successfully, it is sensitive to outliers and only valid for Gaussian distributions. Several variants of PCA have been proposed to resolve noise sensitivity and, among the variants, improved robust fuzzy PCA (RF-PCA2) demonstrated promising results. RF-PCA, however, is still a linear algorithm that cannot accommodate non-Gaussian distributions. In this paper, a non-linear algorithm that combines RF-PCA2 and kernel PCA (K-PCA), called improved robust kernel fuzzy PCA (RKF-PCA2), is introduced. The kernel methods make it to accommodate non-Gaussian distributions. RKF-PCA2 inherits noise robustness from RF-PCA2 and non-linearity from K-PCA. RKF-PCA2 outperforms previous methods in handling non-Gaussian distributions in a noise robust way. Experimental results also support this.
https://doi.org/10.9708/jksci.2011.16.4.015 인용 PDF KSCI

Binary classification by the combination of Adaboost and feature extraction methods (특징 추출 알고리즘과 Adaboost를 이용한 이진분류기)

Ham, Seaung-Lok;Kwak, No-Jun
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.49 no.4
- /
- pp.42-53
- /
- 2012
In pattern recognition and machine learning society, classification has been a classical problem and the most widely researched area. Adaptive boosting also known as Adaboost has been successfully applied to binary classification problems. It is a kind of boosting algorithm capable of constructing a strong classifier through a weighted combination of weak classifiers. On the other hand, the PCA and LDA algorithms are the most popular linear feature extraction methods used mainly for dimensionality reduction. In this paper, the combination of Adaboost and feature extraction methods is proposed for efficient classification of two class data. Conventionally, in classification problems, the roles of feature extraction and classification have been distinct, i.e., a feature extraction method and a classifier are applied sequentially to classify input variable into several categories. In this paper, these two steps are combined into one resulting in a good classification performance. More specifically, each projection vector is treated as a weak classifier in Adaboost algorithm to constitute a strong classifier for binary classification problems. The proposed algorithm is applied to UCI dataset and FRGC dataset and showed better recognition rates than sequential application of feature extraction and classification methods.
PDF KSCI

Polymorphisms of XRCC1 and XRCC2 DNA Repair Genes and Interaction with Environmental Factors Influence the Risk of Nasopharyngeal Carcinoma in Northeast India

Singh, Seram Anil;Ghosh, Sankar Kumar
- Asian Pacific Journal of Cancer Prevention
- /
- v.17 no.6
- /
- pp.2811-2819
- /
- 2016
Multiple genetic and environmental factors have been reported to play key role in the development of nasopharyngeal carcinoma (NPC). Here, we investigated interactions of XRCC1 Arg399Gln and XRCC2 Arg188His polymorphisms and environmental factors in modulating susceptibility to NPC in Northeast India. One-hundred NPC patients, 90 first-degree relatives of patients and 120 controls were enrolled in the study. XRCC1 Arg399Gln and XRCC2 Arg188His polymorphisms were determined using PCR-RFLP, and the results were confirmed by DNA sequencing. Logistic regression (LR) and multifactor dimensionality reduction (MDR) approaches were applied for statistical analysis. The XRCC1 Gln/Gln genotype showed increased risk (OR=2.76; P<0.024) of NPC. However, individuals with both XRCC1 and XRCC2 polymorphic variants had 3.2 fold elevated risk (P<0.041). An enhanced risk of NPC was also observed in smoked meat (OR=4.07; P=0.004) and fermented fish consumers (OR=4.34, P=0.001), and tobacco-betel quid chewers (OR=7.00; P=0.0001) carrying XRCC1 polymorphic variants. However, smokers carrying defective XRCC1 gene showed the highest risk (OR = 7.47; P<0.0001). On MDR analysis, the best model for NPC risk was the five-factor model combination of XRCC1 variant genotype, fermented fish, smoked meat, smoking and chewing (CVC=10/10; TBA=0.636; P<0.0001); whereas in interaction entropy graphs, smoked meat and tobacco chewing showed synergistic interactions with XRCC1. These findings suggest that interaction of genetic and environmental factors might increase susceptibility to NPC in Northeast Indian populations.
KSCI

A Feature Selection Method Based on Fuzzy Cluster Analysis (퍼지 클러스터 분석 기반 특징 선택 방법)

Rhee, Hyun-Sook
- The KIPS Transactions:PartB
- /
- v.14B no.2
- /
- pp.135-140
- /
- 2007
Feature selection is a preprocessing technique commonly used on high dimensional data. Feature selection studies how to select a subset or list of attributes that are used to construct models describing data. Feature selection methods attempt to explore data's intrinsic properties by employing statistics or information theory. The recent developments have involved approaches like correlation method, dimensionality reduction and mutual information technique. This feature selection have become the focus of much research in areas of applications with massive and complex data sets. In this paper, we provide a feature selection method considering data characteristics and generalization capability. It provides a computational approach for feature selection based on fuzzy cluster analysis of its attribute values and its performance measures. And we apply it to the system for classifying computer virus and compared with heuristic method using the contrast concept. Experimental result shows the proposed approach can give a feature ranking, select the features, and improve the system performance.
https://doi.org/10.3745/KIPSTB.2007.14-B.2.135 인용 PDF KSCI

Hydrodynamic Aspects on Three-dimensional Effects of Vertical-axis Tidal Stream Turbine (조류발전용 수직축 터빈의 유체동력학적 3차원 효과에 관한 연구)

Hyun, B.S.;Lee, J.K.
- Journal of the Korean Society for Marine Environment & Energy
- /
- v.16 no.2
- /
- pp.61-70
- /
- 2013
Hydrodynamic aspects on three-dimensional effects were investigated in this study for simple and convenient conversion of tidal stream energy using a Vertical-Axis Turbine (VAT). Numerical approach was made to reveal the differences of flow physics between 2-D estimation and rigorous 3-D simulation. It was shown that the 3-D effects were dominant mainly due to the variation of tip vortices around the tip region of rotor blade, causing the loss of lift for steadily translating hydrofoil and the reduction of torque for rotating turbine blade. The 3-D effect was found to be rather prominent for the typical VATs considered in this paper. Simple and yet efficient 2-D approach with the correction of its three-dimensionality was also proposed for practical design and analysis of VAT.
https://doi.org/10.7846/JKOSMEE.2013.16.2.61 인용 PDF KSCI

Random Forest Based Abnormal ECG Dichotomization using Linear and Nonlinear Feature Extraction (선형-비선형 특징추출에 의한 비정상 심전도 신호의 랜덤포레스트 기반 분류)

Kim, Hye-Jin;Kim, Byeong-Nam;Jang, Won-Seuk;Yoo, Sun-K.
- Journal of Biomedical Engineering Research
- /
- v.37 no.2
- /
- pp.61-67
- /
- 2016
This paper presented a method for random forest based the arrhythmia classification using both heart rate (HR) and heart rate variability (HRV) features. We analyzed the MIT-BIH arrhythmia database which contains half-hour ECG recorded from 48 subjects. This study included not only the linear features but also non-linear features for the improvement of classification performance. We classified abnormal ECG using mean_NN (mean of heart rate), SD1/SD2 (geometrical feature of poincare HRV plot), SE (spectral entropy), pNN100 (percentage of a heart rate longer than 100 ms) affecting accurate classification among combined of linear and nonlinear features. We compared our proposed method with Neural Networks to evaluate the accuracy of the algorithm. When we used the features extracted from the HRV as an input variable for classifier, random forest used only the most contributed variable for classification unlike the neural networks. The characteristics of random forest enable the dimensionality reduction of the input variables, increase a efficiency of classifier and can be obtained faster, 11.1% higher accuracy than the neural networks.
https://doi.org/10.9718/JBER.2016.37.2.61 인용 PDF KSCI

Data anomaly detection and Data fusion based on Incremental Principal Component Analysis in Fog Computing

Yu, Xue-Yong;Guo, Xin-Hui
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.10
- /
- pp.3989-4006
- /
- 2020
The intelligent agriculture monitoring is based on the perception and analysis of environmental data, which enables the monitoring of the production environment and the control of environmental regulation equipment. As the scale of the application continues to expand, a large amount of data will be generated from the perception layer and uploaded to the cloud service, which will bring challenges of insufficient bandwidth and processing capacity. A fog-based offline and real-time hybrid data analysis architecture was proposed in this paper, which combines offline and real-time analysis to enable real-time data processing on resource-constrained IoT devices. Furthermore, we propose a data process-ing algorithm based on the incremental principal component analysis, which can achieve data dimensionality reduction and update of principal components. We also introduce the concept of Squared Prediction Error (SPE) value and realize the abnormal detection of data through the combination of SPE value and data fusion algorithm. To ensure the accuracy and effectiveness of the algorithm, we design a regular-SPE hybrid model update strategy, which enables the principal component to be updated on demand when data anomalies are found. In addition, this strategy can significantly reduce resource consumption growth due to the data analysis architectures. Practical datasets-based simulations have confirmed that the proposed algorithm can perform data fusion and exception processing in real-time on resource-constrained devices; Our model update strategy can reduce the overall system resource consumption while ensuring the accuracy of the algorithm.
https://doi.org/10.3837/tiis.2020.10.004 인용 PDF KSCI HTML

Genetic Design of Granular-oriented Radial Basis Function Neural Network Based on Information Proximity (정보 유사성 기반 입자화 중심 RBF NN의 진화론적 설계)

Park, Ho-Sung;Oh, Sung-Kwun;Kim, Hyun-Ki
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.59 no.2
- /
- pp.436-444
- /
- 2010
In this study, we introduce and discuss a concept of a granular-oriented radial basis function neural networks (GRBF NNs). In contrast to the typical architectures encountered in radial basis function neural networks(RBF NNs), our main objective is to develop a design strategy of GRBF NNs as follows : (a) The architecture of the network is fully reflective of the structure encountered in the training data which are granulated with the aid of clustering techniques. More specifically, the output space is granulated with use of K-Means clustering while the information granules in the multidimensional input space are formed by using a so-called context-based Fuzzy C-Means which takes into account the structure being already formed in the output space, (b) The innovative development facet of the network involves a dynamic reduction of dimensionality of the input space in which the information granules are formed in the subspace of the overall input space which is formed by selecting a suitable subset of input variables so that the this subspace retains the structure of the entire space. As this search is of combinatorial character, we use the technique of genetic optimization to determine the optimal input subspaces. A series of numeric studies exploiting some nonlinear process data and a dataset coming from the machine learning repository provide a detailed insight into the nature of the algorithm and its parameters as well as offer some comparative analysis.
https://doi.org/10.5370/KIEE.2010.59.2.436 인용 PDF KSCI

Sonar Target Classification using Generalized Discriminant Analysis (일반화된 판별분석 기법을 이용한 능동소나 표적 식별)

Kim, Dong-wook;Kim, Tae-hwan;Seok, Jong-won;Bae, Keun-sung
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.22 no.1
- /
- pp.125-130
- /
- 2018
Linear discriminant analysis is a statistical analysis method that is generally used for dimensionality reduction of the feature vectors or for class classification. However, in the case of a data set that cannot be linearly separated, it is possible to make a linear separation by mapping a feature vector into a higher dimensional space using a nonlinear function. This method is called generalized discriminant analysis or kernel discriminant analysis. In this paper, we carried out target classification experiments with active sonar target signals available on the Internet using both liner discriminant and generalized discriminant analysis methods. Experimental results are analyzed and compared with discussions. For 104 test data, LDA method has shown correct recognition rate of 73.08%, however, GDA method achieved 95.19% that is also better than the conventional MLP or kernel-based SVM.
https://doi.org/10.6109/jkiice.2018.22.1.125 인용 PDF KSCI

Search Result 204, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)