• Title/Summary/Keyword: 다변량통계기법

Search Result 132, Processing Time 0.028 seconds

Document Thematic words Extraction using Principal Component Analysis (주성분 분석을 이용한 문서 주제어 추출)

  • Lee, Chang-Beom;Kim, Min-Soo;Lee, Ki-Ho;Lee, Guee-Sang;Park, Hyuk-Ro
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.10
    • /
    • pp.747-754
    • /
    • 2002
  • In this paper, We propose a document thematic words extraction by using principal component analysis(PCA) which is one of the multivariate statistical methods. The proposed PCA model understands the flow of words in the document by using an eigenvalue and an eigenvector, and extracts thematic words. The proposed model is estimated by applying to document summarization. Experimental results using newspaper articles show that the proposed model is superior to the model using either word frequency or information retrieval thesaurus. We expect that the Proposed model can be applied to information retrieval , information extraction and document summarization.

A Note on Relationship between Strengths of Heavy Metals Contamination in Scalp Hair and Organs from Autopsy Subjects (부검체 두발과 장기의 중금속 오염농도 관련성)

  • Lee, Won-Kee;Song, Myung-Unn;Song, Jae-Kee;Lee, Sung-Kook;Park, Sung-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.1
    • /
    • pp.215-222
    • /
    • 1999
  • It is well known to use scalp hair as a signal of strengths of heavy metals contamination. In this paper, using the multivariate methods we examine the relationship between strengths of cadmium and mercury in scalp hair and cerebellum, cerebrum, heart, kidney, liver, lung and spleen from 61 korean autopsy subjects. As a result, there is statistically singnificant relationship between strengths of mercury contamination in scalp hair and secondary contaminated organs.

  • PDF

A Study on the Relationship between Player Characteristic Factors and Competitive Factors of Tennis Grand Slams Competition Using Canonical Correlation Biplot and Procrustes Analysis (테니스 그랜드슬램대회의 선수특성요인과 경기요인에 대한 분석연구 -정준상관 행렬도와 프로크러스티즈 분석의 응용-)

  • Choi, Tae-Hoon;Choi, Yong-Seok;Shin, Sang-Min
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.855-864
    • /
    • 2009
  • Canonical correlation biplot is 2-dimensional plot for investigating the relationship between two sets of variables and the relationship between observations and variables in canonical correlation analysis graphically. Recently, Choi and Choi (2008) suggested a method for investigating the relationship between skill and competition score factors of KLPGA players using canonical correlation biplot and cluster analysis. analysis. Procrustes analysis is very useful tool for comparing shape between configurations. Therefore, in this study, we will provide a method for investigating the relationship between player characteristic factors and competitive factors of tennis grand slams competition using Canonical correlation biplot and Procrustes analysis.

Multidimensional scaling of categorical data using the partition method (분할법을 활용한 범주형자료의 다차원척도법)

  • Shin, Sang Min;Chun, Sun-Kyung;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.67-75
    • /
    • 2018
  • Multidimensional scaling (MDS) is an exploratory analysis of multivariate data to represent the dissimilarity among objects in the geometric low-dimensional space. However, a general MDS map only shows the information of objects without any information about variables. In this study, we used MDS based on the algorithm of Torgerson (Theory and Methods of Scaling, Wiley, 1958) to visualize some clusters of objects in categorical data. For this, we convert given data into a multiple indicator matrix. Additionally, we added the information of levels for each categorical variable on the MDS map by applying the partition method of Shin et al. (Korean Journal of Applied Statistics, 28, 1171-1180, 2015). Therefore, we can find information on the similarity among objects as well as find associations among categorical variables using the proposed MDS map.

Modified Recursive PC (수정된 반복 주성분 분석 기법에 대한 연구)

  • Kim, Dong-Gyu;Kim, Ah-Hyoun;Kim, Hyun-Joong
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.963-977
    • /
    • 2011
  • PCA(Principal Component Analysis) is a well-studied statistical technique and an important tool for handling multivariate data. Although many algorithms exist for PCA, most of them are unsuitable for real time applications or high dimensional problems. Since it is desirable to avoid extensive matrix operations in such cases, alternative solutions are required to calculate the eigenvalues and eigenvectors of the sample covariance matrix. Erdogmus et al. (2004) proposed Recursive PCA(RPCA), which is a fast adaptive on-line solution for PCA, based on the first order perturbation theory. It facilitates the real-time implementation of PCA by recursively approximating updated eigenvalues and eigenvectors. However, the performance of the RPCA method becomes questionable as the size of newly-added data increases. In this paper, we modified the RPCA method by taking advantage of the mathematical relation of eigenvalues and eigenvectors of sample covariance matrix. We compared the performance of the proposed algorithm with that of RPCA, and found that the accuracy of the proposed method remarkably improved.

A Multivariate Analysis of Korean Professional Players Salary (한국 프로스포츠 선수들의 연봉에 대한 다변량적 분석)

  • Song, Jong-Woo
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.441-453
    • /
    • 2008
  • We analyzed Korean professional basketball and baseball players salary under the assumption that it depends on the personal records and contribution to the team in the previous year. We extensively used data visualization tools to check the relationship among the variables, to find outliers and to do model diagnostics. We used multiple linear regression and regression tree to fit the model and used cross-validation to find an optimal model. We check the relationship between variables carefully and chose a set of variables for the stepwise regression instead of using all variables. We found that points per game, number of assists, number of free throw successes, career are important variables for the basketball players. For the baseball pitchers, career, number of strike-outs per 9 innings, ERA, number of homeruns are important variables. For the baseball hitters, career, number of hits, FA are important variables.

Using GA based Input Selection Method for Artificial Neural Network Modeling Application to Bankruptcy Prediction (유전자 알고리즘을 활용한 인공신경망 모형 최적입력변수의 선정: 부도예측 모형을 중심으로)

  • 홍승현;신경식
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.1
    • /
    • pp.227-249
    • /
    • 2003
  • Prediction of corporate failure using past financial data is a well-documented topic. Early studies of bankruptcy prediction used statistical techniques such as multiple discriminant analysis, logit and probit. Recently, however, numerous studies have demonstrated that artificial intelligence such as neural networks can be an alternative methodology for classification problems to which traditional statistical methods have long been applied. In building neural network model, the selection of independent and dependent variables should be approached with great care and should be treated as model construction process. Irrespective of the efficiency of a teaming procedure in terms of convergence, generalization and stability, the ultimate performance of the estimator will depend on the relevance of the selected input variables and the quality of the data used. Approaches developed in statistical methods such as correlation analysis and stepwise selection method are often very useful. These methods, however, may not be the optimal ones for the development of neural network model. In this paper, we propose a genetic algorithms approach to find an optimal or near optimal input variables fur neural network modeling. The proposed approach is demonstrated by applications to bankruptcy prediction modeling. Our experimental results show that this approach increases overall classification accuracy rate significantly.

  • PDF

A Development of Hotel Bankruptcy Prediction Model on Artificial Neural Network (인공신경망 기반 호텔 부도예측모형 개발)

  • Choi, Sung-Ju;Lee, Sang-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.125-133
    • /
    • 2014
  • This paper develops a bankruptcy prediction model on an Artificial Neural Network for hotel management. A bankruptcy prediction model has a specific feature to predict a bankruptcy of the whole hotel business after evaluate bankruptcy possibility on the basis of business performance data of each branch. here are many traditional statistical models for bankruptcy prediction such as Multivariate Discriminant Analysis or Logit Analysis. However, we chose Artificial Neural Network because the method has accuracy rates of prediction better than those of other methods. We first selected 100 good enterprises and 100 bankrupt enterprises as experimental data and set up a bankruptcy prediction model by use of a tool for Artificial Neural Network, NeuroShell. The model and its experiments, which demonstrated high efficiency, can certainly provide great help in decision making in the field of hotel management and in deciding on the bankruptcy or financial solidity of each branch of serviced residence hotel.

A Study on the Genomic Patterns of SARS coronavirus using Bioinformtaics Techniques (바이오인포매틱스 기법을 활용한 SARS 코로나바이러스의 유전정보 연구)

  • Ahn, Insung;Jeong, Byeong-Jin;Son, Hyeon S.
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2007.11a
    • /
    • pp.522-526
    • /
    • 2007
  • Since newly emerged disease, the Severe Acute Respiratory Syndrome (SARS), spread from Asia to North America and Europe rapidly in 2003, many researchers have tried to determine where the virus came from. In the phylogenetic point of view, SARS virus has been known to be one of the genus Coronavirus, but, the overall conservation of SARS virus sequence was not highly similar to that of known coronaviruses. The natural reservoirs of SARS-CoV are not clearly determined, yet. In the present study, the genomic sequences of SARS-CoV were analyzed by bioinformatics techniques such as multiple sequence alignment and phylogenetic analysis methods as well multivariate statistical analysis. All the calculating processes, including calculations of the relative synonymous codon usage (RSCU) and other genomic parameters using 30,305 coding sequences from the two genera, Coronavirus, and Lentivirus, and one family, Orthomyxoviridae, were performed on SMP cluster in KISTI, Supercomputing Center. As a result, SARS_CoV showed very similar RSCU patterns with feline coronavirus on the both axes of the correspondence analysis, and this result showed more agreeable results with serological results for SARS_CoV than that of phylogenetic result itself. In addition, SARS_CoV, human immunodeficiency virus, and influenza A virus commonly showed the very low RSCU differences among each synonymous codon group, and this low RSCU bias might provide some advantages for them to be transmitted from other species into human beings more successfully. Large-scale genomic analysis using bioinformatics techniques may be useful in genetic epidemiology field effectively.

  • PDF

Charge Neutralization of Wet-end (습부공정에 전하 중화개념의 도입)

  • 신종호;김동호;류정용;김용환;송봉근
    • Proceedings of the Korea Technical Association of the Pulp and Paper Industry Conference
    • /
    • 2001.11a
    • /
    • pp.59-59
    • /
    • 2001
  • 전보에서 발표한 바와 같이, 대상 라이너지 제조공장의 습부 운전조건이 지극히 악화되어 있으며 초지 시스템이 지종교체 등의 충격에 전혀 완충작용을 못하는 이유로 는 용수를 포함한 지료의 전하특성을 조절해주지 못하기 때문인 것으로 판단되었다. 특 히 양이온성 고분자로서 유일하게 사용하고 있는 보류향상제가 적절히 작용하지 못하 기 때문에 보류도가 저하되고, 제품내에 보류되지 못한 다량의 미세분이 백수 중에 존 재함으로서 결과적으로 지료의 전기적 특성을 더욱 악화시키는 악순환이 되풀이 되는 것으로 판단되었다. 이와 같이 강하게 음으로 하전된 지료의 전기적 특성을 조절하기 위해서는 양이온성 고분자의 사용량을 증가시키거나 고분자의 전하밀도 또는 분자량을 변화시켜 보는 것이 일반적인 습부첨가제 사용방법이라고 할 수 있다. 따라서 대상 습부공정의 조업조건을 호전시키기 위해서는 적절한 보류향상시스 템의 적용이 가장 시급한 현안이라고 판단되어 선규 보류제의 현장적용시험을 수행한 결과, 백수의 COD와 미세분이 격감하고 탈수성이 향상되어 습부공정의 운전조건이 호 전됨을 관측할 수 있었다. 그러나 2달 이상에 걸친 보류제 현장적용시험 기간 중에 생 산된 라이너지의 제반 물성들은 별다른 변화를 관측할 수 없었다. 이는 적용된 보류제 의 상당 부분이 계내의 미세분과 작용하여 소모되기 때문으로 판단되었다. 본 연구에서는 보류제의 투입 이전에 보류제와는 상대적으로 저분자량과 고 전 하밀도를 가진 고분자 전해질 4종을 사용하여 라이너지 지료의 전하를 중화시키고자 하였으며, 이러한 공정으로 생산된 라이너지의 물성변화를 관측하였다. 물성으로는 파 열강도, 압축강도, 습윤인장강도 및 염료 고착능력 등을 살펴보았다.시아노에틸화한 PYA가 안정된 분자구조를 유지하고 있음을 확인할 수 있었다. 시아노에틸화한 PYA용액의 점탄성 평가를 위하여 storage modulus와 loss modulus 를 분석하였다. 일반적 유변특성 평가 결과 PYA용액은 shear-thinning, pseudoplastic 한 특성을 나타내어 표면사이즈 공정에서의 적용 가능성을 확인할 수 있었다. 사용하는 통계기법 중의 하나인 주성분회귀분석을 실시하였다. 주성분 분석은 여러 개의 반응변수에 대하여 얻어진 다변량 자료의 다차원적인 변 수들을 축소, 요약하는 차원의 단순화와 더불어 서로 상관되어있는 반응변수들 상호간 의 복잡한 구조를 분석하는 기법이다. 본 발표에서는 공정 자료를 활용하여 인공신경망 과 주성분분석을 통해 공정 트러블의 발생에 영향 하는 인자들을 보다 현실적으로 추 정하고, 그 대책을 모색함으로써 이를 최소화할 수 있는 방안을 소개하고자 한다.금 빛 용사 둥과 같은 표면처리를 할 경우임의 소재 표면에 도금 및 용 사에 용이한 재료를 오버레이용접시킨 후 표면처리를 함으로써 보다 고품질의 표면층을 얻기위한 시도가 이루어지고 있다. 따라서 국내, 외의 오버레이 용접기술의 적용현황 및 대표적인 적용사례, 오버레이 용접기술 및 용접재료의 개발현황 둥을 중심으로 살펴봄으로서 아직 국내에서는 널리 알려지지 않은 본 기 술의 활용을 넓이고자 한다. within minimum time from beginning of the shutdown.및 12.36%, $101{\sim}200$일의 경우 12.78% 및 12.44%, 201일 이상의 경우 13.17% 및 11.30%로 201일 이상의 유기의 경우에만 대조구와 삭

  • PDF