• 제목/요약/키워드: Multivariate Statistical Analysis

검색결과 639건 처리시간 0.024초

A Automatic Document Summarization Method based on Principal Component Analysis

  • Kim, Min-Soo;Lee, Chang-Beom;Baek, Jang-Sun;Lee, Guee-Sang;Park, Hyuk-Ro
    • Communications for Statistical Applications and Methods
    • /
    • 제9권2호
    • /
    • pp.491-503
    • /
    • 2002
  • In this paper, we propose a automatic document summarization method based on Principal Component Analysis(PCA) which is one of the multivariate statistical methods. After extracting thematic words using PCA, we select the statements containing the respective extracted thematic words, and make the document summary with them. Experimental results using newspaper articles show that the proposed method is superior to the method using either word frequency or information retrieval thesaurus.

한국인 발 형상 분류에 관한 연구 (A Study on the Categorization of Korean Foot Shapes)

  • 성덕현;정의승;조용주
    • 대한인간공학회지
    • /
    • 제25권2호
    • /
    • pp.107-118
    • /
    • 2006
  • Recently, Korean's 3-D foot data have been extensively collected through 5th national anthropometric survey known as 'Size Korea'. In this study, Korean foot shape was investigated and subsequently classified, based on the existing standard for foot shaping. This study analyzed and categorized Korean foot shapes through the following methods. Although the data used in this study were limited to those of Korean adults, major factors affecting the foot shape were deduced and then categorically grouped by the multivariate statistical analysis. For those whose age ranged from 14 to 70, major factors affecting the foot shape for the male were related to foot breadth, ankle thickness, 1st toe shape, malleolus height, heel to top of the foot length, the ratio between toe-side and heel-side and 5th toe shape. For the female, the ball of foot height was added to the above factors. From the factors extracted, the Korean foot shape was categorized into three groups for the male and four groups for the female. They were the ladder type, the inverted triangle type and the square type. For the female, the triangular type was added to the three types. These findings will serve as useful information for the footwear production industry in Korea.

기상조건이 하수발생량 및 하수처리장 운전인자에 미치는 영향에 관한 통계적 분석 (Study on the Relationship between Weather Conditions, Sewage and Operational Variables of WWTPs using Multivariate Statistical Methods)

  • 이재현
    • 한국물환경학회지
    • /
    • 제28권2호
    • /
    • pp.285-291
    • /
    • 2012
  • Generally, the rainfall and the influent of wastewater treatment plants (WWTPs) have strong relationship at the case of combined sewers. With the fact that the influent variations in terms of quantity and sewage quality is the most common and significant disturbance, the impact factor to the characteristics of sewage should be searched for. In this paper, the relationship between weather conditions such as humidity, temperature and rainfall and influent flowrate and contaminant concentration was analysed using factor analysis. Additionally, 3 influent types were deduced using cluster analysis and the distributions of operational variables were compared to the each groups by one-way ANOVA. The applied dataset were clustered to three groups that have the similar weather and influent conditions. These different conditions can cause the different operating conditions at WWTPs. That is, the Group 1 is for the condition with high humidity and rainfall, so DO concentration in the reactor was very high but MLSS concentration was very low because of too large flowrate. However, the Group 3 is classified to the case having low humidity, temperature, and rainfall, therefore, the SRT was the longest and the SVI was the highest due to the worst settleability in the winter for a year.

베이지안 다변량 선형 모형을 이용한 청소년 패널 데이터 분석 (KCYP data analysis using Bayesian multivariate linear model)

  • 이인선;이근백
    • 응용통계연구
    • /
    • 제35권6호
    • /
    • pp.703-724
    • /
    • 2022
  • 다변량 경시적 자료 분석은 반복 측정된 자료에 존재하는 상관관계를 올바르게 추정하면서 자료를 분석해야 한다. 경시적 연구에서는 다변량 경시적 자료가 주로 생성되지만, 기존 통계적 모형은 대부분 단변량으로 분석되어 다변량 경시적 자료에 존재하는 복잡한 상관관계를 제대로 설명하지 못하게 된다. 따라서 본 논문에서는 복잡한 상관관계를 설명하기 위해 공분산 행렬을 모형화하는 다양한 방법에 대해 고찰한다. 그 중 수정된 콜레스키 분해, 수정된 콜레스키 블록분해와 초구분해를 살펴본다. 그리고 일반화 자기회귀모수 행렬이 가지는 희박성 문제를 해결하기 위해 베이지안 방법을 이용하여 청소년 패널 데이터를 분석한다. 청소년 패널 데이터는 다변량 경시적 자료이며, 반응 변수로는 학교 적응도, 학업 성취도, 휴대전화 의존도를 고려한다. 자기 상관 구조와 혁신 표준 편차 구조를 달리 가정하여 여러 모형을 비교한다. 가장 적합한 모형에 대해 학교 적응도와 학업 성취도에 대해 모든 설명 변수가 유의미하며, 휴대전화 의존도가 반응 변수일 때 사교육 시간을 제외한 모든 설명 변수가 유의미한 것으로 나타난다.

Estimation of Water Quality of Fish Farms using Multivariate Statistical Analysis

  • Ceong, Hee-Taek;Kim, Hae-Ran
    • Journal of information and communication convergence engineering
    • /
    • 제9권4호
    • /
    • pp.475-482
    • /
    • 2011
  • In this research, we have attempted to estimate the water quality of fish farms in terms of parameters such as water temperature, dissolved oxygen, pH, and salinity by employing observational data obtained from a coastal ocean observatory of a national institution located close to the fish farm. We requested and received marine data comprising nine factors including water temperature from Korea Hydrographic and Oceanographic Administration. For verifying our results, we also established an experimental fish farm in which we directly placed the sensor module of an optical mode, YSI-6920V2, used for self-cleaning inside fish tanks and used the data measured and recorded by a environment monitoring system that was communicating serially with the sensor module. We investigated the differences in water temperature and salinity among three areas - Goheung Balpo, Yeosu Odongdo, and the experimental fish farm, Keumho. Water temperature did not exhibit significant differences but there was a difference in salinity (significance <5%). Further, multiple regression analysis was performed to estimate the water quality of the fish farm at Keumho based on the data of Goheung Balpo. The water temperature and dissolved-oxygen estimations had multiple regression linear relationships with coefficients of determination of 98% and 89%, respectively. However, in the case of the pH and salinity estimated using the oceanic environment with nine factors, the adjusted coefficient of determination was very low at less than 10%, and it was therefore difficult to predict the values. We plotted the predicted and measured values by employing the estimated regression equation and found them to fit very well; the values were close to the regression line. We have demonstrated that if statistical model equations that fit well are used, the expense of fish-farm sensor and system installations, maintenances, and repairs, which is a major issue with existing environmental information monitoring systems of marine farming areas, can be reduced, thereby making it easier for fish farmers to monitor aquaculture and mariculture environments.

러프집합이론을 중심으로 한 감성 지식 추출 및 통계분석과의 비교 연구 (Knowledge Extraction from Affective Data using Rough Sets Model and Comparison between Rough Sets Theory and Statistical Method)

  • 홍승우;박재규;박성준;정의승
    • 대한인간공학회지
    • /
    • 제29권4호
    • /
    • pp.631-637
    • /
    • 2010
  • The aim of affective engineering is to develop a new product by translating customer affections into design factors. Affective data have so far been analyzed using a multivariate statistical analysis, but the affective data do not always have linear features assumed under normal distribution. Rough sets model is an effective method for knowledge discovery under uncertainty, imprecision and fuzziness. Rough sets model is to deal with any type of data regardless of their linearity characteristics. Therefore, this study utilizes rough sets model to extract affective knowledge from affective data. Four types of scent alternatives and four types of sounds were designed and the experiment was performed to look into affective differences in subject's preference on air conditioner. Finally, the purpose of this study also is to extract knowledge from affective data using rough sets model and to figure out the relationships between rough sets based affective engineering method and statistical one. The result of a case study shows that the proposed approach can effectively extract affective knowledge from affective data and is able to discover the relationships between customer affections and design factors. This study also shows similar results between rough sets model and statistical method, but it can be made more valuable by comparing fuzzy theory, neural network and multivariate statistical methods.

Comparing Role of Two Chemotherapy Regimens, CMF and Anthracycline-Based, on Breast Cancer Survival in the Eastern Mediterranean Region and Asia by Multivariate Mixed Effects Models: a Meta-Analysis

  • Ghanbari, Saeed;Ayatollahi, Seyyed Mohammad Taghi;Zare, Najaf
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권14호
    • /
    • pp.5655-5661
    • /
    • 2015
  • Purpose: To assess the role of two adjuvant chemotherapy regimens, anthracycline-based and CMF on disease free survival and overall survival breast cancer patients by meta-analysis approach in Eastern Mediterranean and Asian countries to determine which is more effective and evaluate the appropriateness and efficiency of two different proposed statistical models. Materials and Methods: Survival curves were digitized and the survival proportions and times were extracted and modeled to appropriate covariates by two multivariate mixed effects models. Studies which reported disease free survival and overall survival curves for anthracycline-based or CMF as adjuvant chemotherapy that were published in English in the Eastern Mediterranean region and Asia were included in this systematic review. The two transformations of survival probabilities (Ln (-Ln(S)) and Ln(S/ (1-S))) as dependent variables were modeled by a multivariate mixed model to same covariates in order to have precise estimations with high power and appropriate interpretation of covariate effects. The analysis was carried out with SAS Proc MIXED and STATA software. Results: A total of 32 studies from the published literature were analysed, covering 4,092 patients who received anthracycline-based and 2,501 treated with CMF for the disease free survival and in order to analyze the overall survival, 13 studies reported the overall survival curves in which 2,050 cases were treated with anthracycline-based and 1,282 with CMF regimens. Conclusions: The findings illustrated that the model with dependent variable Ln (-Ln(S)) had more precise estimations of the covariate effects and showed significant difference between the effects of two adjuvant chemotherapy regimens. Anthracycline-based treatment gave better disease free survival and overall survival. As an IPD meta-analysis in the Italy the results of Angelo et al in 2011 also confirmed that anthracycline-based regimens were more effective for survival of breast cancer patients. The findings of Zare et al 2012 on disease free survival curves in Asia also provided similar evidence.

Robust Design for Multiple Quality Characteristics using Principal Component Analysis

  • Kwon, Yong-Man;Hong, Yeon-Woong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권3호
    • /
    • pp.545-551
    • /
    • 2003
  • Robust design is to identify appropriate settings of control factors that make the system's performance robust to changes in the noise factors that represent the source of variation. In this paper we propose how to simultaneously optimize multiple quality characteristics using the principal component analysis of multivariate statistical analysis. An example is illustrated to compare it with already proposed method.

  • PDF

Resistant Principal Factor Analysis

  • Park, Youg-Seok;Byun, Ho-Seon
    • Journal of the Korean Statistical Society
    • /
    • 제25권1호
    • /
    • pp.67-80
    • /
    • 1996
  • Factor analysis is a multivariate technique for describing the in-terrelationship among many variables in terms of a few underlying but unobservable random variables called factors. There are various approaches for this factor analysis. In particular, principal factor analysis is one of the most popular methods. This follows the mathematical algorithm of the principal component analysis based on the singular value decomposition. But it is known that the singular value decomposition is not resistant, i.e., it is very sensitive to small changes in the input data. In this article, using the resistant singular value decomposition of Choi and Huh (1994), we derive a resistant principal factor analysis relatively little influenced by notable observations.

  • PDF

다변량 통계분석기법을 이용한 전국 표준유역 대상 수문학적 군집화 연구 (A Study on Hydrologic Clustering for Standard Watersheds of Korea Water Resources Unit Map Using Multivariate Statistical Analysis)

  • 안소라;김상호;김성준
    • 한국지리정보학회지
    • /
    • 제17권1호
    • /
    • pp.91-106
    • /
    • 2014
  • 본 연구는 다변량 통계분석기법을 이용하여 한국 수자원단위지도의 전국 795개 표준유역에 대하여 수문학적 군집화를 수행하였다. 국내 유역의 종합적인 특성인자 산정을 위해 지형, 하천, 기상, 토양, 토지이용 및 수문학 관련 유역특성인자 30개를 선정하였다. 다변량 통계기법인 요인분석을 통해 유역특성인자들 간의 상관관계를 분석하여 16개의 대표 유역특성인자들을 추출하였으며, 유역의 특징을 결정짓는 인자는 토양특성, 유역위치, 유역크기, 기상 및 수문특성에 관련된 인자들로 나타났다. 군집분석을 위해 전국의 기상, 강우, 수위관측소의 자료를 수집하고 양질의 자료보유현황을 검토하여 73개의 계측 유역을 구분하였다. 이 73개의 계측유역을 기준으로 하여, 나머지 미계측 유역 간에 16개의 대표 유역특성인자들과의 유클리드 거리를 계산함으로써 수문학적 군집화를 수행하였다. 그 결과 각 권역별로 동일권역 내 표준유역 사이의 유사성은 한강이 87%, 낙동강이 69%, 금강이 41%, 섬진강이 52%, 영산강이 27%로 분석되었다.