• Title/Summary/Keyword: Variance of Analysis

Search Result 6,248, Processing Time 0.04 seconds

On principal component analysis for interval-valued data (구간형 자료의 주성분 분석에 관한 연구)

  • Choi, Soojin;Kang, Kee-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.61-74
    • /
    • 2020
  • Interval-valued data, one type of symbolic data, are observed in the form of intervals rather than single values. Each interval-valued observation has an internal variation. Principal component analysis reduces the dimension of data by maximizing the variance of data. Therefore, the principal component analysis of the interval-valued data should account for the variance between observations as well as the variation within the observed intervals. In this paper, three principal component analysis methods for interval-valued data are summarized. In addition, a new method using a truncated normal distribution has been proposed instead of a uniform distribution in the conventional quantile method, because we believe think there is more information near the center point of the interval. Each method is compared using simulations and the relevant data set from the OECD. In the case of the quantile method, we draw a scatter plot of the principal component, and then identify the position and distribution of the quantiles by the arrow line representation method.

Evaluation of Water Quality Characteristics and Grade Classification of Yeongsan River Tributaries (영산강 수계 지류.지천의 수질 특성 평가 및 등급화 방안)

  • Jung, Soojung;Kim, Kapsoon;Seo, Dongju;Kim, Junghyun;Lim, Byungjin
    • Journal of Korean Society on Water Environment
    • /
    • v.29 no.4
    • /
    • pp.504-513
    • /
    • 2013
  • Water quality trends for major tributaries (66 sites) in the Yeongsan River basin of Korea were examined for 12 parameters based on water quality data collected every month over a period of 12 months. The complex data matrix was treated with multivariate analysis such as PCA, FA and CA. PCA/FA identified four factors, which are responsible for the structure explaining 78.2% of the total variance. The first factor accounting 27.3% of the total variance was correlated with BOD, TN, TP, and TOC, and weighting values were allowed to these parameters for grade classification. CA rendered a dendrogram, where monitoring sites were grouped into 5 clusters. Cluster 2 corresponds to high pollution from domestic wastewater, wastewater treatment and run-off from livestock farms. For grade classification of tributaries, scores to 10 indexes were calculated considering the weighting values to 3 parameters as BOD, TN and TP which were categorized as the first factor after FA. The highest-polluted group included 10 tributaries such as Gwangjucheon, Jangsucheon, Daejeoncheon, Gamjungcheon, Yeongsancheon. The results indicate that grade classification method suggested in this study is useful in reliable classification of tributaries in the study area.

Numerical taxonomy of Rhus sensu lato (Anacardiaceae) in Korea (한국산 광의의 붉나무속(Rhus L. sensu lato)의 수리분류학적 연구)

  • Tho, Jae-Hwa;Kim, Joo-Hwan
    • Korean Journal of Plant Taxonomy
    • /
    • v.34 no.3
    • /
    • pp.205-220
    • /
    • 2004
  • Numerical analysis based on the 67 morphological characters from 28 populations of 6 species of Korean Rhus sensu lato (Anacardiaceae) was performed for the taxonomic delimitation. Based on the results of PCA with 47 quantitative characters, the sum of contributions for the total variance of three major principal components was 77,9% (PCl 35.2%, PC2 22.5% and PC3 20.2%). The sum of contributions for the total variance of three major principal components were 90,7% (PCl 37.7%, PC2 33.0% and PC3 20.0%) based on the results of PCA with 20 qualitative The characters. Two dimensional plotting from PCA results recognized six distinct species. UPGMA phenogram based on simple matching coefficient method recognized clear taxonomic delimitations among six taxa. On the cluster analysis, qualitative characters were more useful for grouping the species treated. Numerical analysis was very valuable to delimit the Korean taxa of Rhus s.l.

Classification of Weather Patterns in the East Asia Region using the K-means Clustering Analysis (K-평균 군집분석을 이용한 동아시아 지역 날씨유형 분류)

  • Cho, Young-Jun;Lee, Hyeon-Cheol;Lim, Byunghwan;Kim, Seung-Bum
    • Atmosphere
    • /
    • v.29 no.4
    • /
    • pp.451-461
    • /
    • 2019
  • Medium-range forecast is highly dependent on ensemble forecast data. However, operational weather forecasters have not enough time to digest all of detailed features revealed in ensemble forecast data. To utilize the ensemble data effectively in medium-range forecasting, representative weather patterns in East Asia in this study are defined. The k-means clustering analysis is applied for the objectivity of weather patterns. Input data used daily Mean Sea Level Pressure (MSLP) anomaly of the ECMWF ReAnalysis-Interim (ERA-Interim) during 1981~2010 (30 years) provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). Using the Explained Variance (EV), the optimal study area is defined by 20~60°N, 100~150°E. The number of clusters defined by Explained Cluster Variance (ECV) is thirty (k = 30). 30 representative weather patterns with their frequencies are summarized. Weather pattern #1 occurred all seasons, but it was about 56% in summer (June~September). The relatively rare occurrence of weather pattern (#30) occurred mainly in winter. Additionally, we investigate the relationship between weather patterns and extreme weather events such as heat wave, cold wave, and heavy rainfall as well as snowfall. The weather patterns associated with heavy rainfall exceeding 110 mm day-1 were #1, #4, and #9 with days (%) of more than 10%. Heavy snowfall events exceeding 24 cm day-1 mainly occurred in weather pattern #28 (4%) and #29 (6%). High and low temperature events (> 34℃ and < -14℃) were associated with weather pattern #1~4 (14~18%) and #28~29 (27~29%), respectively. These results suggest that the classification of various weather patterns will be used as a reference for grouping all ensemble forecast data, which will be useful for the scenario-based medium-range ensemble forecast in the future.

Comparison study of modeling covariance matrix for multivariate longitudinal data (다변량 경시적 자료 분석을 위한 공분산 행렬의 모형화 비교 연구)

  • Kwak, Na Young;Lee, Keunbaik
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.3
    • /
    • pp.281-296
    • /
    • 2020
  • Repeated outcomes from the same subjects are referred to as longitudinal data. Analysis of the data requires different methods unlike cross-sectional data analysis. It is important to model the covariance matrix because the correlation between the repeated outcomes must be considered when estimating the effects of covariates on the mean response. However, the modeling of the covariance matrix is tricky because there are many parameters to be estimated, and the estimated covariance matrix should be positive definite. In this paper, we consider analysis of multivariate longitudinal data via two modeling methodologies for the covariance matrix for multivariate longitudinal data. Both methods describe serial correlations of multivariate longitudinal outcomes using a modified Cholesky decomposition. However, the two methods consider different decompositions to explain the correlation between simultaneous responses. The first method uses enhanced linear covariance models so that the covariance matrix satisfies a positive definiteness condition; in addition, and principal component analysis and maximization-minimization algorithm (MM algorithm) were used to estimate model parameters. The second method considers variance-correlation decomposition and hypersphere decomposition to model covariance matrix. Simulations are used to compare the performance of the two methodologies.

Finite Element Analysis and Optimal Design of Shape Memory Composite Material Stents using Taguchi Method (다구찌 방법을 이용한 형상기억 복합재료 스텐트 유한요소 해석 및 최적설계)

  • Young Bin Kim;Suji Kim;Heechan Song;Heoung-Jae Chun
    • Composites Research
    • /
    • v.37 no.4
    • /
    • pp.301-309
    • /
    • 2024
  • Shape memory stents are used for treating vascular conditions like myocardial infarction, angina, and arteriosclerosis through their shape memory behavior. These stents are inserted into blood vessels to expand them, and their performance in terms of flexibility, elastic recovery, and deformation is influenced by their design. In this study, parameters affecting stent structural design were analyzed using Taguchi method, aiming to design structures that consider flexibility, elastic recovery, and deformation. Reflecting the actual conditions faced by stents, ISO standards were incorporated, and finite element analysis was conducted, considering shape memory composite material properties obtained from tensile tests, specifically hyperealstic properties. Ultimately, statistical significance of stent structural design was evaluated through ANOVA (Analysis of Variance), and an improved optimal design model compared to the existing one was proposed.

Probabilistic Load Analysis for Tailplane Considering Uncertainties in Design Variables (설계변수의 불확실성을 고려한 미익 하중의 확률론적 해석)

  • Choi, Yong-Joon;Kim, In-Gul;Lee, Seok-Je
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.13 no.6
    • /
    • pp.1043-1050
    • /
    • 2010
  • This paper examined the probabilistic load analysis for the tailplane during pitching maneuvering in the conceptual aircraft design phase. The flight load analysis based on the probabilistic distribution of design variables are compared with the results of the deterministic analysis. Two forms of variable distribution are used in this paper. One is standard normal distribution, the other distribution is calculated from the results of short-period longitudinal equation of aircraft motion. The influence of the distribution parameter on the probabilistic load analysis was investigated and the significant design variables that have an impact on the mean and variance of probabilistic load were identified. The comparison indicates that probabilistic load analysis provides more reliable probabilistic load distribution for the structural design than the traditional deterministic analysis.

Efficient strategy for the genetic analysis of related samples with a linear mixed model (선형혼합모형을 이용한 유전체 자료분석방안에 대한 연구)

  • Lim, Jeongmin;Sung, Joohon;Won, Sungho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1025-1038
    • /
    • 2014
  • Linear mixed model has often been utilized for genetic association analysis with family-based samples. The correlation matrix for family-based samples is constructed with kinship coefficient and assumes that parental phenotypes are independent and the amount of correlations between parent and offspring is same as that of correlations between siblings. However, for instance, there are positive correlations between parental heights, which indicates that the assumption for correlation matrix is often violated. The statistical validity and power are affected by the appropriateness of assumed variance covariance matrix, and in this thesis, we provide the linear mixed model with flexible variance covariance matrix. Our results show that the proposed method is usually more efficient than existing approaches, and its application to genome-wide association study of body mass index illustrates the practical value in real data analysis.

Effects of Health Behaviors on Perceived Physical and Psychological Job Stress Among Korean Manufacturing Workers (제조업 근로자의 건강행위와 직무로 인한 스트레스 자각증상의 관련성)

  • 박경옥;김인석;오영아
    • Korean Journal of Health Education and Promotion
    • /
    • v.21 no.3
    • /
    • pp.195-211
    • /
    • 2004
  • Stress is a primary health promotion issue in worksite research because psychological distress is closely related not only to workers' health status but also to their job performance. This study identified the significant health behaviors affecting workers' job-related stress in Korean manufacturing industry with the national survey data conducted by the Korean Occupational Safety and Health Agency in 2003. A total of 7,818 factory workers in 1,562 manufacturing companies participated in the Korean nation-wide occupational health survey and 3,390 workers answered that they had any stressors in their workplace among the 7,818 workers finally participated in the analysis. Participants were selected by the stratified proportional sampling process by manufacturing industry classification, company size, and company locations (8 metropolitan and 8 non-metropolitan regions) in Korea. Trained interviewers visited the target companies and interviewed the factory workers randomly selected in each company. Smoking, drinking, weight control, exercise, sleeping, break time at work, and perceived fatigue were included in the health behavior construct. Stress symptoms was consisted of physical and psychological stress with 8 items. All survey responses were anonymously coded into the SPSS statistical program and testified using stepwise multiple regression analysis. Male workers were 73.5% and the 30s were 40.0% among the age groups. The married and the high school graduate were majority with 52.1% and 61.8% each. Current smokers were 44.7% and More than 50% of the participants drank alcohol sometimes. No exercise group was 59.3% and the participants who dissatisfied with their daily sleeping hours were 43.5%. In t-test and analysis of variance, the significant general characteristics associated with physical and psychological job stress were young age (p<0.001), single marital status (p<0.001), and short working period at the present company (p<0.001). The health behaviors related to physical job stress were current smoking, weight change during the past one year (p<0.001), weight control effort (p<0.001), exercise (p<0.001), daily sleeping dissatisfaction (p<0.001), break time, and perceived fatigue (p<0.001). All 10 health behavior factors were significantly associated with psychological job stress (p<0.05). Weight change, weight control effort, exercise, daily sleeping dissatisfaction, little break at work, and high perceived fatigue were significant factors affecting job stress. Daily sleeping dissatisfaction, little break at work, little exercise, weight change for the past one year and young age were selected as the significant health behavior and general factors affecting physical job stress symptoms in stepwise multiple regression analysis. The five factors explained 18.9% of the physical stress score variance. Six factors were selected as the significant health behaviors affecting psychological job stress: daily sleeping dissatisfaction, little exercise, frequent drinking alcohol, high perceived fatigue, little break at work, and little weight control effort. The six factors explained 10.6% of the psychological stress score variance.

Lip Contour Detection by Multi-Threshold (다중 문턱치를 이용한 입술 윤곽 검출 방법)

  • Kim, Jeong Yeop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.12
    • /
    • pp.431-438
    • /
    • 2020
  • In this paper, the method to extract lip contour by multiple threshold is proposed. Spyridonos et. el. proposed a method to extract lip contour. First step is get Q image from transform of RGB into YIQ. Second step is to find lip corner points by change point detection and split Q image into upper and lower part by corner points. The candidate lip contour can be obtained by apply threshold to Q image. From the candidate contour, feature variance is calculated and the contour with maximum variance is adopted as final contour. The feature variance 'D' is based on the absolute difference near the contour points. The conventional method has 3 problems. The first one is related to lip corner point. Calculation of variance depends on much skin pixels and therefore the accuracy decreases and have effect on the split for Q image. Second, there is no analysis for color systems except YIQ. YIQ is a good however, other color systems such as HVS, CIELUV, YCrCb would be considered. Final problem is related to selection of optimal contour. In selection process, they used maximum of average feature variance for the pixels near the contour points. The maximum of variance causes reduction of extracted contour compared to ground contours. To solve the first problem, the proposed method excludes some of skin pixels and got 30% performance increase. For the second problem, HSV, CIELUV, YCrCb coordinate systems are tested and found there is no relation between the conventional method and dependency to color systems. For the final problem, maximum of total sum for the feature variance is adopted rather than the maximum of average feature variance and got 46% performance increase. By combine all the solutions, the proposed method gives 2 times in accuracy and stability than conventional method.