• Title/Summary/Keyword: Multivariate Statistical Analysis

Search Result 639, Processing Time 0.029 seconds

Multiple Testing in Genomic Sequences Using Hamming Distance

  • Kang, Moonsu
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.899-904
    • /
    • 2012
  • High-dimensional categorical data models with small sample sizes have not been used extensively in genomic sequences that involve count (or discrete) or purely qualitative responses. A basic task is to identify differentially expressed genes (or positions) among a number of genes. It requires an appropriate test statistics and a corresponding multiple testing procedure so that a multivariate analysis of variance should not be feasible. A family wise error rate(FWER) is not appropriate to test thousands of genes simultaneously in a multiple testing procedure. False discovery rate(FDR) is better than FWER in multiple testing problems. The data from the 2002-2003 SARS epidemic shows that a conventional FDR procedure and a proposed test statistic based on a pseudo-marginal approach with Hamming distance performs better.

Quality and Productivity Improvement by Clustering Product Database Information in Semiconductor Testing Floor

  • Lim, Ik-Sung;Koo, Il-Sup;Kim, Tae-Sung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.23 no.60
    • /
    • pp.73-81
    • /
    • 2000
  • The testing processes for VLSI finished devices are considerably complex because they require different types of ATE to be linked together. Due to the interaction effect between two or more linked ATEs, it is difficult to trace down the cause of the unexpected longer ATE setup time and random yields, which frequently occur in the VLSI circuit-testing laboratory. The goal of this paper is to develop and demonstrate the methodology designed to eliminate the possible interaction factors that might affect the random yields and/or unexpected longer setup time as well as increase the productivity. The statistical method such as design of experiment or multivariate analysis cannot be applied to the final testing floor here directly due to the environmental constraints. Expanded product data information (PDI) is constructed by combining product data information and ATE control information. An architecture utilizing expanded PDI is designed, which enables the engineer to conduct statistical approach investigation and reduce the setup time, as well as increase yield.

  • PDF

A class of accelerated sequential procedures with applications to estimation problems for some distributions useful in reliability theory

  • Joshi, Neeraj;Bapat, Sudeep R.;Shukla, Ashish Kumar
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.5
    • /
    • pp.563-582
    • /
    • 2021
  • This paper deals with developing a general class of accelerated sequential procedures and obtaining the associated second-order approximations for the expected sample size and 'regret' (difference between the risks of the proposed accelerated sequential procedure and the optimum fixed sample size procedure) function. We establish that the estimation problems based on various lifetime distributions can be tackled with the help of the proposed class of accelerated sequential procedures. Extensive simulation analysis is presented in support of the accuracy of our proposed methodology using the Pareto distribution and a real data set on carbon fibers is also analyzed to demonstrate the practical utility. We also provide the brief details of some other inferential problems which can be seen as the applications of the proposed class of accelerated sequential procedures.

A Kullback-Leibler divergence based comparison of approximate Bayesian estimations of ARMA models

  • Amin, Ayman A
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.471-486
    • /
    • 2022
  • Autoregressive moving average (ARMA) models involve nonlinearity in the model coefficients because of unobserved lagged errors, which complicates the likelihood function and makes the posterior density analytically intractable. In order to overcome this problem of posterior analysis, some approximation methods have been proposed in literature. In this paper we first review the main analytic approximations proposed to approximate the posterior density of ARMA models to be analytically tractable, which include Newbold, Zellner-Reynolds, and Broemeling-Shaarawy approximations. We then use the Kullback-Leibler divergence to study the relation between these three analytic approximations and to measure the distance between their derived approximate posteriors for ARMA models. In addition, we evaluate the impact of the approximate posteriors distance in Bayesian estimates of mean and precision of the model coefficients by generating a large number of Monte Carlo simulations from the approximate posteriors. Simulation study results show that the approximate posteriors of Newbold and Zellner-Reynolds are very close to each other, and their estimates have higher precision compared to those of Broemeling-Shaarawy approximation. Same results are obtained from the application to real-world time series datasets.

Discrimination model of cultivation area of Corni Fructus using a GC-MS-Based metabolomics approach (GC-MS 기반 대사체학 기법을 이용한 산수유의 산지판별모델)

  • Leem, Jae-Yoon
    • Analytical Science and Technology
    • /
    • v.29 no.1
    • /
    • pp.1-9
    • /
    • 2016
  • It is believed that traditional Korean medicines can be managed more scientifically through the development of logical criteria to verify their region of cultivation, and that this could contribute to the advancement of the traditional herbal medicine industry. This study attempted to determine such criteria for Sansuyu. The volatile compounds were obtained from 20 samples of domestic Corni fructus (Sansuyu) and 45 samples of Chinese Sansuyu by steam distillation. The metabolites were identified in the NIST Mass Spectral Library via the obtained gas chromatography/mass spectrometer (GC/MS) data of 53 training samples. Data binning at 0.2 min intervals was performed to normalize the number of variables used in the statistical analysis. Multivariate statistical analyses, such as principle component analysis (PCA), partial least squares-discriminant analysis (PLS-DA), and orthogonal partial least squares-discriminant analysis (OPLS-DA) were performed using the SIMCA-P software package. Significant variables with a variable importance in the projection (VIP) score higher than 1.0 were obtained from OPLS-DA, and variables that resulted in a p-value of less than 0.05 through one-way ANOVA were selected to verify the marker compounds. Finally, among the 11 variables extracted, 1-ethylbutyl-hydroperoxide (9.089 min), nonadecane (20.170 min), butylated hydroxytoluene (25.319 min), 5β,7βH,10α-eudesm-11-en-1α-ol (25.921 min), 7,9-bis(2-methyl-2-propanyl)-1-oxaspiro[4.5]deca-6,9-diene-2,8-dione (34.257 min), and 2-decyldodecyl-benzene (54.717 min) were selected as markers to indicate the origin of Sansuyu. The statistical model developed was suitable for the determination of the geographical origin of Sansuyu. The cultivation areas of four Korean and eight Chinese Sansuyu samples were predicted via the established OPLS-DA model, and it was confirmed that 11 of the 12 samples were accurately classified.

Predicting Unknown Composition of a Mixture Using Independent Component Analysis

  • Lee, Hye-Seon;Park, Hae-Sang;Jun, Chi-Hyuck
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2005.04a
    • /
    • pp.127-134
    • /
    • 2005
  • A suitable representation for the conceptual simplicity of the data in statistics and signal processing is essential for a subsequent analysis such as prediction, pattern recognition, and spatial analysis. Independent component analysis (ICA) is a statistical method for transforming an observed high-dimensional multivariate data into statistically independent components. ICA has been applied increasingly in wide fields of spectrum application since ICA is able to extract unknown components of a mixture from spectra. We focus on application of ICA for separating independent sources and predicting each composition using extracted components. The theory of ICA is introduced and an application to a metal surface spectra data will be described, where subsequent analysis using non-negative least square method is performed to predict composition ratio of each sample. Furthermore, some simulation experiments are performed to demonstrate the performance of the proposed approach.

  • PDF

The Building Strategies of Natural Park Integration Monitoring System Based on Geographic Information Analysis System

  • Bae, Min-Ki;Lee, Ju-Hee
    • Journal of Korean Society of Forest Science
    • /
    • v.95 no.5
    • /
    • pp.605-613
    • /
    • 2006
  • The goal of this study was to propose building strategies of web-based national park monitoring system (WNPMS) using geographic information analysis system. To accomplish this study, at first, this study selected and made integrated management indicators considering physical, ecological, and socio-psychological carrying capacity in national park. Secondly, this study built up an integrated management this system with statistical analysis program for execution of various multivariate analysis and spatial analysis. Finally, WNPMS could identify the relationship among visitors, natural resources, and recreation facilities in national park, and forecast the future management status of each national park in Korea. There results of this study will contribute to prevent the damage of natural resources and facilities, improve visitor's satisfaction, prevent an excess of carrying capacity at national park, and established tailored management strategies of each national park.

Loss of Expression of PTEN is Associated with Worse Prognosis in Patients with Cancer

  • Qiu, Zhi-Xin;Zhao, Shuang;Li, Lei;Li, Wei-Min
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.11
    • /
    • pp.4691-4698
    • /
    • 2015
  • Background: The tumor suppressor phosphatase and tensin homolog (PTEN) is an important negative regulator of cell-survival signaling. However, available results for the prognostic value of PTEN expression in patients with cancer remain controversial. Therefore, a meta-analysis of published studies investigating this issue was performed. Materials and Methods: A literature search via PubMed and EMBASE databases was conducted. Statistical analysis was performed by using the STATA 12.0 (STATA Corp., College, TX). Data from eligible studies were extracted and included into the meta-analysis using a random effects model. Results: A total of 3,810 patients from 27 studies were included in the meta-analysis, 22 investigating the relationship between PTEN expression and overall survival (OS) using univariate analysis, and nine with multivariate analysis. The pooled hazard ratio (HR) for OS was 1.64 (95% confidence interval (CI): 1.32-2.05) by univariate analysis and 1.56 (95% CI: 1.20-2.03) by multivariate analysis. In addition, eight papers including two disease-free-survival analyses (DFSs), four relapse-free-survival analyses (RFSs), three progression-free-survival analyses (PFSs) and one metastasis-free-survival analysis (MFS) reported the effect of PTEN on survival. The results showed that loss of PTEN expression was significant correlated with poor prognosis, with a combined HR of 1.74 (95% CI: 1.24-2.44). Furthermore, in the stratified analysis by the year of publication, ethnicity, cancer type, method, cut-off value, median follow-up time and neoadjuvant therapy in which the study was conducted, we found that the ethnicity, cancer type, method, median follow-up time and neoadjuvant therapy are associated with prognosis. Conclusions: Our study shows that negative or loss of expression of PTEN is associated with worse prognosis in patients with cancer. However, adequately designed prospective studies need to be performed for confirmation.

Service Quality assessment for Food & Beverage Product of Hotel (관광호텔 식음료상품 서비스품질 평가)

  • 김승희
    • Culinary science and hospitality research
    • /
    • v.5 no.2
    • /
    • pp.447-467
    • /
    • 1999
  • Most published work on product quality focuses on manufactured goods. The subject of service quality has received less attention. This distinction is important because some of the quality-improving strategies avaliable to manufacturers may be inappropriate for service firms. Services are performances, not objects. They are often produced in the presence of the customer, as in the cause of hotel restaurant services, quality occurs during service delivery, usually in an interaction between the customer and contact personnel of service firm. for this reason, service quality is highly dependent on the performance of employees, an organizational resource that cannot be controlled to the degree that components of tangible goods can be engineered. The study has begun as a basic study for customer satisfaction-oriented management in understanding the service quality of food & beverage products and through a systematic analysis of it. The major purpose of the study was to examine the relationship of the customer satisfaction and service quality in consideration of reliability, empathy, responsiveness, tangibility and assurance. An empirical research was conducted based on the previous theoretical studies. 286 customer at first class hotels in Seoul were selected as samples of this study. The time period of research was from February through March 1999, and answers were processed by SAS to yield frequency analysis, multivariate statistical analysis and regression analysis. The finding of the statistical treatment are frequencies, factor analysis, multiple regression analysis, path analysis. SERVQUAL method was used the service quality evaluation methods. After factor analysis, it was resulted to 3 factors. those were factor 1(assurance.empathy.responsiveness), factor 2(reliability), factor 3(tangibility). The findings of the statistical treatment are as follows. First, the attribute measurement of performance service quality was affected by customer satisfaction. Second, the attribute measurement of performance service qualify was affected by repurchase intention. Third, The attribute measurement of performance customer satisfaction was affected by repurchase intention. The result of study model was followed, service quality was affected repurchase intention than customer satisfaction. indirected effect through, service duality and customer satisfaction was affected repurchase intention.

  • PDF

Forecasting Korean housing price index: application of the independent component analysis (부동산 매매지수와 전세지수 예측: 독립성분분석을 활용한 분석)

  • Pak, Ro Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.2
    • /
    • pp.271-280
    • /
    • 2017
  • Real-estate values and related economics are often the first read newspaper category. We are concerned about the opinions of experts on the forecast for real estate prices. The Box-Jenkins ARIMA model is a commonly used statistical method to predict housing prices. In this article, we tried to predict housing prices by combining independent component analysis (ICA) in multivariate data analysis and the Box-Jenkins ARIMA model. The two independent components for both the selling price index and the long-term rental price index were extracted and used to predict the future values of both indices. In conclusion, it has been shown that the actual indices and the forecast indices using ICA are more comparable to the forecasts of the ARIMA model alone.