• 제목/요약/키워드: Statistical Analysis Data

검색결과 9,183건 처리시간 0.04초

Quantitative Linguistic Analysis on Literary Works

  • Choi, Kyung-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제18권4호
    • /
    • pp.1057-1064
    • /
    • 2007
  • From the view of natural language process, quantitative linguistic analysis is a linguistic study relying on statistical methods, and is a mathematical linguistics in an attempt to discover various linguistic characters by interpreting linguistic facts quantitatively through statistical methods. In this study, I would like to introduce a quantitative linguistic analysis method utilizing a computer and statistical methods on literary works. I also try to introduce a use of SynKDP, a synthesized Korean data process, and show the relations between distribution of linguistic unit elements which are used by the hero in a novel #Sassinamjunggi# and theme analysis on literary works.

  • PDF

TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data

  • Lim, Jae Hyun;Lee, Soo Youn;Kim, Ju Han
    • Genomics & Informatics
    • /
    • 제15권1호
    • /
    • pp.51-53
    • /
    • 2017
  • High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

Review of statistical methods for survival analysis using genomic data

  • Lee, Seungyeoun;Lim, Heeju
    • Genomics & Informatics
    • /
    • 제17권4호
    • /
    • pp.41.1-41.12
    • /
    • 2019
  • Survival analysis mainly deals with the time to event, including death, onset of disease, and bankruptcy. The common characteristic of survival analysis is that it contains "censored" data, in which the time to event cannot be completely observed, but instead represents the lower bound of the time to event. Only the occurrence of either time to event or censoring time is observed. Many traditional statistical methods have been effectively used for analyzing survival data with censored observations. However, with the development of high-throughput technologies for producing "omics" data, more advanced statistical methods, such as regularization, should be required to construct the predictive survival model with high-dimensional genomic data. Furthermore, machine learning approaches have been adapted for survival analysis, to fit nonlinear and complex interaction effects between predictors, and achieve more accurate prediction of individual survival probability. Presently, since most clinicians and medical researchers can easily assess statistical programs for analyzing survival data, a review article is helpful for understanding statistical methods used in survival analysis. We review traditional survival methods and regularization methods, with various penalty functions, for the analysis of high-dimensional genomics, and describe machine learning techniques that have been adapted to survival analysis.

통계패키지와 Active Server Page를 이용한 통계 분석 웹 컨텐츠 개발 (Development of Web Contents for Statistical Analysis Using Statistical Package and Active Server Page)

  • 강태구;이재관;김미아;박찬근;허태영
    • 한국산업정보학회논문지
    • /
    • 제15권1호
    • /
    • pp.109-114
    • /
    • 2010
  • 본 논문에서는 통계패키지와 Active Server Page(ASP)를 이용하여 통계분석을 위한 웹 컨텐츠를 개발하였다. 통계패키지는 통계비전공자에게 사용하기도 어렵고 배우기도 매우 어렵지만, 통계비전공자들은 SAS, S-plus, R 등과 같은 통계패키지에 대한 학습 없이 자료를 분석하기를 원하고 있다. 따라서 본 연구에서는 통계패키지로 많이 활용되고 있는 S-plus와 ASP를 이용하여 통계분석 웹 컨텐츠를 개발하였다. 실제 응용으로, 수질오염자료에 대하여 웹 상에서 탐색적 자료 분석, 분산분석, 시계열 분석 등과 같은 다양한 분석에 대한 웹 컨텐츠를 개발하였다. 개발된 웹 통계분석은 공무원, 연구원 등과 같은 통계 비전문가들에게 매우 유용한 도구이다. 결과적으로 웹 기반의 통계분석 컨텐츠를 통하여 인터넷으로 하여금 사용자들로 하여금 자료 분석을 쉽게 빠르게 할 수 있다.

Iterative integrated imputation for missing data and pathway models with applications to breast cancer subtypes

  • Linder, Henry;Zhang, Yuping
    • Communications for Statistical Applications and Methods
    • /
    • 제26권4호
    • /
    • pp.411-430
    • /
    • 2019
  • Tumor development is driven by complex combinations of biological elements. Recent advances suggest that molecularly distinct subtypes of breast cancers may respond differently to pathway-targeted therapies. Thus, it is important to dissect pathway disturbances by integrating multiple molecular profiles, such as genetic, genomic and epigenomic data. However, missing data are often present in the -omic profiles of interest. Motivated by genomic data integration and imputation, we present a new statistical framework for pathway significance analysis. Specifically, we develop a new strategy for imputation of missing data in large-scale genomic studies, which adapts low-rank, structured matrix completion. Our iterative strategy enables us to impute missing data in complex configurations across multiple data platforms. In turn, we perform large-scale pathway analysis integrating gene expression, copy number, and methylation data. The advantages of the proposed statistical framework are demonstrated through simulations and real applications to breast cancer subtypes. We demonstrate superior power to identify pathway disturbances, compared with other imputation strategies. We also identify differential pathway activity across different breast tumor subtypes.

Robustness, Data Analysis, and Statistical Modeling: The First 50 Years and Beyond

  • Barrios, Erniel B.
    • Communications for Statistical Applications and Methods
    • /
    • 제22권6호
    • /
    • pp.543-556
    • /
    • 2015
  • We present a survey of contributions that defined the nature and extent of robust statistics for the last 50 years. From the pioneering work of Tukey, Huber, and Hampel that focused on robust location parameter estimation, we presented various generalizations of these estimation procedures that cover a wide variety of models and data analysis methods. Among these extensions, we present linear models, clustered and dependent observations, times series data, binary and discrete data, models for spatial data, nonparametric methods, and forward search methods for outliers. We also present the current interest in robust statistics and conclude with suggestions on the possible future direction of this area for statistical science.

A Study for the Features of Data Analysis Methods Used in Medical Research

  • 신재경;장덕준;문승호
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권2호
    • /
    • pp.257-264
    • /
    • 2003
  • The perception of the importance of statistical methods for processing medical data in Korea's medical research and the practical use of the analysis method are insufficient. From this standpoint, in order to examine the features of the data analysis method used in the medical journals of Korea and America, we have examined the research papers which has been published in the exemplary medical journals of both countries. It showed that there was a large difference in the quantity and quality between Korea and America. Especially in the medical research of Korea, we could notice that the use of statistical methods were comparatively low. Hence the researchers in the medical area are encouraged to use more statistical methods in processing medical data.

  • PDF

한국한의학연구원 논문집에 사용된 통계기법의 평가 (An Evaluation of the Statistical Techniques Used in the 1995-2007 Editions of the Korea Institute of Oriental Medicine)

  • 강경원;강병갑;고미미;신선화;최선미
    • 한국한의학연구원논문집
    • /
    • 제13권2호통권20호
    • /
    • pp.121-125
    • /
    • 2007
  • Background and Purpose : The purpose of this study was done to investigate what kinds of statistical techniques have been used to analyze data from oriental medicine research Methods : 135 original articles which used statistical techniques in their data analysis were selected from the articles published in The Journal of Korea Institute of Oriental Medicine(JKIOM) between 1995 to 2007. Results : Among 135 articles, 59 articles used descriptive statistics while 76 articles used inferential statistics for data analysis. For that 76 articles, two-sample t-test(33 articles), analysis of variance(29 articles), regression(9 articles), chi-square test(5 articles), nonparametic test(4 articles), Fisher's exact test(3 articles), and other test(9 articles) were chosen to analyze the data. SAS and SPSS statistical softwares(82.50%) were mostly used to analyze the data. Nonparametic tests were used to 4 articles(6.97%) of 67 articles and parametic tests were used to 63 articles(93.03%) of 67 articles. Among 29 articles used analysis of variance, duncan(8 articles), dunnet(4 articles), bonferroni(4 articles), turkey(3 articles), scheff(1 article) were used to do multiple comparison. 9 articles did not carry out the multiple comparison. Conclusions : It was found that the frequencies of statistical package used and statistical analysis used were not much by now. High level statistical analyses were not used most for oriental medicine research.

  • PDF

A Statistical Analysis of Professional Baseball Team Data: The Case of the Lotte Giants

  • Cho, Young-Seuk;Han, Jun-Tae;Park, Chan-Keun;Heo, Tae-Young
    • 응용통계연구
    • /
    • 제23권6호
    • /
    • pp.1191-1199
    • /
    • 2010
  • Knowing what factors into a player's ability to affect the outcome of a sports game is crucial. This knowledge helps determine the relative degree of contribution by each team member as well as sets appropriate annual salaries. This study uses statistical analysis to investigate how much the outcome of a professional baseball game is influenced by the records of individual players. We used the Lotte Giants' data on 252 games played between 2007 and 2008 that included environmental data(home or away games and opponents) as well as pitchers' and batters' data. Using a SAS Enterprise Miner, we performed a logistic regression analysis and decision tree analysis on the data. The results obtained through the two analytic methods are compared and discussed.

Classification via principal differential analysis

  • Jang, Eunseong;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • 제28권2호
    • /
    • pp.135-150
    • /
    • 2021
  • We propose principal differential analysis based classification methods. Computations of squared multiple correlation function (RSQ) and principal differential analysis (PDA) scores are reviewed; in addition, we combine principal differential analysis results with the logistic regression for binary classification. In the numerical study, we compare the principal differential analysis based classification methods with functional principal component analysis based classification. Various scenarios are considered in a simulation study, and principal differential analysis based classification methods classify the functional data well. Gene expression data is considered for real data analysis. We observe that the PDA score based method also performs well.