• Title/Summary/Keyword: R 통계패키지

Search Result 55, Processing Time 0.019 seconds

An Application of R Commander on Probability and Statistics Education in Middle and High School Mathematics (중.고등학교 확률과 통계영역 교육에서의 R Commander의 활용)

  • Jang, Dae-Heung
    • Communications of Mathematical Education
    • /
    • v.21 no.3
    • /
    • pp.541-557
    • /
    • 2007
  • Jang(2007a, b) described the overall explanation about R statistical package and application on probability and statistics education. With referring the contents of the 7th national mathematics curriculum, we suggest the plan for applications of R Commander on probability and statistics education in middle and high school mathematics.

  • PDF

Independence tests using coin package in R (coin 패키지를 이용한 독립성 검정)

  • Kim, Jinheum;Lee, Jung-Dong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1039-1055
    • /
    • 2014
  • The distribution of a test statistic under a null hypothesis depends on the unknown distribution of the data and thus is unknown as well. Conditional tests replace the unknown null distribution by the conditional null distribution, that is, the distribution of the test statistic given the observed data. This approach is known as permutation tests and was developed by Fisher (Fisher, 1935). Theoretical framework for permutation tests was given by Strasser and Weber(1999). The coin package developed by Hothon et al. (2006, 2008) implements a unified approach for conditional inference via the generic independence test. Because convenient functions for the most prominent problems are available, users will not have to use the extremely flexible procedure. In this article we briefly review the underlying theory from Strasser and Weber (1999) and explain how to transform the data to perform the generic function independence test. Finally it was illustrated with a few real data sets.

The Use of a Biplot in Studying the Career Maturity of College Freshmen (행렬도를 이용한 대학 신입생의 진로의식 분석)

  • Choi, Hye-Mi;Park, Chan-Yong;Lee, Sang-Hyeop;Chung, Sung-Suk
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.933-941
    • /
    • 2010
  • Biplot is a modern graphical methodology allowing for the projection of high-dimensional data to a low-dimensional subspace that is rich in information on variation in the data, correlation among variables as well as class separation. For the construction of biplots, we use a BiplotGUI package in a free statistical software R with increasing popularity. Moreover, using data from questionnaires given to Chonbuk National University freshmen in 2009, the relationship between career goals and career maturity are studied by applying the biplot method.

A study on high dimensional large-scale data visualization (고차원 대용량 자료의 시각화에 대한 고찰)

  • Lee, Eun-Kyung;Hwang, Nayoung;Lee, Yoondong
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1061-1075
    • /
    • 2016
  • In this paper, we discuss various methods to visualize high dimensional large-scale data and review some issues associated with visualizing this type of data. High-dimensional data can be presented in a 2-dimensional space with a few selected important variables. We can visualize more variables with various aesthetic attributes in graphics or use the projection pursuit method to find an interesting low-dimensional view. For large-scale data, we discuss jittering and alpha blending methods that solve any problem with overlapping points. We also review the R package tabplot, scagnostics, and other R packages for interactive web application with visualization.

Rhipe Platform for Big Data Processing and Analysis (빅데이터 처리 및 분석을 위한 Rhipe 플랫폼)

  • Jung, Byung Ho;Shin, Ji Eun;Lim, Dong Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.7
    • /
    • pp.1171-1185
    • /
    • 2014
  • Rhipe that integrates R and Hadoop environment, made it possible to process and analyze massive amounts of data using a distributed processing environment. In this paper, we implemented multiple regression analysis using Rhipe with various data sizes of actual data and simulated data. Experimental results for comparing the computing speeds of pseudo-distributed and fully-distributed modes for configuring Hadoop cluster, showed fully-distributed mode was more fast than pseudo-distributed mode and computing speeds of fully-distributed mode were faster as the number of data nodes increases. We also compared the performance of our Rhipe with stats and biglm packages available on bigmemory. The results showed that our Rhipe was more fast than other packages owing to paralleling processing with increasing the number of map tasks as the size of data increases.

Application of functional ANOVA and functional MANOVA (단변량 및 다변량 함수 데이터에 대한 분산분석의 활용)

  • Kim, Mijeong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.579-591
    • /
    • 2022
  • Functional data is collected in various fields. It is often necessary to test whether there are differences among groups of functional data. In this case, it is not appropriate to explain using the point-wise ANOVA method, and we should present not the point-wise result but the integrated result. Various studies on functional data analysis of variance have been proposed, and recently implemented those methods in the package fdANOVA of R. In this paper, I first explain ANOVA and multivariate ANOVA, then I will introduce various methods of analysis of variance for univariate and multivariate functional data recently proposed. I also describe how to use the R package fdANOVA. This package is used to test equality of weekly temperatures in Seoul and Busan through univariate functional data ANOVA, and to test equality of multivariate functional data corresponding to handwritten images using multivariate function data ANOVA.

Simulation Modeling of Profit Optimization and Output Analysis using R (R을 활용한 이윤 최적화 시뮬레이션 모델링 및 결과 분석)

  • Cho, Min-Ho;Jeon, Yong-Ho
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.9 no.8
    • /
    • pp.883-888
    • /
    • 2014
  • Simulation is now using in various area as an effective decision analysis tool in complex environment of today. But, There is a focus to the simulation model development and execution better than result analysis. This article will emphasis to the importance of result analysis apart from model development in simulation, and will use R package for profit optimization simulation. R has a various function in statistic analysis and data manipulation, graphic display. So this research can show the value of R as a tool for simulation.

Implementation of R-language-based REST API and Solution for Security Issues (R 언어 기반의 REST API 구현 및 보안문제의 해결 방안)

  • Kang, DongHoon;Oh, Sejong
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.9 no.1
    • /
    • pp.387-394
    • /
    • 2019
  • Recently, the importance of big data has been increased, and demand for data analysis for the big data is also increased. R language is developed for data analysis, and users are analyzing data by using algorithms of various statistics, machine learning and data mining packages in R language. However, it is difficult to develop an application using R. Early study proposed a method to call R script through another language such as PHP, Java, and so on. However, it is troublesome to write such a development method in addition to R in combination with other languages. In this study, we introduce how to write API using only R language without using another language by using Plumber package. We also propose a solution for security issues related with R API. If we use propose technology for developing web application, we can expect high productivity, easy of use, and easy of maintenance.

A Tutorial on Covariance-based Structural Equation Modeling using R: focused on "lavaan" Package (R을 이용한 공분산 기반 구조방정식 모델링 튜토리얼: Lavaan 패키지를 중심으로)

  • Yoon, Cheol-Ho;Choi, Kwang-Don
    • Journal of Digital Convergence
    • /
    • v.13 no.10
    • /
    • pp.121-133
    • /
    • 2015
  • This tutorial presents an approach to perform the covariance based structural equation modeling using the R. For this purpose, the tutorial defines the criteria for the covariance based structural equation modeling by reviewing previous studies, and shows how to analyze the research model with an example using the "lavaan" which is the R package supporting the covariance based structural equation modeling. In this tutorial, a covariance-based structural equation modeling technique using the R and the R scripts targeting the example model were proposed as the results. This tutorial will be useful to start the study of the covariance based structural equation modeling for the researchers who first encounter the covariance based structural equation modeling and will provide the knowledge base for in-depth analysis through the covariance based structural equation modeling technique using R which is the integrated statistical software operating environment for the researchers familiar with the covariance based structural equation modeling.

Variable Selection in Frailty Models using FrailtyHL R Package: Breast Cancer Survival Data (frailtyHL 통계패키지를 이용한 프레일티 모형의 변수선택: 유방암 생존자료)

  • Kim, Bohyeon;Ha, Il Do;Noh, Maengseok;Na, Myung Hwan;Song, Ho-Chun;Kim, Jahae
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.965-976
    • /
    • 2015
  • Determining relevant variables for a regression model is important in regression analysis. Recently, a variable selection methods using a penalized likelihood with various penalty functions (e.g. LASSO and SCAD) have been widely studied in simple statistical models such as linear models and generalized linear models. The advantage of these methods is that they select important variables and estimate regression coefficients, simultaneously; therefore, they delete insignificant variables by estimating their coefficients as zero. We study how to select proper variables based on penalized hierarchical likelihood (HL) in semi-parametric frailty models that allow three penalty functions, LASSO, SCAD and HL. For the variable selection we develop a new function in the "frailtyHL" R package. Our methods are illustrated with breast cancer survival data from the Medical Center at Chonnam National University in Korea. We compare the results from three variable-selection methods and discuss advantages and disadvantages.