• Title/Summary/Keyword: Statistical data

Search Result 15,004, Processing Time 0.043 seconds

Statistical Issues in Genomic Cohort Studies (유전체 코호트 연구의 주요 통계학적 과제)

  • Park, So-Hee
    • Journal of Preventive Medicine and Public Health
    • /
    • v.40 no.2
    • /
    • pp.108-113
    • /
    • 2007
  • When conducting large-scale cohort studies, numerous statistical issues arise from the range of study design, data collection, data analysis and interpretation. In genomic cohort studies, these statistical problems become more complicated, which need to be carefully dealt with. Rapid technical advances in genomic studies produce enormous amount of data to be analyzed and traditional statistical methods are no longer sufficient to handle these data. In this paper, we reviewed several important statistical issues that occur frequently in large-scale genomic cohort studies, including measurement error and its relevant correction methods, cost-efficient design strategy for main cohort and validation studies, inflated Type I error, gene-gene and gene-environment interaction and time-varying hazard ratios. It is very important to employ appropriate statistical methods in order to make the best use of valuable cohort data and produce valid and reliable study results.

Statistical Methods for Gene Expression Data

  • Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.1
    • /
    • pp.59-77
    • /
    • 2004
  • Since the introduction of DNA microarray, a revolutionary high through-put biological technology, a lot of papers have been published to deal with the analyses of the gene expression data from the microarray. In this paper we review most papers relevant to the cDNA microarray data, classify them in statistical methods' point of view, and present some statistical methods deserving consideration and future study.

REGRESSION WITH CENSORED DATA BY LEAST SQUARES SUPPORT VECTOR MACHINE

  • Kim, Dae-Hak;Shim, Joo-Yong;Oh, Kwang-Sik
    • Journal of the Korean Statistical Society
    • /
    • v.33 no.1
    • /
    • pp.25-34
    • /
    • 2004
  • In this paper we propose a prediction method on the regression model with randomly censored observations of the training data set. The least squares support vector machine regression is applied for the regression function prediction by incorporating the weights assessed upon each observation in the optimization problem. Numerical examples are given to show the performance of the proposed prediction method.

An Introduction to Data Analysis (자료 분석의 기초)

  • Pak, Son-Il;Lee, Young-Won
    • Journal of Veterinary Clinics
    • /
    • v.26 no.3
    • /
    • pp.189-199
    • /
    • 2009
  • With the growing importance of evidence-based medicine, clinical or biomedical research relies critically on the validity and reliability of data, and the subsequent statistical inferences for medical decision-making may lead to valid conclusion. Despite widespread use of analytical techniques in papers published in the Journal of Veterinary Clinics statistical errors particularly in design of experiments, research methodology or data analysis methods are commonly encountered. These flaws often leading to misinterpretation of the data, thereby, subjected to inappropriate conclusions. This article is the first in a series of nontechnical introduction designed not to systemic review of medical statistics but intended to provide the journal readers with an understanding of common statistical concepts, including data scale, selection of appropriate statistical methods, descriptive statistics, data transformation, confidence interval, the principles of hypothesis testing, sampling distribution, and interpretation of results.

Application of data mining and statistical measurement of agricultural high-quality development

  • Yan Zhou
    • Advances in nano research
    • /
    • v.14 no.3
    • /
    • pp.225-234
    • /
    • 2023
  • In this study, we aim to use big data resources and statistical analysis to obtain a reliable instruction to reach high-quality and high yield agricultural yields. In this regard, soil type data, raining and temperature data as well as wheat production in each year are collected for a specific region. Using statistical methodology, the acquired data was cleaned to remove incomplete and defective data. Afterwards, using several classification methods in machine learning we tried to distinguish between different factors and their influence on the final crop yields. Comparing the proposed models' prediction using statistical quantities correlation factor and mean squared error between predicted values of the crop yield and actual values the efficacy of machine learning methods is discussed. The results of the analysis show high accuracy of machine learning methods in the prediction of the crop yields. Moreover, it is indicated that the random forest (RF) classification approach provides best results among other classification methods utilized in this study.

Optimization Methodology Integrated Data Mining and Statistical Method (데이터 마이닝과 통계적 기법을 통합한 최적화 기법)

  • Song, Suh-Ill;Shin, Sang-Mun;Jung, Hey-Jin
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.4
    • /
    • pp.33-39
    • /
    • 2006
  • These days manufacture technology and manufacture environment are changing rapidly. By development of computer and enlargement of technique, most of manufacture field are computerized. In order to win international competition, it is important for companies how fast get the useful information from vast data. Statistical process control(SPC) techniques have been used as a problem solution tool at manufacturing process until present. However, these statistical methods are not applied more extensively because it has much restrictions in realistic problems. These statistical techniques have lots of problems when much data and factors are analyzed. In this paper, we proposed more practical and efficient a new statistical design technique which integrated data mining (DM) and statistical methods as alternative of problems. First step is selecting significant factor using DM feature selection algorithm from data of manufacturing process including many factors. Second step is finding optimum of process after estimating response function through response surface methodology(RSM) that is a statistical techniques

Improved Statistical Language Model for Context-sensitive Spelling Error Candidates (문맥의존 철자오류 후보 생성을 위한 통계적 언어모형 개선)

  • Lee, Jung-Hun;Kim, Minho;Kwon, Hyuk-Chul
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.2
    • /
    • pp.371-381
    • /
    • 2017
  • The performance of the statistical context-sensitive spelling error correction depends on the quality and quantity of the data for statistical language model. In general, the size and quality of data in a statistical language model are proportional. However, as the amount of data increases, the processing speed becomes slower and storage space also takes up a lot. We suggest the improved statistical language model to solve this problem. And we propose an effective spelling error candidate generation method based on a new statistical language model. The proposed statistical model and the correction method based on it improve the performance of the spelling error correction and processing speed.

Statistical micro matching using a multinomial logistic regression model for categorical data

  • Kim, Kangmin;Park, Mingue
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.5
    • /
    • pp.507-517
    • /
    • 2019
  • Statistical matching is a method of combining multiple sources of data that are extracted or surveyed from the same population. It can be used in situation when variables of interest are not jointly observed. It is a low-cost way to expect high-effects in terms of being able to create synthetic data using existing sources. In this paper, we propose the several statistical micro matching methods using a multinomial logistic regression model when all variables of interest are categorical or categorized ones, which is common in sample survey. Under conditional independence assumption (CIA), a mixed statistical matching method, which is useful when auxiliary information is not available, is proposed. We also propose a statistical matching method with auxiliary information that reduces the bias of the conventional matching methods suggested under CIA. Through a simulation study, proposed micro matching methods and conventional ones are compared. Simulation study shows that suggested matching methods outperform the existing ones especially when CIA does not hold.

Statistical Estimation of Specified Concrete Strength by Applying Non-Destructive Test Data (비파괴시험 자료를 적용한 콘크리트 기준강도의 통계적 추정)

  • Paik, Inyeol
    • Journal of the Korean Society of Safety
    • /
    • v.30 no.1
    • /
    • pp.52-59
    • /
    • 2015
  • The aim of the paper is to introduce the statistical definition of the specified compressive strength of the concrete to be used for safety evaluation of the existing structure in domestic practice and to present the practical method to obtain the specified strength by utilizing the non-destructive test data as well as the limited number of core test data. The statistical definition of the specified compressive strength of concrete in the design codes is reviewed and the consistent formulations to statistically estimate the specified strength for assessment are described. In order to prevent estimating an unrealistically small value of the specified strength due to limited number of data, it is proposed that the information from the non-destructive test data is combined to that of the minimum core test data. The the sample mean, standard deviation and total number of concrete test are obtained from combined test data. The proposed procedures are applied to an example test data composed of the artificial numerical values and the actual evaluation data collected from the bridge assessment reports. The calculation results show that the proposed statistical estimation procedures yield reasonable values of the specified strength for assessment by applying the non-destructive test data in addition to the limited number of core test data.

R programming: Language and Environment for Statistical Computing and Data Visualization (R 프로그래밍: 통계 계산과 데이터 시각화를 위한 환경)

  • Lee, D.H.;Ren, Ye
    • Electronics and Telecommunications Trends
    • /
    • v.28 no.1
    • /
    • pp.42-51
    • /
    • 2013
  • The R language is an open source programming language and a software environment for statistical computing and data visualization. The R language is widely used among a lot of statisticians and data scientists to develop statistical software and data analysis. The R language provides a variety of statistical and graphical techniques, including basic descriptive statistics, linear or nonlinear modeling, conventional or advanced statistical tests, time series analysis, clustering, simulation, and others. In this paper, we first introduce the R language and investigate its features as a data analytics tool. As results, we may explore the application possibility of the R language in the field of data analytics.

  • PDF