• 제목/요약/키워드: technique of statistics

검색결과 879건 처리시간 0.026초

Logistic Regression Method in Interval-Censored Data

  • Yun, Eun-Young;Kim, Jin-Mi;Ki, Choong-Rak
    • 응용통계연구
    • /
    • 제24권5호
    • /
    • pp.871-881
    • /
    • 2011
  • In this paper we propose a logistic regression method to estimate the survival function and the median survival time in interval-censored data. The proposed method is motivated by the data augmentation technique with no sacrifice in augmenting data. In addition, we develop a cross validation criterion to determine the size of data augmentation. We compare the proposed estimator with other existing methods such as the parametric method, the single point imputation method, and the nonparametric maximum likelihood estimator through extensive numerical studies to show that the proposed estimator performs better than others in the sense of the mean squared error. An illustrative example based on a real data set is given.

Supervised text data augmentation method for deep neural networks

  • Jaehwan Seol;Jieun Jung;Yeonseok Choi;Yong-Seok Choi
    • Communications for Statistical Applications and Methods
    • /
    • 제30권3호
    • /
    • pp.343-354
    • /
    • 2023
  • Recently, there have been many improvements in general language models using architectures such as GPT-3 proposed by Brown et al. (2020). Nevertheless, training complex models can hardly be done if the number of data is very small. Data augmentation that addressed this problem was more than normal success in image data. Image augmentation technology significantly improves model performance without any additional data or architectural changes (Perez and Wang, 2017). However, applying this technique to textual data has many challenges because the noise to be added is veiled. Thus, we have developed a novel method for performing data augmentation on text data. We divide the data into signals with positive or negative meaning and noise without them, and then perform data augmentation using k-doc augmentation to randomly combine signals and noises from all data to generate new data.

A Note on the Use of Peer Assessment to Improve Pupil's Performance

  • Lee, Kyung-Koo;Mun, Gil-Seong;Ahn, Jeong-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제19권2호
    • /
    • pp.443-450
    • /
    • 2008
  • Peer assessment is the process of assessment of students by other students and one form of innovative assessment. It actively involves students in the assessment process and is generally agreed that such involvement enhances the quality and effectiveness of the learning process, since assessing something and benchmarking process is a powerful aid to mastering it themselves. It is more effective on the hard courses for them to understand. In this article we present a peer assessment technique which was applied to students enrolled in a mathematical statistics course and a historical course. In order to measure the effectiveness of the technique, students had to evaluate their colleagues based on predefined criteria and a comparison is presented between the instructor assessments and the peer assessment.

  • PDF

유전 프로그래밍 기반 단기 기온 예보의 보정 기법 (Genetic Programming Based Compensation Technique for Short-range Temperature Prediction)

  • 현병용;현수환;이용희;서기성
    • 전기학회논문지
    • /
    • 제61권11호
    • /
    • pp.1682-1688
    • /
    • 2012
  • This paper introduces a GP(Genetic Programming) based robust technique for temperature compensation in short-range prediction. Development of an efficient MOS(Model Output Statistics) is necessary to correct systematic errors of the model, because forecast models do not reliably determine weather conditions. Most of MOS use a linear regression to compensate a prediction model, therefore it is hard to manage an irregular nature of prediction. In order to solve the problem, a nonlinear and symbolic regression method using GP is suggested. The purpose of this study is to evaluate the accuracy of the estimation by a GP based nonlinear MOS for 3 days temperatures in Korean regions. This method is then compared to the UM model and has shown superior results. The training period of 2007-2009 summer is used, and the data of 2010 summer is adopted for verification.

The Use of Generalized Gamma-Polynomial Approximation for Hazard Functions

  • Ha, Hyung-Tae
    • 응용통계연구
    • /
    • 제22권6호
    • /
    • pp.1345-1353
    • /
    • 2009
  • We introduce a simple methodology, so-called generalized gamma-polynomial approximation, based on moment-matching technique to approximate survival and hazard functions in the context of parametric survival analysis. We use the generalized gamma-polynomial approximation to approximate the density and distribution functions of convolutions and finite mixtures of random variables, from which the approximated survival and hazard functions are obtained. This technique provides very accurate approximation to the target functions, in addition to their being computationally efficient and easy to implement. In addition, the generalized gamma-polynomial approximations are very stable in middle range of the target distributions, whereas saddlepoint approximations are often unstable in a neighborhood of the mean.

EWM 통계량을 이용한 스테가노그래픽 자료 감지 기법 (Detecting Steganographic Contents Using EWM Statistics)

  • 지선수
    • 한국산업정보학회논문지
    • /
    • 제13권3호
    • /
    • pp.54-62
    • /
    • 2008
  • 가장 일반적이고 효과적으로 사용하는 혼합된 정보은닉 기술인 스테가노그래피에서 의사소통의 존재를 숨기면서 송수신하는 자료은닉 기술과 관련된 통계적 기법을 적용하는 연구가 필요하다. 즉, 인터넷상에 존재하는 임의의 원본이미지에 비밀(은닉) 메시지가 포함된 변조된 혼합이미지를 가장 효과적으로 관리하고 찾아내는 감지 기법의 연구가 필요하다. 이 논문에서 원본이미지에 비밀자료를 숨기기 위한 스테가노그래피에 RGB, DCT 및 EWM 통계기법을 이용하여 은닉자료를 감지하고 그 위치를 찾는 기법을 확인한다. 그리고 카이스퀘어 검정법을 이용하는 기존의 방법과 비교한다.

  • PDF

DCT 영역에서 블록 기반의 통계적 분석을 이용한 강인한 워터마킹 (Robust Watermarking Using a Block-based Statistical Analysis in DCT Domain)

  • 임현;김귀현;박순영;방만원
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2001년도 제14회 신호처리 합동 학술대회 논문집
    • /
    • pp.657-660
    • /
    • 2001
  • In this paper, a robust watermarking technique is presented by using a block-based statistics in DCT domain. First, the proposed technique calculates JND threshold value using the global statistics in DCT domain. Then watermark insertion is carried out by inserting one watermark into coefficients which are above the threshold value J within a 2${\times}$2 block. Finally, watermark is estimated by averaging the extracted watermarks from the coefficients which are above the threshold in a window. In experiments it is shown that the proposed techniques can enhance perceptual invisibility and robustness against additive noise and JPEG compression attacks by using the characteristics of JND.

  • PDF

진화프로그래밍을 이용한 이상 유전자 분류 방법 제안 (Suggestion Method of Classific System of Abnormal Genetic using EP)

  • 김영지;배상현
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2008년도 춘계종합학술대회 A
    • /
    • pp.776-779
    • /
    • 2008
  • DNA 기술의 발달로 얻어진 대량의 유전자 정보를 손쉽게 이상 값을 가진 유전자의 정확한 분류와 진단을 할 수 있는 방법인 Microarray 기술에 대한 기대가 커지고 있다. 정확한 분류를 하기 위해서는 추출된 유전자에 들어 있는 많은 잡음 즉 이상 값을 가진 유전자만을 추출할 필요가 있다. 따라서 본 논문에서는 세 가지 dataset에 대해 기존 연구방법의 여러 가지 유전자 추출 방법을 조사하고 Matlab으로 구현한 진화프로그램을 이용하여 새로운 데이터의 분류방법과 모델링 방법을 제안한다.

  • PDF

희귀모형의 선형성에 대한 커널붓스트랩검정 (A Bootstrap Test for Linear Relationship by Kernel Smoothing)

  • 백장선;김민수
    • Journal of the Korean Data and Information Science Society
    • /
    • 제9권2호
    • /
    • pp.95-103
    • /
    • 1998
  • 회귀모형의 선형성을 검정하는 방법으로서 Azzalini와 Bowman은 회귀모형의 오차항이 정규분포를 따른다는 가정하에서 커널회귀추정량을 이용한 유사우도비 검정이라는 비모수적 방법을 제안하였다. 붓스트랩(bootstrap)기법을 도입하여 그들의 검정방법을 변형한 커널붓스트랩검정이라는 새로운 검정법을 제시하고 모의실험을 통해 검정력을 살펴보았다. 제안된 방법은 오차항의 분포가 정규분포가 아닌 경우에도 적용이 가능하였다.

  • PDF

DISTRIBUTIONS OF PATTERNS OF TWO FAILURES SEPARATED BY SUCCESS RUNS OF LENGTH $\textit{k}$

  • Sen, Kanwar;Goyal, Babita
    • Journal of the Korean Statistical Society
    • /
    • 제33권1호
    • /
    • pp.35-58
    • /
    • 2004
  • For fixed positive integers and $\textit{k}\;(n\;{\geq}\;{\textit{k}}\;+\;2)$, the exact probability distributions of non-overlapping and overlapping patterns of two failures separated by (i) exactly $textsc{k}$ successes, (ii) at least $\textit{k}$ successes and (iii) at most $\textit{k}$ successes have been obtained for Bernoulli independent and Markov dependent trials by using combinatorial technique. The waiting time distributions for the first occurrence and the $r^{th}$ (r > 1) occurrence of the patterns have also been obtained.