• Title/Summary/Keyword: Statistics technique

Search Result 874, Processing Time 0.024 seconds

Supervised text data augmentation method for deep neural networks

  • Jaehwan Seol;Jieun Jung;Yeonseok Choi;Yong-Seok Choi
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.343-354
    • /
    • 2023
  • Recently, there have been many improvements in general language models using architectures such as GPT-3 proposed by Brown et al. (2020). Nevertheless, training complex models can hardly be done if the number of data is very small. Data augmentation that addressed this problem was more than normal success in image data. Image augmentation technology significantly improves model performance without any additional data or architectural changes (Perez and Wang, 2017). However, applying this technique to textual data has many challenges because the noise to be added is veiled. Thus, we have developed a novel method for performing data augmentation on text data. We divide the data into signals with positive or negative meaning and noise without them, and then perform data augmentation using k-doc augmentation to randomly combine signals and noises from all data to generate new data.

Suggestion Method of Classific System of Abnormal Genetic using EP (진화프로그래밍을 이용한 이상 유전자 분류 방법 제안)

  • Kim, Young-Gie;Bae, Sang-Hyun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.05a
    • /
    • pp.776-779
    • /
    • 2008
  • It is expect that Microarray technique be direct classification and diagnosis of Genetic data have abnomal data value because DNA technique. It is necessary that many noses that is abnomal data in sampling genetic data. So in this paper reported sampling method in exiting study then suggests new data classific system and modeling method using EP by Matlab about three dataset.

  • PDF

A Note on the Use of Peer Assessment to Improve Pupil's Performance

  • Lee, Kyung-Koo;Mun, Gil-Seong;Ahn, Jeong-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.2
    • /
    • pp.443-450
    • /
    • 2008
  • Peer assessment is the process of assessment of students by other students and one form of innovative assessment. It actively involves students in the assessment process and is generally agreed that such involvement enhances the quality and effectiveness of the learning process, since assessing something and benchmarking process is a powerful aid to mastering it themselves. It is more effective on the hard courses for them to understand. In this article we present a peer assessment technique which was applied to students enrolled in a mathematical statistics course and a historical course. In order to measure the effectiveness of the technique, students had to evaluate their colleagues based on predefined criteria and a comparison is presented between the instructor assessments and the peer assessment.

  • PDF

Robust Watermarking Using a Block-based Statistical Analysis in DCT Domain (DCT 영역에서 블록 기반의 통계적 분석을 이용한 강인한 워터마킹)

  • Lim, Hyun;Kim, Gui-Hyun;Park, Soon-Young;Bang, Man-Won
    • Proceedings of the IEEK Conference
    • /
    • 2001.09a
    • /
    • pp.657-660
    • /
    • 2001
  • In this paper, a robust watermarking technique is presented by using a block-based statistics in DCT domain. First, the proposed technique calculates JND threshold value using the global statistics in DCT domain. Then watermark insertion is carried out by inserting one watermark into coefficients which are above the threshold value J within a 2${\times}$2 block. Finally, watermark is estimated by averaging the extracted watermarks from the coefficients which are above the threshold in a window. In experiments it is shown that the proposed techniques can enhance perceptual invisibility and robustness against additive noise and JPEG compression attacks by using the characteristics of JND.

  • PDF

A Bootstrap Test for Linear Relationship by Kernel Smoothing (희귀모형의 선형성에 대한 커널붓스트랩검정)

  • Baek, Jang-Sun;Kim, Min-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.9 no.2
    • /
    • pp.95-103
    • /
    • 1998
  • Azzalini and Bowman proposed the pseudo-likelihood ratio test for checking the linear relationship using kernel regression estimator when the error of the regression model follows the normal distribution. We modify their method with the bootstrap technique to construct a new test, and examine the power of our test through simulation. Our method can be applied to the case where the distribution of the error is not normal.

  • PDF

The Use of Generalized Gamma-Polynomial Approximation for Hazard Functions

  • Ha, Hyung-Tae
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1345-1353
    • /
    • 2009
  • We introduce a simple methodology, so-called generalized gamma-polynomial approximation, based on moment-matching technique to approximate survival and hazard functions in the context of parametric survival analysis. We use the generalized gamma-polynomial approximation to approximate the density and distribution functions of convolutions and finite mixtures of random variables, from which the approximated survival and hazard functions are obtained. This technique provides very accurate approximation to the target functions, in addition to their being computationally efficient and easy to implement. In addition, the generalized gamma-polynomial approximations are very stable in middle range of the target distributions, whereas saddlepoint approximations are often unstable in a neighborhood of the mean.

Genetic Programming Based Compensation Technique for Short-range Temperature Prediction (유전 프로그래밍 기반 단기 기온 예보의 보정 기법)

  • Hyeon, Byeong-Yong;Hyun, Soo-Hwan;Lee, Yong-Hee;Seo, Ki-Sung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.11
    • /
    • pp.1682-1688
    • /
    • 2012
  • This paper introduces a GP(Genetic Programming) based robust technique for temperature compensation in short-range prediction. Development of an efficient MOS(Model Output Statistics) is necessary to correct systematic errors of the model, because forecast models do not reliably determine weather conditions. Most of MOS use a linear regression to compensate a prediction model, therefore it is hard to manage an irregular nature of prediction. In order to solve the problem, a nonlinear and symbolic regression method using GP is suggested. The purpose of this study is to evaluate the accuracy of the estimation by a GP based nonlinear MOS for 3 days temperatures in Korean regions. This method is then compared to the UM model and has shown superior results. The training period of 2007-2009 summer is used, and the data of 2010 summer is adopted for verification.

Detecting Steganographic Contents Using EWM Statistics (EWM 통계량을 이용한 스테가노그래픽 자료 감지 기법)

  • Ji, Seon-Su
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.13 no.3
    • /
    • pp.54-62
    • /
    • 2008
  • For a message hiding technique to be effectual, it needs to have availability, confidentiality and integrity. Steganography is the science of hiding one message within other types of digital contents. In this case, attempt to defeat steganalysis by restoring the statistics of the composite image to resemble that of the cover, these discrepancies expose the fact that hidden communication is happening. In this paper, I present a steganography scheme capable of concealing a piece of secrete information in a host image and base on the technique's OCT, RGB, statistical restoration.

  • PDF

Semiparametric Bayesian multiple comparisons for Poisson Populations

  • Cho, Jang Sik;Kim, Dal Ho;Kang, Sang Gil
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.427-434
    • /
    • 2001
  • In this paper, we consider the nonparametric Bayesian approach to the multiple comparisons problem for I Poisson populations using Dirichlet process priors. We describe Gibbs sampling algorithm for calculating posterior probabilities for the hypotheses and calculate posterior probabilities for the hypotheses using Markov chain Monte Carlo. Also we provide a numerical example to illustrate the developed numerical technique.

  • PDF

Recalibration Estimation for Unit Nonresponse at the Two Levels Auxiliary Information

  • Yum, Joon Keun;Son, Chang Kyoon;Jeung, Young Mee
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.665-678
    • /
    • 2003
  • In this paper we suggest the new calibration estimator, which is called to the recalibration estimator, and its variance estimator using two-phase sampling technique according to the auxiliary information having strong correlation with the variable of interest under the unit nonresponse. In this unit nonresponse situation, an available information may exists at the level of whole population or the first-phase sample. The proposed recalibration estimator derives from the first and second phase weights respectively.