• 제목/요약/키워드: Statistics data

검색결과 13,842건 처리시간 0.031초

Supervised text data augmentation method for deep neural networks

  • Jaehwan Seol;Jieun Jung;Yeonseok Choi;Yong-Seok Choi
    • Communications for Statistical Applications and Methods
    • /
    • 제30권3호
    • /
    • pp.343-354
    • /
    • 2023
  • Recently, there have been many improvements in general language models using architectures such as GPT-3 proposed by Brown et al. (2020). Nevertheless, training complex models can hardly be done if the number of data is very small. Data augmentation that addressed this problem was more than normal success in image data. Image augmentation technology significantly improves model performance without any additional data or architectural changes (Perez and Wang, 2017). However, applying this technique to textual data has many challenges because the noise to be added is veiled. Thus, we have developed a novel method for performing data augmentation on text data. We divide the data into signals with positive or negative meaning and noise without them, and then perform data augmentation using k-doc augmentation to randomly combine signals and noises from all data to generate new data.

구간 자료의 확률적 순서 검정 (Testing for stochastic order in interval-valued data)

  • 최혜정;임요한;곽민정;박성오
    • 응용통계연구
    • /
    • 제32권6호
    • /
    • pp.879-887
    • /
    • 2019
  • 본 연구에서는 이표본 구간 자료의 확률적 순서 검정 절차를 제안한다. 제안하는 검정 통계량은 U-통계량에 해당하며 본 연구에서는 이에 대한 점근적 분포를 귀무 가설 하에서 유도하였다. 실제 자료와 모의 실험을 통해 새로 제안한 방법의 성능을 단측 이변량 Kolmogorov-Smirnov 검정법과 비교한다.

Detection of Hotspots for Geospatial Lattice Data

  • Moon, Sung-Ho;Kim, Jong-Duk
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권1호
    • /
    • pp.131-139
    • /
    • 2006
  • Statistical analyses for spatial data are important features for various types of fields. Spatial data are taken at specific locations or within specific regions and their relative positions are recorded. Lattice data are synoptic observation covering an entire spatial region, like cancer rates corresponding to each county in a state. The main purpose of this paper is to detect hotspots for the region with significantly high or low rates. Kulldorff(1997) detected hotspots based on circular spatial scan statistics. We propose a new method to find any shapes of hotspots by use of echelon analysis with spatial scan statistics.

  • PDF

A Comparative Study on Statistics Education Between Korea and USA

  • Kim, Sang-Lyong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제17권4호
    • /
    • pp.1107-1117
    • /
    • 2006
  • In this thesis, we conduct a comprehensive analysis of the current situation and the inherent problems found in modern statistics education in Korea. We investigate the American probability and statistics curriculum currently used in Wisconsin and discuss the overall state of statistics education in The United States. Through comparison of both the Korean and Wisconsin model, we explore the future direction of statistical education.

  • PDF

지역통계 데이타 베이스 구축방안 (A Study on Data Base of Region Statistics)

  • 이화영;이희춘;홍기학
    • 품질경영학회지
    • /
    • 제22권1호
    • /
    • pp.179-187
    • /
    • 1994
  • This study suggests a data base scheme of region statistics whose demand has been rapidly increased as the local self-governing body is introduced in Korea. A program for the region statistics management(registration, reference, modification, deletion) is developed and it can be used by personal computer users.

  • PDF

Recurrence Relations in the Transformed Exponential Distributions

  • Choi, Jeen-Kap;Mo, Kap-Jong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제14권4호
    • /
    • pp.1031-1044
    • /
    • 2003
  • In this paper, we establish some recurrence relations of the moments, product moments, percentage points, and modes of order statistics from the transformed exponential distribution.

  • PDF

A Study on Data Mining Using the Spline Basis

  • Lee, Sun-Geune;Sim, Songyong;Koo, Ja-Yong
    • Communications for Statistical Applications and Methods
    • /
    • 제11권2호
    • /
    • pp.255-264
    • /
    • 2004
  • Due to a computerized data processing, there are many cases when we encounter a huge data set. On the other hand, advances in computing technologies make it possible to deal with a huge data set. One important area is the data mining. In this paper we consider data mining when the dependent variable is binary. The proposed method is to use the poly-class model when the independent variables consists of continuous and discrete variables. An example is provided.

임상시험에서의 통계 활용 (Usage of Statistics in Clinical Trials)

  • 안홍엽
    • Journal of Hospice and Palliative Care
    • /
    • 제13권1호
    • /
    • pp.1-6
    • /
    • 2010
  • 임상시험은 인간을 대상으로 약물 또는 치료법의 효과를 검증하는 것을 목적으로 하고 있다. 성공적인 임상시험을 위해서는 단순한 자료분석에만 통계의 이용을 제한하지 않고 다양한 영역으로 활용의 폭을 넓히는 것이 필요하다. 연구계획단계에서부터 구체적이고 체계적으로 통계의 활용을 고려하기 위해 효과에 대한 정의, 적정한 표본크기 산정, 통계분석 방법 등 전반적인 통계의 응용을 고찰한다.

통계교육의 개선방향 탐색 (An Exploration of the Reform Direction of Teaching Statistics)

  • 우정호
    • 대한수학교육학회지:학교수학
    • /
    • 제2권1호
    • /
    • pp.1-27
    • /
    • 2000
  • In the past half century little effort has been made for the improvement of teaching and learning statistics compared with other parts of school mathematics. But recently data analysis has begun to play a prominant role in the national reform efforts of mathematics curricula in the United States of America and the United Kingdom. In this paper we overview modern statistical thinking differed from mathematical thinking and examine the problems of current old-style teaching of statistics. And, we discuss the current data handling(or data analysis) emphasis in the national curriculum of mathematics in the countries mentioned above. We explore the reform direction of statistics teaching; changing the philosophy of teaching statistics, teaching real data analysis, emphasis of using computer, and teaching statistical inference not as mathematics but as intuitive data-centered approach.

  • PDF

A Study on the Surveyed Courses which are Related with or Support the Subject of Statistics

  • Shin, Jae-Kyoung;Chang, Duk-Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권4호
    • /
    • pp.837-843
    • /
    • 2004
  • Recently the status of statistics is in a critical condition in the curriculum of the universities. It is largely due to the decrease of applicants for the admission to the universities and growing tendency to prefer the departments that are advantageous for getting a job after graduation. Another reason for the present crisis of statistics is that scholars of statistics and related scholarly associations fail to prepare measures for the activation of statistics. In order to cope with these general problems, this study surveyed the courses which are related with or support the subject of statistics in the department, other than statistics, of two universities in the Yeongnam province. Based on the survey this study aims at suggesting some ways of activation of statistics.

  • PDF