• Title/Summary/Keyword: unbalanced data

Search Result 324, Processing Time 0.022 seconds

Data De-weighting in Matrix Pencil Method (매트릭스 팬슬 방법의 데이터 불균형 제거 기법)

  • Koh, Jin-Hwan;Xu, Xiaowen;Ryu, Beong-Ju;Lee, Jae-Hun;Lee, Jung-Sup
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.36 no.8A
    • /
    • pp.741-747
    • /
    • 2011
  • Matrix Pencil method is one of the promising method to estimate DOA in non-stationary, multi-path coherent environment. Not only the Matrix Pencil Method offers better resolution than the conventional approach using covariance matrix, but also it is computationally very efficient. In this paper, we presented an effect of unbalanced data weighting in the formulation of the Matrix Pencil method. A new formulation has been suggested to mitigate the effect of unbalanced data weighting. Numerical simulation demonstrated that the proposed method can successfully eliminate the problem of unbalanced data weighting.

BAYESIAN INFERENCE FOR FIELLER-CREASY PROBLEM USING UNBALANCED DATA

  • Lee, Woo-Dong;Kim, Dal-Ho;Kang, Sang-Gil
    • Journal of the Korean Statistical Society
    • /
    • v.36 no.4
    • /
    • pp.489-500
    • /
    • 2007
  • In this paper, we consider Bayesian approach to the Fieller-Creasy problem using noninformative priors. Specifically we extend the results of Yin and Ghosh (2000) to the unbalanced case. We develop some noninformative priors such as the first and second order matching priors and reference priors. Also we prove the posterior propriety under the derived noninformative priors. We compare these priors in light of how accurately the coverage probabilities of Bayesian credible intervals match the corresponding frequentist coverage probabilities.

Tests for Panel Regression Model with Unbalanced Data

  • Song, Suck-Heun;Jung, Byoung-Cheol
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.3
    • /
    • pp.511-527
    • /
    • 2001
  • This paper consider the testing problem of variance component for the unbalanced tow=-way error component model. We provide a conditional LM test statistic for testing zero individual(time) effects assuming that the other time-specific(individual)efefcts are present. This test is extension of Baltagi, Chang and Li(1998, 1992). Monte Carlo experiments are conducted to study the performance of this LM test.

  • PDF

Confidence Interval for the Variance Component in a Unbalanced One-way Random Effects Model

  • Song, Gyu-Moon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.13 no.2
    • /
    • pp.329-340
    • /
    • 2002
  • Two methods are proposed for constructing a confidence interval on the among group variance component in a unbalanced one-way random effects model. Computer simulation is used to compare these methods with alternative procedures. The results indicate that the method1 and methods2 perform well over small group size and large sample size respectively.

  • PDF

On the Fitting ANOVA Models to Unbalanced Data

  • Jong-Tae Park;Jae-Heon Lee;Byung-Chun Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.1
    • /
    • pp.48-54
    • /
    • 1995
  • A direct method for fitting analysis-of-variance models to unbalanced data is presented. This method exploits sparsity and rank deficiency of the matrix and is based on Gram-Schmidt orthogonalization of a set of sparse columns of the model matrix. The computational algorithm of the sum of squares for testing estmable hyphotheses is given.

  • PDF

Detecting Malicious Social Robots with Generative Adversarial Networks

  • Wu, Bin;Liu, Le;Dai, Zhengge;Wang, Xiujuan;Zheng, Kangfeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.11
    • /
    • pp.5594-5615
    • /
    • 2019
  • Malicious social robots, which are disseminators of malicious information on social networks, seriously affect information security and network environments. The detection of malicious social robots is a hot topic and a significant concern for researchers. A method based on classification has been widely used for social robot detection. However, this method of classification is limited by an unbalanced data set in which legitimate, negative samples outnumber malicious robots (positive samples), which leads to unsatisfactory detection results. This paper proposes the use of generative adversarial networks (GANs) to extend the unbalanced data sets before training classifiers to improve the detection of social robots. Five popular oversampling algorithms were compared in the experiments, and the effects of imbalance degree and the expansion ratio of the original data on oversampling were studied. The experimental results showed that the proposed method achieved better detection performance compared with other algorithms in terms of the F1 measure. The GAN method also performed well when the imbalance degree was smaller than 15%.

Time series representation for clustering using unbalanced Haar wavelet transformation (불균형 Haar 웨이블릿 변환을 이용한 군집화를 위한 시계열 표현)

  • Lee, Sehun;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.707-719
    • /
    • 2018
  • Various time series representation methods have been proposed for efficient time series clustering and classification. Lin et al. (DMKD, 15, 107-144, 2007) proposed a symbolic aggregate approximation (SAX) method based on symbolic representations after approximating the original time series using piecewise local mean. The performance of SAX therefore depends heavily on how well the piecewise local averages approximate original time series features. SAX equally divides the entire series into an arbitrary number of segments; however, it is not sufficient to capture key features from complex, large-scale time series data. Therefore, this paper considers data-adaptive local constant approximation of the time series using the unbalanced Haar wavelet transformation. The proposed method is shown to outperforms SAX in many real-world data applications.

Methods and Techniques for Variance Component Estimation in Animal Breeding - Review -

  • Lee, C.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.13 no.3
    • /
    • pp.413-422
    • /
    • 2000
  • In the class of models which include random effects, the variance component estimates are important to obtain accurate predictors and estimators. Variance component estimation is straightforward for balanced data but not for unbalanced data. Since orthogonality among factors is absent in unbalanced data, various methods for variance component estimation are available. REML estimation is the most widely used method in animal breeding because of its attractive statistical properties. Recently, Bayesian approach became feasible through Markov Chain Monte Carlo methods with increasingly powerful computers. Furthermore, advances in variance component estimation with complicated models such as generalized linear mixed models enabled animal breeders to analyze non-normal data.

Machine Learning-based landslide susceptibility mapping - Inje area, South Korea

  • Chanul Choi;Le Xuan Hien;Seongcheon Kwon;Giha Lee
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.248-248
    • /
    • 2023
  • In recent years, the number of landslides in Korea has been increasing due to extreme weather events such as localized heavy rainfall and typhoons. Landslides often occur with debris flows, land subsidence, and earthquakes. They cause significant damage to life and property. 64% of Korea's land area is made up of mountains, the government wanted to predict landslides to reduce damage. In response, the Korea Forest Service has established a 'Landslide Information System' to predict the likelihood of landslides. This system selects a total of 13 landslide factors based on past landslide events. Using the LR technique (Logistic Regression) to predict the possibility of a landslide occurrence and the accuracy is known to be 0.75. However, most of the data used for learning in the current system is on landslides that occurred from 2005 to 2011, and it does not reflect recent typhoons or heavy rain. Therefore, in this study, we will apply a total of six machine learning techniques (KNN, LR, SVM, XGB, RF, GNB) to predict the occurrence of landslides based on the data of Inje, Gangwon-do, which was recently produced by the National Institute of Forest. To predict the occurrence of landslides, it is necessary to process converting landslide events and factors data into a suitable form for machine learning techniques through ArcGIS and Python. In addition, there is a large difference in the number of data between areas where landslides occurred or not. Therefore, the prediction was performed after correcting the unbalanced data using Tomek Links and Near Miss techniques. Moreover, to control unbalanced data, a model that reflects soil properties will use to remove absolute safe areas.

  • PDF

Analysis of induced voltage of CCPU with unbalanced current from Distribution Line on Underground Transmission Cable System (지중송전계통에서 배전선 불평형전류 유입에 따른 영향 검토)

  • Kang, J.W.;Jang, T.I.;Hong, D.S.;Jung, C.K.;Yoon, D.S.;Yoon, J.K.;Kim, H.H.
    • Proceedings of the KIEE Conference
    • /
    • 2005.07a
    • /
    • pp.459-461
    • /
    • 2005
  • This paper analyses the induced voltage characteristic of CCPU with unbalanced current from distribution line on underground transmission power cable systems. In switching surge strokes, in order to obtain the data of induced voltage/current on CCPU, the actual proof test carried out. This paper is expected to contribute the establishment of proper protection methods of CCPU against the unbalanced current from distribution line on underground transmission power cable systems.

  • PDF