• Title/Summary/Keyword: oversampling algorithm

Search Result 30, Processing Time 0.022 seconds

A COMOS Oversampling Data Recovery Circuit With the Vernier Delay Generation Technique

  • Jun-Young Park
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.10A
    • /
    • pp.1590-1597
    • /
    • 2000
  • This paper describes a CMOS data recovery circuit using oversampling technique. Digital oversampling is done using a delay locked loop circuit locked to multiple clock periods. The delay locked loop circuit generates the vernier delay resolution less than the gate delay of the delay chain. The transition and non-transition counting algorithm for 4x oversampling was implemented for data recovery and verified through FPGA. The chip has been fabricated with 0.6um CMOS technology and measured results are presented.

  • PDF

A study on the characteristics of applying oversampling algorithms to Fosberg Fire-Weather Index (FFWI) data

  • Sang Yeob Kim;Dongsoo Lee;Jung-Doung Yu;Hyung-Koo Yoon
    • Smart Structures and Systems
    • /
    • v.34 no.1
    • /
    • pp.9-15
    • /
    • 2024
  • Oversampling algorithms are methods employed in the field of machine learning to address the constraints associated with data quantity. This study aimed to explore the variations in reliability as data volume is progressively increased through the use of oversampling algorithms. For this purpose, the synthetic minority oversampling technique (SMOTE) and the borderline synthetic minority oversampling technique (BSMOTE) are chosen. The data inputs, which included air temperature, humidity, and wind speed, are parameters used in the Fosberg Fire-Weather Index (FFWI). Starting with a base of 52 entries, new data sets are generated by incrementally increasing the data volume by 10% up to a total increase of 100%. This augmented data is then utilized to predict FFWI using a deep neural network. The coefficient of determination (R2) is calculated for predictions made with both the original and the augmented datasets. Suggesting that increasing data volume by more than 50% of the original dataset quantity yields more reliable outcomes. This study introduces a methodology to alleviate the challenge of establishing a standard for data augmentation when employing oversampling algorithms, as well as a means to assess reliability.

2X Converse Oversampling 1.65Gb/s/ch CMOS Semi-digital Data Recovery (2X Converse Oversampling 1.65Gb/s/ch CMOS 준 디지털 데이터 복원 회로)

  • Kim, Gil-Su;Kim, Kyu-Young;Shon, Kwan-Su;Kim, Soo-Won
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.6 s.360
    • /
    • pp.1-7
    • /
    • 2007
  • This paper proposes CMOS semi-digital data recovery with 2X converse oversampling to reduce power consumption and chid area of high definition multimedia interface (HDMI) receivers. Proposed recovery can reduce its power and the effective area by using nt converse oversampling algorithm and semi-digital architecture. Proposed circuit is fabricated using 0.18um CMOS process and measured results demonstrated the power consumption of 14.4mW, the effective area of $0.152mm^2$ and the jitter tolerance of 0.7UIpp with 1.8V supply voltage.)

Optimal Ratio of Data Oversampling Based on a Genetic Algorithm for Overcoming Data Imbalance (데이터 불균형 해소를 위한 유전알고리즘 기반 최적의 오버샘플링 비율)

  • Shin, Seung-Soo;Cho, Hwi-Yeon;Kim, Yong-Hyuk
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.1
    • /
    • pp.49-55
    • /
    • 2021
  • Recently, with the development of database, it is possible to store a lot of data generated in finance, security, and networks. These data are being analyzed through classifiers based on machine learning. The main problem at this time is data imbalance. When we train imbalanced data, it may happen that classification accuracy is degraded due to over-fitting with majority class data. To overcome the problem of data imbalance, oversampling strategy that increases the quantity of data of minority class data is widely used. It requires to tuning process about suitable method and parameters for data distribution. To improve the process, In this study, we propose a strategy to explore and optimize oversampling combinations and ratio based on various methods such as synthetic minority oversampling technique and generative adversarial networks through genetic algorithms. After sampling credit card fraud detection which is a representative case of data imbalance, with the proposed strategy and single oversampling strategies, we compare the performance of trained classifiers with each data. As a result, a strategy that is optimized by exploring for ratio of each method with genetic algorithms was superior to previous strategies.

Semi-supervised Software Defect Prediction Model Based on Tri-training

  • Meng, Fanqi;Cheng, Wenying;Wang, Jingdong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.4028-4042
    • /
    • 2021
  • Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

Data Construction through Oversampling Techniques and Outlier Removal Methods (Oversampling 기법 및 이상치 제거 방법을 통한 데이터 구축 연구)

  • Jang, Byeong-Su;Go, Gyu-Hyun;Kim, YoungSeok;Kim, Sewon;Choi, Hyun-Jun;Yoon, Hyung-Koo
    • Journal of the Korean Geotechnical Society
    • /
    • v.40 no.5
    • /
    • pp.93-101
    • /
    • 2024
  • Numerical analysis methods are widely used to assess the safety of hydrogen storage facilities; however, obtaining data under various conditions poses significant challenges. This study aims to expand the dataset using oversampling algorithms and utilize these enhanced datasets as diverse input parameters for numerical analysis. The oversampling techniques applied include SMOTE, Borderline-SMOTE, ADASYN, and CTGAN, with data amplified by factors of 2, 5, and 100 relative to the original dataset. This approach increases data volume based on the characteristics of the existing data, which may consequently introduce outliers. To address this, statistical methods such as the 3-sigma rule and the confidence level method are employed to identify and remove outliers beyond the normal distribution range. The reliability of the conditions generated through data amplification and outlier analysis is evaluated by comparing them with trends observed in the original dataset. Additionally, the SHAP algorithm is utilized to analyze changes in the importance values of each parameter. The SHAP values derived from the original dataset and those processed through AI techniques and outlier analysis exhibit similar trends, validating the proposed methodologies. The methods proposed in this paper are applicable not only to hydrogen storage facilities but also to the systematic construction of data for assessing the stability of various geotechnical structures.

A divide-oversampling and conquer algorithm based support vector machine for massive and highly imbalanced data (불균형의 대용량 범주형 자료에 대한 분할-과대추출 정복 서포트 벡터 머신)

  • Bang, Sungwan;Kim, Jaeoh
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.177-188
    • /
    • 2022
  • The support vector machine (SVM) has been successfully applied to various classification areas with a high level of classification accuracy. However, it is infeasible to use the SVM in analyzing massive data because of its significant computational problems. When analyzing imbalanced data with different class sizes, furthermore, the classification accuracy of SVM in minority class may drop significantly because its classifier could be biased toward the majority class. To overcome such a problem, we propose the DOC-SVM method, which uses divide-oversampling and conquers techniques. The proposed DOC-SVM divides the majority class into a few subsets and applies an oversampling technique to the minority class in order to produce the balanced subsets. And then the DOC-SVM obtains the final classifier by aggregating all SVM classifiers obtained from the balanced subsets. Simulation studies are presented to demonstrate the satisfactory performance of the proposed method.

Comparison of Anomaly Detection Performance Based on GRU Model Applying Various Data Preprocessing Techniques and Data Oversampling (다양한 데이터 전처리 기법과 데이터 오버샘플링을 적용한 GRU 모델 기반 이상 탐지 성능 비교)

  • Yoo, Seung-Tae;Kim, Kangseok
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.201-211
    • /
    • 2022
  • According to the recent change in the cybersecurity paradigm, research on anomaly detection methods using machine learning and deep learning techniques, which are AI implementation technologies, is increasing. In this study, a comparative study on data preprocessing techniques that can improve the anomaly detection performance of a GRU (Gated Recurrent Unit) neural network-based intrusion detection model using NGIDS-DS (Next Generation IDS Dataset), an open dataset, was conducted. In addition, in order to solve the class imbalance problem according to the ratio of normal data and attack data, the detection performance according to the oversampling ratio was compared and analyzed using the oversampling technique applied with DCGAN (Deep Convolutional Generative Adversarial Networks). As a result of the experiment, the method preprocessed using the Doc2Vec algorithm for system call feature and process execution path feature showed good performance, and in the case of oversampling performance, when DCGAN was used, improved detection performance was shown.

Blind MMSE Equalization of FIR/IIR Channels Using Oversampling and Multichannel Linear Prediction

  • Chen, Fangjiong;Kwong, Sam;Kok, Chi-Wah
    • ETRI Journal
    • /
    • v.31 no.2
    • /
    • pp.162-172
    • /
    • 2009
  • A linear-prediction-based blind equalization algorithm for single-input single-output (SISO) finite impulse response/infinite impulse response (FIR/IIR) channels is proposed. The new algorithm is based on second-order statistics, and it does not require channel order estimation. By oversampling the channel output, the SISO channel model is converted to a special single-input multiple-output (SIMO) model. Two forward linear predictors with consecutive prediction delays are applied to the subchannel outputs of the SIMO model. It is demonstrated that the partial parameters of the SIMO model can be estimated from the difference between the prediction errors when the length of the predictors is sufficiently large. The sufficient filter length for achieving the optimal prediction is also derived. Based on the estimated parameters, both batch and adaptive minimum-mean-square-error equalizers are developed. The performance of the proposed equalizers is evaluated by computer simulations and compared with existing algorithms.

  • PDF

Study on failure mode prediction of reinforced concrete columns based on class imbalanced dataset

  • Mingyi Cai;Guangjun Sun;Bo Chen
    • Earthquakes and Structures
    • /
    • v.27 no.3
    • /
    • pp.177-189
    • /
    • 2024
  • Accurately predicting the failure modes of reinforced concrete (RC) columns is essential for structural design and assessment. In this study, the challenges of imbalanced datasets and complex feature selection in machine learning (ML) methods were addressed through an optimized ML approach. By combining feature selection and oversampling techniques, the prediction of seismic failure modes in rectangular RC columns was improved. Two feature selection methods were used to identify six input parameters. To tackle class imbalance, the Borderline-SMOTE1 algorithm was employed, enhancing the learning capabilities of the models for minority classes. Eight ML algorithms were trained and fine-tuned using k-fold shuffle split cross-validation and grid search. The results showed that the artificial neural network model achieved 96.77% accuracy, while k-nearest neighbor, support vector machine, and random forest models each achieved 95.16% accuracy. The balanced dataset led to significant improvements, particularly in predicting the flexure-shear failure mode, with accuracy increasing by 6%, recall by 8%, and F1 scores by 7%. The use of the Borderline-SMOTE1 algorithm significantly improved the recognition of samples at failure mode boundaries, enhancing the classification performance of models like k-nearest neighbor and decision tree, which are highly sensitive to data distribution and decision boundaries. This method effectively addressed class imbalance and selected relevant features without requiring complex simulations like traditional methods, proving applicable for discerning failure modes in various concrete members under seismic action.