• Title/Summary/Keyword: Oversampling Technique

Search Result 56, Processing Time 0.025 seconds

Prediction Model for Gastric Cancer via Class Balancing Techniques

  • Danish, Jamil ;Sellappan, Palaniappan;Sanjoy Kumar, Debnath;Muhammad, Naseem;Susama, Bagchi ;Asiah, Lokman
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.53-63
    • /
    • 2023
  • Many researchers are trying hard to minimize the incidence of cancers, mainly Gastric Cancer (GC). For GC, the five-year survival rate is generally 5-25%, but for Early Gastric Cancer (EGC), it is almost 90%. Predicting the onset of stomach cancer based on risk factors will allow for an early diagnosis and more effective treatment. Although there are several models for predicting stomach cancer, most of these models are based on unbalanced datasets, which favours the majority class. However, it is imperative to correctly identify cancer patients who are in the minority class. This research aims to apply three class-balancing approaches to the NHS dataset before developing supervised learning strategies: Oversampling (Synthetic Minority Oversampling Technique or SMOTE), Undersampling (SpreadSubsample), and Hybrid System (SMOTE + SpreadSubsample). This study uses Naive Bayes, Bayesian Network, Random Forest, and Decision Tree (C4.5) methods. We measured these classifiers' efficacy using their Receiver Operating Characteristics (ROC) curves, sensitivity, and specificity. The validation data was used to test several ways of balancing the classifiers. The final prediction model was built on the one that did the best overall.

3.125Gbps Reference-less Clock and Data Recovery using 4X Oversampling (4X 오버샘플링을 이용한 3.125Gbps급 기준 클록이 없는 클록 데이터 복원 회로)

  • Jang, Hyung-Wook;Kang, Jin-Ku
    • Journal of IKEEE
    • /
    • v.10 no.1 s.18
    • /
    • pp.10-15
    • /
    • 2006
  • In this paper, a clock and data recovery (CDR) circuit for a serial link with a half rate 4x oversampling phase and frequency detector structure without a reference clock is described. The phase detector (PD) and frequency detector (FD)are designed by 4X oversampling method. The PD, which uses bang-bang method, finds the phase error by generating four up/down signal and the FD, which uses the rotational method, finds the frequency error by generating up/down signal made by the PD output. And the six signals of the PD and the FD control an amount of current that flows through the charge pump. The VCO composed of four differential buffer stages generates eight differential clocks. Proposed circuit is designed using the 0.18um CMOS technology and operating voltage is 1.8V. With a 4X oversampling PD and FD technique, tracking range of 24% at 3.125Gbps is achieved.

  • PDF

A divide-oversampling and conquer algorithm based support vector machine for massive and highly imbalanced data (불균형의 대용량 범주형 자료에 대한 분할-과대추출 정복 서포트 벡터 머신)

  • Bang, Sungwan;Kim, Jaeoh
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.177-188
    • /
    • 2022
  • The support vector machine (SVM) has been successfully applied to various classification areas with a high level of classification accuracy. However, it is infeasible to use the SVM in analyzing massive data because of its significant computational problems. When analyzing imbalanced data with different class sizes, furthermore, the classification accuracy of SVM in minority class may drop significantly because its classifier could be biased toward the majority class. To overcome such a problem, we propose the DOC-SVM method, which uses divide-oversampling and conquers techniques. The proposed DOC-SVM divides the majority class into a few subsets and applies an oversampling technique to the minority class in order to produce the balanced subsets. And then the DOC-SVM obtains the final classifier by aggregating all SVM classifiers obtained from the balanced subsets. Simulation studies are presented to demonstrate the satisfactory performance of the proposed method.

Comparison of Anomaly Detection Performance Based on GRU Model Applying Various Data Preprocessing Techniques and Data Oversampling (다양한 데이터 전처리 기법과 데이터 오버샘플링을 적용한 GRU 모델 기반 이상 탐지 성능 비교)

  • Yoo, Seung-Tae;Kim, Kangseok
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.201-211
    • /
    • 2022
  • According to the recent change in the cybersecurity paradigm, research on anomaly detection methods using machine learning and deep learning techniques, which are AI implementation technologies, is increasing. In this study, a comparative study on data preprocessing techniques that can improve the anomaly detection performance of a GRU (Gated Recurrent Unit) neural network-based intrusion detection model using NGIDS-DS (Next Generation IDS Dataset), an open dataset, was conducted. In addition, in order to solve the class imbalance problem according to the ratio of normal data and attack data, the detection performance according to the oversampling ratio was compared and analyzed using the oversampling technique applied with DCGAN (Deep Convolutional Generative Adversarial Networks). As a result of the experiment, the method preprocessed using the Doc2Vec algorithm for system call feature and process execution path feature showed good performance, and in the case of oversampling performance, when DCGAN was used, improved detection performance was shown.

Simulated Annealing for Overcoming Data Imbalance in Mold Injection Process (사출성형공정에서 데이터의 불균형 해소를 위한 담금질모사)

  • Dongju Lee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.233-239
    • /
    • 2022
  • The injection molding process is a process in which thermoplastic resin is heated and made into a fluid state, injected under pressure into the cavity of a mold, and then cooled in the mold to produce a product identical to the shape of the cavity of the mold. It is a process that enables mass production and complex shapes, and various factors such as resin temperature, mold temperature, injection speed, and pressure affect product quality. In the data collected at the manufacturing site, there is a lot of data related to good products, but there is little data related to defective products, resulting in serious data imbalance. In order to efficiently solve this data imbalance, undersampling, oversampling, and composite sampling are usally applied. In this study, oversampling techniques such as random oversampling (ROS), minority class oversampling (SMOTE), ADASYN(Adaptive Synthetic Sampling), etc., which amplify data of the minority class by the majority class, and complex sampling using both undersampling and oversampling, are applied. For composite sampling, SMOTE+ENN and SMOTE+Tomek were used. Artificial neural network techniques is used to predict product quality. Especially, MLP and RNN are applied as artificial neural network techniques, and optimization of various parameters for MLP and RNN is required. In this study, we proposed an SA technique that optimizes the choice of the sampling method, the ratio of minority classes for sampling method, the batch size and the number of hidden layer units for parameters of MLP and RNN. The existing sampling methods and the proposed SA method were compared using accuracy, precision, recall, and F1 Score to prove the superiority of the proposed method.

Centroid and Nearest Neighbor based Class Imbalance Reduction with Relevant Feature Selection using Ant Colony Optimization for Software Defect Prediction

  • B., Kiran Kumar;Gyani, Jayadev;Y., Bhavani;P., Ganesh Reddy;T, Nagasai Anjani Kumar
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.1-10
    • /
    • 2022
  • Nowadays software defect prediction (SDP) is most active research going on in software engineering. Early detection of defects lowers the cost of the software and also improves reliability. Machine learning techniques are widely used to create SDP models based on programming measures. The majority of defect prediction models in the literature have problems with class imbalance and high dimensionality. In this paper, we proposed Centroid and Nearest Neighbor based Class Imbalance Reduction (CNNCIR) technique that considers dataset distribution characteristics to generate symmetry between defective and non-defective records in imbalanced datasets. The proposed approach is compared with SMOTE (Synthetic Minority Oversampling Technique). The high-dimensionality problem is addressed using Ant Colony Optimization (ACO) technique by choosing relevant features. We used nine different classifiers to analyze six open-source software defect datasets from the PROMISE repository and seven performance measures are used to evaluate them. The results of the proposed CNNCIR method with ACO based feature selection reveals that it outperforms SMOTE in the majority of cases.

A Design of 1V Delta-Sigma Modulator (델타-시그마 변조기의 1V 설계)

  • 김정민;임신일;최종찬
    • Proceedings of the IEEK Conference
    • /
    • 2002.06e
    • /
    • pp.87-90
    • /
    • 2002
  • This paper describes design technique of switched-capacitor 1V delta-sigma modulator. To solve the incomplete switching operation at low voltage, bootstrapping technique is used. For PMOS input pair of 1V operational amplifier, simple common mode level down technique is used. Designed 2nd order single loop modulator has an oversampling ratio of 64 and obtains a peak SNR of 71dB, a dynamic range of 73 dB with the power consumption of 350uW at 1V power supply.

  • PDF

Prediction of CDOM absorption coefficient using Oversampling technique and Machine Learning in upstream reach of Baekje weir (백제보 상류하천구간의 Oversampling technique과 Machine Learning을 활용한 CDOM 흡수계수 예측)

  • Kim, Jinuk;Jang, Wonjin;Kim, Jinhwi;Park, Yongeun;Kim, Seongjoon
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.46-46
    • /
    • 2022
  • 유기물의 복잡한 혼합물인 CDOM(Colored or Chromophoric Dissolved Organic Matter)은 하천 내 BOD(Biological Oxygen Demand), COD(Chemical Oxygen Demand) 및 유기 오염물질과 상당한 관련이 있다. CDOM은 가시광선 영역에서 빛을 흡수하는 성질을 가지고 있으며, 최근 원격감지 기술로 CDOM을 모니터링하기 위한 연구가 진행되고 있다. 본 연구에서는 백제보 상류 23km 구간에서 3년(2016~2018) 중 13일의 초분광영상을 활용하여 머신러닝 기반 CDOM을 추정 알고리즘을 개발하고자 한다. 초분광영상은 400~970 nm의 범위의 4 nm 간격 127개 대역의 분광해상도와 2 m의 공간해상도를 가진 항공기 탑재 AsiaFENIX 초분광 센서를 통해 수집하였으며 CDOM은 Millipore polycarbonate filter (𝚽47, 0.2 ㎛)에서 여과된 CDOM 샘플 자료를 200~800 nm의 흡수계수 스펙트럼으로 추출하여 사용하였다. CDOM 값은 전체기간 동안 2.0~11.0 m-1의 값 분포를 보였으며 5 m-1이상의 고농도 구간 자료개수가 전체 153개 샘플자료 중 21개로 불균형하다. 따라서 ADASYN(Adaptive Synthesis Sampling Approach)의 oversampling 방법으로 생성된 합성 데이터를 사용하여 원본 데이터의 소수계층 데이터 불균형을 해결하고 모델 예측 성능을 개선하고자 하였다. 생성된 합성 데이터를 입력변수로 하여 ANN(Artificial Neural Netowk)을 활용한 CDOM 예측 알고리즘을 구축하였다. ADASYN 기법을 통한 합성 데이터는 관측된 데이터의 불균형을 해결하여 기계학습 모델의 CDOM 탐지 성능을 향상시킬 수 있으며, 저수지 내 유기 오염물질 관리를 위한 설계를 지원하는데 사용할 수 있을 것으로 판단된다.

  • PDF

A 12-Bit 2nd-order Noise-Shaping D/A Converter (12-Bit 2차 Noise-Shaping D/A 변환기)

  • 김대정;김성준;박재진;정덕균;김원찬
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.30A no.12
    • /
    • pp.98-107
    • /
    • 1993
  • This paper describes a design of a multi-bit oversampling noise-shaping D/A converter which achieves a resolution of 12 bits using oversampling technique. In the architecture the essential block which determines the whole accuracy is the analog internal D/A converter, and the designed charge-integration internal D/A converter adopts a differential structure in order to minimize the reduction of the resolution due to process variation. As the proposed circuit is driven by signal clocks which contains the information of the data variation from the noise-shaping coder, it minimizes the disadvantage of a charge-integration circuit in the time axis. In order to verify the circuit, it was integrated with the active area of 950$\times$650${\mu}m^{2}$ in a double metal 1.5-$\mu$m CMOS process, and testified that it can achieve a S/N ratio of 75 dB and a S/(N+D) ratio of 60 dB for the signal bandwidth of 9.6 kHz by the measurement with a spectrum analyzer.

  • PDF

Sigma-Delta A/D Converter for ADSL Modems (ADSL 모뎀용 시그마-델타 아날로그/디지털 변환기)

  • Han, Seung-Yub;Yu, Sang-Dae;Lee, Ju-Sang
    • Proceedings of the KIEE Conference
    • /
    • 2003.11c
    • /
    • pp.950-953
    • /
    • 2003
  • In this paper, sigma-delta A/D converter for ADSL modems using oversampling technique is designed. Conventionally, the oversampling A/D converter is consist of opamps, switched capacitors, quantizers, infernal D/A converters, and decimation filters. 3-bit flash A/D converter, 3-bit thermometer-based D/A converters, and sub-blocks are used for high speed operation. HSPICE simulator and CADENCE tool are used for verification and layout of the designed modulator. The internal A/D converter and D/A converters are operated at 130 MHz. In design of decimation filter Matlab is used for calculating coefficients and ModelSim and VHDL are used for design.

  • PDF