Search | Korea Science

Classification for Imbalanced Breast Cancer Dataset Using Resampling Methods

Hana Babiker, Nassar
- International Journal of Computer Science & Network Security
- /
- v.23 no.1
- /
- pp.89-95
- /
- 2023
Analyzing breast cancer patient files is becoming an exciting area of medical information analysis, especially with the increasing number of patient files. In this paper, breast cancer data is collected from Khartoum state hospital, and the dataset is classified into recurrence and no recurrence. The data is imbalanced, meaning that one of the two classes have more sample than the other. Many pre-processing techniques are applied to classify this imbalanced data, resampling, attribute selection, and handling missing values, and then different classifiers models are built. In the first experiment, five classifiers (ANN, REP TREE, SVM, and J48) are used, and in the second experiment, meta-learning algorithms (Bagging, Boosting, and Random subspace). Finally, the ensemble model is used. The best result was obtained from the ensemble model (Boosting with J48) with the highest accuracy 95.2797% among all the algorithms, followed by Bagging with J48(90.559%) and random subspace with J48(84.2657%). The breast cancer imbalanced dataset was classified into recurrence, and no recurrence with different classified algorithms and the best result was obtained from the ensemble model.
https://doi.org/10.22937/IJCSNS.2023.23.1.12 인용 PDF

Object Classification Method Using Dynamic Random Forests and Genetic Optimization

Kim, Jae Hyup;Kim, Hun Ki;Jang, Kyung Hyun;Lee, Jong Min;Moon, Young Shik
- Journal of the Korea Society of Computer and Information
- /
- v.21 no.5
- /
- pp.79-89
- /
- 2016
In this paper, we proposed the object classification method using genetic and dynamic random forest consisting of optimal combination of unit tree. The random forest can ensure good generalization performance in combination of large amount of trees by assigning the randomization to the training samples and feature selection, etc. allocated to the decision tree as an ensemble classification model which combines with the unit decision tree based on the bagging. However, the random forest is composed of unit trees randomly, so it can show the excellent classification performance only when the sufficient amounts of trees are combined. There is no quantitative measurement method for the number of trees, and there is no choice but to repeat random tree structure continuously. The proposed algorithm is composed of random forest with a combination of optimal tree while maintaining the generalization performance of random forest. To achieve this, the problem of improving the classification performance was assigned to the optimization problem which found the optimal tree combination. For this end, the genetic algorithm methodology was applied. As a result of experiment, we had found out that the proposed algorithm could improve about 3~5% of classification performance in specific cases like common database and self infrared database compare with the existing random forest. In addition, we had shown that the optimal tree combination was decided at 55~60% level from the maximum trees.
https://doi.org/10.9708/jksci.2016.21.5.079 인용 PDF KSCI

An Ensemble Classifier using Two Dimensional LDA

Park, Cheong-Hee
- Journal of Korea Multimedia Society
- /
- v.13 no.6
- /
- pp.817-824
- /
- 2010
Linear Discriminant Analysis (LDA) has been successfully applied for dimension reduction in face recognition. However, LDA requires the transformation of a face image to a one-dimensional vector and this process can cause the correlation information among neighboring pixels to be disregarded. On the other hand, 2D-LDA uses 2D images directly without a transformation process and it has been shown to be superior to the traditional LDA. Nevertheless, there are some problems in 2D-LDA. First, it is difficult to determine the optimal number of feature vectors in a reduced dimensional space. Second, the size of rectangular windows used in 2D-LDA makes strong impacts on classification accuracies but there is no reliable way to determine an optimal window size. In this paper, we propose a new algorithm to overcome those problems in 2D-LDA. We adopt an ensemble approach which combines several classifiers obtained by utilizing various window sizes. And a practical method to determine the number of feature vectors is also presented. Experimental results demonstrate that the proposed method can overcome the difficulties with choosing an optimal window size and the number of feature vectors.
PDF KSCI

Ensemble variable selection using genetic algorithm

Seogyoung, Lee;Martin Seunghwan, Yang;Jongkyeong, Kang;Seung Jun, Shin
- Communications for Statistical Applications and Methods
- /
- v.29 no.6
- /
- pp.629-640
- /
- 2022
Variable selection is one of the most crucial tasks in supervised learning, such as regression and classification. The best subset selection is straightforward and optimal but not practically applicable unless the number of predictors is small. In this article, we propose directly solving the best subset selection via the genetic algorithm (GA), a popular stochastic optimization algorithm based on the principle of Darwinian evolution. To further improve the variable selection performance, we propose to run multiple GA to solve the best subset selection and then synthesize the results, which we call ensemble GA (EGA). The EGA significantly improves variable selection performance. In addition, the proposed method is essentially the best subset selection and hence applicable to a variety of models with different selection criteria. We compare the proposed EGA to existing variable selection methods under various models, including linear regression, Poisson regression, and Cox regression for survival data. Both simulation and real data analysis demonstrate the promising performance of the proposed method.
https://doi.org/10.29220/CSAM.2022.29.6.629 인용 PDF KSCI

Study on the Functional Architecture and Improvement Accuracy for Auto Target Classification on the SAR Image by using CNN Ensemble Model based on the Radar System for the Fighter (전투기용 레이다 기반 SAR 영상 자동표적분류 기능 구조 및 CNN 앙상블 모델을 이용한 표적분류 정확도 향상 방안 연구)

Lim, Dong Ju;Song, Se Ri;Park, Peom
- Journal of the Korean Society of Systems Engineering
- /
- v.16 no.1
- /
- pp.51-57
- /
- 2020
The fighter pilot uses radar mounted on the fighter to obtain high-resolution SAR (Synthetic Aperture Radar) images for a specific area of distance, and then the pilot visually classifies targets within the image. However, the target configuration captured in the SAR image is relatively small in size, and distortion of that type occurs depending on the depression angle, making it difficult for pilot to classify the type of target. Also, being present with various types of clutters, there should be errors in target classification and pilots should be even worse if tasks such as navigation and situational awareness are carried out simultaneously. In this paper, the concept of operation and functional structure of radar system for fighter jets were presented to transfer the SAR image target classification task of fighter pilots to radar system, and the method of target classification with high accuracy was studied using the CNN ensemble model to archive higher classification accuracy than single CNN model.
https://doi.org/10.14248/JKOSSE.2020.16.1.051 인용 PDF KSCI

Extreme Learning Machine Ensemble Using Bagging for Facial Expression Recognition

Ghimire, Deepak;Lee, Joonwhoan
- Journal of Information Processing Systems
- /
- v.10 no.3
- /
- pp.443-458
- /
- 2014
An extreme learning machine (ELM) is a recently proposed learning algorithm for a single-layer feed forward neural network. In this paper we studied the ensemble of ELM by using a bagging algorithm for facial expression recognition (FER). Facial expression analysis is widely used in the behavior interpretation of emotions, for cognitive science, and social interactions. This paper presents a method for FER based on the histogram of orientation gradient (HOG) features using an ELM ensemble. First, the HOG features were extracted from the face image by dividing it into a number of small cells. A bagging algorithm was then used to construct many different bags of training data and each of them was trained by using separate ELMs. To recognize the expression of the input face image, HOG features were fed to each trained ELM and the results were combined by using a majority voting scheme. The ELM ensemble using bagging improves the generalized capability of the network significantly. The two available datasets (JAFFE and CK+) of facial expressions were used to evaluate the performance of the proposed classification system. Even the performance of individual ELM was smaller and the ELM ensemble using a bagging algorithm improved the recognition performance significantly.
https://doi.org/10.3745/JIPS.02.0004 인용 PDF KSCI

On the Classfication by an Improved Pairwise Coupling Algorithm (향상된 PAIRWISE COUPLING 알고리즘에 의한 자료의 분류)

최대우;윤중식
- The Korean Journal of Applied Statistics
- /
- v.13 no.2
- /
- pp.415-425
- /
- 2000
We proposed a new classification algorithm based on bootstrap sampling and pairwise coupling method. Also, for comparing the accuracy of a proposed algorithm with those of old methods, we conducted classification with waveform data and others.
PDF

Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts (소셜 텍스트의 주요 정보 추출을 위한 로지스틱 회귀 앙상블 기법)

Kim, So Hyeon;Kim, Han Joon
- KIPS Transactions on Software and Data Engineering
- /
- v.6 no.5
- /
- pp.279-284
- /
- 2017
Currenty, in the era of big data, text mining and opinion mining have been used in many domains, and one of their most important research issues is to extract significant information from social media. Thus in this paper, we propose a logistic regression ensemble method of finding the main body text from blog HTML. First, we extract structural features and text features from blog HTML tags. Then we construct a classification model with logistic regression and ensemble that can decide whether any given tags involve main body text or not. One of our important findings is that the main body text can be found through 'depth' features extracted from HTML tags. In our experiment using diverse topics of blog data collected from the web, our tag classification model achieved 99% in terms of accuracy, and it recalled 80.5% of documents that have tags involving the main body text.
https://doi.org/10.3745/KTSDE.2017.6.5.279 인용 PDF KSCI

Pattern Selection Using the Bias and Variance of Ensemble (앙상블의 편기와 분산을 이용한 패턴 선택)

Shin, Hyunjung;Cho, Sungzoon
- Journal of Korean Institute of Industrial Engineers
- /
- v.28 no.1
- /
- pp.112-127
- /
- 2002
A useful pattern is a pattern that contributes much to learning. For a classification problem those patterns near the class boundary surfaces carry more information to the classifier. For a regression problem the ones near the estimated surface carry more information. In both cases, the usefulness is defined only for those patterns either without error or with negligible error. Using only the useful patterns gives several benefits. First, computational complexity in memory and time for learning is decreased. Second, overfitting is avoided even when the learner is over-sized. Third, learning results in more stable learners. In this paper, we propose a pattern 'utility index' that measures the utility of an individual pattern. The utility index is based on the bias and variance of a pattern trained by a network ensemble. In classification, the pattern with a low bias and a high variance gets a high score. In regression, on the other hand, the one with a low bias and a low variance gets a high score. Based on the distribution of the utility index, the original training set is divided into a high-score group and a low-score group. Only the high-score group is then used for training. The proposed method is tested on synthetic and real-world benchmark datasets. The proposed approach gives a better or at least similar performance.
PDF KSCI

Malwares Attack Detection Using Ensemble Deep Restricted Boltzmann Machine

K. Janani;R. Gunasundari
- International Journal of Computer Science & Network Security
- /
- v.24 no.5
- /
- pp.64-72
- /
- 2024
In recent times cyber attackers can use Artificial Intelligence (AI) to boost the sophistication and scope of attacks. On the defense side, AI is used to enhance defense plans, to boost the robustness, flexibility, and efficiency of defense systems, which means adapting to environmental changes to reduce impacts. With increased developments in the field of information and communication technologies, various exploits occur as a danger sign to cyber security and these exploitations are changing rapidly. Cyber criminals use new, sophisticated tactics to boost their attack speed and size. Consequently, there is a need for more flexible, adaptable and strong cyber defense systems that can identify a wide range of threats in real-time. In recent years, the adoption of AI approaches has increased and maintained a vital role in the detection and prevention of cyber threats. In this paper, an Ensemble Deep Restricted Boltzmann Machine (EDRBM) is developed for the classification of cybersecurity threats in case of a large-scale network environment. The EDRBM acts as a classification model that enables the classification of malicious flowsets from the largescale network. The simulation is conducted to test the efficacy of the proposed EDRBM under various malware attacks. The simulation results show that the proposed method achieves higher classification rate in classifying the malware in the flowsets i.e., malicious flowsets than other methods.
https://doi.org/10.22937/IJCSNS.2024.24.5.7 인용 PDF

Search Result 119, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)