Search | Korea Science

Ensemble Gene Selection Method Based on Multiple Tree Models

Mingzhu Lou
- Journal of Information Processing Systems
- /
- v.19 no.5
- /
- pp.652-662
- /
- 2023
Identifying highly discriminating genes is a critical step in tumor recognition tasks based on microarray gene expression profile data and machine learning. Gene selection based on tree models has been the subject of several studies. However, these methods are based on a single-tree model, often not robust to ultra-highdimensional microarray datasets, resulting in the loss of useful information and unsatisfactory classification accuracy. Motivated by the limitations of single-tree-based gene selection, in this study, ensemble gene selection methods based on multiple-tree models were studied to improve the classification performance of tumor identification. Specifically, we selected the three most representative tree models: ID3, random forest, and gradient boosting decision tree. Each tree model selects top-n genes from the microarray dataset based on its intrinsic mechanism. Subsequently, three ensemble gene selection methods were investigated, namely multipletree model intersection, multiple-tree module union, and multiple-tree module cross-union, were investigated. Experimental results on five benchmark public microarray gene expression datasets proved that the multiple tree module union is significantly superior to gene selection based on a single tree model and other competitive gene selection methods in classification accuracy.
https://doi.org/10.3745/JIPS.04.0290 인용 PDF

Multiple SVM Classifier for Pattern Classification in Data Mining (데이터 마이닝에서 패턴 분류를 위한 다중 SVM 분류기)

Kim Man-Sun;Lee Sang-Yong
- Journal of the Korean Institute of Intelligent Systems
- /
- v.15 no.3
- /
- pp.289-293
- /
- 2005
Pattern classification extracts various types of pattern information expressing objects in the real world and decides their class. The top priority of pattern classification technologies is to improve the performance of classification and, for this, many researches have tried various approaches for the last 40 years. Classification methods used in pattern classification include base classifier based on the probabilistic inference of patterns, decision tree, method based on distance function, neural network and clustering but they are not efficient in analyzing a large amount of multi-dimensional data. Thus, there are active researches on multiple classifier systems, which improve the performance of classification by combining problems using a number of mutually compensatory classifiers. The present study identifies problems in previous researches on multiple SVM classifiers, and proposes BORSE, a model that, based on 1:M policy in order to expand SVM to a multiple class classifier, regards each SVM output as a signal with non-linear pattern, trains the neural network for the pattern and combine the final results of classification performance.
https://doi.org/10.5391/JKIIS.2005.15.3.289 인용 PDF KSCI

Guiding Practical Text Classification Framework to Optimal State in Multiple Domains

Choi, Sung-Pil;Myaeng, Sung-Hyon;Cho, Hyun-Yang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.3 no.3
- /
- pp.285-307
- /
- 2009
This paper introduces DICE, a Domain-Independent text Classification Engine. DICE is robust, efficient, and domain-independent in terms of software and architecture. Each module of the system is clearly modularized and encapsulated for extensibility. The clear modular architecture allows for simple and continuous verification and facilitates changes in multiple cycles, even after its major development period is complete. Those who want to make use of DICE can easily implement their ideas on this test bed and optimize it for a particular domain by simply adjusting the configuration file. Unlike other publically available tool kits or development environments targeted at general purpose classification models, DICE specializes in text classification with a number of useful functions specific to it. This paper focuses on the ways to locate the optimal states of a practical text classification framework by using various adaptation methods provided by the system such as feature selection, lemmatization, and classification models.
https://doi.org/10.3837/tiis.2009.03.005 인용 PDF

Parallel Multiple Hashing for Packet Classification

Jung, Yeo-Jin;Kim, Hye-Ran;Lim, Hye-Sook
- Proceedings of the IEEK Conference
- /
- 2004.06a
- /
- pp.171-174
- /
- 2004
Packet classification is an essential architectural component in implementing the quality-of-service (QoS) in today's Internet which provides a best-effort service to ail of its applications. Multiple header fields of incoming packets are compared against a set of rules in packet classification, the highest priority rule among matched rules is selected, and the packet is treated according to the action of the rule. In this Paper, we proposed a new packet classification scheme based on parallel multiple hashing on tuple spaces. Simulation results using real classifiers show that the proposed scheme provides very good performance on the required number of memory accesses and the memory size compared with previous works.
PDF

A Geostatisitical Study Using Qualitative Information for Multiple Rock Classification II. Application (다분적 암반분류를 위한 정성적 자료의 지구통계학적 연구- II. 응용)

유광호
- Geotechnical Engineering
- /
- v.14 no.1
- /
- pp.29-36
- /
- 1998
The application of a multiple rock classification method, which is a generalization of a binary rock classification, is studied in this paper. In particular, this paper shows how to incorporate qualitative data through a case study. The method suggested in this paper can be effectively used for a systematic multiple rock classification such as RMR system developed by Bieniawski. It will be very useful for rock classifications. In addition, it is known that the expected cost of errors can be atopted to indicate how well a investigation plan is made.
PDF

Automatic Text Categorization Using Hybrid Multiple Model Schemes (하이브리드 다중모델 학습기법을 이용한 자동 문서 분류)

명순희;김인철
- Journal of the Korean Society for information Management
- /
- v.19 no.4
- /
- pp.35-51
- /
- 2002
Inductive learning and classification techniques have been employed in various research and applications that organize textual data to solve the problem of information access. In this study, we develop hybrid model combination methods which incorporate the concepts and techniques for multiple modeling algorithms to improve the accuracy of text classification, and conduct experiments to evaluate the performances of proposed schemes. Boosted stacking, one of the extended stacking schemes proposed in this study yields higher accuracy relative to the conventional model combination methods and single classifiers.
https://doi.org/10.3743/KOSIM.2002.19.4.035 인용 PDF

Alternative accuracy for multiple ROC analysis

Hong, Chong Sun;Wu, Zhi Qiang
- Journal of the Korean Data and Information Science Society
- /
- v.25 no.6
- /
- pp.1521-1530
- /
- 2014
The ROC analysis is considered for multiple class diagnosis. There exist many criteria to find optimal thresholds and measure the accuracy of diagnostic tests for k dimensional ROC analysis. In this paper, we proposed a diagnostic accuracy measure called the correct classification simple rate, which is defined as the summation of true rates for each classification distribution and expressed as a function of summation of sequential true rates for two consecutive distributions. This measure does not weight accuracy across categories by the category prevalence and is comparable across populations for multiple class diagnosis. It is found that this accuracy measure does not only have a relationship with Kolmogorov - Smirnov statistics, but also can be represented as a linear function of some optimal threshold criteria. With these facts, the suggested measure could be applied to test for comparing multiple distributions.
https://doi.org/10.7465/jkdi.2014.25.6.1521 인용 PDF KSCI

Image classification methods applicable multiple satellite imagery

Jeong, Jae-Jun;Kim, Kyung-Ok;Lee, Jong-Hun
- Proceedings of the KSRS Conference
- /
- 2002.10a
- /
- pp.81-81
- /
- 2002
Classification is considered as one of the processes of extracting attributes from satellite imagery and is one of the usual functions in the commercial satellite image processing software. Accuracy of classification plays a key role in deciding the usage of its results. Many tremendous efforts far the higher accuracy have been done in such fields; training area selection, classification algorithm. Our research is one of these effort in different manners. In this research, we conduct classification using multiple satellite image data and evidential approach. We statistically consider the posterior probabilities and certainty in maximum likelihood classification and methodologically Dempster's orthogonal sums. Unfortunately, accuracy for the whole data sets has not assessed yet, but accuracy assessments in training fields and check fields shows accuracy improvement over 10% in overall accuracy and over 0.1 in kappa index.
PDF

Multiple image classification using label mapping (레이블 매핑을 이용한 다중 이미지 분류)

Jeon, Seung-Je;Lee, Dong-jun;Lee, DongHwi
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2022.05a
- /
- pp.367-369
- /
- 2022
In this paper, the predicted results were confirmed by label mapping for each class while implementing multi-class image classification to confirm accurate results for images in which the trained model failed classification. A CNN model was constructed and trained using Kaggle's Intel Image Classification dataset, and the mapped label values of multiple classes of images and the values classified by the model were compared by label mapping the images of the test dataset.
PDF

Classification of Multi-temporal SAR Data by Using Data Transform Based Features and Multiple Classifiers (자료변환 기반 특징과 다중 분류자를 이용한 다중시기 SAR자료의 분류)

Yoo, Hee Young;Park, No-Wook;Hong, Sukyoung;Lee, Kyungdo;Kim, Yeseul
- Korean Journal of Remote Sensing
- /
- v.31 no.3
- /
- pp.205-214
- /
- 2015
In this study, a novel land-cover classification framework for multi-temporal SAR data is presented that can combine multiple features extracted through data transforms and multiple classifiers. At first, data transforms using principle component analysis (PCA) and 3D wavelet transform are applied to multi-temporal SAR dataset for extracting new features which were different from original dataset. Then, three different classifiers including maximum likelihood classifier (MLC), neural network (NN) and support vector machine (SVM) are applied to three different dataset including data transform based features and original backscattering coefficients, and as a result, the diverse preliminary classification results are generated. These results are combined via a majority voting rule to generate a final classification result. From an experiment with a multi-temporal ENVISAT ASAR dataset, every preliminary classification result showed very different classification accuracy according to the used feature and classifier. The final classification result combining nine preliminary classification results showed the best classification accuracy because each preliminary classification result provided complementary information on land-covers. The improvement of classification accuracy in this study was mainly attributed to the diversity from combining not only different features based on data transforms, but also different classifiers. Therefore, the land-cover classification framework presented in this study would be effectively applied to the classification of multi-temporal SAR data and also be extended to multi-sensor remote sensing data fusion.
https://doi.org/10.7780/kjrs.2015.31.3.1 인용 PDF KSCI

Search Result 1,104, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)