Search | Korea Science

Modified Version of SVM for Text Categorization

Jo, Tae-Ho
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.8 no.1
- /
- pp.52-60
- /
- 2008
This research proposes a new strategy where documents are encoded into string vectors for text categorization and modified versions of SVM to be adaptable to string vectors. Traditionally, when the traditional version of SVM is used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and apply the modified version of SVM adaptable to string vectors for text categorization.
https://doi.org/10.5391/IJFIS.2008.8.1.052 인용 PDF KSCI

Inverted Index based Modified Version of KNN for Text Categorization

Jo, Tae-Ho
- Journal of Information Processing Systems
- /
- v.4 no.1
- /
- pp.17-26
- /
- 2008
This research proposes a new strategy where documents are encoded into string vectors and modified version of KNN to be adaptable to string vectors for text categorization. Traditionally, when KNN are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the supervised learning algorithms adaptable to string vectors for text categorization.
https://doi.org/10.3745/JIPS.2008.4.1.017 인용 PDF KSCI

Improving the Performance of Statistical Automatic Text Categorization by using Phrasal Patterns and Keyword Sets (구문 패턴과 키워드 집합을 이용한 통계적 자동 문서 분류의 성능 향상)

Han, Jeong-Gi;Park, Min-Gyu;Jo, Gwang-Je;Kim, Jun-Tae
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.4
- /
- pp.1150-1159
- /
- 2000
This paper presents an automatic text categorization model that improves the accuracy by combining statistical and knowledge-based categorization methods. In our model we apply knowledge-based method first, and then apply statistical method on the text which are not categorized by knowledge-based method. By using this combined method, we can improve the accuracy of categorization while categorize all the texts without failure. For statistical categorization, the vector model with Inverted Category Frequency (ICF) weighting is used. For knowledge-based categorization, Phrasal Patterns and Keyword Sets are introduced to represent sentence patterns, and then pattern matching is performed. Experimental results on new articles show that the accuracy of categorization can be improved by combining the tow different categorization methods.
PDF

Effects of categorization training and expertise on cognitive problem solving (범주화 훈련과 전문성이 인지 문제 해결에 미치는 영향)

Lee Hee Seung;Sohn Young Woo
- Korean Journal of Cognitive Science
- /
- v.16 no.1
- /
- pp.53-67
- /
- 2005
Present study identified categorization pattern differences between experts and novices and examined whether categorization training has positive effects on problem solving. In experiment I, we examined categorization differences between groups according to expertise using mathematical equation problems. Experts classified problems based on deep structure related to problem solution methods whereas novices classified problems based on surface features. However, in the labeled categorization condition, novices' categorization pattern was not different from experts'. These results suggest that novices have difficulty identifying deep structure of problems. In experiment 2, we examined whether categorization training showing subjects deep structure of problems explicitly increases transfer performance. The results showed that solution training was more effective to expert group whereas categorization training was more effective to novice group. We have discussed that different training methods should be applied according to expertise.
PDF

A Robust Pattern-based Feature Extraction Method for Sentiment Categorization of Korean Customer Reviews (강건한 한국어 상품평의 감정 분류를 위한 패턴 기반 자질 추출 방법)

Shin, Jun-Soo;Kim, Hark-Soo
- Journal of KIISE:Software and Applications
- /
- v.37 no.12
- /
- pp.946-950
- /
- 2010
Many sentiment categorization systems based on machine learning methods use morphological analyzers in order to extract linguistic features from sentences. However, the morphological analyzers do not generally perform well in a customer review domain because online customer reviews include many spacing errors and spelling errors. These low performances of the underlying systems lead to performance decreases of the sentiment categorization systems. To resolve this problem, we propose a feature extraction method based on simple longest matching of Eojeol (a Korean spacing unit) and phoneme patterns. The two kinds of patterns are automatically constructed from a large amount of POS (part-of-speech) tagged corpus. Eojeol patterns consist of Eojeols including content words such as nouns and verbs. Phoneme patterns consist of leading consonant and vowel pairs of predicate words such as verbs and adjectives because spelling errors seldom occur in leading consonants and vowels. To evaluate the proposed method, we implemented a sentiment categorization system using a SVM (Support Vector Machine) as a machine learner. In the experiment with Korean customer reviews, the sentiment categorization system using the proposed method outperformed that using a morphological analyzer as a feature extractor.
PDF KSCI

A Study on Handwritten Digit Categorization of RAM-based Neural Network (RAM 기반 신경망을 이용한 필기체 숫자 분류 연구)

Park, Sang-Moo;Kang, Man-Mo;Eom, Seong-Hoon
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.12 no.3
- /
- pp.201-207
- /
- 2012
A RAM-based neural network is a weightless neural network based on binary neural network(BNN) which is efficient neural network with a one-shot learning. RAM-based neural network has multiful information bits and store counts of training in BNN. Supervised learning based on the RAM-based neural network has the excellent performance in pattern recognition but in pattern categorization with unsupervised learning as unsuitable. In this paper, we propose a unsupervised learning algorithm in the RAM-based neural network to perform pattern categorization. By the proposed unsupervised learning algorithm, RAM-based neural network create categories depending on the input pattern by itself. Therefore, RAM-based neural network for supervised learning and unsupervised learning should proof of all possible complex models. The training data for experiments provided by the MNIST offline handwritten digits which is consist of 0 to 9 multi-pattern.
https://doi.org/10.7236/JIWIT.2012.12.3.201 인용 PDF KSCI

A Design of SPO for the Conceptual Systematization of Software Patterns (소프트웨어 패턴의 개념적 체계화를 위한 SPO 설계)

Hong, Hyeun-Sool;Han, Sung-Kook
- Journal of the Institute of Electronics Engineers of Korea TE
- /
- v.39 no.3
- /
- pp.71-82
- /
- 2002
The software pattern is knowledge representation derived from the verified solutions or the experience of the experts. On account of the design varieties of software development, however, it is not the facilitated task to discover the best proper software pattern. This situation requires that software patterns be categorized in terms of their innate concepts. This paper proposes software pattern ontology(SPO) for the systematic categorization of software patterns by means of conceptual properties of patterns after the comparative analysis of association between software pattern and ontology. The SPO presented in this paper can establish the basis for the software pattern management system at the conceptual level. This paper also shows an idea for the application by unifying conceptual properties of software pattern and ontology.
PDF KSCI

Efficient Implementing of DNA Computing-inspired Pattern Classifier Using GPU (GPU를 이용한 DNA 컴퓨팅 기반 패턴 분류기의 효율적 구현)

Choi, Sun-Wook;Lee, Chong-Ho
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.58 no.7
- /
- pp.1424-1434
- /
- 2009
DNA computing-inspired pattern classification based on the hypernetwork model is a novel approach to pattern classification problems. The hypernetwork model has been shown to be a powerful tool for multi-class data analysis. However, the ordinary hypernetwork model has limitations, such as operating sequentially only. In this paper, we propose a efficient implementing method of DNA computing-inspired pattern classifier using GPU. We show simulation results of multi-class pattern classification from hand-written digit data, DNA microarray data and 8 category scene data for performance evaluation. and we also compare of operation time of the proposed DNA computing-inspired pattern classifier on each operating environments such as CPU and GPU. Experiment results show competitive diagnosis results over other conventional machine learning algorithms. We could confirm the proposed DNA computing-inspired pattern classifier, designed on GPU using CUDA platform, which is suitable for multi-class data classification. And its operating speed is fast enough to comply point-of-care diagnostic purpose and real-time scene categorization and hand-written digit data classification.
PDF KSCI

A Study on the Demand Prediction Model for Repair Parts of Automotive After-sales Service Center Using LSTM Artificial Neural Network (LSTM 인공신경망을 이용한 자동차 A/S센터 수리 부품 수요 예측 모델 연구)

Jung, Dong Kun;Park, Young Sik
- The Journal of Information Systems
- /
- v.31 no.3
- /
- pp.197-220
- /
- 2022
Purpose The purpose of this study is to identifies the demand pattern categorization of repair parts of Automotive After-sales Service(A/S) and proposes a demand prediction model for Auto repair parts using Long Short-Term Memory (LSTM) of artificial neural networks (ANN). The optimal parts inventory quantity prediction model is implemented by applying daily, weekly, and monthly the parts demand data to the LSTM model for the Lumpy demand which is irregularly in a specific period among repair parts of the Automotive A/S service. Design/methodology/approach This study classified the four demand pattern categorization with 2 years demand time-series data of repair parts according to the Average demand interval(ADI) and coefficient of variation (CV²) of demand size. Of the 16,295 parts in the A/S service shop studied, 96.5% had a Lumpy demand pattern that large quantities occurred at a specific period. lumpy demand pattern's repair parts in the last three years is predicted by applying them to the LSTM for daily, weekly, and monthly time-series data. as the model prediction performance evaluation index, MAPE, RMSE, and RMSLE that can measure the error between the predicted value and the actual value were used. Findings As a result of this study, Daily time-series data were excellently predicted as indicators with the lowest MAPE, RMSE, and RMSLE values, followed by Weekly and Monthly time-series data. This is due to the decrease in training data for Weekly and Monthly. even if the demand period is extended to get the training data, the prediction performance is still low due to the discontinuation of current vehicle models and the use of alternative parts that they are contributed to no more demand. Therefore, sufficient training data is important, but the selection of the prediction demand period is also a critical factor.
https://doi.org/10.5859/KAIS.2022.31.3.197 인용 PDF KSCI

A Coupled-ART Neural Network Capable of Modularized Categorization of Patterns (복합 특징의 분리 처리를 위한 모듈화된 Coupled-ART 신경회로망)

우용태;이남일;안광선
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.19 no.10
- /
- pp.2028-2042
- /
- 1994
Properly defining signal and noise in a self-organizing system like ART(Adaptive Resonance Theory) neural network model raises a number of subtle issues. Pattern context must enter the definition so that input features, treated as irrelevant noise when they are embedded in a given input pattern, may be treated as informative signals when they are embedded in a different input pattern. The ATR automatically self-scales their computational units to embody context and learning dependent definitions of a signal and noise and there is no problem in categorizing input pattern that have features similar in nature. However, when we have imput patterns that have features that are different in size and nature, the use of only one vigilance parameter is not enough to differentiate a signal from noise for a good categorization. For example, if the value fo vigilance parameter is large, then noise may be processed as an informative signal and unnecessary categories are generated: and if the value of vigilance parameter is small, an informative signal may be ignored and treated as noise. Hence it is no easy to achieve a good pattern categorization. To overcome such problems, a Coupled-ART neural network capable of modularized categorization of patterns is proposed. The Coupled-ART has two layer of tightly coupled modules. the upper and the lower. The lower layer processes the global features of a pattern and the structural features, separately in parallel. The upper layer combines the categorized outputs from the lower layer and categorizes the combined output, Hence, due to the modularized categorization of patterns, the Coupled-ART classifies patterns more efficiently than the ART1 model.
PDF

Search Result 53, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)