Search | Korea Science

Text Classification for Patents: Experiments with Unigrams, Bigrams and Different Weighting Methods

Im, ChanJong;Kim, DoWan;Mandl, Thomas
- International Journal of Contents
- /
- v.13 no.2
- /
- pp.66-74
- /
- 2017
Patent classification is becoming more critical as patent filings have been increasing over the years. Despite comprehensive studies in the area, there remain several issues in classifying patents on IPC hierarchical levels. Not only structural complexity but also shortage of patents in the lower level of the hierarchy causes the decline in classification performance. Therefore, we propose a new method of classification based on different criteria that are categories defined by the domain's experts mentioned in trend analysis reports, i.e. Patent Landscape Report (PLR). Several experiments were conducted with the purpose of identifying type of features and weighting methods that lead to the best classification performance using Support Vector Machine (SVM). Two types of features (noun and noun phrases) and five different weighting schemes (TF-idf, TF-rf, TF-icf, TF-icf-based, and TF-idcef-based) were experimented on.
https://doi.org/10.5392/IJoC.2017.13.2.066 인용 PDF KSCI

A Performance Enhancement of Osteoporosis Classification in CT images (CT 영상에서 골다공증 판별 방법의 성능 향상)

Jung, Sung-Tae
- Journal of Korea Multimedia Society
- /
- v.19 no.8
- /
- pp.1248-1259
- /
- 2016
Classification methods based on dual energy X-ray absorptiometry, ultrasonic waves, and quantitative computed tomography have been proposed. Also, a classification method based on machine learning with bone mineral density and structural indicators extracted from the CT images has been proposed. We propose a method which enhances the performance of existing classification method based on bone mineral density and structural indicators by extending structural indicators and using principal component analysis. Experimental result shows that the proposed method in this paper improves the correctness of osteoporosis classification 2.8% with extended structural indicators only and 4.8% with both extended structural indicators and principal component analysis. In addition, this paper proposes a method of automatic phantom analysis needed to convert the CT values to BMD values. While existing method requires manual operation to mark the bone region within the phantom, the proposed method detects the bone region automatically by detecting circles in the CT image. The proposed method and the existing method gave the same conversion formula for converting CT value to bone mineral density.
https://doi.org/10.9717/kmms.2016.19.8.1248 인용 PDF KSCI KPUBS HTML

A Data-centric Analysis to Evaluate Suitable Machine-Learning-based Network-Attack Classification Schemes

Huong, Truong Thu;Bac, Ta Phuong;Thang, Bui Doan;Long, Dao Minh;Quang, Le Anh;Dan, Nguyen Minh;Hoang, Nguyen Viet
- International Journal of Computer Science & Network Security
- /
- v.21 no.6
- /
- pp.169-180
- /
- 2021
Since machine learning was invented, there have been many different machine learning-based algorithms, from shallow learning to deep learning models, that provide solutions to the classification tasks. But then it poses a problem in choosing a suitable classification algorithm that can improve the classification/detection efficiency for a certain network context. With that comes whether an algorithm provides good performance, why it works in some problems and not in others. In this paper, we present a data-centric analysis to provide a way for selecting a suitable classification algorithm. This data-centric approach is a new viewpoint in exploring relationships between classification performance and facts and figures of data sets.
https://doi.org/10.22937/IJCSNS.2021.21.6.23 인용 PDF KSCI

Tuning the Architecture of Neural Networks for Multi-Class Classification (다집단 분류 인공신경망 모형의 아키텍쳐 튜닝)

Jeong, Chulwoo;Min, Jae H.
- Journal of the Korean Operations Research and Management Science Society
- /
- v.38 no.1
- /
- pp.139-152
- /
- 2013
The purpose of this study is to claim the validity of tuning the architecture of neural network models for multi-class classification. A neural network model for multi-class classification is basically constructed by building a series of neural network models for binary classification. Building a neural network model, we are required to set the values of parameters such as number of hidden nodes and weight decay parameter in advance, which draws special attention as the performance of the model can be quite different by the values of the parameters. For better performance of the model, it is absolutely necessary to have a prior process of tuning the parameters every time the neural network model is built. Nonetheless, previous studies have not mentioned the necessity of the tuning process or proved its validity. In this study, we claim that we should tune the parameters every time we build the neural network model for multi-class classification. Through empirical analysis using wine data, we show that the performance of the model with the tuned parameters is superior to those of untuned models.
https://doi.org/10.7737/JKORMS.2013.38.1.139 인용 PDF KSCI

Text Classification with Heterogeneous Data Using Multiple Self-Training Classifiers

William Xiu Shun Wong;Donghoon Lee;Namgyu Kim
- Asia pacific journal of information systems
- /
- v.29 no.4
- /
- pp.789-816
- /
- 2019
Text classification is a challenging task, especially when dealing with a huge amount of text data. The performance of a classification model can be varied depending on what type of words contained in the document corpus and what type of features generated for classification. Aside from proposing a new modified version of the existing algorithm or creating a new algorithm, we attempt to modify the use of data. The classifier performance is usually affected by the quality of learning data as the classifier is built based on these training data. We assume that the data from different domains might have different characteristics of noise, which can be utilized in the process of learning the classifier. Therefore, we attempt to enhance the robustness of the classifier by injecting the heterogeneous data artificially into the learning process in order to improve the classification accuracy. Semi-supervised approach was applied for utilizing the heterogeneous data in the process of learning the document classifier. However, the performance of document classifier might be degraded by the unlabeled data. Therefore, we further proposed an algorithm to extract only the documents that contribute to the accuracy improvement of the classifier.
https://doi.org/10.14329/apjis.2019.29.4.789 인용 PDF

Performance Comparison of Automatic Classification Using Word Embeddings of Book Titles (단행본 서명의 단어 임베딩에 따른 자동분류의 성능 비교)

Yong-Gu Lee
- Journal of the Korean Society for information Management
- /
- v.40 no.4
- /
- pp.307-327
- /
- 2023
To analyze the impact of word embedding on book titles, this study utilized word embedding models (Word2vec, GloVe, fastText) to generate embedding vectors from book titles. These vectors were then used as classification features for automatic classification. The classifier utilized the k-nearest neighbors (kNN) algorithm, with the categories for automatic classification based on the DDC (Dewey Decimal Classification) main class 300 assigned by libraries to books. In the automatic classification experiment applying word embeddings to book titles, the Skip-gram architectures of Word2vec and fastText showed better results in the automatic classification performance of the kNN classifier compared to the TF-IDF features. In the optimization of various hyperparameters across the three models, the Skip-gram architecture of the fastText model demonstrated overall good performance. Specifically, better performance was observed when using hierarchical softmax and larger embedding dimensions as hyperparameters in this model. From a performance perspective, fastText can generate embeddings for substrings or subwords using the n-gram method, which has been shown to increase recall. The Skip-gram architecture of the Word2vec model generally showed good performance at low dimensions(size 300) and with small sizes of negative sampling (3 or 5).
https://doi.org/10.3743/KOSIM.2023.40.4.307 인용 PDF

Learning Deep Representation by Increasing ConvNets Depth for Few Shot Learning

Fabian, H.S. Tan;Kang, Dae-Ki
- International journal of advanced smart convergence
- /
- v.8 no.4
- /
- pp.75-81
- /
- 2019
Though recent advancement of deep learning methods have provided satisfactory results from large data domain, somehow yield poor performance on few-shot classification tasks. In order to train a model with strong performance, i.e. deep convolutional neural network, it depends heavily on huge dataset and the labeled classes of the dataset can be extremely humongous. The cost of human annotation and scarcity of the data among the classes have drastically limited the capability of current image classification model. On the contrary, humans are excellent in terms of learning or recognizing new unseen classes with merely small set of labeled examples. Few-shot learning aims to train a classification model with limited labeled samples to recognize new classes that have neverseen during training process. In this paper, we increase the backbone depth of the embedding network in orderto learn the variation between the intra-class. By increasing the network depth of the embedding module, we are able to achieve competitive performance due to the minimized intra-class variation.
https://doi.org/10.7236/IJASC.2019.8.4.75 인용 PDF KSCI

A Study on the Dynamic Flow Classification for IP Switching (IP 스위칭에서 동적 흐름 분류에 관한 연구)

이우승;정운석;박광채
- Proceedings of the IEEK Conference
- /
- 2000.06c
- /
- pp.169-172
- /
- 2000
IP Switching is a new routing technology proposed to improve the performance of IP routers. Flow classification is one of the key issues in IP Switching. To achieve better performance, flow classification should be matched to the varying IP traffic and an IP switch should make use of its hardware switching resources as fully as possible. This paper proposes an adaptive flow classification algorithm for IP Switching. By dynamically adjusting the values of its control parameters in response to the present usage of the hardware switching resources, this adaptive algorithm can efficiently match the varying IP traffic and thus improve the performance of an IP switch.
PDF

A New Hybrid Algorithm for Invariance and Improved Classification Performance in Image Recognition

Shi, Rui-Xia;Jeong, Dong-Gyu
- International journal of advanced smart convergence
- /
- v.9 no.3
- /
- pp.85-96
- /
- 2020
It is important to extract salient object image and to solve the invariance problem for image recognition. In this paper we propose a new hybrid algorithm for invariance and improved classification performance in image recognition, whose algorithm is combined by FT(Frequency-tuned Salient Region Detection) algorithm, Guided filter, Zernike moments, and a simple artificial neural network (Multi-layer Perceptron). The conventional FT algorithm is used to extract initial salient object image, the guided filtering to preserve edge details, Zernike moments to solve invariance problem, and a classification to recognize the extracted image. For guided filtering, guided filter is used, and Multi-layer Perceptron which is a simple artificial neural networks is introduced for classification. Experimental results show that this algorithm can achieve a superior performance in the process of extracting salient object image and invariant moment feature. And the results show that the algorithm can also classifies the extracted object image with improved recognition rate.
https://doi.org/10.7236/IJASC.2020.9.3.85 인용 PDF KSCI

An Empirical Study on Improving the Performance of Text Categorization Considering the Relationships between Feature Selection Criteria and Weighting Methods (자질 선정 기준과 가중치 할당 방식간의 관계를 고려한 문서 자동분류의 개선에 대한 연구)

Lee Jae-Yun
- Journal of the Korean Society for Library and Information Science
- /
- v.39 no.2
- /
- pp.123-146
- /
- 2005
This study aims to find consistent strategies for feature selection and feature weighting methods, which can improve the effectiveness and efficiency of kNN text classifier. Feature selection criteria and feature weighting methods are as important factor as classification algorithms to achieve good performance of text categorization systems. Most of the former studies chose conflicting strategies for feature selection criteria and weighting methods. In this study, the performance of several feature selection criteria are measured considering the storage space for inverted index records and the classification time. The classification experiments in this study are conducted to examine the performance of IDF as feature selection criteria and the performance of conventional feature selection criteria, e.g. mutual information, as feature weighting methods. The results of these experiments suggest that using those measures which prefer low-frequency features as feature selection criterion and also as feature weighting method. we can increase the classification speed up to three or five times without loosing classification accuracy.
https://doi.org/10.4275/KSLIS.2005.39.2.123 인용 PDF

Search Result 3,704, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)