Search | Korea Science

Inverted Index based Modified Version of K-Means Algorithm for Text Clustering

Jo, Tae-Ho
- Journal of Information Processing Systems
- /
- v.4 no.2
- /
- pp.67-76
- /
- 2008
This research proposes a new strategy where documents are encoded into string vectors and modified version of k means algorithm to be adaptable to string vectors for text clustering. Traditionally, when k means algorithm is used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text clustering, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the k means algorithm adaptable to string vectors for text clustering.
https://doi.org/10.3745/JIPS.2008.4.2.067 인용 PDF KSCI

Inverted Index based Modified Version of KNN for Text Categorization

Jo, Tae-Ho
- Journal of Information Processing Systems
- /
- v.4 no.1
- /
- pp.17-26
- /
- 2008
This research proposes a new strategy where documents are encoded into string vectors and modified version of KNN to be adaptable to string vectors for text categorization. Traditionally, when KNN are used for pattern classification, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern classification. For example, in text categorization, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In this research, we encode full texts into string vectors, and modify the supervised learning algorithms adaptable to string vectors for text categorization.
https://doi.org/10.3745/JIPS.2008.4.1.017 인용 PDF KSCI

Building a Hierarchy of Product Categories through Text Analysis of Product Description (텍스트 분석을 통한 제품 분류 체계 수립방안: 관광분야 App을 중심으로)

Lim, Hyuna;Choi, Jaewon;Lee, Hong Joo
- Knowledge Management Research
- /
- v.20 no.3
- /
- pp.139-154
- /
- 2019
With the increasing use of smartphone apps, many apps are coming out in various fields. In order to analyze the current status and trends of apps in a specific field, it is necessary to establish a classification scheme. Various schemes considering users' behavior and characteristics of apps have been proposed, but there is a problem in that many apps are released and a fixed classification scheme must be updated according to the passage of time. Although it is necessary to consider many aspects in establishing classification scheme, it is possible to grasp the trend of the app through the proposal of a classification scheme according to the characteristic of the app. This research proposes a method of establishing an app classification scheme through the description of the app written by the app developers. For this purpose, we collected explanations about apps in the tourism field and identified major categories through topic modeling. Using only the apps corresponding to the topic, we construct a network of words contained in the explanatory text and identify subcategories based on the networks of words. Six topics were selected, and Clauset Newman Moore algorithm was applied to each topic to identify subcategories. Four or five subcategories were identified for each topic.
https://doi.org/10.15813/kmr.2019.20.3.009 인용 PDF KSCI

Word-Level Embedding to Improve Performance of Representative Spatio-temporal Document Classification

Byoungwook Kim;Hong-Jun Jang
- Journal of Information Processing Systems
- /
- v.19 no.6
- /
- pp.830-841
- /
- 2023
Tokenization is the process of segmenting the input text into smaller units of text, and it is a preprocessing task that is mainly performed to improve the efficiency of the machine learning process. Various tokenization methods have been proposed for application in the field of natural language processing, but studies have primarily focused on efficiently segmenting text. Few studies have been conducted on the Korean language to explore what tokenization methods are suitable for document classification task. In this paper, an exploratory study was performed to find the most suitable tokenization method to improve the performance of a representative spatio-temporal document classifier in Korean. For the experiment, a convolutional neural network model was used, and for the final performance comparison, tasks were selected for document classification where performance largely depends on the tokenization method. As a tokenization method for comparative experiments, commonly used Jamo, Character, and Word units were adopted. As a result of the experiment, it was confirmed that the tokenization of word units showed excellent performance in the case of representative spatio-temporal document classification task where the semantic embedding ability of the token itself is important.
https://doi.org/10.3745/JIPS.04.0296 인용 PDF

An Empirical Study on Improving the Performance of Text Categorization Considering the Relationships between Feature Selection Criteria and Weighting Methods (자질 선정 기준과 가중치 할당 방식간의 관계를 고려한 문서 자동분류의 개선에 대한 연구)

Lee Jae-Yun
- Journal of the Korean Society for Library and Information Science
- /
- v.39 no.2
- /
- pp.123-146
- /
- 2005
This study aims to find consistent strategies for feature selection and feature weighting methods, which can improve the effectiveness and efficiency of kNN text classifier. Feature selection criteria and feature weighting methods are as important factor as classification algorithms to achieve good performance of text categorization systems. Most of the former studies chose conflicting strategies for feature selection criteria and weighting methods. In this study, the performance of several feature selection criteria are measured considering the storage space for inverted index records and the classification time. The classification experiments in this study are conducted to examine the performance of IDF as feature selection criteria and the performance of conventional feature selection criteria, e.g. mutual information, as feature weighting methods. The results of these experiments suggest that using those measures which prefer low-frequency features as feature selection criterion and also as feature weighting method. we can increase the classification speed up to three or five times without loosing classification accuracy.
https://doi.org/10.4275/KSLIS.2005.39.2.123 인용 PDF

Language Identification in Handwritten Words Using a Convolutional Neural Network

Tung, Trieu Son;Lee, Gueesang
- International Journal of Contents
- /
- v.13 no.3
- /
- pp.38-42
- /
- 2017
Documents of the last few decades typically include more than one kind of language, so linguistic classification of each word is essential, especially in terms of English and Korean in handwritten documents. Traditional methods mostly use conventional features of structural or stroke features, but sometimes they fail to identify many characteristics of words because of complexity introduced by handwriting. Therefore, traditional methods lead to a considerably more-complicated task and naturally lead to possibly poor results. In this study, convolutional neural network (CNN) is used for classification of English and Korean handwritten words in text documents. Experimental results reveal that the proposed method works effectively compared to previous methods.
https://doi.org/10.5392/IJoC.2017.13.3.038 인용 PDF KSCI

Automatic Text Categorization Using Hybrid Multiple Model Schemes (하이브리드 다중모델 학습기법을 이용한 자동 문서 분류)

명순희;김인철
- Journal of the Korean Society for information Management
- /
- v.19 no.4
- /
- pp.35-51
- /
- 2002
Inductive learning and classification techniques have been employed in various research and applications that organize textual data to solve the problem of information access. In this study, we develop hybrid model combination methods which incorporate the concepts and techniques for multiple modeling algorithms to improve the accuracy of text classification, and conduct experiments to evaluate the performances of proposed schemes. Boosted stacking, one of the extended stacking schemes proposed in this study yields higher accuracy relative to the conventional model combination methods and single classifiers.
https://doi.org/10.3743/KOSIM.2002.19.4.035 인용 PDF

An Algorithm for Text Image Watermarking based on Word Classification (단어 분류에 기반한 텍스트 영상 워터마킹 알고리즘)

Kim Young-Won;Oh Il-Seok
- Journal of KIISE:Software and Applications
- /
- v.32 no.8
- /
- pp.742-751
- /
- 2005
This paper proposes a novel text image watermarking algorithm based on word classification. The words are classified into K classes using simple features. Several adjacent words are grouped into a segment. and the segments are also classified using the word class information. The same amount of information is inserted into each of the segment classes. The signal is encoded by modifying some inter-word spaces statistics of segment classes. Subjective comparisons with conventional word-shift algorithms are presented under several criteria.
PDF KSCI

Intention Classification for Retrieval of Health Questions

Liu, Rey-Long
- International Journal of Knowledge Content Development & Technology
- /
- v.7 no.1
- /
- pp.101-120
- /
- 2017
Healthcare professionals have edited many health questions (HQs) and their answers for healthcare consumers on the Internet. The HQs provide both readable and reliable health information, and hence retrieval of those HQs that are relevant to a given question is essential for health education and promotion through the Internet. However, retrieval of relevant HQs needs to be based on the recognition of the intention of each HQ, which is difficult to be done by predefining syntactic and semantic rules. We thus model the intention recognition problem as a text classification problem, and develop two techniques to improve a learning-based text classifier for the problem. The two techniques improve the classifier by location-based and area-based feature weightings, respectively. Experimental results show that, the two techniques can work together to significantly improve a Support Vector Machine classifier in both the recognition of HQ intentions and the retrieval of relevant HQs.
https://doi.org/10.5865/IJKCT.2017.7.1.101 인용 PDF KSCI

Action recognition, hand gesture recognition, and emotion recognition using text classification method (Text classification 방법을 사용한 행동 인식, 손동작 인식 및 감정 인식)

Kim, Gi-Duk
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2021.01a
- /
- pp.213-216
- /
- 2021
본 논문에서는 Text Classification에 사용된 딥러닝 모델을 적용하여 행동 인식, 손동작 인식 및 감정 인식 방법을 제안한다. 먼저 라이브러리를 사용하여 영상에서 특징 추출 후 식을 적용하여 특징의 벡터를 저장한다. 이를 Conv1D, Transformer, GRU를 결합한 모델에 학습시킨다. 이 방법을 통해 하나의 딥러닝 모델을 사용하여 다양한 분야에 적용할 수 있다. 제안한 방법을 사용해 SYSU 3D HOI 데이터셋에서 99.66%, eNTERFACE' 05 데이터셋에 대해 99.0%, DHG-14 데이터셋에 대해 95.48%의 클래스 분류 정확도를 얻을 수 있었다.
PDF

Search Result 720, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)