Search | Korea Science

Issues and Empirical Results for Improving Text Classification

Ko, Young-Joong;Seo, Jung-Yun
- Journal of Computing Science and Engineering
- /
- v.5 no.2
- /
- pp.150-160
- /
- 2011
Automatic text classification has a long history and many studies have been conducted in this field. In particular, many machine learning algorithms and information retrieval techniques have been applied to text classification tasks. Even though much technical progress has been made in text classification, there is still room for improvement in text classification. In this paper, we will discuss remaining issues in improving text classification. In this paper, three improvement issues are presented including automatic training data generation, noisy data treatment and term weighting and indexing, and four actual studies and their empirical results for those issues are introduced. First, the semi-supervised learning technique is applied to text classification to efficiently create training data. For effective noisy data treatment, a noisy data reduction method and a robust text classifier from noisy data are developed as a solution. Finally, the term weighting and indexing technique is revised by reflecting the importance of sentences into term weight calculation using summarization techniques.
https://doi.org/10.5626/JCSE.2011.5.2.150 인용 PDF KPUBS

Speech/Music Signal Classification Based on Spectrum Flux and MFCC For Audio Coder (오디오 부호화기를 위한 스펙트럼 변화 및 MFCC 기반 음성/음악 신호 분류)

Sangkil Lee;In-Sung Lee
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.16 no.5
- /
- pp.239-246
- /
- 2023
In this paper, we propose an open-loop algorithm to classify speech and music signals using the spectral flux parameters and Mel Frequency Cepstral Coefficients(MFCC) parameters for the audio coder. To increase responsiveness, the MFCC was used as a short-term feature parameter and spectral fluxes were used as a long-term feature parameters to improve accuracy. The overall voice/music signal classification decision is made by combining the short-term classification method and the long-term classification method. The Gaussian Mixed Model (GMM) was used for pattern recognition and the optimal GMM parameters were extracted using the Expectation Maximization (EM) algorithm. The proposed long-term and short-term combined speech/music signal classification method showed an average classification error rate of 1.5% on various audio sound sources, and improved the classification error rate by 0.9% compared to the short-term single classification method and 0.6% compared to the long-term single classification method. The proposed speech/music signal classification method was able to improve the classification error rate performance by 9.1% in percussion music signals with attacks and 5.8% in voice signals compared to the Unified Speech Audio Coding (USAC) audio classification method.
https://doi.org/10.17661/jkiiect.2023.16.5.239 인용 PDF HTML

A Study of Classification Systems in the Internet Shopping Malls (인터넷 쇼핑몰의 상품 분류체계에 대한 연구)

곽철완
- Journal of the Korean Society for information Management
- /
- v.18 no.4
- /
- pp.201-215
- /
- 2001
The purpose of this study is to identify how to construct an internet shopping mall classification system used on the library classification theories. To aid in identifying classification system, this study focused on the Ranganathan’s classification canons; canons for characteristics, canons for terms. The study shows six priniciples for an internet shopping mall classification system construct: products’characteristics, inclusiveness, various access points, category sequence and term consistency, term currency and obviousness, no term duplication. For future research, product’s search patterns and relationship to interface are suggested.
PDF

Robust Algorithms for Combining Multiple Term Weighting Vectors for Document Classification

Kim, Minyoung
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.16 no.2
- /
- pp.81-86
- /
- 2016
Term weighting is a popular technique that effectively weighs the term features to improve accuracy in document classification. While several successful term weighting algorithms have been suggested, none of them appears to perform well consistently across different data domains. In this paper we propose several reasonable methods to combine different term weight vectors to yield a robust document classifier that performs consistently well on diverse datasets. Specifically we suggest two approaches: i) learning a single weight vector that lies in a convex hull of the base vectors while minimizing the class prediction loss, and ii) a mini-max classifier that aims for robustness of the individual weight vectors by minimizing the loss of the worst-performing strategy among the base vectors. We provide efficient solution methods for these optimization problems. The effectiveness and robustness of the proposed approaches are demonstrated on several benchmark document datasets, significantly outperforming the existing term weighting methods.
https://doi.org/10.5391/IJFIS.2016.16.2.81 인용 PDF KSCI

Emotion Classification based on EEG signals with LSTM deep learning method (어텐션 메커니즘 기반 Long-Short Term Memory Network를 이용한 EEG 신호 기반의 감정 분류 기법)

Kim, Youmin;Choi, Ahyoung
- Journal of Korea Society of Industrial Information Systems
- /
- v.26 no.1
- /
- pp.1-10
- /
- 2021
This study proposed a Long-Short Term Memory network to consider changes in emotion over time, and applied an attention mechanism to give weights to the emotion states that appear at specific moments. We used 32 channel EEG data from DEAP database. A 2-level classification (Low and High) experiment and a 3-level classification experiment (Low, Middle, and High) were performed on Valence and Arousal emotion model. As a result, accuracy of the 2-level classification experiment was 90.1% for Valence and 88.1% for Arousal. The accuracy of 3-level classification was 83.5% for Valence and 82.5% for Arousal.
https://doi.org/10.9723/jksiis.2021.26.1.001 인용 PDF KSCI

Relationship between Resource Utilization and Long-term Care Classification Level for Residents in Nursing Homes (노인요양시설 거주자의 장기요양등급에 따른 요양서비스 및 자원이용량 분석)

Lee, Min-Kyung;Kim, Eun-Kyung
- Journal of Korean Academy of Nursing
- /
- v.40 no.6
- /
- pp.903-912
- /
- 2010
Purpose: This study was conducted to examine whether the level of classification for long-term care service under longterm care insurance reflects resource utilization level for residents in nursing homes. Methods: From 2 long-term care facilities, the researchers selected 95 participants and identified description and time of care services provided by nurses, certified caregivers, physical therapists and social workers during a 24-hr-period. Results: Resource utilization level was: 281.04 for level 1, 301.05 for level 2 and 270.87 for level 3. Resource utilization was not correlated with level. Differences in resource utilization within the same level were similar with the coefficient of variance, 22.7-27.1%. Physical function was the most influential factor on long-term care scores (r=.88, p<.001). The level for long-term care service did not reflect differences in resource utilization level of residents on long-term care insurance. Conclusion: The results of this study indicate that present grading for long-term care service needs to be reconsidered. Further study is needed to adjust the long-term care classification system to reflect the level of resource utilization for care recipients on the long-term care insurance.
https://doi.org/10.4040/jkan.2010.40.6.903 인용 PDF KSCI

A Study on the Features of the <Classification-Search Term Dictionary>, the Library Classification Scheme in North Korea (북한 문헌분류표 <분류-검색어사전>의 특징 분석)

Jae-Hwang Choi
- Journal of Korean Library and Information Science Society
- /
- v.53 no.4
- /
- pp.123-142
- /
- 2022
In 2000, North Korea developed and published a two-volume, <Classification-Search Term Dictionary> and is currently used throughout North Korea. The purpose of this study is to examine the development process of the classification schemes of the North Korea after liberation and to understand the contents, composition, and principles of the <Classification-Search Term Dictionary> published in 2000 and revised in 2014. Until now, all the studies of the North Korean classification schemes were studies on the <Book Classification Scheme> published in North Korea in 1964, and there has been no discussion on North Korea's classification schemes since then. The first volume of the <Classification-Search Term Dictionary> consists of 'classification symbols - search terms', and the second volume consists of 'search terms - classification symbols'. Volume 1 is based on the <Books and Bibliography Classification Scheme (1996)>, and there are a total of 41 main classes in five categories. Volume 1 allocates 1 main class (11/19) to 'revolutionary ideas and theories', 8 main classes (20~27) to 'natural sciences', 19 main classes (30~69) to 'engineering technology and applied sciences', 12 main classes (70~85) to 'social sciences', and 1 main class (90) to 'total sciences'. Volume 2 is similar to subject-headings. North Korea's <Classification-Search Term Dictionary> is the first classification scheme introduced in South Korea and is expected to be the starting point for future studies on the establishment of the standard unification classification schemes.
https://doi.org/10.16981/kliss.53.4.202212.123 인용 PDF KSCI

Development of a mid-term preceding observation model for radish (무의 중기 선행관측모형 개발)

Cho, Jae-Hwan;Lee, Han-Sung
- Korean Journal of Agricultural Science
- /
- v.38 no.3
- /
- pp.571-581
- /
- 2011
This study develops a mid-term preceding observation model of radish to complement an existing short-term agricultural observation model. The first purpose of the study is to extend a three seasonal classification(spring, summer, fall) of fruit-vegetables to a four seasonal classification that involves the winter additionally. This allows us to verify the reason for demand and supply unbalance and unstable price of radish. The second purpose is to construct a mid-term preceding observation model that would be used to forecast planted areas, output, monthly shipment and price. To achieve these purposes, several multiple regression models are estimated. A system is consisted of a planted areas equation, a yield equation, monthly shipment distribution equation, and monthly price equation. To calculate output an auxiliary equation is involved in the system and the consumer price index etc are considered as exogenous variables.
https://doi.org/10.7744/cnujas.2011.38.3.571 인용 PDF KSCI

Automatic Classification of Blog Posts using Various Term Weighting (다양한 어휘 가중치를 이용한 블로그 포스트의 자동 분류)

Kim, Su-Ah;Jho, Hee-Sun;Lee, Hyun Ah
- Journal of Advanced Marine Engineering and Technology
- /
- v.39 no.1
- /
- pp.58-62
- /
- 2015
Most blog sites provide predefined classes based on contents or topics, but few bloggers choose classes for their posts because of its cumbersome manual process. This paper proposes an automatic blog post classification method that variously combines term frequency, document frequency and class frequency from each classes to find appropriate weighting scheme. In experiment, combination of term frequency, category term frequency and inversed (excepted category's) document frequency shows 77.02% classification precisions.
https://doi.org/10.5916/jkosme.2015.39.1.58 인용 PDF KSCI

Document classification using a deep neural network in text mining (텍스트 마이닝에서 심층 신경망을 이용한 문서 분류)

Lee, Bo-Hui;Lee, Su-Jin;Choi, Yong-Seok
- The Korean Journal of Applied Statistics
- /
- v.33 no.5
- /
- pp.615-625
- /
- 2020
The document-term frequency matrix is a term extracted from documents in which the group information exists in text mining. In this study, we generated the document-term frequency matrix for document classification according to research field. We applied the traditional term weighting function term frequency-inverse document frequency (TF-IDF) to the generated document-term frequency matrix. In addition, we applied term frequency-inverse gravity moment (TF-IGM). We also generated a document-keyword weighted matrix by extracting keywords to improve the document classification accuracy. Based on the keywords matrix extracted, we classify documents using a deep neural network. In order to find the optimal model in the deep neural network, the accuracy of document classification was verified by changing the number of hidden layers and hidden nodes. Consequently, the model with eight hidden layers showed the highest accuracy and all TF-IGM document classification accuracy (according to parameter changes) were higher than TF-IDF. In addition, the deep neural network was confirmed to have better accuracy than the support vector machine. Therefore, we propose a method to apply TF-IGM and a deep neural network in the document classification.
https://doi.org/10.5351/KJAS.2020.33.5.615 인용 PDF KSCI

Search Result 753, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)