• Title/Summary/Keyword: Term Classification

Search Result 753, Processing Time 0.023 seconds

Time-Frequency Analysis of Electrohysterogram for Classification of Term and Preterm Birth

  • Ryu, Jiwoo;Park, Cheolsoo
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.4 no.2
    • /
    • pp.103-109
    • /
    • 2015
  • In this paper, a novel method for the classification of term and preterm birth is proposed based on time-frequency analysis of electrohysterogram (EHG) using multivariate empirical mode decomposition (MEMD). EHG is a promising study for preterm birth prediction, because it is low-cost and accurate compared to other preterm birth prediction methods, such as tocodynamometry (TOCO). Previous studies on preterm birth prediction applied prefilterings based on Fourier analysis of an EHG, followed by feature extraction and classification, even though Fourier analysis is suboptimal to biomedical signals, such as EHG, because of its nonlinearity and nonstationarity. Therefore, the proposed method applies prefiltering based on MEMD instead of Fourier-based prefilters before extracting the sample entropy feature and classifying the term and preterm birth groups. For the evaluation, the Physionet term-preterm EHG database was used where the proposed method and Fourier prefiltering-based method were adopted for comparative study. The result showed that the area under curve (AUC) of the receiver operating characteristic (ROC) was increased by 0.0351 when MEMD was used instead of the Fourier-based prefilter.

A Genre-based Classification of Digital Documents by using Deviation Statistic of Genre-revealing Term and Subject-revealing Term (장르와 주제 범주간 용어 편차정보를 이용한 디지털 문서의 장르기반 분류)

  • 이용배;맹성현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1062-1071
    • /
    • 2003
  • A genre-based classification means classifying documents by the purpose for which they were written, not by the semantics or subject areas. Most genre classifying methods in the past were based on the existing documents categorization algorithms and ineffective for feature selections, resulting in low quality classification results. In this research, we propose a new method for automatic classification of digital documents by genre. The genre classifier we developed uses the deviation statistic between the genre-revealing term frequencies and between the subject-revealing term frequencies within a genre. We collected Web documents to evaluate the proposed genre classification method. The experimental results show that the proposed method outperforms a direct application of a kai-square feature selection and bayesian classifier often used for subject classification by proving an excellent accuracy of about 30 percent.

Comparison of term weighting schemes for document classification (문서 분류를 위한 용어 가중치 기법 비교)

  • Jeong, Ho Young;Shin, Sang Min;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.265-276
    • /
    • 2019
  • The document-term frequency matrix is a general data of objects in text mining. In this study, we introduce a traditional term weighting scheme TF-IDF (term frequency-inverse document frequency) which is applied in the document-term frequency matrix and used for text classifications. In addition, we introduce and compare TF-IDF-ICSDF and TF-IGM schemes which are well known recently. This study also provides a method to extract keyword enhancing the quality of text classifications. Based on the keywords extracted, we applied support vector machine for the text classification. In this study, to compare the performance term weighting schemes, we used some performance metrics such as precision, recall, and F1-score. Therefore, we know that TF-IGM scheme provided high performance metrics and was optimal for text classification.

Two-stage Deep Learning Model with LSTM-based Autoencoder and CNN for Crop Classification Using Multi-temporal Remote Sensing Images

  • Kwak, Geun-Ho;Park, No-Wook
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.4
    • /
    • pp.719-731
    • /
    • 2021
  • This study proposes a two-stage hybrid classification model for crop classification using multi-temporal remote sensing images; the model combines feature embedding by using an autoencoder (AE) with a convolutional neural network (CNN) classifier to fully utilize features including informative temporal and spatial signatures. Long short-term memory (LSTM)-based AE (LAE) is fine-tuned using class label information to extract latent features that contain less noise and useful temporal signatures. The CNN classifier is then applied to effectively account for the spatial characteristics of the extracted latent features. A crop classification experiment with multi-temporal unmanned aerial vehicle images is conducted to illustrate the potential application of the proposed hybrid model. The classification performance of the proposed model is compared with various combinations of conventional deep learning models (CNN, LSTM, and convolutional LSTM) and different inputs (original multi-temporal images and features from stacked AE). From the crop classification experiment, the best classification accuracy was achieved by the proposed model that utilized the latent features by fine-tuned LAE as input for the CNN classifier. The latent features that contain useful temporal signatures and are less noisy could increase the class separability between crops with similar spectral signatures, thereby leading to superior classification accuracy. The experimental results demonstrate the importance of effective feature extraction and the potential of the proposed classification model for crop classification using multi-temporal remote sensing images.

Vocabulary Expansion Technique for Advertisement Classification

  • Jung, Jin-Yong;Lee, Jung-Hyun;Ha, Jong-Woo;Lee, Sang-Keun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.5
    • /
    • pp.1373-1387
    • /
    • 2012
  • Contextual advertising is an important revenue source for major service providers on the Web. Ads classification is one of main tasks in contextual advertising, and it is used to retrieve semantically relevant ads with respect to the content of web pages. However, it is difficult for traditional text classification methods to achieve satisfactory performance in ads classification due to scarce term features in ads. In this paper, we propose a novel ads classification method that handles the lack of term features for classifying ads with short text. The proposed method utilizes a vocabulary expansion technique using semantic associations among terms learned from large-scale search query logs. The evaluation results show that our methodology achieves 4.0% ~ 9.7% improvements in terms of the hierarchical f-measure over the baseline classifiers without vocabulary expansion.

An Optimal Weighting Method in Supervised Learning of Linguistic Model for Text Classification

  • Mikawa, Kenta;Ishida, Takashi;Goto, Masayuki
    • Industrial Engineering and Management Systems
    • /
    • v.11 no.1
    • /
    • pp.87-93
    • /
    • 2012
  • This paper discusses a new weighting method for text analyzing from the view point of supervised learning. The term frequency and inverse term frequency measure (tf-idf measure) is famous weighting method for information retrieval, and this method can be used for text analyzing either. However, it is an experimental weighting method for information retrieval whose effectiveness is not clarified from the theoretical viewpoints. Therefore, other effective weighting measure may be obtained for document classification problems. In this study, we propose the optimal weighting method for document classification problems from the view point of supervised learning. The proposed measure is more suitable for the text classification problem as used training data than the tf-idf measure. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of newspaper article and the customer review which is posted on the web site.

Potential of Bidirectional Long Short-Term Memory Networks for Crop Classification with Multitemporal Remote Sensing Images

  • Kwak, Geun-Ho;Park, Chan-Won;Ahn, Ho-Yong;Na, Sang-Il;Lee, Kyung-Do;Park, No-Wook
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.4
    • /
    • pp.515-525
    • /
    • 2020
  • This study investigates the potential of bidirectional long short-term memory (Bi-LSTM) for efficient modeling of temporal information in crop classification using multitemporal remote sensing images. Unlike unidirectional LSTM models that consider only either forward or backward states, Bi-LSTM could account for temporal dependency of time-series images in both forward and backward directions. This property of Bi-LSTM can be effectively applied to crop classification when it is difficult to obtain full time-series images covering the entire growth cycle of crops. The classification performance of the Bi-LSTM is compared with that of two unidirectional LSTM architectures (forward and backward) with respect to different input image combinations via a case study of crop classification in Anbadegi, Korea. When full time-series images were used as inputs for classification, the Bi-LSTM outperformed the other unidirectional LSTM architectures; however, the difference in classification accuracy from unidirectional LSTM was not substantial. On the contrary, when using multitemporal images that did not include useful information for the discrimination of crops, the Bi-LSTM could compensate for the information deficiency by including temporal information from both forward and backward states, thereby achieving the best classification accuracy, compared with the unidirectional LSTM. These case study results indicate the efficiency of the Bi-LSTM for crop classification, particularly when limited input images are available.

Development of Patient Classification System in Long-term Care Hospitals (요양병원 환자분류체계 개발)

  • Lee, Ji-Yun;Yoon, Ju-Young;Kim, Jung-Hoe;Song, Seong-Hee;Joo, Ji-Soo;Kim, Eun-Kyung
    • Journal of Korean Academy of Nursing Administration
    • /
    • v.14 no.3
    • /
    • pp.229-240
    • /
    • 2008
  • Purpose: To develop the patient classification system based on the resource utilization for reimbursement of long-term care hospitals in Korea. Method: Health Insurance Review & Assessment Service (HIRA) conducted a survey in July 2006 that included 2,899 patients from 35 long-term care hospitals. To calculate resource utilization, we measured care time of direct care staff (physicians, nursing personnel, physical and occupational therapists, social workers). The survey of patient characteristics included ADL, cognitive and behavioral status, diseases and treatments. Major category criteria was developed by modified delphi method from 9 experts. Each category was divided into 2-3 groups by ADL using tree regression. Relative resource use was expressed as a case mix index (CMI) calculated as a proportion of mean resource use. Result: This patient classification system composed of 6 major categories (ultra high medical care, high medical care, medium medical care, behavioral problem, impaired cognition and reduced physical function) and 11 subgroups by ADL score. The differences of CMI between groups were statistically significant (p<.0001). Homogeneity of groups was examined by total coefficient of variation (CV) of CMI. The range of CV was 29.68-40.77%. Conclusions: This patient classification system is feasible for reimbursement of long-term care hospitals.

  • PDF

Network Traffic Classification Based on Deep Learning

  • Li, Junwei;Pan, Zhisong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.11
    • /
    • pp.4246-4267
    • /
    • 2020
  • As the network goes deep into all aspects of people's lives, the number and the complexity of network traffic is increasing, and traffic classification becomes more and more important. How to classify them effectively is an important prerequisite for network management and planning, and ensuring network security. With the continuous development of deep learning, more and more traffic classification begins to use it as the main method, which achieves better results than traditional classification methods. In this paper, we provide a comprehensive review of network traffic classification based on deep learning. Firstly, we introduce the research background and progress of network traffic classification. Then, we summarize and compare traffic classification based on deep learning such as stack autoencoder, one-dimensional convolution neural network, two-dimensional convolution neural network, three-dimensional convolution neural network, long short-term memory network and Deep Belief Networks. In addition, we compare traffic classification based on deep learning with other methods such as based on port number, deep packets detection and machine learning. Finally, the future research directions of network traffic classification based on deep learning are prospected.

A Study on Negation Handling and Term Weighting Schemes and Their Effects on Mood-based Text Classification (감정 기반 블로그 문서 분류를 위한 부정어 처리 및 단어 가중치 적용 기법의 효과에 대한 연구)

  • Jung, Yu-Chul;Choi, Yoon-Jung;Myaeng, Sung-Hyon
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.4
    • /
    • pp.477-497
    • /
    • 2008
  • Mood classification of blog text is an interesting problem, with a potential for a variety of services involving the Web. This paper introduces an approach to mood classification enhancements through the normalized negation n-grams which contain mood clues and corpus-specific term weighting(CSTW). We've done experiments on blog texts with two different classification methods: Enhanced Mood Flow Analysis(EMFA) and Support Vector Machine based Mood Classification(SVMMC). It proves that the normalized negation n-gram method is quite effective in dealing with negations and gave gradual improvements in mood classification with EMF A. From the selection of CSTW, we noticed that the appropriate weighting scheme is important for supporting adequate levels of mood classification performance because it outperforms the result of TF*IDF and TF.

  • PDF