• Title/Summary/Keyword: Domain classification

Search Result 551, Processing Time 0.024 seconds

Domain Adaptation for Opinion Classification: A Self-Training Approach

  • Yu, Ning
    • Journal of Information Science Theory and Practice
    • /
    • v.1 no.1
    • /
    • pp.10-26
    • /
    • 2013
  • Domain transfer is a widely recognized problem for machine learning algorithms because models built upon one data domain generally do not perform well in another data domain. This is especially a challenge for tasks such as opinion classification, which often has to deal with insufficient quantities of labeled data. This study investigates the feasibility of self-training in dealing with the domain transfer problem in opinion classification via leveraging labeled data in non-target data domain(s) and unlabeled data in the target-domain. Specifically, self-training is evaluated for effectiveness in sparse data situations and feasibility for domain adaptation in opinion classification. Three types of Web content are tested: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. Findings of this study suggest that, when there are limited labeled data, self-training is a promising approach for opinion classification, although the contributions vary across data domains. Significant improvement was demonstrated for the most challenging data domain-the blogosphere-when a domain transfer-based self-training strategy was implemented.

Domain Adaptation Image Classification Based on Multi-sparse Representation

  • Zhang, Xu;Wang, Xiaofeng;Du, Yue;Qin, Xiaoyan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.5
    • /
    • pp.2590-2606
    • /
    • 2017
  • Generally, research of classical image classification algorithms assume that training data and testing data are derived from the same domain with the same distribution. Unfortunately, in practical applications, this assumption is rarely met. Aiming at the problem, a domain adaption image classification approach based on multi-sparse representation is proposed in this paper. The existences of intermediate domains are hypothesized between the source and target domains. And each intermediate subspace is modeled through online dictionary learning with target data updating. On the one hand, the reconstruction error of the target data is guaranteed, on the other, the transition from the source domain to the target domain is as smooth as possible. An augmented feature representation produced by invariant sparse codes across the source, intermediate and target domain dictionaries is employed for across domain recognition. Experimental results verify the effectiveness of the proposed algorithm.

A Modified Domain Deformation Theory for Signal Classification (함수의 정의역 변형에 의한 신호간의 거리 측정 방법)

  • Kim, Sung-Soo
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.3
    • /
    • pp.342-349
    • /
    • 1999
  • The metric defined on the domain deformation space better measures the similarity between bounded and continuous signals for the purpose of classification via the metric distances between signals. In this paper, a modified domain deformation theory is introduced for one-dimensional signal classification. A new metric defined on a modified domain deformation for measuring the distance between signals is employed. By introducing a newly defined metric space via the newly defined Integra-Normalizer, the assumption that domain deformation is applicable only to continuous signals is removed such that any kind of integrable signal can be classified. The metric on the modified domain deformation has an advantage over the $L^2$ metric as well as the previously introduced domain deformation does.

  • PDF

Optimization of Domain-Independent Classification Framework for Mood Classification

  • Choi, Sung-Pil;Jung, Yu-Chul;Myaeng, Sung-Hyon
    • Journal of Information Processing Systems
    • /
    • v.3 no.2
    • /
    • pp.73-81
    • /
    • 2007
  • In this paper, we introduce a domain-independent classification framework based on both k-nearest neighbor and Naive Bayesian classification algorithms. The architecture of our system is simple and modularized in that each sub-module of the system could be changed or improved efficiently. Moreover, it provides various feature selection mechanisms to be applied to optimize the general-purpose classifiers for a specific domain. As for the enhanced classification performance, our system provides conditional probability boosting (CPB) mechanism which could be used in various domains. In the mood classification domain, our optimized framework using the CPB algorithm showed 1% of improvement in precision and 2% in recall compared with the baseline.

Comparison of Deep Learning-based Unsupervised Domain Adaptation Models for Crop Classification (작물 분류를 위한 딥러닝 기반 비지도 도메인 적응 모델 비교)

  • Kwak, Geun-Ho;Park, No-Wook
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.2
    • /
    • pp.199-213
    • /
    • 2022
  • The unsupervised domain adaptation can solve the impractical issue of repeatedly collecting high-quality training data every year for annual crop classification. This study evaluates the applicability of deep learning-based unsupervised domain adaptation models for crop classification. Three unsupervised domain adaptation models including a deep adaptation network (DAN), a deep reconstruction-classification network, and a domain adversarial neural network (DANN) are quantitatively compared via a crop classification experiment using unmanned aerial vehicle images in Hapcheon-gun and Changnyeong-gun, the major garlic and onion cultivation areas in Korea. As source baseline and target baseline models, convolutional neural networks (CNNs) are additionally applied to evaluate the classification performance of the unsupervised domain adaptation models. The three unsupervised domain adaptation models outperformed the source baseline CNN, but the different classification performances were observed depending on the degree of inconsistency between data distributions in source and target images. The classification accuracy of DAN was higher than that of the other two models when the inconsistency between source and target images was low, whereas DANN has the best classification performance when the inconsistency between source and target images was high. Therefore, the extent to which data distributions of the source and target images match should be considered to select the best unsupervised domain adaptation model to generate reliable classification results.

Comparison of wavelet-based decomposition and empirical mode decomposition of electrohysterogram signals for preterm birth classification

  • Janjarasjitt, Suparerk
    • ETRI Journal
    • /
    • v.44 no.5
    • /
    • pp.826-836
    • /
    • 2022
  • Signal decomposition is a computational technique that dissects a signal into its constituent components, providing supplementary information. In this study, the capability of two common signal decomposition techniques, including wavelet-based and empirical mode decomposition, on preterm birth classification was investigated. Ten time-domain features were extracted from the constituent components of electrohysterogram (EHG) signals, including EHG subbands and EHG intrinsic mode functions, and employed for preterm birth classification. Preterm birth classification and anticipation are crucial tasks that can help reduce preterm birth complications. The computational results show that the preterm birth classification obtained using wavelet-based decomposition is superior. This, therefore, implies that EHG subbands decomposed through wavelet-based decomposition provide more applicable information for preterm birth classification. Furthermore, an accuracy of 0.9776 and a specificity of 0.9978, the best performance on preterm birth classification among state-of-the-art signal processing techniques, were obtained using the time-domain features of EHG subbands.

A Domain Action Classification Model Using Conditional Random Fields (Conditional Random Fields를 이용한 영역 행위 분류 모델)

  • Kim, Hark-Soo
    • Korean Journal of Cognitive Science
    • /
    • v.18 no.1
    • /
    • pp.1-14
    • /
    • 2007
  • In a goal-oriented dialogue, speakers' intentions can be represented by domain actions that consist of pairs of a speech act and a concept sequence. Therefore, if we plan to implement an intelligent dialogue system, it is very important to correctly infer the domain actions from surface utterances. In this paper, we propose a statistical model to determine speech acts and concept sequences using conditional random fields at the same time. To avoid biased learning problems, the proposed model uses low-level linguistic features such as lexicals and parts-of-speech. Then, it filters out uninformative features using the chi-square statistic. In the experiments in a schedule arrangement domain, the proposed system showed good performances (the precision of 93.0% on speech act classification and the precision of 90.2% on concept sequence classification).

  • PDF

Guiding Practical Text Classification Framework to Optimal State in Multiple Domains

  • Choi, Sung-Pil;Myaeng, Sung-Hyon;Cho, Hyun-Yang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.3 no.3
    • /
    • pp.285-307
    • /
    • 2009
  • This paper introduces DICE, a Domain-Independent text Classification Engine. DICE is robust, efficient, and domain-independent in terms of software and architecture. Each module of the system is clearly modularized and encapsulated for extensibility. The clear modular architecture allows for simple and continuous verification and facilitates changes in multiple cycles, even after its major development period is complete. Those who want to make use of DICE can easily implement their ideas on this test bed and optimize it for a particular domain by simply adjusting the configuration file. Unlike other publically available tool kits or development environments targeted at general purpose classification models, DICE specializes in text classification with a number of useful functions specific to it. This paper focuses on the ways to locate the optimal states of a practical text classification framework by using various adaptation methods provided by the system such as feature selection, lemmatization, and classification models.

A Composite Cluster Analysis Approach for Component Classification (컴포넌트 분류를 위한 복합 클러스터 분석 방법)

  • Lee, Sung-Koo
    • The KIPS Transactions:PartD
    • /
    • v.14D no.1 s.111
    • /
    • pp.89-96
    • /
    • 2007
  • Various classification methods have been developed to reuse components. These classification methods enable the user to access the needed components quickly and easily. Conventional classification approaches include the following problems: a labor-intensive domain analysis effort to build a classification structure, the representation of the inter-component relationships, difficult to maintain as the domain evolves, and applied to a limited domain. In order to solve these problems, this paper describes a composite cluster analysis approach for component classification. The cluster analysis approach is a combination of a hierarchical cluster analysis method, which generates a stable clustering structure automatically, and a non-hierarchical cluster analysis concept, which classifies new components automatically. The clustering information generated from the proposed approach can support the domain analysis process.

Feature Selection with PCA based on DNS Query for Malicious Domain Classification (비정상도메인 분류를 위한 DNS 쿼리 기반의 주성분 분석을 이용한 성분추출)

  • Lim, Sun-Hee;Cho, Jaeik;Kim, Jong-Hyun;Lee, Byung Gil
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.1 no.1
    • /
    • pp.55-60
    • /
    • 2012
  • Recent botnets are widely using the DNS services at the connection of C&C server in order to evade botnet's detection. It is necessary to study on DNS analysis in order to counteract anomaly-based technique using the DNS. This paper studies collection of DNS traffic for experimental data and supervised learning for DNS traffic-based malicious domain classification such as query of domain name corresponding to C&C server from zombies. Especially, this paper would aim to determine significant features of DNS-based classification system for malicious domain extraction by the Principal Component Analysis(PCA).