• Title/Summary/Keyword: 다범주 분류

Search Result 34, Processing Time 0.023 seconds

A Naive Bayes Classifier for Category Disambiguation of Features (자질의 범주 모호성 해소를 위한 Naive Bayes 분류기 설계)

  • 유현숙;정영미
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04b
    • /
    • pp.364-366
    • /
    • 2001
  • 문서 범주화는 전자 정보환경에서 매우 유용한 정보처리 도구로서, 다양한 문서 범주화 기법 및 성능향상을 위한 연구들이 지속적으로 이루어지고 있다. 그러나, 대부분의 연구들은 문서 범주화의 대상이 되는 단어 자질 공간의 차원축소 문제에만 집중되었을 뿐, 학습단계에 큰 영향을 미치는 다범주 단어 자질의 범주 모호성은 고려하지 않았다. 본 연구에서는, 다범주 자질의 범주 모호성을 해소함으로써 문서 범주화의 성능향상을 유도하는 범주 모호성 해소 가중치 W를 제시하고 이를 실험을 통해 증명하였다. 실험에서는 Naive Bayes 분류기와 가중치 W를 적용한 Naive Bayes-W 분류기를 직접 구축하여 문서 범주화의 성능향상 여부를 비교하는데 사용하였다. 도출된 실험결과를 통해, 가중치 W는 현재의 분류기가 가지고 있는 자질 표현의 범주 모호성이라는 단점을 보완하고 분류기의 성능향상을 유도함으로써 정보검색시스템의 검색효율을 높이는 데 활용될 수 있음일 증명되었다.

  • PDF

Classification of ratings in online reviews (온라인 리뷰에서 평점의 분류)

  • Choi, Dongjun;Choi, Hosik;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.845-854
    • /
    • 2016
  • Sentiment analysis or opinion mining is a technique of text mining employed to identify subjective information or opinions of an individual from documents in blogs, reviews, articles, or social networks. In the literature, only a problem of binary classification of ratings based on review texts in an online review. However, because there can be positive or negative reviews as well as neutral reviews, a multi-class classification will be more appropriate than the binary classification. To this end, we consider the multi-class classification of ratings based on review texts. In the preprocessing stage, we extract words related with ratings using chi-square statistic. Then the extracted words are used as input variables to multi-class classifiers such as support vector machines and proportional odds model to compare their predictive performances.

A hybrid method to compose an optimal gene set for multi-class classification using mRMR and modified particle swarm optimization (mRMR과 수정된 입자군집화 방법을 이용한 다범주 분류를 위한 최적유전자집단 구성)

  • Lee, Sunho
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.6
    • /
    • pp.683-696
    • /
    • 2020
  • The aim of this research is to find an optimal gene set that provides highly accurate multi-class classification with a minimum number of genes. A two-stage procedure is proposed: Based on minimum redundancy and maximum relevance (mRMR) framework, several statistics to rank differential expression genes and K-means clustering to reduce redundancy between genes are used for data filtering procedure. And a particle swarm optimization is modified to select a small subset of informative genes. Two well known multi-class microarray data sets, ALL and SRBCT, are analyzed to indicate the effectiveness of this hybrid method.

A comparative study of feature screening methods for ultrahigh dimensional multiclass classification (초고차원 다범주분류를 위한 변수선별 방법 비교 연구)

  • Lee, Kyungeun;Kim, Kyoung Hee;Shin, Seung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.793-808
    • /
    • 2017
  • We compare various variable screening methods on multiclass classification problems when the data is ultrahigh-dimensional. Two different approaches were considered: (1) pairwise extension from binary classification via one versus one or one versus rest comparisons and (2) direct classification of multiclass responses. We conducted extensive simulation studies under different conditions: heavy tailed explanatory variables, correlated signal and noise variables, correlated joint distributions but uncorrelated marginals, and unbalanced response variables. We then analyzed real data to examine the performance of the methods. The results showed that model-free methods perform better for multiclass classification problems as well as binary ones.

Classification Performance Analysis of Cross-Language Text Categorization using Machine Translation (기계번역을 이용한 교차언어 문서 범주화의 분류 성능 분석)

  • Lee, Yong-Gu
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.1
    • /
    • pp.313-332
    • /
    • 2009
  • Cross-language text categorization(CLTC) can classify documents automatically using training set from other language. In this study, collections appropriated for CLTC were extracted from KTSET. Classification performance of various CLTC methods were compared by SVM classifier using machine translation. Results showed that the classification performance in the order of poly-lingual training method, training-set translation and test-set translation. However, training-set translation could be regarded as the most useful method among CLTC, because it was efficient for machine translation and easily adapted to general environment. On the other hand, low performance was shown to be due to the feature reduction or features with no subject characteristics, which occurred in the process of machine translation of CLTC.

Theoretical Categorization of the Meanings of Interaction in Interactive Media (인터랙티브 미디어에 적용되는 인터랙션 의미의 범주화)

  • Rhee, Hyun-jung
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2015.05a
    • /
    • pp.85-86
    • /
    • 2015
  • 문화콘텐츠 시장에서 주요 화두로서 등장하는 인터랙션이라는 용어는 단어자체 의미의 포괄성으로 인해 미디어 세부 분야마다 각기 다른 해석을 가진다. 이러한 양상은 산업 간 융합 및 다학문적 연구에 어려움을 야기한다. 보다 나은 인터랙티브 관련 기술 및 학문 발전의 토대를 위해, 본 연구에서는 콘텐츠 미디어를 중심으로 인터랙션의 개념에 대한 범주화를 시도하였다. 본 연구에서는 인터랙티브 미디어에서 해석하는 인터랙션 의미에 대한 다양한 관점을 바탕으로 일차적으로 총체적인 분류체계를 만들고, 인터랙션 관련 산학 전문가들과의 FGI를 실시하여 수정 및 보완의 과정을 거치며 분류체계에 따른 의미의 범주화를 완성하였다.

  • PDF

Comparison of Multinomial Logit and Logistic Regression on Disability Pensioners' Characteristic (다범주 자료의 다항로짓 모형과 로지스틱 회귀모형 비교;장애연금 특성분석 중심으로)

  • Kim, Mi-Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.589-602
    • /
    • 2008
  • This article studies on disability pensioners' characteristic with multinomial logit and logistic regression model. Seven factors are examined on whether each factor is reflected in degree of disability in the disability pension. By incorporating multinomial logit and logistic regression model, effectiveness and characteristic of the seven factors are investigated on the degree of disability. Result shows all the seven factors are significant on the degree of disability, while among the seven, five factors, age, sex, type of coverage, type of category, insured duration show a trend in degree of disability and the other two, cause of disability and class of standard monthly income are not effective on trend in degree of disability. Results from analyses might be useful for disability pension management.

Exploratory Study on Experience in Cultural Competence of Multicultural Counselors Working with Female Immigrants by Marriage (결혼이주여성 대상 다문화 상담자들의 문화적 역량 관련 경험에 관한 탐색적 연구)

  • Lee, Hyun Jung
    • Journal of Digital Convergence
    • /
    • v.12 no.2
    • /
    • pp.519-530
    • /
    • 2014
  • The purpose of this study was to explore the experience in cultural competence of multicultural counselors working with female immigrants by marriage. In-depth interviews, followed by a phenomenological analysis, were conducted on 10 multicultural counselors. 6 themes were emerged from data analysis: facing difficulties due to linguistic and cultural differences, feeling doubts about self and one's ability, reflecting self, putting efforts to learn other cultures and groups, realizing changes, and feeling insecure still. 3 categories appeared from the 6 themes above: difficulties, efforts to change, and change and limitations. Based on the results, social work suggestions for increasing cultural competence of multicultural counselors were discussed.

Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon (국내 주요 10대 기업에 대한 국민 감성 분석: 다범주 감성사전을 활용한 빅 데이터 접근법)

  • Kim, Seo In;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.45-69
    • /
    • 2016
  • Recently, sentiment analysis using open Internet data is actively performed for various purposes. As online Internet communication channels become popular, companies try to capture public sentiment of them from online open information sources. This research is conducted for the purpose of analyzing pulbic sentiment of Korean Top-10 companies using a multi-categorical sentiment lexicon. Whereas existing researches related to public sentiment measurement based on big data approach classify sentiment into dimensions, this research classifies public sentiment into multiple categories. Dimensional sentiment structure has been commonly applied in sentiment analysis of various applications, because it is academically proven, and has a clear advantage of capturing degree of sentiment and interrelation of each dimension. However, the dimensional structure is not effective when measuring public sentiment because human sentiment is too complex to be divided into few dimensions. In addition, special training is needed for ordinary people to express their feeling into dimensional structure. People do not divide their sentiment into dimensions, nor do they need psychological training when they feel. People would not express their feeling in the way of dimensional structure like positive/negative or active/passive; rather they express theirs in the way of categorical sentiment like sadness, rage, happiness and so on. That is, categorial approach of sentiment analysis is more natural than dimensional approach. Accordingly, this research suggests multi-categorical sentiment structure as an alternative way to measure social sentiment from the point of the public. Multi-categorical sentiment structure classifies sentiments following the way that ordinary people do although there are possibility to contain some subjectiveness. In this research, nine categories: 'Sadness', 'Anger', 'Happiness', 'Disgust', 'Surprise', 'Fear', 'Interest', 'Boredom' and 'Pain' are used as multi-categorical sentiment structure. To capture public sentiment of Korean Top-10 companies, Internet news data of the companies are collected over the past 25 months from a representative Korean portal site. Based on the sentiment words extracted from previous researches, we have created a sentiment lexicon, and analyzed the frequency of the words coming up within the news data. The frequency of each sentiment category was calculated as a ratio out of the total sentiment words to make ranks of distributions. Sentiment comparison among top-4 companies, which are 'Samsung', 'Hyundai', 'SK', and 'LG', were separately visualized. As a next step, the research tested hypothesis to prove the usefulness of the multi-categorical sentiment lexicon. It tested how effective categorial sentiment can be used as relative comparison index in cross sectional and time series analysis. To test the effectiveness of the sentiment lexicon as cross sectional comparison index, pair-wise t-test and Duncan test were conducted. Two pairs of companies, 'Samsung' and 'Hanjin', 'SK' and 'Hanjin' were chosen to compare whether each categorical sentiment is significantly different in pair-wise t-test. Since category 'Sadness' has the largest vocabularies, it is chosen to figure out whether the subgroups of the companies are significantly different in Duncan test. It is proved that five sentiment categories of Samsung and Hanjin and four sentiment categories of SK and Hanjin are different significantly. In category 'Sadness', it has been figured out that there were six subgroups that are significantly different. To test the effectiveness of the sentiment lexicon as time series comparison index, 'nut rage' incident of Hanjin is selected as an example case. Term frequency of sentiment words of the month when the incident happened and term frequency of the one month before the event are compared. Sentiment categories was redivided into positive/negative sentiment, and it is tried to figure out whether the event actually has some negative impact on public sentiment of the company. The difference in each category was visualized, moreover the variation of word list of sentiment 'Rage' was shown to be more concrete. As a result, there was huge before-and-after difference of sentiment that ordinary people feel to the company. Both hypotheses have turned out to be statistically significant, and therefore sentiment analysis in business area using multi-categorical sentiment lexicons has persuasive power. This research implies that categorical sentiment analysis can be used as an alternative method to supplement dimensional sentiment analysis when figuring out public sentiment in business environment.

A Cognitive Study on the Usability of Cross-referencing link ad Multiple hierarchies (교차적 연결과 다계층구조의 유용성에 관한 인지적 연구 : 사이버쇼핑몰의 커스터머 인터페이스를 중심으로)

  • 이정원;김진우
    • Korean Journal of Cognitive Science
    • /
    • v.10 no.1
    • /
    • pp.25-43
    • /
    • 1999
  • The focus of this study is on the elements of structure design that facilitate u user interaction with applications within cyberspace Structure design entails decisions regarding the optimal classification and hierarchical organization of information into s successively higher units. i.e .. the grouping of highly related information in the form of nodes of a site and the subsequent connection of nodes that are inter-related. The decisions are based on the designer's subjective classification framework. which is not always compatible with that of the user. We propose that the ensuing cognitive dissonance can be reduced via the employment of multiple hierarchies and cross-referencing links. Multiple hierarchies represent a single information space in terms of a number of single hierarchies. each of which represent a different perspective Cross-referencing refers to the inter-connection between the constituent hierarchies by providing a link to the alternate hierarchy for information that is most likely to be categorized in diverse manners by users with differing perspectives. In this study we conducted two empirical studies to gauge the effectiveness of multiple hierarchies and Cross-referencing links in the domain of cyber shopping malls. In the first phase. an experiment was conducted to determine how subjects classified given products with respect to two different perspectives for categorization. Experimental cyber malls were developed based on the results from the first phase to test the effectiveness of multiple hierarchies and cross-referencing links. Results show that the ease of navigation was higher for cyber malls that had implemented cross-referencing links are of greater value when used in conjunction with single hierarchical designs rather than multiple hierarchies. Users satisfaction with and ease of navigation was higher for cyber malls that had not implemented multiple hierarchies. This paper concludes with discussion of these results and their implications for designers of cyber malls.

  • PDF