• 제목/요약/키워드: Categorization

검색결과 1,017건 처리시간 0.027초

Evaluation and Functionality Stems Extraction for App Categorization on Apple iTunes Store by Using Mixed Methods : Data Mining for Categorization Improvement

  • Zhang, Chao;Wan, Lili
    • 한국IT서비스학회지
    • /
    • 제17권2호
    • /
    • pp.111-128
    • /
    • 2018
  • About 3.9 million apps and 24 primary categories can be approved on Apple iTunes Store. Making accurate categorization can potentially receive many benefits for developers, app stores, and users, such as improving discoverability and receiving long-term revenue. However, current categorization problems may cause usage inefficiency and confusion, especially for cross-attribution, etc. This study focused on evaluating the reliability of app categorization on Apple iTunes Store by using several rounds of inter-rater reliability statistics, locating categorization problems based on Machine Learning, and making more accurate suggestions about representative functionality stems for each primary category. A mixed methods research was performed and total 4905 popular apps were observed. The original categorization was proved to be substantial reliable but need further improvement. The representative functionality stems for each category were identified. This paper may provide some fusion research experience and methodological suggestions in categorization research field and improve app store's categorization in discoverability.

구문 패턴과 키워드 집합을 이용한 통계적 자동 문서 분류의 성능 향상 (Improving the Performance of Statistical Automatic Text Categorization by using Phrasal Patterns and Keyword Sets)

  • 한정기;박민규;조광제;김준태
    • 한국정보처리학회논문지
    • /
    • 제7권4호
    • /
    • pp.1150-1159
    • /
    • 2000
  • This paper presents an automatic text categorization model that improves the accuracy by combining statistical and knowledge-based categorization methods. In our model we apply knowledge-based method first, and then apply statistical method on the text which are not categorized by knowledge-based method. By using this combined method, we can improve the accuracy of categorization while categorize all the texts without failure. For statistical categorization, the vector model with Inverted Category Frequency (ICF) weighting is used. For knowledge-based categorization, Phrasal Patterns and Keyword Sets are introduced to represent sentence patterns, and then pattern matching is performed. Experimental results on new articles show that the accuracy of categorization can be improved by combining the tow different categorization methods.

  • PDF

Explicit Categorization Ability Predictor for Biology Classification using fMRI

  • Byeon, Jung-Ho;Lee, Il-Sun;Kwon, Yong-Ju
    • 한국과학교육학회지
    • /
    • 제32권3호
    • /
    • pp.524-531
    • /
    • 2012
  • Categorization is an important human function used to process different stimuli. It is also one of the most important factors affecting measurement of a person's classification ability. Explicit categorization, the representative system by which categorization ability is measured, can verbally describe the categorization rule. The purpose of this study was to develop a prediction model for categorization ability as it relates to the classification process of living organisms using fMRI. Fifty-five participants were divided into two groups: a model generation group, comprised of twenty-seven subjects, and a model verification group, made up of twenty-eight subjects. During prediction model generation, functional connectivity was used to analyze temporal correlations between brain activation regions. A classification ability quotient (CQ) was calculated to identify the verbal categorization ability distribution of each subject. Additionally, the connectivity coefficient (CC) was calculated to quantify the functional connectivity for each subject. Hence, it was possible to generate a prediction model through regression analysis based on participants' CQ and CC values. The resultant categorization ability regression model predictor was statistically significant; however, researchers proceeded to verify its predictive ability power. In order to verify the predictive power of the developed regression model, researchers used the regression model and subjects' CC values to predict CQ values for twenty-eight subjects. Correlation between the predicted CQ values and the observed CQ values was confirmed. Results of this study suggested that explicit categorization ability differs at the brain network level of individuals. Also, the finding suggested that differences in functional connectivity between individuals reflect differences in categorization ability. Last, researchers have provided a new method for predicting an individual's categorization ability by measuring brain activation.

Impact of Instance Selection on kNN-Based Text Categorization

  • Barigou, Fatiha
    • Journal of Information Processing Systems
    • /
    • 제14권2호
    • /
    • pp.418-434
    • /
    • 2018
  • With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction.

Representation of Texts into String Vectors for Text Categorization

  • Jo, Tae-Ho
    • Journal of Computing Science and Engineering
    • /
    • 제4권2호
    • /
    • pp.110-127
    • /
    • 2010
  • In this study, we propose a method for encoding documents into string vectors, instead of numerical vectors. A traditional approach to text categorization usually requires encoding documents into numerical vectors. The usual method of encoding documents therefore causes two main problems: huge dimensionality and sparse distribution. In this study, we modify or create machine learning-based approaches to text categorization, where string vectors are received as input vectors, instead of numerical vectors. As a result, we can improve text categorization performance by avoiding these two problems.

Table based Matching Algorithm for Soft Categorization of News Articles in Reuter 21578

  • Jo, Tae-Ho
    • 한국멀티미디어학회논문지
    • /
    • 제11권6호
    • /
    • pp.875-882
    • /
    • 2008
  • This research proposes an alternative approach to machine learning based ones for text categorization. For using machine learning based approaches for any task of text mining, documents should be encoded into numerical vectors; it causes two problems: huge dimensionality and sparse distribution. Although there are various tasks of text mining such as text categorization, text clustering, and text summarization, the scope of this research is restricted to text categorization. The idea of this research is to avoid the two problems by encoding a document or documents into a table, instead of numerical vectors. Therefore, the goal of this research is to improve the performance of text categorization by proposing approaches, which are free from the two problems.

  • PDF

Semantic Word Categorization using Feature Similarity based K Nearest Neighbor

  • Jo, Taeho
    • Journal of Multimedia Information System
    • /
    • 제5권2호
    • /
    • pp.67-78
    • /
    • 2018
  • This article proposes the modified KNN (K Nearest Neighbor) algorithm which considers the feature similarity and is applied to the word categorization. The texts which are given as features for encoding words into numerical vectors are semantic related entities, rather than independent ones, and the synergy effect between the word categorization and the text categorization is expected by combining both of them with each other. In this research, we define the similarity metric between two vectors, including the feature similarity, modify the KNN algorithm by replacing the exiting similarity metric by the proposed one, and apply it to the word categorization. The proposed KNN is empirically validated as the better approach in categorizing words in news articles and opinions. The significance of this research is to improve the classification performance by utilizing the feature similarities.

범주화 훈련과 전문성이 인지 문제 해결에 미치는 영향 (Effects of categorization training and expertise on cognitive problem solving)

  • 이희승;손영우
    • 인지과학
    • /
    • 제16권1호
    • /
    • pp.53-67
    • /
    • 2005
  • 본 연구는 전문성에 따른 범주화 양상의 차이를 확인하고, 범주화 훈련이 전문성에 따라 인지 문제 해결에 어떠한 영향을 주는지 살펴보았다. 실험 떼서는 수학 연립방정식 문제를 사용하여 전문성 수준에 따른 집단별 문제 범주화 양상의 차이를 확인하였다. 전문가는 주로 문제 해결방법과 관련된 문제의 구조적 특징을 범주화의 기준으로 사용하였지만, 초보자는 문제의 표면적 정보를 기준으로 하여 범주화하였다. 그러나 문제의 구조를 명시적으로 표현한 조건 범주화 상황에서 초보자의 범주화 양상이 전문가와 같은 형태로 변화하는 것을 확인할 수 있었다. 초보자와 전문가의 범주화 양상이 다른 것은 초보자들이 문제의 깊은 구조를 파악하는데 어려움이 있기 때문인 것으로 보인다. 실험 2에서는 문제의 구조가 명시적으로 표현된 조건 범주화 훈련이 문제해결 능력의 향상을 가져올 수 있는지 알아보기 위하여 문제 해결 훈련을 한 집단과의 비교를 통해 전이 검사수행을 살펴보았다. 실험 결과, 전문가 집단은 문제해결 훈련이 효과적이었던데 반해, 초보자 집단은 문제 분류훈련이 더 효과적인 것으로 나타났다. 이는 초보자의 경우 문제의 깊은 구조를 파악하기 어렵기 때문에 이를 명시적으로 보여주어 훈련시킴으로써 문제 해결에 도움을 주기 때문인 것으로 보인다. 따라서 전문성의 수준에 따라 서로 다른 형태의 교육방법이 사용되어야 할 것이다.

  • PDF

하이브리드 IT신제품의 범주화에 따른 보완재 번들링의 효과성에 관한 연구 (A Study on the Effect of Complementary Bundling Based on the Categorization of the New Hybrid IT Product)

  • 박윤서;김용식
    • 한국IT서비스학회지
    • /
    • 제13권4호
    • /
    • pp.19-43
    • /
    • 2014
  • Categorization means the process labeling or identifying an object based on what people already know or its similarity for people to be easily perceptible in external environment. If it is categorized, it is schematically conjectured from typical characteristic of the category. In this sense, the categorization of new products has an important effect upon the market performance. Nevertheless, the categorization of innovative new products is not easy and occasionally very ambiguous. In this study, we discuss how to strengthen the categorization strategy of new hybrid IT products through complementary bundling. The model of this study is based on Technology Acceptance Model (TAM) with resistance variable and verifies the statistical significance by undertaking a survey on consumers' awareness. In addition, we review the moderating effects of prior knowledge in the adoption process of complementary bundling. Through this analysis, we find out the structural relationship among the factors affecting adoption of complementary bundling. Also, it show that the influence of prior knowledge in respect of the adoption process is greater than others in case that there exists significant heterogeneity among strategic categories and complements. In conclusion, these findings suggest the following managerial implication. The categorization strategy of new hybrid IT product can be enhanced by complementary bundling, but the suitability among strategic category and complements should be evaluated exhaustively.

한글문서분류에 SVD를 이용한 BPNN 알고리즘 (BPNN Algorithm with SVD Technique for Korean Document categorization)

  • 리청화;변동률;박순철
    • 한국산업정보학회논문지
    • /
    • 제15권2호
    • /
    • pp.49-57
    • /
    • 2010
  • 본 논문에서는 역전파 신경망 알고리즘(BPNN: Back Propagation Neural Network)과 Singular Value Decomposition(SVD)를 이용하는 한글 문서 분류 시스템을 제안한다. BPNN은 학습을 통하여 만들어진 네트워크를 이용하여 문서분류를 수행한다. 이 방법의 어려움은 분류기에 입력되는 특징 공간이 너무 크다는 것이다. SVD를 이용하면 고차원의 벡터를 저차원으로 줄일 수 있고, 또한 의미있는 벡터 공간을 만들어 단어 사이의 중요한 관계성을 구축할 수 있다. 본 논문에서 제안한 BPNN의 성능 평가를 위하여 한국일보-2000/한국일보-40075 문서범주화 실험문서집합의 데이터 셋을 이용하였다. 실험결과를 통하여 BPNN과 SVD를 사용한 시스템이 한글 문서 분류에 탁월한 성능을 가지는 것을 보여준다.