• Title/Summary/Keyword: categorization

Search Result 1,002, Processing Time 0.026 seconds

Evaluation and Functionality Stems Extraction for App Categorization on Apple iTunes Store by Using Mixed Methods : Data Mining for Categorization Improvement

  • Zhang, Chao;Wan, Lili
    • Journal of Information Technology Services
    • /
    • v.17 no.2
    • /
    • pp.111-128
    • /
    • 2018
  • About 3.9 million apps and 24 primary categories can be approved on Apple iTunes Store. Making accurate categorization can potentially receive many benefits for developers, app stores, and users, such as improving discoverability and receiving long-term revenue. However, current categorization problems may cause usage inefficiency and confusion, especially for cross-attribution, etc. This study focused on evaluating the reliability of app categorization on Apple iTunes Store by using several rounds of inter-rater reliability statistics, locating categorization problems based on Machine Learning, and making more accurate suggestions about representative functionality stems for each primary category. A mixed methods research was performed and total 4905 popular apps were observed. The original categorization was proved to be substantial reliable but need further improvement. The representative functionality stems for each category were identified. This paper may provide some fusion research experience and methodological suggestions in categorization research field and improve app store's categorization in discoverability.

Improving the Performance of Statistical Automatic Text Categorization by using Phrasal Patterns and Keyword Sets (구문 패턴과 키워드 집합을 이용한 통계적 자동 문서 분류의 성능 향상)

  • Han, Jeong-Gi;Park, Min-Gyu;Jo, Gwang-Je;Kim, Jun-Tae
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.4
    • /
    • pp.1150-1159
    • /
    • 2000
  • This paper presents an automatic text categorization model that improves the accuracy by combining statistical and knowledge-based categorization methods. In our model we apply knowledge-based method first, and then apply statistical method on the text which are not categorized by knowledge-based method. By using this combined method, we can improve the accuracy of categorization while categorize all the texts without failure. For statistical categorization, the vector model with Inverted Category Frequency (ICF) weighting is used. For knowledge-based categorization, Phrasal Patterns and Keyword Sets are introduced to represent sentence patterns, and then pattern matching is performed. Experimental results on new articles show that the accuracy of categorization can be improved by combining the tow different categorization methods.

  • PDF

Explicit Categorization Ability Predictor for Biology Classification using fMRI

  • Byeon, Jung-Ho;Lee, Il-Sun;Kwon, Yong-Ju
    • Journal of The Korean Association For Science Education
    • /
    • v.32 no.3
    • /
    • pp.524-531
    • /
    • 2012
  • Categorization is an important human function used to process different stimuli. It is also one of the most important factors affecting measurement of a person's classification ability. Explicit categorization, the representative system by which categorization ability is measured, can verbally describe the categorization rule. The purpose of this study was to develop a prediction model for categorization ability as it relates to the classification process of living organisms using fMRI. Fifty-five participants were divided into two groups: a model generation group, comprised of twenty-seven subjects, and a model verification group, made up of twenty-eight subjects. During prediction model generation, functional connectivity was used to analyze temporal correlations between brain activation regions. A classification ability quotient (CQ) was calculated to identify the verbal categorization ability distribution of each subject. Additionally, the connectivity coefficient (CC) was calculated to quantify the functional connectivity for each subject. Hence, it was possible to generate a prediction model through regression analysis based on participants' CQ and CC values. The resultant categorization ability regression model predictor was statistically significant; however, researchers proceeded to verify its predictive ability power. In order to verify the predictive power of the developed regression model, researchers used the regression model and subjects' CC values to predict CQ values for twenty-eight subjects. Correlation between the predicted CQ values and the observed CQ values was confirmed. Results of this study suggested that explicit categorization ability differs at the brain network level of individuals. Also, the finding suggested that differences in functional connectivity between individuals reflect differences in categorization ability. Last, researchers have provided a new method for predicting an individual's categorization ability by measuring brain activation.

Impact of Instance Selection on kNN-Based Text Categorization

  • Barigou, Fatiha
    • Journal of Information Processing Systems
    • /
    • v.14 no.2
    • /
    • pp.418-434
    • /
    • 2018
  • With the increasing use of the Internet and electronic documents, automatic text categorization becomes imperative. Several machine learning algorithms have been proposed for text categorization. The k-nearest neighbor algorithm (kNN) is known to be one of the best state of the art classifiers when used for text categorization. However, kNN suffers from limitations such as high computation when classifying new instances. Instance selection techniques have emerged as highly competitive methods to improve kNN through data reduction. However previous works have evaluated those approaches only on structured datasets. In addition, their performance has not been examined over the text categorization domain where the dimensionality and size of the dataset is very high. Motivated by these observations, this paper investigates and analyzes the impact of instance selection on kNN-based text categorization in terms of various aspects such as classification accuracy, classification efficiency, and data reduction.

Representation of Texts into String Vectors for Text Categorization

  • Jo, Tae-Ho
    • Journal of Computing Science and Engineering
    • /
    • v.4 no.2
    • /
    • pp.110-127
    • /
    • 2010
  • In this study, we propose a method for encoding documents into string vectors, instead of numerical vectors. A traditional approach to text categorization usually requires encoding documents into numerical vectors. The usual method of encoding documents therefore causes two main problems: huge dimensionality and sparse distribution. In this study, we modify or create machine learning-based approaches to text categorization, where string vectors are received as input vectors, instead of numerical vectors. As a result, we can improve text categorization performance by avoiding these two problems.

Table based Matching Algorithm for Soft Categorization of News Articles in Reuter 21578

  • Jo, Tae-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.6
    • /
    • pp.875-882
    • /
    • 2008
  • This research proposes an alternative approach to machine learning based ones for text categorization. For using machine learning based approaches for any task of text mining, documents should be encoded into numerical vectors; it causes two problems: huge dimensionality and sparse distribution. Although there are various tasks of text mining such as text categorization, text clustering, and text summarization, the scope of this research is restricted to text categorization. The idea of this research is to avoid the two problems by encoding a document or documents into a table, instead of numerical vectors. Therefore, the goal of this research is to improve the performance of text categorization by proposing approaches, which are free from the two problems.

  • PDF

Semantic Word Categorization using Feature Similarity based K Nearest Neighbor

  • Jo, Taeho
    • Journal of Multimedia Information System
    • /
    • v.5 no.2
    • /
    • pp.67-78
    • /
    • 2018
  • This article proposes the modified KNN (K Nearest Neighbor) algorithm which considers the feature similarity and is applied to the word categorization. The texts which are given as features for encoding words into numerical vectors are semantic related entities, rather than independent ones, and the synergy effect between the word categorization and the text categorization is expected by combining both of them with each other. In this research, we define the similarity metric between two vectors, including the feature similarity, modify the KNN algorithm by replacing the exiting similarity metric by the proposed one, and apply it to the word categorization. The proposed KNN is empirically validated as the better approach in categorizing words in news articles and opinions. The significance of this research is to improve the classification performance by utilizing the feature similarities.

Effects of categorization training and expertise on cognitive problem solving (범주화 훈련과 전문성이 인지 문제 해결에 미치는 영향)

  • Lee Hee Seung;Sohn Young Woo
    • Korean Journal of Cognitive Science
    • /
    • v.16 no.1
    • /
    • pp.53-67
    • /
    • 2005
  • Present study identified categorization pattern differences between experts and novices and examined whether categorization training has positive effects on problem solving. In experiment I, we examined categorization differences between groups according to expertise using mathematical equation problems. Experts classified problems based on deep structure related to problem solution methods whereas novices classified problems based on surface features. However, in the labeled categorization condition, novices' categorization pattern was not different from experts'. These results suggest that novices have difficulty identifying deep structure of problems. In experiment 2, we examined whether categorization training showing subjects deep structure of problems explicitly increases transfer performance. The results showed that solution training was more effective to expert group whereas categorization training was more effective to novice group. We have discussed that different training methods should be applied according to expertise.

  • PDF

A Study on the Effect of Complementary Bundling Based on the Categorization of the New Hybrid IT Product (하이브리드 IT신제품의 범주화에 따른 보완재 번들링의 효과성에 관한 연구)

  • Park, Yoonseo;Kim, Yongsik
    • Journal of Information Technology Services
    • /
    • v.13 no.4
    • /
    • pp.19-43
    • /
    • 2014
  • Categorization means the process labeling or identifying an object based on what people already know or its similarity for people to be easily perceptible in external environment. If it is categorized, it is schematically conjectured from typical characteristic of the category. In this sense, the categorization of new products has an important effect upon the market performance. Nevertheless, the categorization of innovative new products is not easy and occasionally very ambiguous. In this study, we discuss how to strengthen the categorization strategy of new hybrid IT products through complementary bundling. The model of this study is based on Technology Acceptance Model (TAM) with resistance variable and verifies the statistical significance by undertaking a survey on consumers' awareness. In addition, we review the moderating effects of prior knowledge in the adoption process of complementary bundling. Through this analysis, we find out the structural relationship among the factors affecting adoption of complementary bundling. Also, it show that the influence of prior knowledge in respect of the adoption process is greater than others in case that there exists significant heterogeneity among strategic categories and complements. In conclusion, these findings suggest the following managerial implication. The categorization strategy of new hybrid IT product can be enhanced by complementary bundling, but the suitability among strategic category and complements should be evaluated exhaustively.

BPNN Algorithm with SVD Technique for Korean Document categorization (한글문서분류에 SVD를 이용한 BPNN 알고리즘)

  • Li, Chenghua;Byun, Dong-Ryul;Park, Soon-Choel
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.15 no.2
    • /
    • pp.49-57
    • /
    • 2010
  • This paper proposes a Korean document. categorization algorithm using Back Propagation Neural Network(BPNN) with Singular Value Decomposition(SVD). BPNN makes a network through its learning process and classifies documents using the network. The main difficulty in the application of BPNN to document categorization is high dimensionality of the feature space of the input documents. SVD projects the original high dimensional vector into low dimensional vector, makes the important associative relationship between terms and constructs the semantic vector space. The categorization algorithm is tested and compared on HKIB-20000/HKIB-40075 Korean Text Categorization Test Collections. Experimental results show that BPNN algorithm with SVD achieves high effectiveness for Korean document categorization.