• 제목/요약/키워드: Sparsity

Search Result 334, Processing Time 0.02 seconds

Nonparametric logistic regression based on sparse triangulation over a compact domain

  • Seoyeon Kim;Kwan-Young Bak
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.5
    • /
    • pp.557-569
    • /
    • 2024
  • Based on the investigation of logistic regression models utilizing sparse triangulation within a compact domain in ℝ2, this study addresses the limited research extending the triogram model to logistic regression. A primary challenge arises from the potential instability induced by a large number of vertices, hindering the effective modeling of complex relationships. To mitigate this challenge, we propose introducing sparsity to boundary vertices of the triangulation based on the Ramer-Douglas-Peucker algorithm and employing the K-means algorithm for adaptive vertex initialization. A second order coordinate-wise descent algorithm is adopted to implement the proposed method. Validation of the proposed algorithm's stability and performance assessment are conducted using synthetic and handwritten digit data (LeCun et al., 1989). Results demonstrate the advantages of our method over existing methodologies, particularly when dealing with non-rectangular data domains.

Increasing Accuracy of Classifying Useful Reviews by Removing Neutral Terms (중립도 기반 선택적 단어 제거를 통한 유용 리뷰 분류 정확도 향상 방안)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.129-142
    • /
    • 2016
  • Customer product reviews have become one of the important factors for purchase decision makings. Customers believe that reviews written by others who have already had an experience with the product offer more reliable information than that provided by sellers. However, there are too many products and reviews, the advantage of e-commerce can be overwhelmed by increasing search costs. Reading all of the reviews to find out the pros and cons of a certain product can be exhausting. To help users find the most useful information about products without much difficulty, e-commerce companies try to provide various ways for customers to write and rate product reviews. To assist potential customers, online stores have devised various ways to provide useful customer reviews. Different methods have been developed to classify and recommend useful reviews to customers, primarily using feedback provided by customers about the helpfulness of reviews. Most shopping websites provide customer reviews and offer the following information: the average preference of a product, the number of customers who have participated in preference voting, and preference distribution. Most information on the helpfulness of product reviews is collected through a voting system. Amazon.com asks customers whether a review on a certain product is helpful, and it places the most helpful favorable and the most helpful critical review at the top of the list of product reviews. Some companies also predict the usefulness of a review based on certain attributes including length, author(s), and the words used, publishing only reviews that are likely to be useful. Text mining approaches have been used for classifying useful reviews in advance. To apply a text mining approach based on all reviews for a product, we need to build a term-document matrix. We have to extract all words from reviews and build a matrix with the number of occurrences of a term in a review. Since there are many reviews, the size of term-document matrix is so large. It caused difficulties to apply text mining algorithms with the large term-document matrix. Thus, researchers need to delete some terms in terms of sparsity since sparse words have little effects on classifications or predictions. The purpose of this study is to suggest a better way of building term-document matrix by deleting useless terms for review classification. In this study, we propose neutrality index to select words to be deleted. Many words still appear in both classifications - useful and not useful - and these words have little or negative effects on classification performances. Thus, we defined these words as neutral terms and deleted neutral terms which are appeared in both classifications similarly. After deleting sparse words, we selected words to be deleted in terms of neutrality. We tested our approach with Amazon.com's review data from five different product categories: Cellphones & Accessories, Movies & TV program, Automotive, CDs & Vinyl, Clothing, Shoes & Jewelry. We used reviews which got greater than four votes by users and 60% of the ratio of useful votes among total votes is the threshold to classify useful and not-useful reviews. We randomly selected 1,500 useful reviews and 1,500 not-useful reviews for each product category. And then we applied Information Gain and Support Vector Machine algorithms to classify the reviews and compared the classification performances in terms of precision, recall, and F-measure. Though the performances vary according to product categories and data sets, deleting terms with sparsity and neutrality showed the best performances in terms of F-measure for the two classification algorithms. However, deleting terms with sparsity only showed the best performances in terms of Recall for Information Gain and using all terms showed the best performances in terms of precision for SVM. Thus, it needs to be careful for selecting term deleting methods and classification algorithms based on data sets.

Recommender Systems using Structural Hole and Collaborative Filtering (구조적 공백과 협업필터링을 이용한 추천시스템)

  • Kim, Mingun;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.107-120
    • /
    • 2014
  • This study proposes a novel recommender system using the structural hole analysis to reflect qualitative and emotional information in recommendation process. Although collaborative filtering (CF) is known as the most popular recommendation algorithm, it has some limitations including scalability and sparsity problems. The scalability problem arises when the volume of users and items become quite large. It means that CF cannot scale up due to large computation time for finding neighbors from the user-item matrix as the number of users and items increases in real-world e-commerce sites. Sparsity is a common problem of most recommender systems due to the fact that users generally evaluate only a small portion of the whole items. In addition, the cold-start problem is the special case of the sparsity problem when users or items newly added to the system with no ratings at all. When the user's preference evaluation data is sparse, two users or items are unlikely to have common ratings, and finally, CF will predict ratings using a very limited number of similar users. Moreover, it may produces biased recommendations because similarity weights may be estimated using only a small portion of rating data. In this study, we suggest a novel limitation of the conventional CF. The limitation is that CF does not consider qualitative and emotional information about users in the recommendation process because it only utilizes user's preference scores of the user-item matrix. To address this novel limitation, this study proposes cluster-indexing CF model with the structural hole analysis for recommendations. In general, the structural hole means a location which connects two separate actors without any redundant connections in the network. The actor who occupies the structural hole can easily access to non-redundant, various and fresh information. Therefore, the actor who occupies the structural hole may be a important person in the focal network and he or she may be the representative person in the focal subgroup in the network. Thus, his or her characteristics may represent the general characteristics of the users in the focal subgroup. In this sense, we can distinguish friends and strangers of the focal user utilizing the structural hole analysis. This study uses the structural hole analysis to select structural holes in subgroups as an initial seeds for a cluster analysis. First, we gather data about users' preference ratings for items and their social network information. For gathering research data, we develop a data collection system. Then, we perform structural hole analysis and find structural holes of social network. Next, we use these structural holes as cluster centroids for the clustering algorithm. Finally, this study makes recommendations using CF within user's cluster, and compare the recommendation performances of comparative models. For implementing experiments of the proposed model, we composite the experimental results from two experiments. The first experiment is the structural hole analysis. For the first one, this study employs a software package for the analysis of social network data - UCINET version 6. The second one is for performing modified clustering, and CF using the result of the cluster analysis. We develop an experimental system using VBA (Visual Basic for Application) of Microsoft Excel 2007 for the second one. This study designs to analyzing clustering based on a novel similarity measure - Pearson correlation between user preference rating vectors for the modified clustering experiment. In addition, this study uses 'all-but-one' approach for the CF experiment. In order to validate the effectiveness of our proposed model, we apply three comparative types of CF models to the same dataset. The experimental results show that the proposed model outperforms the other comparative models. In especial, the proposed model significantly performs better than two comparative modes with the cluster analysis from the statistical significance test. However, the difference between the proposed model and the naive model does not have statistical significance.

Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System (추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법)

  • Lee, O-Joun;You, Eun-Soon
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.119-142
    • /
    • 2015
  • With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the recommendation performance that failed to demonstrate significant improvement over the existing techniques. The future research will consider the introduction of a high-dimensional parameter-free clustering algorithm or deep learning-based model in order to improve performance in recommendations.

Weighted Window Assisted User History Based Recommendation System (가중 윈도우를 통한 사용자 이력 기반 추천 시스템)

  • Hwang, Sungmin;Sokasane, Rajashree;Tri, Hiep Tuan Nguyen;Kim, Kyungbaek
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.6
    • /
    • pp.253-260
    • /
    • 2015
  • When we buy items in online stores, it is common to face recommended items that meet our interest. These recommendation system help users not only to find out related items, but also find new things that may interest users. Recommendation system has been widely studied and various models has been suggested such as, collaborative filtering and content-based filtering. Though collaborative filtering shows good performance for predicting users preference, there are some conditions where collaborative filtering cannot be applied. Sparsity in user data causes problems in comparing users. Systems which are newly starting or companies having small number of users are also hard to apply collaborative filtering. Content-based filtering should be used to support this conditions, but content-based filtering has some drawbacks and weakness which are tendency of recommending similar items, and keeping history of a user makes recommendation simple and not able to follow up users preference changes. To overcome this drawbacks and limitations, we suggest weighted window assisted user history based recommendation system, which captures user's purchase patterns and applies them to window weight adjustment. The system is capable of following current preference of a user, removing useless recommendation and suggesting items which cannot be simply found by users. To examine the performance under user and data sparsity environment, we applied data from start-up trading company. Through the experiments, we evaluate the operation of the proposed recommendation system.

A Study for Improving Computational Efficiency in Method of Moments with Loop-Star Basis Functions and Preconditioner (루프-스타(Loop-Star) 기저 함수와 전제 조건(Preconditioner)을 이용한 모멘트법의 계산 효율 향상에 대한 연구)

  • Yeom, Jae-Hyun;Park, Hyeon-Gyu;Lee, Hyun-Suck;Chin, Hui-Cheol;Kim, Hyo-Tae;Kim, Kyung-Tae
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.23 no.2
    • /
    • pp.169-176
    • /
    • 2012
  • This paper uses loop-star basis functions to overcome the low frequency breakdown problem in method of moments (MoM) based on electric field integral equation(EFIE). In addition, p-Type Multiplicative Schwarz preconditioner (p-MUS) technique is employed to reduce the number of iterations required for the conjugate gradient method(CGM). Low frequency instability with Rao Wilton Glisson(RWG) basis functions in EFIE can be resolved using loop-start basis functions and frequency normalized techniques. However, loop-star basis functions, consisting of irrotational and solenoidal components of RWG basis functions, require a large number of iterations to calculate a solution through iterative methods, such as conjugate gradient method(CGM), due to high condition number. To circumvent this problem, in this paper, the pMUS preconditioner technique is proposed to reduce the number of iterations in CGM. Simulation results show that pMUS preconditioner is much faster than block diagonal preconditioner(BDP) when the sparsity of pMUS is the same as that of BDP.

A Multi-Agent framework for Distributed Collaborative Filtering (분산 환경에서의 협력적 여과를 위한 멀티 에이전트 프레임워크)

  • Ji, Ae-Ttie;Yeon, Cheol;Lee, Seung-Hun;Jo, Geun-Sik;Kim, Heung-Nam
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.3
    • /
    • pp.119-140
    • /
    • 2007
  • Recommender systems enable a user to decide which information is interesting and valuable in our world of information overload. As the recent studies of distributed computing environment have been progressing actively, recommender systems, most of which were centralized, have changed toward a peer-to-peer approach. Collaborative Filtering (CF), one of the most successful technologies in recommender systems, presents several limitations, namely sparsity, scalability, cold start, and the shilling problem, in spite of its popularity. The move from centralized systems to distributed approaches can partially improve the issues; distrust of recommendation and abuses of personal information. However, distributed systems can be vulnerable to attackers, who may inject biased profiles to force systems to adapt their objectives. In this paper, we consider both effective CF in P2P environment in order to improve overall performance of system and efficient solution of the problems related to abuses of personal data and attacks of malicious users. To deal with these issues, we propose a multi-agent framework for a distributed CF focusing on the trust relationships between individuals, i.e. web of trust. We employ an agent-based approach to improve the efficiency of distributed computing and propagate trust information among users with effect. The experimental evaluation shows that the proposed method brings significant improvement in terms of the distributed computing of similarity model building and the robustness of system against malicious attacks. Finally, we are planning to study trust propagation mechanisms by taking trust decay problem into consideration.

  • PDF

Preference Prediction System using Similarity Weight granted Bayesian estimated value and Associative User Clustering (베이지안 추정치가 부여된 유사도 가중치와 연관 사용자 군집을 이용한 선호도 예측 시스템)

  • 정경용;최성용;임기욱;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.316-325
    • /
    • 2003
  • A user preference prediction method using an exiting collaborative filtering technique has used the nearest-neighborhood method based on the user preference about items and has sought the user's similarity from the Pearson correlation coefficient. Therefore, it does not reflect any contents about items and also solve the problem of the sparsity. This study suggests the preference prediction system using the similarity weight granted Bayesian estimated value and the associative user clustering to complement problems of an exiting collaborative preference prediction method. This method suggested in this paper groups the user according to the Genre by using Association Rule Hypergraph Partitioning Algorithm and the new user is classified into one of these Genres by Naive Bayes classifier to slove the problem of sparsity in the collaborative filtering system. Besides, for get the similarity between users belonged to the classified genre and new users, this study allows the different estimated value to item which user vote through Naive Bayes learning. If the preference with estimated value is applied to the exiting Pearson correlation coefficient, it is able to promote the precision of the prediction by reducing the error of the prediction because of missing value. To estimate the performance of suggested method, the suggested method is compared with existing collaborative filtering techniques. As a result, the proposed method is efficient for improving the accuracy of prediction through solving problems of existing collaborative filtering techniques.

Video Replay by Frame Receive Order Relocation Method in the Wire and Wireless Network (유무선 네트워크에서 프레임 수신 순서 재할당 방법을 사용한 동영상 재생)

  • Kang, Dong-Jin;Kim, Dong-Hoi
    • Journal of Digital Contents Society
    • /
    • v.17 no.3
    • /
    • pp.135-142
    • /
    • 2016
  • When video service is performed in simulation using NS-2(Network Simulation-2), the video replay is performed as the received frame order. In the existing video replay method based on the received frame order, as the frame orders of receiver and transmitter are different, the receiver buffer does not have the effect that the packets between the frames of transmitter buffer holds a regular size and packet dense and sparsity phenomenon in the receiver buffer is made by the irregular packet size due to the unpredictable reversed order of received partial frames. The above dense and sparsity phenomenon increases the probability of buffer overflow and underflow generation. To prevent these problems, the proposed frame receive order relocation method adds an extra replay buffer which rearranges the order of receive frame as the order of transmit frame, so it has the effect that the packets between the transmit frames keeps a regular size. Through the simulation using NS-2 and JSVM(Joint Scalable Video Model), the generation number of buffer overflow and underflow, and PSNR(Required Peak Signal to Noise Ratio) performance between the existing method and proposed method were compared. As a result, it was found that the proposed method would have better performance than the existing method.

Dilated convolution and gated linear unit based sound event detection and tagging algorithm using weak label (약한 레이블을 이용한 확장 합성곱 신경망과 게이트 선형 유닛 기반 음향 이벤트 검출 및 태깅 알고리즘)

  • Park, Chungho;Kim, Donghyun;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.414-423
    • /
    • 2020
  • In this paper, we propose a Dilated Convolution Gate Linear Unit (DCGLU) to mitigate the lack of sparsity and small receptive field problems caused by the segmentation map extraction process in sound event detection with weak labels. In the advent of deep learning framework, segmentation map extraction approaches have shown improved performance in noisy environments. However, these methods are forced to maintain the size of the feature map to extract the segmentation map as the model would be constructed without a pooling operation. As a result, the performance of these methods is deteriorated with a lack of sparsity and a small receptive field. To mitigate these problems, we utilize GLU to control the flow of information and Dilated Convolutional Neural Networks (DCNNs) to increase the receptive field without additional learning parameters. For the performance evaluation, we employ a URBAN-SED and self-organized bird sound dataset. The relevant experiments show that our proposed DCGLU model outperforms over other baselines. In particular, our method is shown to exhibit robustness against nature sound noises with three Signal to Noise Ratio (SNR) levels (20 dB, 10 dB and 0 dB).