• Title/Summary/Keyword: Graph Dataset

Search Result 70, Processing Time 0.028 seconds

Road network data matching using the network division technique (네트워크 분할 기법을 이용한 도로 네트워크 데이터 정합)

  • Huh, Yong;Son, Whamin;Lee, Jeabin
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.4
    • /
    • pp.285-292
    • /
    • 2013
  • This study proposes a network matching method based on a network division technique. The proposed method generates polygons surrounded by links of the original network dataset, and detects corresponding polygon group pairs using a intersection-based graph clustering. Then corresponding sub-network pairs are obtained from the polygon group pairs. To perform the geometric correction between them, the Iterative Closest Points algorithm is applied to the nodes of each corresponding sub-networks pair. Finally, Hausdorff distance analysis is applied to find link pairs of networks. To assess the feasibility of the algorithm, we apply it to the networks from the KTDB center and commercial CNS company. In the experiments, several Hausdorff distance thresholds from 3m to 18m with 3m intervals are tested and, finally, we can get the F-measure of 0.99 when using the threshold of 15m.

Ant Colony Hierarchical Cluster Analysis (개미 군락 시스템을 이용한 계층적 클러스터 분석)

  • Kang, Mun-Su;Choi, Young-Sik
    • Journal of Internet Computing and Services
    • /
    • v.15 no.5
    • /
    • pp.95-105
    • /
    • 2014
  • In this paper, we present a novel ant-based hierarchical clustering algorithm, where ants repeatedly hop from one node to another over a weighted directed graph of k-nearest neighborhood obtained from a given dataset. We introduce a notion of node pheromone, which is the summation of amount of pheromone on incoming arcs to a node. The node pheromone can be regarded as a relative density measure in a local region. After a finite number of ants' hopping, we remove nodes with a small amount of node pheromone from the directed graph, and obtain a group of strongly connected components as clusters. We iteratively do this removing process from a low value of threshold to a high value, yielding a hierarchy of clusters. We demonstrate the performance of the proposed algorithm with synthetic and real data sets, comparing with traditional clustering methods. Experimental results show the superiority of the proposed method to the traditional methods.

Student Group Division Algorithm based on Multi-view Attribute Heterogeneous Information Network

  • Jia, Xibin;Lu, Zijia;Mi, Qing;An, Zhefeng;Li, Xiaoyong;Hong, Min
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3836-3854
    • /
    • 2022
  • The student group division is benefit for universities to do the student management based on the group profile. With the widespread use of student smart cards on campus, especially where students living in campus residence halls, students' daily activities on campus are recorded with information such as smart card swiping time and location. Therefore, it is feasible to depict the students with the daily activity data and accordingly group students based on objective measuring from their campus behavior with some regular student attributions collected in the management system. However, it is challenge in feature representation due to diverse forms of the student data. To effectively and comprehensively represent students' behaviors for further student group division, we proposed to adopt activity data from student smart cards and student attributes as input data with taking account of activity and attribution relationship types from different perspective. Specially, we propose a novel student group division method based on a multi-view student attribute heterogeneous information network (MSA-HIN). The network nodes in our proposed MSA-HIN represent students with their multi-dimensional attribute information. Meanwhile, the edges are constructed to characterize student different relationships, such as co-major, co-occurrence, and co-borrowing books. Based on the MSA-HIN, embedded representations of students are learned and a deep graph cluster algorithm is applied to divide students into groups. Comparative experiments have been done on a real-life campus dataset collected from a university. The experimental results demonstrate that our method can effectively reveal the variability of student attributes and relationships and accordingly achieves the best clustering results for group division.

A Max-Flow-Based Similarity Measure for Spectral Clustering

  • Cao, Jiangzhong;Chen, Pei;Zheng, Yun;Dai, Qingyun
    • ETRI Journal
    • /
    • v.35 no.2
    • /
    • pp.311-320
    • /
    • 2013
  • In most spectral clustering approaches, the Gaussian kernel-based similarity measure is used to construct the affinity matrix. However, such a similarity measure does not work well on a dataset with a nonlinear and elongated structure. In this paper, we present a new similarity measure to deal with the nonlinearity issue. The maximum flow between data points is computed as the new similarity, which can satisfy the requirement for similarity in the clustering method. Additionally, the new similarity carries the global and local relations between data. We apply it to spectral clustering and compare the proposed similarity measure with other state-of-the-art methods on both synthetic and real-world data. The experiment results show the superiority of the new similarity: 1) The max-flow-based similarity measure can significantly improve the performance of spectral clustering; 2) It is robust and not sensitive to the parameters.

A New Perspective to Stable Marriage Problem in Profit Maximization of Matrimonial Websites

  • Bhatnagar, Aniket;Gambhir, Varun;Thakur, Manish Kumar
    • Journal of Information Processing Systems
    • /
    • v.14 no.4
    • /
    • pp.961-979
    • /
    • 2018
  • For many years, matching in a bipartite graph has been widely used in various assignment problems, such as stable marriage problem (SMP). As an application of bipartite matching, the problem of stable marriage is defined over equally sized sets of men and women to identify a stable matching in which each person is assigned a partner of opposite gender according to their preferences. The classical SMP proposed by Gale and Shapley uses preference lists for each individual (men and women) which are infeasible in real world applications for a large populace of men and women such as matrimonial websites. In this paper, we have proposed an enhancement to the SMP by computing a weighted score for the users registered at matrimonial websites. The proposed enhancement has been formulated into profit maximization of matrimonial websites in terms of their ability to provide a suitable match for the users. The proposed formulation to maximize the profits of matrimonial websites leads to a combinatorial optimization problem. We have proposed greedy and genetic algorithm based approaches to solve the proposed optimization problem. We have shown that the proposed genetic algorithm based approaches outperform the existing Gale-Shapley algorithm on the dataset crawled from matrimonial websites.

Transitive Similarity Evaluation Model for Improving Sparsity in Collaborative Filtering (협업필터링의 희박 행렬 문제를 위한 이행적 유사도 평가 모델)

  • Bae, Eun-Young;Yu, Seok-Jong
    • The Journal of Korean Institute of Information Technology
    • /
    • v.16 no.12
    • /
    • pp.109-114
    • /
    • 2018
  • Collaborative filtering has been widely utilized in recommender systems as typical algorithm for outstanding performance. Since it depends on item rating history structurally, The more sparse rating matrix is, the lower its recommendation accuracy is, and sometimes it is totally useless. Variety of hybrid approaches have tried to combine collaborative filtering and content-based method for improving the sparsity issue in rating matrix. In this study, a new method is suggested for the same purpose, but with different perspective, it deals with no-match situation in person-person similarity evaluation. This method is called the transitive similarity model because it is based on relation graph of people, and it compares recommendation accuracy by applying to Movielens open dataset.

A Synthetic Dataset for Korean Knowledge Graph-to-Text Generation (한국어 지식 그래프-투-텍스트 생성을 위한 데이터셋 자동 구축)

  • Dahyun Jung;Seungyoon Lee;SeungJun Lee;Jaehyung Seo;Sugyeong Eo;Chanjun Park;Yuna Hur;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.219-224
    • /
    • 2022
  • 최근 딥러닝이 상식 정보를 추론하지 못하거나, 해석 불가능하다는 한계점을 보완하기 위해 지식 그래프를 기반으로 자연어 텍스트를 생성하는 연구가 중요하게 수행되고 있다. 그러나 이를 위해서 대량의 지식 그래프와 이에 대응되는 문장쌍이 요구되는데, 이를 구축하는 데는 시간과 비용이 많이 소요되는 한계점이 존재한다. 또한 하나의 그래프에 다수의 문장을 생성할 수 있기에 구축자 별로 품질 차이가 발생하게 되고, 데이터 균등성에 문제가 발생하게 된다. 이에 본 논문은 공개된 지식 그래프인 디비피디아를 활용하여 전문가의 도움 없이 자동으로 데이터를 쉽고 빠르게 구축하는 방법론을 제안한다. 이를 기반으로 KoBART와 mBART, mT5와 같은 한국어를 포함한 대용량 언어모델을 활용하여 문장 생성 실험을 진행하였다. 실험 결과 mBART를 활용하여 미세 조정 학습을 진행한 모델이 좋은 성능을 보였고, 자연스러운 문장을 생성하는데 효과적임을 확인하였다.

  • PDF

A Folksonomy Ranking Framework: A Semantic Graph-based Approach (폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근)

  • Park, Hyun-Jung;Rho, Sang-Kyu
    • Asia pacific journal of information systems
    • /
    • v.21 no.2
    • /
    • pp.89-116
    • /
    • 2011
  • In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.

Personalized Session-based Recommendation for Set-Top Box Audience Targeting (셋톱박스 오디언스 타겟팅을 위한 세션 기반 개인화 추천 시스템 개발)

  • Jisoo Cha;Koosup Jeong;Wooyoung Kim;Jaewon Yang;Sangduk Baek;Wonjun Lee;Seoho Jang;Taejoon Park;Chanwoo Jeong;Wooju Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.323-338
    • /
    • 2023
  • TV advertising with deep analysis of watching pattern of audiences is important to set-top box audience targeting. Applying session-based recommendation model(SBR) to internet commercial, or recommendation based on searching history of user showed its effectiveness in previous studies, but applying SBR to the TV advertising was difficult in South Korea due to data unavailabilities. Also, traditional SBR has limitations for dealing with user preferences, especially in data with user identification information. To tackle with these problems, we first obtain set-top box data from three major broadcasting companies in South Korea(SKB, KT, LGU+) through collaboration with Korea Broadcast Advertising Corporation(KOBACO), and this data contains of watching sequence of 4,847 anonymized users for 6 month respectively. Second, we develop personalized session-based recommendation model to deal with hierarchical data of user-session-item. Experiments conducted on set-top box audience dataset and two other public dataset for validation. In result, our proposed model outperformed baseline model in some criteria.

Building change detection in high spatial resolution images using deep learning and graph model (딥러닝과 그래프 모델을 활용한 고해상도 영상의 건물 변화탐지)

  • Park, Seula;Song, Ahram
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.40 no.3
    • /
    • pp.227-237
    • /
    • 2022
  • The most critical factors for detecting changes in very high-resolution satellite images are building positional inconsistencies and relief displacements caused by satellite side-view. To resolve the above problems, additional processing using a digital elevation model and deep learning approach have been proposed. Unfortunately, these approaches are not sufficiently effective in solving these problems. This study proposed a change detection method that considers both positional and topology information of buildings. Mask R-CNN (Region-based Convolutional Neural Network) was trained on a SpaceNet building detection v2 dataset, and the central points of each building were extracted as building nodes. Then, triangulated irregular network graphs were created on building nodes from temporal images. To extract the area, where there is a structural difference between two graphs, a change index reflecting the similarity of the graphs and differences in the location of building nodes was proposed. Finally, newly changed or deleted buildings were detected by comparing the two graphs. Three pairs of test sites were selected to evaluate the proposed method's effectiveness, and the results showed that changed buildings were detected in the case of side-view satellite images with building positional inconsistencies.