1. Introduction
Network embedding or graph embedding refers to the task of embedding a node in a network into a low-dimensional vector space. All nodes from a network such as a social network [1], mobile application (app) network [2, 3], or scientific paper network [4, 5] can be embedded through explicit or indirect connections. Embedding nodes enables similarities between nodes to be calculated and can be used for a variety of tasks such as node labeling, clustering, link prediction, visualization, and graph analysis [1]. Clustering is a representative task that can be used for the embedding task. Clustering groups similar nodes or documents and successfully extracting insights by classifying a large number of documents without an expensive prebuilt training set. Recently, clustering techniques have become frequently used for big data analysis.
Network-based clustering is one of the methods that groups nodes quickly and effectively by analyzing the importance of nodes and the strength of relationships between them based on graphs. Network-based clustering methods generally show higher performance because nodes are treated as connected objects rather than independently [3].
In recent years, research that applies the word2vec algorithm [6], which is widely used for word embedding, to network embedding has become mainstream. The word2vec-based method embeds a node by predicting its neighbor nodes; a randomized algorithm such as a random walk is used to sample neighbor nodes. DeepWalk [7] assumes that nodes are closer if they generate a higher sampling frequency during a random walk when analyzing the connection relationships in a graph. Node2vec [8] improves performance by considering both a breadth-first search and depth-first search strategy.
Although the existing method builds the graph through explicit connections, the connection between nodes can be determined in various ways. Fig. 1 shows graphs constructed differently according to various information about a mobile app. The graph on the left forms links between apps that share the same word in the title, and the middle graph connects applications created by the same developer. The third graph forms links between apps according to two topics: “con-stellation” and “camera,” All three graphs were built with different views for one app, reflecting three different characteristics. In this paper, we propose a multi-channel embedding method that considers various types of graphs simultaneously. When embedding a node, the proposed method learns various latent characteristics of the nodes. When using multi-channel information, the importance of each channel should be considered. It is intuitive to give more weight to channels with more important information and not to use channels that are of little help. To this end, we propose a method of embedding a node by adding a gate layer including a reset gate to the neural network.
Fig. 1. Example of graphs built using different information.
The contributions of this paper are as follows. First, we propose a novel method that uses different types of interconnections to embed a certain node simultaneously using gated multi-channel embedding methods; This not only enables the method to learn several characteristics of the node but also enforces node embeddings to focus on more important information channels. it leads to better performance on the benchmark test set. Second, we propose an indirect interconnection method that can be applied to all domains. Using topic analysis results derived from the other method, we were able to successfully create a graph suitable for clustering tasks. Lastly, We prove that the proposed method is effective for big data analysis by applying it to a large data set.
2. Literature Review
2.1 Neural network-based embedding
The word2vec algorithm, which is based on a neural network, converts a high-dimensional integer vector represented by one-hot encoding into a low-dimensional real vector. CBOW and skip-gram are two methods that embed words with the word2vec algorithm [6]. CBOW learns to predict adjacent words from a series of input words, whereas skip-gram learns to predict a sequence of words from an input word. The learned projection layer is used to encode the one-hot encoded-word vector into a low-dimensional vector. As extensions of word-based representations, low-dimensional vector representations have been studied in terms of sentences, documents, or semantic units [6, 9, 10, 11, 12, 13, 14]. An embed-ding method called doc2vec, which is similar to word2vec, has been proposed to embed documents [9] and outperforms word2vec when comparing the similarity of documents [15]. Other studies have used knowledge bases to improve the performance of embeddings. Some studies have improved performance using the WordNet Knowledge Base [13, 16], whereas others have improved performance in a semi-supervised way [2]. In this paper, we improve the embedding performance using meta information and the topic analysis results of the latent Dirichlet allocation (LDA) model [17].
2.2 Graph Embedding
Some studies have investigated embedding nodes in graphs by extending text embedding methods [7, 8, 18]. Various types of graphs for social networks [1], scientific papers [4, 5], protein interactions [8], drug-target interactions [19] and mobile apps [2, 3] have been embedded to predict, cluster, or visualize nodes. Many previous approaches focus on the sampling strategy that is used to select target nodes for prediction. The DeepWalk [7] and node2vec [8] approaches used randomized sampling methods to select target nodes. The LINE [18] approach is an edge-sampling approach for embedding a large network, taking into account both first- and second-order proximities to avoid underfitting. In this paper, we simply use first-order proximity to reduce complexity. To avoid underfitting, we use multi-channel information instead. Text information is one of the primary sources for node representations. A graph can be built by interconnecting the shared words in the text of nodes [4]. PTE [20] is a semi-supervised embedding method that uses text networks instead of an unsupervised text embedding method such as skip grams. Yoon et al. [3] also used a text network to embed a large number of mobile apps with a description set. In this study, we also use a text network as one of the basic channels for embedding nodes.
Recently, heterogeneous network embedding methods are being studied to utilize multiple information from multiple sources. NEDTP [19] builds a similar network of nodes based on 15 heterogeneous information networks to predict drug-target interactions. Duong et al. [21] introduce a heterogeneous network embedding method using message passing [22] for an information retrieval system. MixSp [23] studies multiple networks and enforces node embedding to be similar across multiple networks with cross-view co-regularization. However, in these methods, all networks are treated with equal weight and do not differentiate between sensitive and neglected information. In this paper, we propose a method of assigning high weight to important information and giving low weight to unnecessary information.
2.3 Network-based Clustering
Network clustering is an unsupervised approach to clustering network objects with affinity [4, 24, 25, 26, 27]; document networks or social networks are often clustered to analyze datasets without expending much human effort. Network-based clustering is different from content-based clustering in that it clusters documents based on connectivity. The network-based approach has performed better than content-based clustering in certain tasks [20, 3] because it uses human-labeled information such as citations and tagging information. Academic papers with citations [4, 5], web documents with hypertext [25, 28], social networks [29], and news [30] with text networks have been clustered using the network-based approach. In this paper, we used network-based clustering on a large volume of mobile apps. To this end, we use three channels: text links, developer links, and topic links.
3. Multichannel Graph Embedding
In this paper, multi-channel graph embedding means trans-forming 𝑉→ 𝑅+ when there are 𝐸k ∈ 𝐸 and 𝐺k ∈ 𝐺 for node-set 𝑉, edge set 𝐸, edge subset 𝐸k, graph set 𝐺, and graph subset 𝐺k. We propose a method for multi-channel embedding. The DeepWalk algorithm is a method of embedding a network based on negative sampling, and as shown in Fig. 2, it can learn to predict a neighboring node for every channel, so that it can be easily extended to multi-channel. However, when embedding a node, a difference in the amount of information included in each channel is not considered. The proposed method adds a gate layer to the negative sampling algorithm and considers the importance of each channel.
Fig. 2. DeepWalk with multi-channel information.
3.1 DeepWalk with Multiple Channels
The DeepWalk algorithm with multiple channels learns the node embedding function 𝑉 → 𝑅+ when graph 𝐺𝑘 = (𝑉, 𝐸𝑘) and 𝐺𝑘 ∈ 𝐺 are given for node 𝑉. Function 𝑓 is shared across all graphs, as shown in Fig. 2. For a given graph 𝐺𝑘 = (𝑉, 𝐸𝑘), 𝐺𝑘 ∈ 𝐺, this algorithm minimizes the negative log-likelihood of the following summation and learns 𝑘 embedding functions 𝑓𝑘 ∈ 𝑓, 𝑓 ∶ 𝑉 →𝑅+ as follows:
\(\begin{aligned}\underset{f}{\operatorname{argmin}} \sum_{f_{i} \in f, N_{i}^{S} \in N^{s}} \sum_{v \in V}-\log P\left(N_{i}^{S}(v) \mid f_{i}(v)\right)\end{aligned}\)
Here, 𝑁𝑠(𝑣) denotes the set of neighbor nodes of 𝑣 and 𝑁𝑖𝑠 denotes the set of neighbor nodes of 𝑣 in the 𝑖-the graph. In this method, a neighbor node is sampled in a graph using a sequence-based method [7]. The neighbor node 𝑁𝑠(𝑣) is selected from a window 𝜔 in a sequence. Because 𝑣 and 𝑓(𝑣) have a one-to-one mapping relationship, the probability of predicting the neighbor vector set 𝑁s(𝑣) of embedding 𝑣 can be defined as follows:
\(\begin{aligned}P\left(N^{s}(v) \mid f(v)\right)=\sum_{n_{i} \in N^{s}(v)} \log P\left(n_{i} \mid v\right)\end{aligned}\)
In the embedding space, the symmetry between node 𝑣 and neighbor nodes must be satisfied. This is because the similarity between nodes 𝑣1 and 𝑣2 should be constant. The softmax function satisfies this property.
\(\begin{aligned}\log P(n \mid v)=\frac{\exp (f(n) f(v))}{\sum_{u \in V} \exp (f(u), f(v))}\end{aligned}\)
Using the above formulas, the summation to be optimized can be calculated.
3.2 Gated Multi-channel Network Embedding
John et al. [31] proposed a multimodal gate unit to classify movie genres using both image and text modalities. Inspired by this study, the proposed method learns by selecting channels when embedding nodes by adding a multi-channel gate layer to the negative sampling algorithm for network embedding. Learning from multiple sources is closely related to information fusion. Information fusion finds the optimal way to select the necessary information from various sources of information to predict the correct answer. The simplest and most effective method of information fusion is an ensemble method using a weighted sum or voting on the prediction results of a model trained using each source. However, to obtain the result, several models must be run simultaneously. Therefore, a method called late fusion, which feeds multi-source information into a single model and performs fusion at the final decision stage has been recently studied. Another method is to fuse information from each source at the feature level. It is a method called early fusion, where information selected from each source is projected onto a shared latent vector space and used for prediction. Fusion-based methods are commonly used for classification tasks. It uses classification-based neural networks such as CNNs and RNNs with an additional layer to handle information fusion. In this paper, an novel information fusion method for node embedding is introduced. It embeds network nodes by selecting important information from multiple sources by adding a gate layer to the negative sampling algorithm. That is, the gate layer used in GRU or LSTM is added to the multi-channel DeepWalk algorithm introduced in 3.1. Fig. 3 a shows the negative sampling algorithm using the proposed gate layer. In the DeepWalk algorithm, a node with multiple graphs is simply projected onto a shared latent semantic space using a negative sampling algorithm. It assumes that all graphs have the same weights and does not consider node representations between graphs. To solve this problem, a gate layer as shown in Fig. 3.b is proposed. Each 𝑥𝑖 represents a node 𝑥 represented in graph 𝑖. The update gate adjusts how much 𝑥𝑖 contributes to the output vector, and the reset gate determines how much information to ignore when 𝑥𝑖 contains unnecessary information. The equations governing the multi-channel gate layer are as follows:
Fig. 3. Network Embedding with a multichannel gate layer
\(\begin{aligned}\begin{array}{c}z_{i}=\sigma\left(\sum_{i=1}^{k} W_{i}^{(z)} \cdot x_{i}\right) \\ r_{i}=\sigma\left(\sum_{i=1}^{k} W_{i}^{(r)} \cdot x_{i}\right) \\ h_{i}=\tanh \left(x_{i}+z_{i}+r_{i} \otimes \sum_{j \neq i}^{k} x_{j}\right) \\ h=\sum_{k}^{i} W_{i}^{(h)} \cdot h_{i} \end{array}\end{aligned}\)
Θ = {𝑊(z)𝑖∈{1,...,𝑘}, 𝑊(𝑟)𝑖∈{1,...,𝑘}, 𝑊(ℎ)𝑖∈{1,...,𝑘}}
𝑧𝑖 and 𝑟𝑖 indicates results of the update gate and the reset gate. Θ indicates the parameters to be learned. With the gate layer, the probability of predicting the neighbor 𝑣 can be defined as follow:
\(\begin{aligned}\log P(n \mid v)=\frac{\exp (f(M G(n)) f(M G(v)))}{\sum_{u \in v} \exp (f(M G(u)), f(M G(v)))}\end{aligned}\)
𝑀G indicates the multi-channel gate layer.
4. Evaluations
4.1 Dataset and Experimental Setup
To evaluate the effectiveness of the proposed multi-channel embedding methods, we applied the methods to a fine-grained clustering task. For the graph set, we used a mobile app dataset collected by web crawling the Google Play Store.
For the test dataset, we used a fine-grained clustering dataset[3]. The Google Play Store has fewer than 60 default categories that developers can select when registering their app. The following categories were selected for the test dataset: life, education, travel, tools, and entertainment. The details of the datasets are given in Table 1. The test set was classified into two sets to evaluate the single-labeling and multi-labeling clustering performance of the multi-channel embedding model. For the first set, each app is assigned to a single cluster, and for the second set, each app can be assigned to multiple clusters. The mobile app datasets are formally defined as follows:
Table 1. Test dataset. Five categories were selected in which various types of apps were registered.
• MobileApp Graphs: 𝐺𝑘 ∈ 𝐺 and 𝐺𝑘 =< 𝑉, 𝐸𝑘 >. Each 𝐺𝑘 has its own rule for building edges 𝐸𝑘 between nodes, which are as follows:
– 𝐸1: Connect apps sharing some title words;
– 𝐸2: Connect apps sharing a topic;
– 𝐸3: Connect apps built by the same developer.
For our experiments, we used a system equipped with an Intel i9 CPU clocked at 4.6 GHz and 128 GiB of main memory.
4.2 Clustering Performance
To evaluate the mobile app clustering performance of the multi-channel network-embedding algorithms GNME, we compared them with several existing embedding algorithms. We included not only neural-network-based embedding algorithms but also traditional frequency-based algorithms. All methods compared in the evaluation are listed below:
• TF: Nodes are embedded based on the term frequency in a description. The dimension of the embedding is decided by the total number of words 𝑁.
• LDA: To embed nodes as vectors, the topic-score vector is calculated from the description of the mobile app based on LDA [16]. The topic number was determined to be 124 using parameter tuning.
• Doc2vec: The doc2vec algorithm [13] is used for the node embedding. For this evaluation, we set the dimension size to 300, five negative examples were sampled, and the window was set to 8.
• DeepWalk: Network embedding with the word2vec algorithm. This algorithm selects neighbors with a random walk strategy. Nodes sharing title words are connected as neighbors.
• MDeepWalk: In Fig. 2, we applied the DeepWalk algorithm for multichannel embedding using the three channels introduced in Section 4.1. The algorithm predicts neighboring nodes from each of the three channels and reflects all their characteristics in the final node embeddings.
• GMNE: Fig. 3 depicts the proposed multichannel embedding algorithm with a gate layer. This layer assigns higher weights to more important nodes and channels, addressing the issues of the MDeepWalk algorithm, which treats all nodes equally.
To evaluate the effectiveness of the embedding algorithms for graph clustering, each of the embedding algorithms was applied to the two following clustering algorithms:
• K-means clustering: This clustering minimizes the variance between the clusters. Each cluster has one centroid, and each object is assigned to the nearest center; objects assigned to the same center gather to form a cluster. To fairly compare algorithms, the gold-standard number of clusters 𝑘 is given. For the development set, the lifestyle category was used and the hyperparameters were determined using the irace package [32].
• Keyword-based clustering: This clustering algorithm [3] clusters nodes by extracting representative words from a document. This algorithm calculates the probability that each app will be assigned to a specific cluster and assigns the app to the cluster when the probability exceeds a threshold. The advantage of this method is that it can be used for multiple labels.
We used purity and entropy as the evaluation measures to compare the clustering performances. Purity is an external evaluation index for measuring the quality of the clusters and is determined as follows:
\(\begin{aligned}Purity=\frac{1}{N} \sum_{i}^{k} \max _{j}\left|c_{i} \cap t_{j}\right|\end{aligned}\)
where 𝑁 denotes the total number of documents, 𝑐𝑖 denotes the 𝑖-th cluster predicted by the system, and 𝑡𝑗 denotes the cluster tag that appears the most often in the documents that belong to 𝑐𝑖. Purity has a value between 0 and 1, and a higher value indicates better performance. Entropy is an indicator that measures the degree of unexpectedness and is determined as follows:
\(\begin{aligned}Entropy=\sum_{i}^{k} \sum_{j}-\frac{N_{i}}{N} \times p\left(t_{j}\right) \log p\left(t_{j}\right)\end{aligned}\)
where 𝑁𝑖 is the number of documents that belong to 𝑐𝑖, and 𝑝(𝑡𝑗) is the probability that a document that belongs to 𝑐𝑖 is classified as 𝑡𝑗.
Table 2 shows the performances of the embedding methods. The simple document frequency-based embedding method obtains average purity and entropy values of 0.42 and 2.83, respectively, and yields the lowest values in all categories. The doc2vec and LDA methods have similar average values for purity; the LDA method performs better in the lifestyle category and the doc2vec method performs better in the entertainment category. The proposed GMNE methods perform better concerning both purity and entropy compared to the DeepWalk method. In particular, the GMNE method obtains results that are better than those of DeepWalk by 8% concerning purity and 0.15 concerning entropy, which indicates that a model considering various graphs with a gate layer is more effective than a model that considers only one graph. Furthermore, the proposed GMNE methods exhibit better performance in all categories, indicating that the consideration of multiple graphs and the use of the gate layer can improve the quality of the embeddings. Overall, our experimental results demonstrate that the proposed GMNE methods can effectively capture the multi-channel characteristics of social networks and outperform state-of-the-art methods in terms of both purity and entropy. Table 3 shows the keyword-based clustering performance. The MDeepWalk method also performs better than the existing methods in the keyword-based clustering experiments. In MDeepWalk, the results of topic allocation obtained by LDA were used as the input of the topic channel. This indirectly has the same effect as an ensemble model and improves performance when compared with using the LDA or DeepWalk method alone. When using multiple channels, GMNE is more effective than MDeepWalk for both the k-means algorithm and the keyword-based algorithm. The keyword-based clustering performance evaluation in Table 3 further supports the superiority of the GMNE methods over other embedding techniques. Specifically, the results show that the MDeepWalk method outperforms existing methods, including LDA and DeepWalk, in terms of purity and entropy metrics. This is attributed to the use of topic allocation results from LDA as an additional channel to complement the existing graph-based channel in MDeepWalk. However, when multiple channels are considered, the proposed GMNE methods are shown to be more effective than MDeepWalk in both the k-means algorithm and the keyword-based algorithm. This suggests that GMNE is better able to capture the diverse sources of information present in social networks and use them to generate high-quality embeddings for clustering tasks.
Table 2. K-means clustering performance of each embedding method.
Table 3. Keyword-based clustering performance of each embedding method
4.3 Multi-Labeling Performance
An app can be assigned to multiple clusters depending on its nature. For example, the “star camera” app can be assigned to a constellation topic cluster or a camera topic cluster. Set 2 is a dataset that includes apps with multiple labels. We hence used this dataset to evaluate the performance of the proposed method in the multi-labeling task. The F-measure is used as the metric for the evaluation. The F-measure of cluster C 𝐹(𝐶) is defined as follows:
\(\begin{aligned}F(C)=\sum_{x_{j} \in X} \frac{\left|x_{j}\right|}{|D|} \max _{c_{i} \in C}\{F\}, \; where \;F=\frac{2 P R}{P+R}, P=\frac{\left|c_{i} \cap x_{j}\right|}{\left|c_{i}\right|}\; and \; R=\frac{\left|c_{i} \cap x_{j}\right|}{\left|x_{j}\right|}\end{aligned}\)
In general, a high 𝐹(𝐶) value indicates that the cluster is of good quality. In this study, document 𝑥 was allocated to every cluster 𝐶𝑖, where 𝑝(𝐶𝑖|𝑥) > threshold using the proposed clustering method.
Fig. 4 exhibits the average F-measure scores of the embedding algorithms. The GMNE algorithm has an F-measure value of 0.75 for a threshold of 0.5, which is higher than those of doc2vec (0.66 at 0.45), DeepWalk (0.67 at 0.4), and MDeepWalk (0.68 at 0.45). The precision and recall for the highest F-measure value are respectively 0.77 and 0.73 for GMNE; 0.72 and 0.61 for doc2vec; 0.7 and 0.64 for DeepWalk; and 0.68 and 0.69 for MDeepWalk. Hence, both the precision and recall of the GMNE method are higher than those of the other methods. Overall, the results demonstrate that the proposed GMNE method is effective in multi-labeling tasks and outperforms the other embedding algorithms. This is attributed to the ability of GMNE to capture the multi-channel characteristics of networks, which enables it to assign apps to multiple clusters accurately. The findings suggest that the proposed method can be useful in various applications, such as personalized recommendation systems and content classification.
Fig. 4. Multi-labeling performance.
4.4 Ablation Test
We conducted ablation studies to evaluate the influence of each channel in our model. MDeepWalk and GMNE obtain the best performance when used with all channels, as expected. In k-means clustering, the performance drops most drastically without the text network, while in keyword-based clustering, the performance falls most significantly without the topic network. This reflects the nature of the keyword-based clustering algorithm, which uses textual information. This property can retain some of the properties of a text network even without an explicit text network.
These results demonstrate that all channels play a crucial role in achieving high performance in the proposed model. Overall, the ablation studies support the effectiveness of the proposed model in app clustering and suggest that all channels should be considered in app clustering tasks.
Table 4. Keyword-based clustering performance of each embedding method
5. Conclusion
In this study, we proposed two methods for embedding a certain node using multiple graphs: the GMNE with a multi-channel gate layer, which obtained better results than the base methods. The contributions of this study are as follows: First, unlike the previous approaches, we used different types of interconnections to embed a certain node using multichannel embedding methods; this enables the embedding vector to learn several characteristics about the node. Second, we proposed an indirect interconnection method that can be applied in all domains. By using topic analysis results derived from another method, we were able to successfully create a graph suitable for clustering tasks. Lastly, we proposed a method with a multi-channel gate layer. The gate layer addresses the mismatch from different graphs; it obtains better results than the methods that do not use the gate layer. We believe this study provides interesting directions for research in embedding networks. There remains considerable room for improving the proposed methods by discovering new types of graphs.
Acknowledgment
“This work was supported by the research grant of Jeju National University in 2022"
References
- Goyal, P., Ferrara, E, "Graph embedding techniques, applications, and performance: A survey," Knowledge-Based Systems, 151, 78-94, 2018. https://doi.org/10.1016/j.knosys.2018.03.022
- Yoon, Y.C., Lee, J., Park, S.Y., Lee, C, "Fine-grained mobile application clustering model using retrofitted document embedding," ETRI Journal, 39, 443-454, 2017. https://doi.org/10.4218/etrij.17.0116.0936
- Yoon, Y.C., Gee, H.K., Lim, H, "Network-Based Document Clustering Using External Ranking Loss for Network Embedding," IEEE Access, 7, 155412-155423, 2019. https://doi.org/10.1109/ACCESS.2019.2948662
- Sun, Y., Yu, Y., Han, J, "Ranking-Based Clustering of Heterogeneous Information Networks with Star Network Schema," in Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, pp. 797-806, 2019.
- Zhang, S., Xu, Y., Zhang, W, "Clustering scientific document based on an extended citation model," IEEE Access, 7, 57037-57046, 2019. https://doi.org/10.1109/ACCESS.2019.2913995
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J, "Distributed representations of words and phrases and their compositionality," in Proc. of the 26th International Conference on Neural Information Processing Systems, vol. 2, pp. 3111-3119, 2013.
- Perozzi, B., Al-Rfou, R., Skiena, S, "Deepwalk: Online learning of social representations," in Proc. of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701-710, 2014.
- Grover, A., Leskovec, J, "node2vec: Scalable feature learning for networks," in Proc. of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 855-864, 2016.
- Le, Q., Mikolov, T, "Distributed representations of sentences and documents," in Proc. of International conference on machine learning, pp. 1188-1196, 2014.
- Dai, A.M., Olah, C., Le, Q.V, "Document embedding with paragraph vectors," arXiv preprint arXiv:1507.07998, 2015.
- Kiros, R., Zhu, Y., Salakhutdinov, R.R., Zemel, R., Urtasun, R., Torralba, A., Fidler, S, "Skip-thought vectors," in Proc. of the 28th International Conference on Neural Information Processing Systems, vol. 2, pp. 3294-3302, 2015.
- Wang, S., Tang, J., Aggarwal, C., Liu, H, "Linked document embedding for classification," in Proc. of the 25th ACM international on conference on information and knowledge management, pp. 115-124, 2016.
- Johansson, R., Pina, L.N, "Embedding a semantic network in a word space," in Proc. of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1428-1433, 2015.
- Rothe, S., Schutze, H. Autoextend: "Extending word embeddings to embeddings for synsets and lexemes," in Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 1793-1803, 2015.
- Lau, J.H., Baldwin, T, "An empirical evaluation of doc2vec with practical insights into document embedding generation," arXiv preprint arXiv:1607.05368, 2016.
- Dai, Y., Wang, S., Xiong, N.N., Guo, W, "A Survey on Knowledge Graph Embedding: Approaches, Applications and Benchmarks," Electronics, 9, 750, 2020.
- Blei, D.M., Ng, A.Y., Jordan, M.I, "Latent dirichlet allocation," Journal of machine Learning research, 3, 993-1022, 2003.
- Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q. Line, "Large-scale information network embedding," in Proc. of the 24th international conference on world wide web, pp. 1067-1077, 2015.
- Qi An and Liang Yu, "A heterogeneous network embedding frame-work for predicting similarity-based drug-target interactions," Brief-ings in Bioinformatics, 22.6, bbab275, 2021.
- Qiu, J., Dong, Y., Ma, H., Li, J., Wang, K., Tang, J, "Network embedding as matrix factorization: Unifying deepwalk, line, pte, and node2vec," in Proc. of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 459-467, 2018.
- Duong, Chi Thang, et al, "Efficient and effective multi-modal queries through heterogeneous network embedding," IEEE Transactions on Knowledge and Data Engineering, 34.11, 5307-5320, 2022. https://doi.org/10.1109/TKDE.2021.3052871
- Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl, "Neural message passing for quantum chemistry," in Proc. of International conference on machine learning, PMLR, pp. 1263-1272, 2017.
- Xu, L., Wang, J., He, L., Cao, J., Wei, X., Philip, S. Y., & Yamanishi, K., "MixSp: a framework for embedding heterogeneous information networks with arbitrary number of node and edge types," IEEE Transactions on Knowledge and Data Engineering, 33.6, 2627-2639, 2021. https://doi.org/10.1109/TKDE.2019.2955945
- Sutanto, T. E., & Nayak, R, "Ranking based clustering for social event detection," in Proc. of Working Notes in Proc. of the MediaEval 2014 Workshop [CEUR Workshop Proceedings, Volume, 1263], Sun SITE Central Europe, pp. 1-2, 2014.
- Jon M Kleinberg, "Authoritative sources in a hyperlinked environment," Journal of the ACM (JACM), 46.5, 604-632, 1999. https://doi.org/10.1145/324133.324140
- Yang Liu and Songhua Xu, "A local context-aware lda model for topic modeling in a document network," Journal of the Association for Information Science and Technology, 68.6, 1429-1448, 2017. https://doi.org/10.1002/asi.23822
- Pei, Xiaobing, Chuanbo Chen, and Weihua Gong, "Concept factorization with adaptive neighbors for document clustering," IEEE Transactions on Neural Networks and Learning Systems, 29.2, 343-352, 2018. https://doi.org/10.1109/TNNLS.2016.2626311
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, "The pagerank citation ranking: Bringing order to the web," Tech. report, Stanford InfoLab, vol. 98, pp 161-172, 1999
- Rozemberczki, B., Davies, R., Sarkar, R., & Sutton, C., "Gemsec: Graph embedding with self clustering," in Proc. of the 2019 IEEE/ACM international conference on advances in social networks analysis and mining, pp. 65-72, 2019.
- Xu, Z., Wei, X., Luo, X., Liu, Y., Mei, L., Hu, C., & Chen, L., "Knowle: a semantic link network based system for organizing large scale online news events," Future Generation Computer Systems, 43-44, 40-50, 2015. https://doi.org/10.1016/j.future.2014.04.002
- Arevalo, J., Solorio, T., Montes-y-Gomez, M., & Gonzalez, F. A, "Gated multimodal units for information fusion," arXiv preprint arXiv:1702.01992, 2017.
- Lopez-Ibanez, Manuel, et al., "The irace package: Iterated racing for automatic algorithm configuration," Operations Research Perspectives, 3, 43-58, 2016. https://doi.org/10.1016/j.orp.2016.09.002