• Title/Summary/Keyword: Semantic Similarity

Search Result 280, Processing Time 0.033 seconds

The Need for Paradigm Shift in Semantic Similarity and Semantic Relatedness : From Cognitive Semantics Perspective (의미간의 유사도 연구의 패러다임 변화의 필요성-인지 의미론적 관점에서의 고찰)

  • Choi, Youngseok;Park, Jinsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.111-123
    • /
    • 2013
  • Semantic similarity/relatedness measure between two concepts plays an important role in research on system integration and database integration. Moreover, current research on keyword recommendation or tag clustering strongly depends on this kind of semantic measure. For this reason, many researchers in various fields including computer science and computational linguistics have tried to improve methods to calculating semantic similarity/relatedness measure. This study of similarity between concepts is meant to discover how a computational process can model the action of a human to determine the relationship between two concepts. Most research on calculating semantic similarity usually uses ready-made reference knowledge such as semantic network and dictionary to measure concept similarity. The topological method is used to calculated relatedness or similarity between concepts based on various forms of a semantic network including a hierarchical taxonomy. This approach assumes that the semantic network reflects the human knowledge well. The nodes in a network represent concepts, and way to measure the conceptual similarity between two nodes are also regarded as ways to determine the conceptual similarity of two words(i.e,. two nodes in a network). Topological method can be categorized as node-based or edge-based, which are also called the information content approach and the conceptual distance approach, respectively. The node-based approach is used to calculate similarity between concepts based on how much information the two concepts share in terms of a semantic network or taxonomy while edge-based approach estimates the distance between the nodes that correspond to the concepts being compared. Both of two approaches have assumed that the semantic network is static. That means topological approach has not considered the change of semantic relation between concepts in semantic network. However, as information communication technologies make advantage in sharing knowledge among people, semantic relation between concepts in semantic network may change. To explain the change in semantic relation, we adopt the cognitive semantics. The basic assumption of cognitive semantics is that humans judge the semantic relation based on their cognition and understanding of concepts. This cognition and understanding is called 'World Knowledge.' World knowledge can be categorized as personal knowledge and cultural knowledge. Personal knowledge means the knowledge from personal experience. Everyone can have different Personal Knowledge of same concept. Cultural Knowledge is the knowledge shared by people who are living in the same culture or using the same language. People in the same culture have common understanding of specific concepts. Cultural knowledge can be the starting point of discussion about the change of semantic relation. If the culture shared by people changes for some reasons, the human's cultural knowledge may also change. Today's society and culture are changing at a past face, and the change of cultural knowledge is not negligible issues in the research on semantic relationship between concepts. In this paper, we propose the future directions of research on semantic similarity. In other words, we discuss that how the research on semantic similarity can reflect the change of semantic relation caused by the change of cultural knowledge. We suggest three direction of future research on semantic similarity. First, the research should include the versioning and update methodology for semantic network. Second, semantic network which is dynamically generated can be used for the calculation of semantic similarity between concepts. If the researcher can develop the methodology to extract the semantic network from given knowledge base in real time, this approach can solve many problems related to the change of semantic relation. Third, the statistical approach based on corpus analysis can be an alternative for the method using semantic network. We believe that these proposed research direction can be the milestone of the research on semantic relation.

Korean Semantic Similarity Measures for the Vector Space Models

  • Lee, Young-In;Lee, Hyun-jung;Koo, Myoung-Wan;Cho, Sook Whan
    • Phonetics and Speech Sciences
    • /
    • v.7 no.4
    • /
    • pp.49-55
    • /
    • 2015
  • It is argued in this paper that, in determining semantic similarity, Korean words should be recategorized with a focus on the semantic relation to ontology in light of cross-linguistic morphological variations. It is proposed, in particular, that Korean semantic similarity should be measured on three tracks, human judgements track, relatedness track, and cross-part-of-speech relations track. As demonstrated in Yang et al. (2015), GloVe, the unsupervised learning machine on semantic similarity, is applicable to Korean with its performance being compared with human judgement results. Based on this compatability, it was further thought that the model's performance might most likely vary with different kinds of specific relations in different languages. An attempt was made to analyze them in terms of two major Korean-specific categories involved in their lexical and cross-POS-relations. It is concluded that languages must be analyzed by varying methods so that semantic components across languages may allow varying semantic distance in the vector space models.

A Text Similarity Measurement Method Based on Singular Value Decomposition and Semantic Relevance

  • Li, Xu;Yao, Chunlong;Fan, Fenglong;Yu, Xiaoqiang
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.863-875
    • /
    • 2017
  • The traditional text similarity measurement methods based on word frequency vector ignore the semantic relationships between words, which has become the obstacle to text similarity calculation, together with the high-dimensionality and sparsity of document vector. To address the problems, the improved singular value decomposition is used to reduce dimensionality and remove noises of the text representation model. The optimal number of singular values is analyzed and the semantic relevance between words can be calculated in constructed semantic space. An inverted index construction algorithm and the similarity definitions between vectors are proposed to calculate the similarity between two documents on the semantic level. The experimental results on benchmark corpus demonstrate that the proposed method promotes the evaluation metrics of F-measure.

A Method of Service Refinement for Network-Centric Operational Environment

  • Lee, Haejin;Kang, Dongsu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.12
    • /
    • pp.97-105
    • /
    • 2016
  • Network-Centric Operational Environment(NCOE) service becomes critical in today's military environment network because reusability of service and interaction are being increasingly important as well in business process. However, the refinement of service by semantic similarity and functional similarity at the business process was not detailed yet. In order to enhance accuracy of refining of business service, in this study, the authors introduce a method for refining service by semantic similarity and functional similarity in BPMN model. The business process are designed in a BPMN model. In this model, candidated services are refined through binding related activities by the analysis result of semantic similarity based on word-net and functional similarity based on properties specification between activities. Then, the services are identified through refining the candidated service. The proposed method is expected to enhance the service identification with accuracy and modularity. It also can accelerate more standardized service refinement developments by the proposed method.

A Comparison between Factor Structure and Semantic Representation of Personality Test Items Using Latent Semantic Analysis (잠재의미분석을 활용한 성격검사문항의 의미표상과 요인구조의 비교)

  • Park, Sungjoon;Park, Heeyoung;Kim, Cheongtag
    • Korean Journal of Cognitive Science
    • /
    • v.30 no.3
    • /
    • pp.133-156
    • /
    • 2019
  • To investigate how personality test items are understood by participants, their semantic representations were explored by Latent Semantic Analysis, In this thesis, Semantic Similarity Matrix was proposed, which contains cosine similarity of semantic representations between test items and personality traits. The matrix was compared to traditional factor loading matrix. In preliminary study, semantic space was constructed from the passages describing the five traits, collected from 154 undergraduate participants. In study 1, positive correlation was observed between the factor loading matrix of Korean shorten BFI and its semantic similarity matrix. In study 2, short personality test was constructed from semantic similarity matrix, and observed that its factor loading matrix was positively correlated with the semantic similarity matrix as well. In conclusion, the results implies that the factor structure of personality test can be inferred from semantic similarity between the items and factors.

Development of the Recommender System of Arabic Books Based on the Content Similarity

  • Alotaibi, Shaykhah Hajed;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.175-186
    • /
    • 2022
  • This research article develops an Arabic books' recommendation system, which is based on the content similarity that assists users to search for the right book and predict the appropriate and suitable books pertaining to their literary style. In fact, the system directs its users toward books, which can meet their needs from a large dataset of Information. Further, this system makes its predictions based on a set of data that is gathered from different books and converts it to vectors by using the TF-IDF system. After that, the recommendation algorithms such as the cosine similarity, the sequence matcher similarity, and the semantic similarity aggregate data to produce an efficient and effective recommendation. This approach is advantageous in recommending previously unrated books to users with unique interests. It is found to be proven from the obtained results that the results of the cosine similarity of the full content of books, the results of the sequence matcher similarity of Arabic titles of the books, and the results of the semantic similarity of English titles of the books are the best obtained results, and extremely close to the average of the result related to the human assigned/annotated similarity. Flask web application is developed with a simple interface to show the recommended Arabic books by using cosine similarity, sequence matcher similarity, and semantic similarity algorithms with all experiments that are conducted.

Assessment of performance of machine learning based similarities calculated for different English translations of Holy Quran

  • Al Ghamdi, Norah Mohammad;Khan, Muhammad Badruddin
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.4
    • /
    • pp.111-118
    • /
    • 2022
  • This research article presents the work that is related to the application of different machine learning based similarity techniques on religious text for identifying similarities and differences among its various translations. The dataset includes 10 different English translations of verses (Arabic: Ayah) of two Surahs (chapters) namely, Al-Humazah and An-Nasr. The quantitative similarity values for different translations for the same verse were calculated by using the cosine similarity and semantic similarity. The corpus went through two series of experiments: before pre-processing and after pre-processing. In order to determine the performance of machine learning based similarities, human annotated similarities between translations of two Surahs (chapters) namely Al-Humazah and An-Nasr were recorded to construct the ground truth. The average difference between the human annotated similarity and the cosine similarity for Surah (chapter) Al-Humazah was found to be 1.38 per verse (ayah) per pair of translation. After pre-processing, the average difference increased to 2.24. Moreover, the average difference between human annotated similarity and semantic similarity for Surah (chapter) Al-Humazah was found to be 0.09 per verse (Ayah) per pair of translation. After pre-processing, it increased to 0.78. For the Surah (chapter) An-Nasr, before preprocessing, the average difference between human annotated similarity and cosine similarity was found to be 1.93 per verse (Ayah), per pair of translation. And. After pre-processing, the average difference further increased to 2.47. The average difference between the human annotated similarity and the semantic similarity for Surah An-Nasr before preprocessing was found to be 0.93 and after pre-processing, it was reduced to 0.87 per verse (ayah) per pair of translation. The results showed that as expected, the semantic similarity was proven to be better measurement indicator for calculation of the word meaning.

A Multi-Agent Improved Semantic Similarity Matching Algorithm Based on Ontology Tree (온톨로지 트리기반 멀티에이전트 세만틱 유사도매칭 알고리즘)

  • Gao, Qian;Cho, Young-Im
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.18 no.11
    • /
    • pp.1027-1033
    • /
    • 2012
  • Semantic-based information retrieval techniques understand the meanings of the concepts that users specify in their queries, but the traditional semantic matching methods based on the ontology tree have three weaknesses which may lead to many false matches, causing the falling precision. In order to improve the matching precision and the recall of the information retrieval, this paper proposes a multi-agent improved semantic similarity matching algorithm based on the ontology tree, which can avoid the considerable computation redundancies and mismatching during the entire matching process. The results of the experiments performed on our algorithm show improvements in precision and recall compared with the information retrieval techniques based on the traditional semantic similarity matching methods.

KNN-based Image Annotation by Collectively Mining Visual and Semantic Similarities

  • Ji, Qian;Zhang, Liyan;Li, Zechao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4476-4490
    • /
    • 2017
  • The aim of image annotation is to determine labels that can accurately describe the semantic information of images. Many approaches have been proposed to automate the image annotation task while achieving good performance. However, in most cases, the semantic similarities of images are ignored. Towards this end, we propose a novel Visual-Semantic Nearest Neighbor (VS-KNN) method by collectively exploring visual and semantic similarities for image annotation. First, for each label, visual nearest neighbors of a given test image are constructed from training images associated with this label. Second, each neighboring subset is determined by mining the semantic similarity and the visual similarity. Finally, the relevance between the images and labels is determined based on maximum a posteriori estimation. Extensive experiments were conducted using three widely used image datasets. The experimental results show the effectiveness of the proposed method in comparison with state-of-the-arts methods.

GORank: Semantic Similarity Search for Gene Products using Gene Ontology (GORank: Gene Ontology를 이용한 유전자 산물의 의미적 유사성 검색)

  • Kim, Ki-Sung;Yoo, Sang-Won;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.7
    • /
    • pp.682-692
    • /
    • 2006
  • Searching for gene products which have similar biological functions are crucial for bioinformatics. Modern day biological databases provide the functional description of gene products using Gene Ontology(GO). In this paper, we propose a technique for semantic similarity search for gene products using the GO annotation information. For this purpose, an information-theoretic measure for semantic similarity between gene products is defined. And an algorithm for semantic similarity search using this measure is proposed. We adapt Fagin's Threshold Algorithm to process the semantic similarity query as follows. First, we redefine the threshold for our measure. This is because our similarity function is not monotonic. Then cluster-skipping and the access ordering of the inverted index lists are proposed to reduce the number of disk accesses. Experiments with real GO and annotation data show that GORank is efficient and scalable.