• Title/Summary/Keyword: Content Similarity

Search Result 530, Processing Time 0.033 seconds

Semantic-Based K-Means Clustering for Microblogs Exploiting Folksonomy

  • Heu, Jee-Uk
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1438-1444
    • /
    • 2018
  • Recently, with the development of Internet technologies and propagation of smart devices, use of microblogs such as Facebook, Twitter, and Instagram has been rapidly increasing. Many users check for new information on microblogs because the content on their timelines is continually updating. Therefore, clustering algorithms are necessary to arrange the content of microblogs by grouping them for a user who wants to get the newest information. However, microblogs have word limits, and it has there is not enough information to analyze for content clustering. In this paper, we propose a semantic-based K-means clustering algorithm that not only measures the similarity between the data represented as a vector space model, but also measures the semantic similarity between the data by exploiting the TagCluster for clustering. Through the experimental results on the RepLab2013 Twitter dataset, we show the effectiveness of the semantic-based K-means clustering algorithm.

Hybrid Video Information System Supporting Content-based Retrieval and Similarity Retrieval (비디오의 의미검색과 유사성검색을 위한 통합비디오정보시스템)

  • Yun, Mi-Hui;Yun, Yong-Ik;Kim, Gyo-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.8
    • /
    • pp.2031-2041
    • /
    • 1999
  • In this paper, we present the HVIS (Hybrid Video Information System) which bolsters up meaning retrieval of all the various users by integrating feature-based retrieval and annotation-based retrieval of unformatted formed and massive video data. HVIS divides a set of video into video document, sequence, scene and object to model the metadata and suggests the Two layered Hybrid Object-oriented Metadata Model(THOMM) which is composed of raw-data layer for physical video stream, metadata layer to support annotation-based retrieval, content-based retrieval, and similarity retrieval. Grounded on this model, we presents the video query language which make the annotation-based query, content-based query and similar query possible and Video Query Processor to process the query and query processing algorithm. Specially, We present the similarity expression to appear degree of similarity which considers interesting of user. The proposed system is implemented with Visual C++, ActiveX and ORACLE.

  • PDF

Patent Document Similarity Based on Image Analysis Using the SIFT-Algorithm and OCR-Text

  • Park, Jeong Beom;Mandl, Thomas;Kim, Do Wan
    • International Journal of Contents
    • /
    • v.13 no.4
    • /
    • pp.70-79
    • /
    • 2017
  • Images are an important element in patents and many experts use images to analyze a patent or to check differences between patents. However, there is little research on image analysis for patents partly because image processing is an advanced technology and typically patent images consist of visual parts as well as of text and numbers. This study suggests two methods for using image processing; the Scale Invariant Feature Transform(SIFT) algorithm and Optical Character Recognition(OCR). The first method which works with SIFT uses image feature points. Through feature matching, it can be applied to calculate the similarity between documents containing these images. And in the second method, OCR is used to extract text from the images. By using numbers which are extracted from an image, it is possible to extract the corresponding related text within the text passages. Subsequently, document similarity can be calculated based on the extracted text. Through comparing the suggested methods and an existing method based only on text for calculating the similarity, the feasibility is achieved. Additionally, the correlation between both the similarity measures is low which shows that they capture different aspects of the patent content.

Comparison of Ginsenoside Contents and Pattern Similarity Between Root Parts of New Cultivars in Panax ginseng C.A. Meyer (인삼 신품종의 뿌리부위별 진세노사이드 함량 및 패턴비교)

  • Ahn, In-Ok;Lee, Sung-Sik;Lee, Jang-Ho;Lee, Mi-Ja;Jo, Byung-Gu
    • Journal of Ginseng Research
    • /
    • v.32 no.1
    • /
    • pp.15-18
    • /
    • 2008
  • This study was carried out to evaluate the basic information on ginsenoside contents and pattern similarity in five cultivars of Panax ginseng C.A. Meyer. Among five cultivars the unit content and total content of ginsenosides were the highest in Gopoong cultivar as 18.9 mg/g and 596 mg/root, respectively. The unit content and total content of ginsenosides decreased in the order of Yunpoong, Gumpoong, Seonpoong and Chunpoong cultivar. Ginsenoside pattern similarity between tap root and lateral root was high as 0.95 but that between tap root and fine root was low as 0.72. Correlation of ginsenoside contents between tap root and lateral root exhibited the highest value as 0.843 and decreased in the order of main root, fine root, and rhizome. And the correlation value between unit content and total content of ginsenoside was very high as 0.933.

A Similarity Computation Algorithm Based on the Pitch and Rhythm of Music Melody (선율의 음높이와 리듬 정보를 이용한 음악의 유사도 계산 알고리즘)

  • Mo, Jong-Sik;Kim, So-Young;Ku, Kyong-I;Han, Chang-Ho;Kim, Yoo-Sung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3762-3774
    • /
    • 2000
  • The advances of computer hardware and information processing technologies raise the needs of multimedia information retrieval systems. Up to date. multimedia information systems have been developed for text information and image information. Nowadays. the multimedia information systems for video and audio information. especially for musical information have been grown up more and more. In recent music information retrieval systems. not only the information retrieval based on meta-information such like composer and title but also the content-based information retrieval is supported. The content-based information retrieval in music information retrieval systems utilize the similarity value between the user query and the music information stored in music database. In tbis paper. hence. we developed a similarity computation algorithm in which the pitches and lengths of each corresponding pair of notes are used as the fundamental factors for similarity computation between musical information. We also make an experiment of the proposed algorithm to validate its appropriateness. From the experimental results. the proposed similarity computation algorithm is shown to be able to correctly check whether two music files are analogous to each other or not based on melodies.

  • PDF

A Study on the efficiency of similarity and clustering measure in Historical Writing Document (역사적 기록 문서에서 효율적인 유사도 및 클러스터링 측정에 관한 연구)

  • 한광덕
    • Journal of the Korea Society of Computer and Information
    • /
    • v.7 no.4
    • /
    • pp.94-101
    • /
    • 2002
  • It expected a lot of changes in mass media and documentation expression as documents on web are getting diverse, complex and massive. An Annals of The Chosun Dynasty is a very important document used for researching historical facts and is published as CD-Rom. However. The CD-Rom was composed as content-based and using simple search method, therefore it's very difficult to make determine event-relationship between documents factors. Because of that, we studied to discover event-relationship between documents through clustering and efficient similarity method among Annals of The Chosun Dynasty. For the research method, we discovered the best similarity method for historical written documents through simulation similarity measures of Annals of The Chosun Dynasty documents. Then we did simulation-clustering documents based on similarity probability. In evaluation of the clustered documents , the results were the same as when manually figured.

  • PDF

A Method for Recommending Learning Contents Using Similarity and Difficulty (유사도와 난이도를 이용한 학습 콘텐츠 추천 방법)

  • Park, Jae -Wook;Lee, Yong-Kyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.7
    • /
    • pp.127-135
    • /
    • 2011
  • It is required that an e-learning system has a content recommendation component which helps a learner choose an item. In order to predict items concerning learner's interest, collaborative filtering and content-based filtering methods have been most widely used. The methods recommend items for a learner based on other learner's interests without considering the knowledge level of the learner. So, the effectiveness of the recommendation can be reduced when the number of overall users are relatively small. Also, it is not easy to recommend a newly added item. In order to address the problem, we propose a content recommendation method based on the similarity and the difficulty of an item. By using a recommendation function that reflects both characteristics of items, a higher-level leaner can choose more difficult but less similar items, while a lower-level learner can select less difficult but more similar items, Thus, a learner can be presented items according to his or her level of achievement, which is irrelevant to other learner's interest.

A Prospective Extension Through an Analysis of the Existing Movie Recommendation Systems and Their Challenges (기존 영화 추천시스템의 문헌 고찰을 통한 유용한 확장 방안)

  • Cho Nwe Zin, Latt;Muhammad, Firdaus;Mariz, Aguilar;Kyung-Hyune, Rhee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.1
    • /
    • pp.25-40
    • /
    • 2023
  • Recommendation systems are frequently used by users to generate intelligent automatic decisions. In the study of movie recommendation system, the existing approach uses largely collaboration and content-based filtering techniques. Collaborative filtering considers user similarity, while content-based filtering focuses on the activity of a single user. Also, mixed filtering approaches that combine collaborative filtering and content-based filtering are being used to compensate for each other's limitations. Recently, several AI-based similarity techniques have been used to find similarities between users to provide better recommendation services. This paper aims to provide the prospective expansion by deriving possible solutions through the analysis of various existing movie recommendation systems and their challenges.

Genetic Diversity of Barley Cultivars as Revealed by SSR Masker

  • Kim, Hong-Sik;Park, Kwang-Geun;Baek, Seong-Bum;Suh, Sae-Jung;Nam, Jung-Hyun
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.47 no.5
    • /
    • pp.379-383
    • /
    • 2002
  • Allelic diversity of 44 microsatellite marker loci originated from the coding regions of specific genes or the non-coding regions of barley genome was analyzed for 19 barley genotypes. Multi-allelic variation was observed at the most of marker loci except for HVM13, HVM15, HVM22, and HVM64. The number of different alleles ranged from 2 to 12 with a mean of 4.0 alleles per micro-satellite. Twenty-one alleles derived from 10 marker loci are specific for certain genotypes. The level of polymorphism (Polymorphic Information Content, PIC) based on the band pattern frequencies among genotypes was relatively high at the several loci such as HVM3, HVM5, HVM14, HVM36, HVM62 and HVM67. In the cluster analysis using genetic similarity matrix calculated from microsatellite-derived DNA profiles, two major groups were classified and the spike-row type was a major factor for clustering. Correlation between genetic similarity matrices based on microsatellite markers and pedigree data was highly significant ($r=0.57^{**}$), but these two parameters were moderately associated each other. On the other hand, RAPD-based genetic similarity matrix was more highly associated with microsatellite-based genetic similarity ($r=0.63^{**}$) than coefficient of parentage.

Learning Similarity with Probabilistic Latent Semantic Analysis for Image Retrieval

  • Li, Xiong;Lv, Qi;Huang, Wenting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.4
    • /
    • pp.1424-1440
    • /
    • 2015
  • It is a challenging problem to search the intended images from a large number of candidates. Content based image retrieval (CBIR) is the most promising way to tackle this problem, where the most important topic is to measure the similarity of images so as to cover the variance of shape, color, pose, illumination etc. While previous works made significant progresses, their adaption ability to dataset is not fully explored. In this paper, we propose a similarity learning method on the basis of probabilistic generative model, i.e., probabilistic latent semantic analysis (PLSA). It first derives Fisher kernel, a function over the parameters and variables, based on PLSA. Then, the parameters are determined through simultaneously maximizing the log likelihood function of PLSA and the retrieval performance over the training dataset. The main advantages of this work are twofold: (1) deriving similarity measure based on PLSA which fully exploits the data distribution and Bayes inference; (2) learning model parameters by maximizing the fitting of model to data and the retrieval performance simultaneously. The proposed method (PLSA-FK) is empirically evaluated over three datasets, and the results exhibit promising performance.