• Title/Summary/Keyword: Page Similarity

Search Result 69, Processing Time 0.023 seconds

An Automatic Web Page Classification System Using Meta-Tag (메타 태그를 이용한 자동 웹페이지 분류 시스템)

  • Kim, Sang-Il;Kim, Hwa-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.4
    • /
    • pp.291-297
    • /
    • 2013
  • Recently, the amount of web pages, which include various information, has been drastically increased according to the explosive increase of WWW usage. Therefore, the need for web page classification arose in order to make it easier to access web pages and to make it possible to search the web pages through the grouping. Web page classification means the classification of various web pages that are scattered on the web according to the similarity of documents or the keywords contained in the documents. Web page classification method can be applied to various areas such as web page searching, group searching and e-mail filtering. However, it is impossible to handle the tremendous amount of web pages on the web by using the manual classification. Also, the automatic web page classification has the accuracy problem in that it fails to distinguish the different web pages written in different forms without classification errors. In this paper, we propose the automatic web page classification system using meta-tag that can be obtained from the web pages in order to solve the inaccurate web page retrieval problem.

Local Similarity based Document Layout Analysis using Improved ARLSA

  • Kim, Gwangbok;Kim, SooHyung;Na, InSeop
    • International Journal of Contents
    • /
    • v.11 no.2
    • /
    • pp.15-19
    • /
    • 2015
  • In this paper, we propose an efficient document layout analysis algorithm that includes table detection. Typical methods of document layout analysis use the height and gap between words or columns. To correspond to the various styles and sizes of documents, we propose an algorithm that uses the mean value of the distance transform representing thickness and compare with components in the local area. With this algorithm, we combine a table detection algorithm using the same feature as that of the text classifier. Table candidates, separators, and big components are isolated from the image using Connected Component Analysis (CCA) and distance transform. The key idea of text classification is that the characteristics of the text parallel components that have a similar thickness and height. In order to estimate local similarity, we detect a text region using an adaptive searching window size. An improved adaptive run-length smoothing algorithm (ARLSA) was proposed to create the proper boundary of a text zone and non-text zone. Results from experiments on the ICDAR2009 page segmentation competition test set and our dataset demonstrate the superiority of our dataset through f-measure comparison with other algorithms.

Graph based KNN for Optimizing Index of News Articles

  • Jo, Taeho
    • Journal of Multimedia Information System
    • /
    • v.3 no.3
    • /
    • pp.53-61
    • /
    • 2016
  • This research proposes the index optimization as a classification task and application of the graph based KNN. We need the index optimization as an important task for maximizing the information retrieval performance. And we try to solve the problems in encoding words into numerical vectors, such as huge dimensionality and sparse distribution, by encoding them into graphs as the alternative representations to numerical vectors. In this research, the index optimization is viewed as a classification task, the similarity measure between graphs is defined, and the KNN is modified into the graph based version based on the similarity measure, and it is applied to the index optimization task. As the benefits from this research, by modifying the KNN so, we expect the improvement of classification performance, more graphical representations of words which is inherent in graphs, the ability to trace more easily results from classifying words. In this research, we will validate empirically the proposed version in optimizing index on the two text collections: NewsPage.com and 20NewsGroups.

Content-based Recommendation Based on Social Network for Personalized News Services (개인화된 뉴스 서비스를 위한 소셜 네트워크 기반의 콘텐츠 추천기법)

  • Hong, Myung-Duk;Oh, Kyeong-Jin;Ga, Myung-Hyun;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.57-71
    • /
    • 2013
  • Over a billion people in the world generate new news minute by minute. People forecasts some news but most news are from unexpected events such as natural disasters, accidents, crimes. People spend much time to watch a huge amount of news delivered from many media because they want to understand what is happening now, to predict what might happen in the near future, and to share and discuss on the news. People make better daily decisions through watching and obtaining useful information from news they saw. However, it is difficult that people choose news suitable to them and obtain useful information from the news because there are so many news media such as portal sites, broadcasters, and most news articles consist of gossipy news and breaking news. User interest changes over time and many people have no interest in outdated news. From this fact, applying users' recent interest to personalized news service is also required in news service. It means that personalized news service should dynamically manage user profiles. In this paper, a content-based news recommendation system is proposed to provide the personalized news service. For a personalized service, user's personal information is requisitely required. Social network service is used to extract user information for personalization service. The proposed system constructs dynamic user profile based on recent user information of Facebook, which is one of social network services. User information contains personal information, recent articles, and Facebook Page information. Facebook Pages are used for businesses, organizations and brands to share their contents and connect with people. Facebook users can add Facebook Page to specify their interest in the Page. The proposed system uses this Page information to create user profile, and to match user preferences to news topics. However, some Pages are not directly matched to news topic because Page deals with individual objects and do not provide topic information suitable to news. Freebase, which is a large collaborative database of well-known people, places, things, is used to match Page to news topic by using hierarchy information of its objects. By using recent Page information and articles of Facebook users, the proposed systems can own dynamic user profile. The generated user profile is used to measure user preferences on news. To generate news profile, news category predefined by news media is used and keywords of news articles are extracted after analysis of news contents including title, category, and scripts. TF-IDF technique, which reflects how important a word is to a document in a corpus, is used to identify keywords of each news article. For user profile and news profile, same format is used to efficiently measure similarity between user preferences and news. The proposed system calculates all similarity values between user profiles and news profiles. Existing methods of similarity calculation in vector space model do not cover synonym, hypernym and hyponym because they only handle given words in vector space model. The proposed system applies WordNet to similarity calculation to overcome the limitation. Top-N news articles, which have high similarity value for a target user, are recommended to the user. To evaluate the proposed news recommendation system, user profiles are generated using Facebook account with participants consent, and we implement a Web crawler to extract news information from PBS, which is non-profit public broadcasting television network in the United States, and construct news profiles. We compare the performance of the proposed method with that of benchmark algorithms. One is a traditional method based on TF-IDF. Another is 6Sub-Vectors method that divides the points to get keywords into six parts. Experimental results demonstrate that the proposed system provide useful news to users by applying user's social network information and WordNet functions, in terms of prediction error of recommended news.

Peptide Sequence Analysis of the CNBr-Digested 34-36 kd Sperminogen

  • Yu, Hyunkyung;Yi, Lee-S.-H.
    • Animal cells and systems
    • /
    • v.5 no.3
    • /
    • pp.199-203
    • /
    • 2001
  • Sperminogen was purified from the acid extracts of boar spermatozoa and partial peptide sequence of the 34-36 kd sperminogen was determined. Acid extracts of boar spermatozoa was gel-filtered through Sephadex G-75, and the 34-36 kd sperminogen was purified by preparative SDS-PAGE. The sperminogen bands were sliced out, and 34-36 kd sperminogen were eluted from the gel fragments and was subjected to peptide sequencing. Since the amino termini were blocked for Edman degradation method, internal amino acid sequences of the eluted 34-36 kd sperminogen were obtained from CNBr-digested peptides of sperminogen. Among several bands resolved on tricine SDS-PAGE, 14, 22 and 26 kd peptides were subjected to peptide sequencing. The ana1yzed amino acid sequences of the 26 and 22 kd peptides showed high homologies with that of the zona pellucida binding protein, Sp38, and the analyzed amino acid sequence of the 14 kd peptide showed neither sequence homology nor similarity with any known proteins.

  • PDF

Cloning and Sequencing of the ${\alpha}-1{\rightarrow}6$ Dextransurcrase Gene from Leuconostoc mensenteroides B-742CB

  • Kim, Ho-Sang;Kim, Do-Man;Ryu, Hwa-Ja;Robyt, John-F.
    • Journal of Microbiology and Biotechnology
    • /
    • v.10 no.4
    • /
    • pp.559-563
    • /
    • 2000
  • A dextransucrase gene (dsrB742) that expresses a dextransucrase to synthesize mostly ${\alpha}-1{\rightarrow}6$ linked dextran with a low amount (3-5%) of ${\alpha}-1{\rightarrow}3$ branching was cloned and sequenced from Leuconostoc mesenteroides B-742CB. The 6.1-kb PstI fragments were ligated with pGEM-3Zf(-) and transformed into E. coli $DH5{\alpha}$. The recombinant clone (pDSRB742) synthesized dextran on an agar plate containing 2% (w/v) sucrose. The dextran synthesized was hydrolyzed with Penicillium endo-dextranase. The hydrolyzate was composed of glucose, isomaltose, isomaltotriose, and branced pentasaccharide. The nucleotide sequence of dsrB742 showed one open reading frame (ORF) composed of 4,524 bp encoding dextrasnsucrase. The deduced amino acid sequence revealed a calculated molecular mass of 168.6 kDa. It also showed an activity band of 184 kKa on a non-denaturing SDS-PAGE (10%). The amino acid sequence of DSRB742 exhibited a 50% similarity with DSRA from L. mesenteroides B-1299, a 70% similarity with DSRS from L. mesenteroides B-512 (F, FMCM) and a 45-56% similarity with Streptococcal GTFs.

  • PDF

Summarizing the Differences in Chinese-Vietnamese Bilingual News

  • Wu, Jinjuan;Yu, Zhengtao;Liu, Shulong;Zhang, Yafei;Gao, Shengxiang
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1365-1377
    • /
    • 2019
  • Summarizing the differences in Chinese-Vietnamese bilingual news plays an important supporting role in the comparative analysis of news views between China and Vietnam. Aiming at cross-language problems in the analysis of the differences between Chinese and Vietnamese bilingual news, we propose a new method of summarizing the differences based on an undirected graph model. The method extracts elements to represent the sentences, and builds a bridge between different languages based on Wikipedia's multilingual concept description page. Firstly, we calculate the similarity between Chinese and Vietnamese news sentences, and filter the bilingual sentences accordingly. Then we use the filtered sentences as nodes and the similarity grade as the weight of the edge to construct an undirected graph model. Finally, combining the random walk algorithm, the weight of the node is calculated according to the weight of the edge, and sentences with highest weight can be extracted as the difference summary. The experiment results show that our proposed approach achieved the highest score of 0.1837 on the annotated test set, which outperforms the state-of-the-art summarization models.

Semantic Similarity Search using the Signature Tree (시그니처 트리를 사용한 의미적 유사성 검색 기법)

  • Kim, Ki-Sung;Im, Dong-Hyuk;Kim, Cheol-Han;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.34 no.6
    • /
    • pp.546-553
    • /
    • 2007
  • As ontologies are used widely, interest for semantic similarity search is also increasing. In this paper, we suggest a query evaluation scheme for k-nearest neighbor query, which retrieves k most similar objects to the query object. We use the best match method to calculate the semantic similarity between objects and use the signature tree to index annotation information of objects in database. The signature tree is usually used for the set similarity search. When we use the signature tree in similarity search, we are required to predict the upper-bound of similarity for a node; the highest similarity value which can be found when we traverse into the node. So we suggest a prediction function for the best match similarity function and prove the correctness of the prediction. And we modify the original signature tree structure for same signatures not to be stored redundantly. This improved structure of signature tree not only reduces the size of signature tree but also increases the efficiency of query evaluation. We use the Gene Ontology(GO) for our experiments, which provides large ontologies and large amount of annotation data. Using GO, we show that proposed method improves query efficiency and present several experimental results varying the page size and using several node-splitting methods.

Genetic Variation and Polymorphism in Rainbow Trout, Oncorhynchus mykiss Analysed by Amplified Fragment Length Polymorphism

  • Yoon, Jong-Man;Yoo, Jae-Young;Park, Jae-Il
    • Journal of Aquaculture
    • /
    • v.17 no.1
    • /
    • pp.69-80
    • /
    • 2004
  • The objective of the present study was to analyze genetic distances, variation and characteristics of individuals in rainbow trout, Oncorhynchus mykis using amplified fragment length polymorphism (AFLP) method as molecular genetic technique, to detect AFLP band patterns as genetic markers, and to compare the efficiency of agarosegel electrophoresis (AGE) and polyacrylamide gel electrophoresis (PAGE), respectively. Using 9 primer combinations, a total of 141 AFLP bands were produced, 108 bands (82.4%) of which were polymorphic in AGE. In PAGE, a total of 288 bands were detected, and 220 bands (76.4%) were polymorphic. The AFLP fingerprints of AGE were different from those of PAGE. Separation of the fragments with low molecular weight and genetic polymorphisms revealed a distinct pattern in the two gel systems. In the present study, the average bandsharing values of the individuals between two populations apart from the geographic sites in Kangwon-do ranged from 0.084 to 0.738 of AGE and PAGE. The bandsharing values between individuals No.9 and No. 10 showed the highest level within population, whereas the bandsharing values between individuals No.5 and No.7 showed the lowest level. As calculated by bandsharing analysis, an average of genetic difference (mean$\pm$SD) of individuals was approximately 0.590$\pm$0.125 in this population. In AGE, the single linkage dendrogram resulted from two primers (M11+H11 and M13+H11), indicating six genetic groupings composed of group 1 (No.9 and 10), group 2 (No. 1, 4, 5, 7, 10, 11, 16 and 17), group 3 (No. 2, 3, 6, 8, 12, 15 and 16), group 4 (No.9, 14 and 17), group 5 (No. 13, 19, 20 and 21) and group 6 (No. 23). In AGE, the genetic distances among individuals of between-population ranged from 0.108 to 0.392. In AGE, the shortest genetic distance (0.108) displaying significant molecular differences was between individuals No.9 and No. 10. Especially, the genetic distance between individuals No. 23 and the remnants among individuals within population was highest (0.392). Additionally, in the cluster analysis using the PAGE data, the single linkage dendrogram resulted from two primers (M12+H13 and M11+H13), indicating seven genetic groupings composed of group 1 (No. 15), group 2 (No. 14), group 3 (No. 11 and 12), group 4 (No.5, 6, 7, 8, 10 and 13), group 5 (No.1, 2, 3 and 4), group 6 (No.9) and group 7 (No. 16). By comparison with the individuals in PAGE, genetic distance between No. 10 and No. 7 showed the shortest value (0.071), also between No. 16 and No. 14 showed the highest value (0.242). As with the PAGE analysis, genetic differences were certainly apparent with 13 of 16 individuals showing greater than 80% AFLP-based similarity to their closest neighbor. The three individuals (No. 14, No. 15 and No. 16) of rainbow trout between two populations apart from the geographic sites in Kangwon-do formed distinct genetic distances as compared with other individuals. These results indicated that AFLP markers of this fish could be used as genetic information such as species identification, genetic relationship or analysis of genome structure, and selection aids for genetic improvement of economically important traits in fish species.

Responses of Bacteria to TNT: Cells′Survival, SDS-PAGE and 2-D Electrophoretic Analyses of Stress-Induced Proteins (TNT에 대한 세균의 반응기작: 생존율, 스트레스 유도단백질의 SDS-PAGE 및 2-D 전기영동 분석)

  • 오계헌;장효원;강형일;김승일
    • Korean Journal of Microbiology
    • /
    • v.38 no.2
    • /
    • pp.67-73
    • /
    • 2002
  • The cellular responses of soil-borne bacterium, Pseudomonas sp. HK-6 to explosive 2,4,6-trinitrotoluene (TNT) were examined. Two stress shock proteins (SSPs), approximately 70-kDa DnaK and a 60-kDa GroEL were found in HK-6 cells in response to TNT. Analyses of SDS-PAGE and Western blot using anti-DnaK and GroEL revealed that SSPs were induced in HK-6 cells exposed to 0.5 M of TNT far 6-12 hrs. The maximum induction of proteins was achieved at 8-hr incubation point after HK-6 cells'exposure to TNT. Similar SSPs were found to be induced in HK-6 cells by heat shock (shift of temperature, from $30^{\circ}C$ to $42^{\circ}C$) or cold shock (shift of temperature,$30^{\circ}C$ to $4^{\circ}C$).2D-PAGE of soluble protein tractions from the culture of Pseudomonas sp. HX-6 exposed to TNT demonstrated that approximately 450 spots were observed on the silver stained gels ranging from pH 3 to pH 10. Among them, 12 spots significantly induced and expressed in response to TNT were selected and analyzed. Approximately 60-kDa protein, which was assumed highly expressed on the gel, was used for amino acid sequencing. N-terminal microsequencing with in-gel digestion showed that N-terminal sequence of the TNT-induced protein, <$^1XXAKDVKFGDSARKKML^17$, shared extensive similarity with $^1XXAKDVKFGDSARKKML^17$, N-terminal sequence of (P48216) GroEL of Pseudomonas putida.